← Back to history

Pipeline run

5c415ce7-e9d4-4ca3-97c6-e28132bfcdbe

Pipeline LLM cost (USD)
API 1: $0.0035 API 2: $0.0002 API 3: $0.0000 Total: $0.0038

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd
Nature of work · Data pipeline development
Operate Airflow DAGs and SQL/Spark pipelines to migrate and deprecate workflows, backfill and validate data, and keep data processing jobs reliable, performant, and version-controlled in Git.
""Use Apache Airflow to schedule, monitor, and automate data workflows.""
Tech stack maturity
Mainstream Modern
Apache Airflow, Apache Spark, Git, and SQL are widely adopted, current data engineering tools that fit a mainstream modern stack rather than legacy or bleeding-edge.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
0.00 / 5
· Title match
· Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
Frameworks (×2):
Models / concepts (×3):
Evidence — skills matched in JD (10)
Apache Airflow DAGs SQL Apache Spark Git Data Migration Data Validation Anomaly Detection Data Governance Data Security
Skill cluster (5 dimension groups, role-scoped)
Data Pipeline Orchestration
Apache Airflow
Data Quality and Reconciliation
Anomaly Detection
ETL and ELT Tooling
Apache Spark
Programming Languages for Data Work
SQL
Cross-cutting / unaligned
DAGs Git Data Migration Data Validation Data Governance Data Security
Show KRA description ↓
1. Workflow Deprecation Plan and execute the deprecation of migrated workflows by evaluating current workflows' dependencies and consumption. Utilize tools and best practices to identify, mark, and communicate deprecated workflows to stakeholders. 2. Data Migration Plan and execute data migration tasks to move data between different storage systems or formats. Ensure the accuracy and completeness of data during migration processes. Implement strategies to accelerate the pace of data migration by backfilling, validating, and making new data assets ready for use. 3. Data Validation Define and implement data validation rules to ensure data accuracy, completeness, and reliability. Utilize data validation solutions and anomaly detection methods to monitor data quality. 4. Workflow Management Use Apache Airflow to schedule, monitor, and automate data workflows. Develop and manage DAGs (Directed Acyclic Graphs) in Airflow to orchestrate complex data processing tasks. 5. Data Processing Develop and maintain data processing scripts using SQL and Apache Spark. Optimize data processing for performance and efficiency. 6. Version Control Use Git for version control, collaborating with the team to manage the codebase and track changes. Ensure best practices in code quality and repository management. 7. Continuous Improvement Keep up to date with the latest developments in data engineering and related technologies. Continuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability.  Proficient in Git for version control and collaborative development.  Proficiency in SQL and experience with database technologies.  Experience in data pipeline tools such as Apache Airflow.  Strong knowledge of Apache Spark for data processing and transformation.  Experience with data migration and validation techniques.  Knowledge of data governance and security practices.  Strong problem-solving skills and the ability to work independently and in a team.  Ability to communicate with global team  Ability to work as a team in high performing environment.

Signals

Skill data-engineer
0.43
Alias data-engineer
1.00
KRA data-engineer
0.72

Post-classification

Centroidupdated · n=164
Alias collision log
New-role queue
New skills captured5
New KRA captured

Captured for admin review

DAGs primary Data Engineer pending
Data Migration primary Data Engineer pending
Data Validation primary Data Engineer pending
Data Governance Data Engineer pending
Data Security Data Engineer pending
Status: completed Created: 2026-05-27T14:22:49.498017Z Updated: 2026-05-27T14:24:01.260927Z API 3 duration: 19343 ms
Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

CASE A

slug: data-engineer · id: 2 · source: db

Exact alias hit on data-engineer (1.0) — no other alias at this confidence; skill_top data-engineer 0.43 does not contradict

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

0
New skills
0
Skill↔dim saved
0
Role↔dim saved
0
Skipped

Job description

JD


Data Engineer II Job Desc Deprecation Accelerator scope
We are looking for a Data Engineer who has working knowledge of building and
maintaining scalable data pipelines on-premises and on the cloud. This includes
understanding the input and output data sources, upstream downstream dependencies
and ensuring data quality. A key aspect of this role will be focusing on the deprecation
of migrated workflows and migration of workflows into new systems (if needed). The
ideal candidate should be experienced with tools and technologies such as Git, Apache
Airflow, Apache Spark, SQL, data migration, and data validation.
Key Responsibilities:
1. Workflow Deprecation


o Plan and execute the deprecation of migrated workflows by
evaluating current workflows' dependencies and consumption.
o Utilize tools and best practices to identify, mark, and communicate
deprecated workflows to stakeholders.


2. Data Migration


o Plan and execute data migration tasks to move data between
different storage systems or formats.
o Ensure the accuracy and completeness of data during migration
processes.
o Implement strategies to accelerate the pace of data migration by
backfilling, validating, and making new data assets ready for use.


3. Data Validation


o Define and implement data validation rules to ensure data
accuracy, completeness, and reliability.
o Utilize data validation solutions and anomaly detection methods to
monitor data quality.
4. Workflow Management


o Use Apache Airflow to schedule, monitor, and automate data
workflows.
o Develop and manage DAGs (Directed Acyclic Graphs) in Airflow
to orchestrate complex data processing tasks.


5. Data Processing


o Develop and maintain data processing scripts using SQL and
Apache Spark.
o Optimize data processing for performance and efficiency.


6. Version Control


o Use Git for version control, collaborating with the team to manage
the codebase and track changes.
o Ensure best practices in code quality and repository management.


7. Continuous Improvement


o Keep up to date with the latest developments in data engineering
and related technologies.


o Continuously improve and refactor data pipelines, tooling, and
processes to enhance performance and reliability.


Skills and Qualifications:
 Bachelor's degree in Computer Science, Engineering, or a related field.
 Proficient in Git for version control and collaborative development.
 Proficiency in SQL and experience with database technologies.
 Experience in data pipeline tools such as Apache Airflow.
 Strong knowledge of Apache Spark for data processing and transformation.
 Experience with data migration and validation techniques.
 Knowledge of data governance and security practices.
 Strong problem-solving skills and the ability to work independently and in a
team.
 Ability to communicate with global team
 Ability to work as a team in high performing environment.

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Apache Airflow Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Apache Airflow id=110 · apache-airflow

Aliases — catalog

  • Apache Airflow (CANONICAL) primary

Context tags (catalog)

CeleryExecutor DAG ETL KubernetesExecutor Sensors XCom backfill catchup cron data pipelines executor hooks operators scheduler task dependencies

Stored enrichment (catalog DB)

Category
Tool
Sub-category
Workflow Orchestration Tool
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2015
Confidence
0.98
Version strategy
NOT_APPLICABLE

Maturity reasoning: Frequently listed in data engineering JDs and widely adopted for workflow orchestration; strong GitHub activity and managed offerings from AWS/GCP/Azure signal broad market demand.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
13
Sub-category id
130
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Data Pipeline Orchestration Catalog dimension db id 23

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Data Pipeline Orchestration
data-pipeline-orchestration
Existing dimension (library) · Role↔dimension saved
DAGs Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
CONCEPT
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
SQL Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: SQL id=101 · sql

Aliases — catalog

  • SQL (CANONICAL) primary

Context tags (catalog)

ACID CTE DDL DML ETL JOIN MySQL NoSQL OLAP ORM PostgreSQL SQL injection SQLite T-SQL data modeling data warehousing database normalization execution plan indexing joins normalization query optimization stored procedures subquery transaction isolation transaction management window functions

Stored enrichment (catalog DB)

Category
Language
Sub-category
Query Language
Vendor
ANSI
License
unknown
Year introduced
1974
Confidence
0.99
Version strategy
NOT_APPLICABLE

Maturity reasoning: SQL appears in a large share of data, backend, and analytics job descriptions and remains the default query language for PostgreSQL, MySQL, and cloud warehouses like Snowflake/BigQuery.

Skill profile (library / DB)

Skill nature
LANGUAGE
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
6
Sub-category id
97
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Pega Programming Languages & DSLs Catalog dimension db id 267

    Library dimension (catalog)

    Roles linked in library: Pega Developer

  • Programming Languages for Data Work Catalog dimension db id 21

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Pega Programming Languages & DSLs
pega-programming-languages-dsls
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for Data Work
programming-languages-for-data-work
Existing dimension (library) · Role↔dimension saved
Apache Spark Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Apache Spark id=1350 · apache-spark

Aliases — catalog

  • Apache Spark (CANONICAL)
  • apache spark 3 (VERSION)
  • spark (VERSION)
  • spark 3 (VERSION)
  • spark 3.x (VERSION)
  • spark3 (VERSION)

Context tags (catalog)

Apache Kafka Cluster Manager DAGScheduler Data Lake DataFrame ETL Hadoop MLlib Machine Learning PySpark RDD Scala Spark SQL Spark Streaming SparkSession

Stored enrichment (catalog DB)

Category
Framework
Sub-category
Distributed Data Processing Framework
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2010
Confidence
0.94
Version strategy
SEPARATE_ENTITY
Version tag
3.x

Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.

Skill profile (library / DB)

Skill nature
FRAMEWORK
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
5
Sub-category id
1021
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • ETL and ELT Tooling Catalog dimension db id 24

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
Git Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Git id=1002 · git

Aliases — catalog

  • Git (CANONICAL)

Context tags (catalog)

CI/CD GitHub GitLab branching checkout clone commit fork merging pull request rebase remote repository stash versioning

Stored enrichment (catalog DB)

Category
Tool
Sub-category
Version Control Tool
Vendor
Linus Torvalds
License
gpl_v2
Year introduced
2005
Confidence
0.99
Version strategy
NOT_APPLICABLE

Maturity reasoning: Git is a hiring-pipeline staple: it appears in the vast majority of software engineering job descriptions and is the default VCS on GitHub/GitLab/Bitbucket.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
13
Sub-category id
730
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • React Frontend Development Catalog dimension db id 96

    Library dimension (catalog)

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
React Frontend Development
d_init_01
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Data Migration Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
PRACTICE
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Data Validation Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
PRACTICE
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Anomaly Detection Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Anomaly detection id=134 · anomaly-detection

Aliases — catalog

  • Anomaly detection (CANONICAL) primary

Context tags (catalog)

CUSUM EWMA Mahalanobis distance autoencoder automated alerts change point detection control charts data drift density estimation false positives feature engineering isolation forest machine learning model validation monitoring novelty detection one-class SVM outlier outlier detection predictive maintenance real-time analysis root cause analysis seasonality statistical methods thresholding time series unsupervised learning z-score

Stored enrichment (catalog DB)

Category
Concept
Sub-category
Ml Monitoring Concept
Confidence
0.90
Version strategy
NOT_APPLICABLE

Maturity reasoning: Common in ML/observability job descriptions and vendor docs (Datadog, Splunk, AWS, Azure) for fraud, monitoring, and alerting; broad market adoption across production systems.

Skill profile (library / DB)

Skill nature
CONCEPT
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
2
Sub-category id
1117
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Data Quality and Reconciliation Catalog dimension db id 27

    Library dimension (catalog)

    Roles linked in library: Data Engineer

  • Model Monitoring and Drift Detection Catalog dimension db id 45

    Library dimension (catalog)

    Roles linked in library: ML Engineer, MLOps Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Data Quality and Reconciliation
data-quality-and-reconciliation
Existing dimension (library) · Role↔dimension saved
Model Monitoring and Drift Detection
model-monitoring-and-drift-detection
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Data Governance Secondary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
CONCEPT
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Data Security Secondary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Security Tools
Sub-category
general
Skill nature
CONCEPT
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill Tag Dimension Skill↔dim Role↔dim Outcome Notes
Apache Airflow in_db
Data Pipeline Orchestration
data-pipeline-orchestration
Existing dimension (library) · Role↔dimension saved
SQL in_db
Pega Programming Languages & DSLs
pega-programming-languages-dsls
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
SQL in_db
Programming Languages for Data Work
programming-languages-for-data-work
Existing dimension (library) · Role↔dimension saved
Apache Spark in_db
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
Git in_db
React Frontend Development
d_init_01
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Anomaly Detection in_db
Data Quality and Reconciliation
data-quality-and-reconciliation
Existing dimension (library) · Role↔dimension saved
Anomaly Detection in_db
Model Monitoring and Drift Detection
model-monitoring-and-drift-detection
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Library artifacts (this run)

Kind Detail DB id
canonical_skill_proposed DAGs | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR
canonical_skill_proposed Data Migration | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR
canonical_skill_proposed Data Validation | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR
canonical_skill_proposed Data Governance | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR
canonical_skill_proposed Data Security | type=Security Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR
nano JD Parser — gpt-4.1-nano click to toggle
RoleData Engineer II
DomainOther
JD type pass
Show raw JSON
{
  "JD_type": "pass",
  "about_company": null,
  "certifications": [],
  "company_name": null,
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [],
      "domain": "Other"
    },
    "secondary": null
  },
  "education": [
    {
      "level": "Bachelor\u0027s",
      "qualification": "BTECH/BE - Computer Science (or related)",
      "raw": "Bachelor\u0027s degree in Computer Science, Engineering, or a related field.",
      "requirement": "required"
    }
  ],
  "experience": null,
  "job_locations": [],
  "role": "Data Engineer II",
  "role_aliases": [
    "Data Engineer",
    "Data Engineer II",
    "Data Pipeline Engineer"
  ],
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 7,
      "heading": "Key Responsibilities",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "1. Workflow Deprecation",
        "last_5_words": "and reliability."
      },
      "text": "1. Workflow Deprecation\n\n\nPlan and execute the deprecation of migrated workflows by evaluating current workflows\u0027 dependencies and consumption.\nUtilize tools and best practices to identify, mark, and communicate deprecated workflows to stakeholders.\n\n\n2. Data Migration\n\n\nPlan and execute data migration tasks to move data between different storage systems or formats.\nEnsure the accuracy and completeness of data during migration processes.\nImplement strategies to accelerate the pace of data migration by backfilling, validating, and making new data assets ready for use.\n\n\n3. Data Validation\n\n\nDefine and implement data validation rules to ensure data accuracy, completeness, and reliability.\nUtilize data validation solutions and anomaly detection methods to monitor data quality.\n\n\n4. Workflow Management\n\n\nUse Apache Airflow to schedule, monitor, and automate data workflows.\nDevelop and manage DAGs (Directed Acyclic Graphs) in Airflow to orchestrate complex data processing tasks.\n\n\n5. Data Processing\n\n\nDevelop and maintain data processing scripts using SQL and Apache Spark.\nOptimize data processing for performance and efficiency.\n\n\n6. Version Control\n\n\nUse Git for version control, collaborating with the team to manage the codebase and track changes.\nEnsure best practices in code quality and repository management.\n\n\n7. Continuous Improvement\n\n\nKeep up to date with the latest developments in data engineering and related technologies.\nContinuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability.",
      "word_count": 366
    },
    {
      "bullet_count": 9,
      "heading": "Skills and Qualifications",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Proficient in Git for version",
        "last_5_words": "high performing environment."
      },
      "text": "\uf0b7 Proficient in Git for version control and collaborative development.\n\uf0b7 Proficiency in SQL and experience with database technologies.\n\uf0b7 Experience in data pipeline tools such as Apache Airflow.\n\uf0b7 Strong knowledge of Apache Spark for data processing and transformation.\n\uf0b7 Experience with data migration and validation techniques.\n\uf0b7 Knowledge of data governance and security practices.\n\uf0b7 Strong problem-solving skills and the ability to work independently and in a team.\n\uf0b7 Ability to communicate with global team\n\uf0b7 Ability to work as a team in high performing environment.",
      "word_count": 81
    }
  ],
  "urls": []
}
API 1 — extract-from-jd click to toggle
{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Apache Airflow"
    },
    {
      "is_primary": true,
      "skill_name": "DAGs"
    },
    {
      "is_primary": true,
      "skill_name": "SQL"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Spark"
    },
    {
      "is_primary": true,
      "skill_name": "Git"
    },
    {
      "is_primary": true,
      "skill_name": "Data Migration"
    },
    {
      "is_primary": true,
      "skill_name": "Data Validation"
    },
    {
      "is_primary": false,
      "skill_name": "Anomaly Detection"
    },
    {
      "is_primary": false,
      "skill_name": "Data Governance"
    },
    {
      "is_primary": false,
      "skill_name": "Data Security"
    }
  ],
  "jd_role": {
    "display_name": "Data Engineer II",
    "rationale": null,
    "role_aliases": [
      "Data Engineer",
      "Data Engineer II",
      "Data Pipeline Engineer"
    ],
    "role_archetype": "Data",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": null,
    "certifications": [],
    "company_name": null,
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [],
        "domain": "Other"
      },
      "secondary": null
    },
    "education": [
      {
        "level": "Bachelor\u0027s",
        "qualification": "BTECH/BE - Computer Science (or related)",
        "raw": "Bachelor\u0027s degree in Computer Science, Engineering, or a related field.",
        "requirement": "required"
      }
    ],
    "experience": null,
    "job_locations": [],
    "role": "Data Engineer II",
    "role_aliases": [
      "Data Engineer",
      "Data Engineer II",
      "Data Pipeline Engineer"
    ],
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 7,
        "heading": "Key Responsibilities",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "1. Workflow Deprecation",
          "last_5_words": "and reliability."
        },
        "text": "1. Workflow Deprecation\n\n\nPlan and execute the deprecation of migrated workflows by evaluating current workflows\u0027 dependencies and consumption.\nUtilize tools and best practices to identify, mark, and communicate deprecated workflows to stakeholders.\n\n\n2. Data Migration\n\n\nPlan and execute data migration tasks to move data between different storage systems or formats.\nEnsure the accuracy and completeness of data during migration processes.\nImplement strategies to accelerate the pace of data migration by backfilling, validating, and making new data assets ready for use.\n\n\n3. Data Validation\n\n\nDefine and implement data validation rules to ensure data accuracy, completeness, and reliability.\nUtilize data validation solutions and anomaly detection methods to monitor data quality.\n\n\n4. Workflow Management\n\n\nUse Apache Airflow to schedule, monitor, and automate data workflows.\nDevelop and manage DAGs (Directed Acyclic Graphs) in Airflow to orchestrate complex data processing tasks.\n\n\n5. Data Processing\n\n\nDevelop and maintain data processing scripts using SQL and Apache Spark.\nOptimize data processing for performance and efficiency.\n\n\n6. Version Control\n\n\nUse Git for version control, collaborating with the team to manage the codebase and track changes.\nEnsure best practices in code quality and repository management.\n\n\n7. Continuous Improvement\n\n\nKeep up to date with the latest developments in data engineering and related technologies.\nContinuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability.",
        "word_count": 366
      },
      {
        "bullet_count": 9,
        "heading": "Skills and Qualifications",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Proficient in Git for version",
          "last_5_words": "high performing environment."
        },
        "text": "\uf0b7 Proficient in Git for version control and collaborative development.\n\uf0b7 Proficiency in SQL and experience with database technologies.\n\uf0b7 Experience in data pipeline tools such as Apache Airflow.\n\uf0b7 Strong knowledge of Apache Spark for data processing and transformation.\n\uf0b7 Experience with data migration and validation techniques.\n\uf0b7 Knowledge of data governance and security practices.\n\uf0b7 Strong problem-solving skills and the ability to work independently and in a team.\n\uf0b7 Ability to communicate with global team\n\uf0b7 Ability to work as a team in high performing environment.",
        "word_count": 81
      }
    ],
    "urls": []
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "5c415ce7-e9d4-4ca3-97c6-e28132bfcdbe",
  "stage3_signals": {
    "alias_found": true,
    "alias_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 1.0,
        "slug": "data-engineer",
        "total_count": null
      }
    ],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": [
          {
            "kra_text": "Implements data quality validation rules, reconciliation checks, and anomaly detection to ensure data completeness, accuracy, and consistency.",
            "sentence": "Define and implement data validation rules to ensure data accuracy, completeness, and reliability.",
            "similarity": 0.7516
          },
          {
            "kra_text": "Implements data quality validation rules, reconciliation checks, and anomaly detection to ensure data completeness, accuracy, and consistency.",
            "sentence": "Utilize data validation solutions and anomaly detection methods to monitor data quality.",
            "similarity": 0.7061
          },
          {
            "kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
            "sentence": "\uf0b7 Experience in data pipeline tools such as Apache Airflow.",
            "similarity": 0.6936
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 0.7171,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "React Native Developer",
        "kra_matches": [
          {
            "kra_text": "maintain code quality",
            "sentence": "Ensure best practices in code quality and repository management.",
            "similarity": 0.7165
          },
          {
            "kra_text": "support offline-aware data flow",
            "sentence": "Implement strategies to accelerate the pace of data migration by backfilling, validating, and making new data assets ready for use.",
            "similarity": 0.4426
          },
          {
            "kra_text": "maintain code quality",
            "sentence": "Continuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability.",
            "similarity": 0.4355
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 73,
        "score": 0.5315,
        "slug": "react-native-developer",
        "total_count": null
      },
      {
        "display_name": "Fullstack Developer",
        "kra_matches": [
          {
            "kra_text": "Optimizes application performance from database query efficiency through API response latency to frontend rendering speed and bundle size.",
            "sentence": "Optimize data processing for performance and efficiency.",
            "similarity": 0.5783
          },
          {
            "kra_text": "Delivers features through CI/CD pipelines using automated tests, staged rollouts, feature flags, and incremental deployments.",
            "sentence": "Continuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability.",
            "similarity": 0.5286
          },
          {
            "kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
            "sentence": "\uf0b7 Proficiency in SQL and experience with database technologies.",
            "similarity": 0.4813
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 15,
        "score": 0.5294,
        "slug": "full-stack-engineer",
        "total_count": null
      },
      {
        "display_name": "Java Backend Developer",
        "kra_matches": [
          {
            "kra_text": "backend performance tuning",
            "sentence": "Optimize data processing for performance and efficiency.",
            "similarity": 0.5833
          },
          {
            "kra_text": "code refactoring and defect fixes",
            "sentence": "Ensure best practices in code quality and repository management.",
            "similarity": 0.5049
          },
          {
            "kra_text": "request validation and error handling",
            "sentence": "Define and implement data validation rules to ensure data accuracy, completeness, and reliability.",
            "similarity": 0.4997
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 79,
        "score": 0.5293,
        "slug": "java-backend-developer",
        "total_count": null
      },
      {
        "display_name": "Scala Backend Developer",
        "kra_matches": [
          {
            "kra_text": "business rule and validation logic",
            "sentence": "Define and implement data validation rules to ensure data accuracy, completeness, and reliability.",
            "similarity": 0.539
          },
          {
            "kra_text": "performance and reliability tuning",
            "sentence": "Optimize data processing for performance and efficiency.",
            "similarity": 0.5095
          },
          {
            "kra_text": "backend workflow orchestration",
            "sentence": "Develop and manage DAGs (Directed Acyclic Graphs) in Airflow to orchestrate complex data processing tasks.",
            "similarity": 0.4915
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 87,
        "score": 0.5133,
        "slug": "scala-backend-developer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": 3,
        "matched_skills": [
          "Apache Airflow",
          "Apache Spark",
          "SQL"
        ],
        "role_id": 2,
        "score": 0.4286,
        "slug": "data-engineer",
        "total_count": 7
      },
      {
        "display_name": "Pega Developer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "SQL"
        ],
        "role_id": 24,
        "score": 0.1429,
        "slug": "pega-developer",
        "total_count": 7
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "A",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 1.0,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 1.0,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [],
    "matched_kras": [],
    "matched_skills": [],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.43 does not contradict",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 164,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": null,
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 8628,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "DAGs",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 8629,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Migration",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 8630,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Validation",
        "status": "pending"
      },
      {
        "is_primary": false,
        "queue_id": 8631,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Governance",
        "status": "pending"
      },
      {
        "is_primary": false,
        "queue_id": 8632,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Security",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}
API 2 — extract-details
{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 304,
      "existing_alias_text": "Apache Airflow",
      "input_term": "Apache Airflow",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "Apache Airflow",
        "id": 110,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "apache-airflow",
        "sub_category_id": 130,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 271,
      "existing_alias_text": "SQL",
      "input_term": "SQL",
      "matched_canonical": {
        "category_id": 6,
        "display_name": "SQL",
        "id": 101,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "sql",
        "sub_category_id": 97,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2004,
      "existing_alias_text": "Apache Spark",
      "input_term": "Apache Spark",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 1613,
      "existing_alias_text": "Git",
      "input_term": "Git",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "Git",
        "id": 1002,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "git",
        "sub_category_id": 730,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 338,
      "existing_alias_text": "Anomaly detection",
      "input_term": "Anomaly Detection",
      "matched_canonical": {
        "category_id": 2,
        "display_name": "Anomaly detection",
        "id": 134,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CONCEPT",
        "slug": "anomaly-detection",
        "sub_category_id": 1117,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "Data Engineer",
      "id": 2,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-engineer",
      "source": "db"
    },
    {
      "display_name": "Pega Developer",
      "id": 24,
      "rationale": null,
      "role_archetype": null,
      "slug": "pega-developer",
      "source": "db"
    },
    {
      "display_name": "ML Engineer",
      "id": 3,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-engineer",
      "source": "db"
    },
    {
      "display_name": "MLOps Engineer",
      "id": 16,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-ops-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.43 does not contradict",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Data Pipeline Orchestration",
        "id": 23,
        "rationale": "Workflow engines that schedule, coordinate, and recover batch data jobs. This cluster covers dependency management, retries, backfills, sensors, and operational control of pipeline DAGs.",
        "slug": "data-pipeline-orchestration",
        "source": "db"
      },
      "input_skill": "Apache Airflow",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Pega Programming Languages \u0026 DSLs",
        "id": 267,
        "rationale": "Programming languages and domain-specific languages used in Pega development.",
        "slug": "pega-programming-languages-dsls",
        "source": "db"
      },
      "input_skill": "SQL",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Pega Developer",
          "id": 24,
          "rationale": null,
          "role_archetype": null,
          "slug": "pega-developer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Data Work",
        "id": 21,
        "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
        "slug": "programming-languages-for-data-work",
        "source": "db"
      },
      "input_skill": "SQL",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "ETL and ELT Tooling",
        "id": 24,
        "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
        "slug": "etl-and-elt-tooling",
        "source": "db"
      },
      "input_skill": "Apache Spark",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "React Frontend Development",
        "id": 96,
        "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Git",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Data Quality and Reconciliation",
        "id": 27,
        "rationale": "Validation and reconciliation practices that ensure data is accurate, complete, and trustworthy. This includes rule-based checks, anomaly detection, cross-system reconciliation, and failure triage.",
        "slug": "data-quality-and-reconciliation",
        "source": "db"
      },
      "input_skill": "Anomaly Detection",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Model Monitoring and Drift Detection",
        "id": 45,
        "rationale": "Production observability for model behavior, data drift, concept drift, latency, and quality regressions. ML engineers use this to detect degradation and trigger remediation or retraining.",
        "slug": "model-monitoring-and-drift-detection",
        "source": "db"
      },
      "input_skill": "Anomaly Detection",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        },
        {
          "display_name": "MLOps Engineer",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-ops-engineer",
          "source": "db"
        }
      ]
    }
  ],
  "input_final_skills": [
    "Apache Airflow",
    "DAGs",
    "SQL",
    "Apache Spark",
    "Git",
    "Data Migration",
    "Data Validation",
    "Anomaly Detection",
    "Data Governance",
    "Data Security"
  ],
  "input_llm_skills": [
    "Apache Airflow",
    "DAGs",
    "SQL",
    "Apache Spark",
    "Git",
    "Data Migration",
    "Data Validation",
    "Anomaly Detection",
    "Data Governance",
    "Data Security"
  ],
  "new_aliases_persisted": 0,
  "run_id": "5c415ce7-e9d4-4ca3-97c6-e28132bfcdbe",
  "skills_detail": [
    {
      "aliases_in_db": [
        {
          "alias_text": "Apache Airflow",
          "alias_type": "CANONICAL",
          "id": 304,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "Apache Airflow",
        "id": 110,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "apache-airflow",
        "sub_category_id": 130,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Data Pipeline Orchestration",
            "id": 23,
            "rationale": "Workflow engines that schedule, coordinate, and recover batch data jobs. This cluster covers dependency management, retries, backfills, sensors, and operational control of pipeline DAGs.",
            "slug": "data-pipeline-orchestration",
            "source": "db"
          },
          "input_skill": "Apache Airflow",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Apache Airflow",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "DAGs",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "dags",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "SQL",
          "alias_type": "CANONICAL",
          "id": 271,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 6,
        "display_name": "SQL",
        "id": 101,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "sql",
        "sub_category_id": 97,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Pega Programming Languages \u0026 DSLs",
            "id": 267,
            "rationale": "Programming languages and domain-specific languages used in Pega development.",
            "slug": "pega-programming-languages-dsls",
            "source": "db"
          },
          "input_skill": "SQL",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Pega Developer",
              "id": 24,
              "rationale": null,
              "role_archetype": null,
              "slug": "pega-developer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Data Work",
            "id": 21,
            "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
            "slug": "programming-languages-for-data-work",
            "source": "db"
          },
          "input_skill": "SQL",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "SQL",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Apache Spark",
          "alias_type": "CANONICAL",
          "id": 2004,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "apache spark 3",
          "alias_type": "VERSION",
          "id": 2006,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark",
          "alias_type": "VERSION",
          "id": 2510,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3",
          "alias_type": "VERSION",
          "id": 2007,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3.x",
          "alias_type": "VERSION",
          "id": 2009,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark3",
          "alias_type": "VERSION",
          "id": 2008,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "ETL and ELT Tooling",
            "id": 24,
            "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
            "slug": "etl-and-elt-tooling",
            "source": "db"
          },
          "input_skill": "Apache Spark",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Apache Spark",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Git",
          "alias_type": "CANONICAL",
          "id": 1613,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "Git",
        "id": 1002,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "git",
        "sub_category_id": 730,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "React Frontend Development",
            "id": 96,
            "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Git",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Git",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Data Migration",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "PRACTICE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "data-migration",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Data Validation",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "PRACTICE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "data-validation",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Anomaly detection",
          "alias_type": "CANONICAL",
          "id": 338,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 2,
        "display_name": "Anomaly detection",
        "id": 134,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CONCEPT",
        "slug": "anomaly-detection",
        "sub_category_id": 1117,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Data Quality and Reconciliation",
            "id": 27,
            "rationale": "Validation and reconciliation practices that ensure data is accurate, complete, and trustworthy. This includes rule-based checks, anomaly detection, cross-system reconciliation, and failure triage.",
            "slug": "data-quality-and-reconciliation",
            "source": "db"
          },
          "input_skill": "Anomaly Detection",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Model Monitoring and Drift Detection",
            "id": 45,
            "rationale": "Production observability for model behavior, data drift, concept drift, latency, and quality regressions. ML engineers use this to detect degradation and trigger remediation or retraining.",
            "slug": "model-monitoring-and-drift-detection",
            "source": "db"
          },
          "input_skill": "Anomaly Detection",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            },
            {
              "display_name": "MLOps Engineer",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-ops-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Anomaly Detection",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Data Governance",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "data-governance",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Data Security",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Security Tools",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "data-security",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": [
    "DAGs",
    "Data Migration",
    "Data Validation",
    "Data Governance",
    "Data Security"
  ]
}
API 3 — final-role-output
{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.43 does not contradict",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "Apache Airflow",
      "tag": "in_db"
    },
    {
      "skill": "DAGs",
      "tag": "new"
    },
    {
      "skill": "SQL",
      "tag": "in_db"
    },
    {
      "skill": "Apache Spark",
      "tag": "in_db"
    },
    {
      "skill": "Git",
      "tag": "in_db"
    },
    {
      "skill": "Data Migration",
      "tag": "new"
    },
    {
      "skill": "Data Validation",
      "tag": "new"
    },
    {
      "skill": "Anomaly Detection",
      "tag": "in_db"
    },
    {
      "skill": "Data Governance",
      "tag": "new"
    },
    {
      "skill": "Data Security",
      "tag": "new"
    }
  ],
  "llm_cost_api1_usd": null,
  "llm_cost_api2_usd": null,
  "llm_cost_api3_usd": null,
  "llm_cost_total_usd": null,
  "persistence": {
    "items": [
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Data Pipeline Orchestration",
          "id": 23,
          "rationale": "Workflow engines that schedule, coordinate, and recover batch data jobs. This cluster covers dependency management, retries, backfills, sensors, and operational control of pipeline DAGs.",
          "slug": "data-pipeline-orchestration",
          "source": "db"
        },
        "dimension_id": 23,
        "input_skill": "Apache Airflow",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 110,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Pega Programming Languages \u0026 DSLs",
          "id": 267,
          "rationale": "Programming languages and domain-specific languages used in Pega development.",
          "slug": "pega-programming-languages-dsls",
          "source": "db"
        },
        "dimension_id": 267,
        "input_skill": "SQL",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Pega Developer",
            "id": 24,
            "rationale": null,
            "role_archetype": null,
            "slug": "pega-developer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 101,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Data Work",
          "id": 21,
          "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
          "slug": "programming-languages-for-data-work",
          "source": "db"
        },
        "dimension_id": 21,
        "input_skill": "SQL",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 101,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "ETL and ELT Tooling",
          "id": 24,
          "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
          "slug": "etl-and-elt-tooling",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "Apache Spark",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1350,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "React Frontend Development",
          "id": 96,
          "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 96,
        "input_skill": "Git",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 1002,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Data Quality and Reconciliation",
          "id": 27,
          "rationale": "Validation and reconciliation practices that ensure data is accurate, complete, and trustworthy. This includes rule-based checks, anomaly detection, cross-system reconciliation, and failure triage.",
          "slug": "data-quality-and-reconciliation",
          "source": "db"
        },
        "dimension_id": 27,
        "input_skill": "Anomaly Detection",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 134,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Model Monitoring and Drift Detection",
          "id": 45,
          "rationale": "Production observability for model behavior, data drift, concept drift, latency, and quality regressions. ML engineers use this to detect degradation and trigger remediation or retraining.",
          "slug": "model-monitoring-and-drift-detection",
          "source": "db"
        },
        "dimension_id": 45,
        "input_skill": "Anomaly Detection",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          },
          {
            "display_name": "MLOps Engineer",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-ops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 134,
        "skill_tag": "in_db",
        "skipped_reason": null
      }
    ],
    "new_skills_created": 0,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 0,
    "skipped": 0
  },
  "planner_output": null,
  "run_id": "5c415ce7-e9d4-4ca3-97c6-e28132bfcdbe"
}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…