Pipeline run

9ca819dc-2f75-4d3f-abdf-4d447fa207ae

Pipeline LLM cost (USD)

API 1: $0.0069 API 2: $0.0002 API 3: $0.0000 Total: $0.0070

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description

SPARSE JD role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd

Nature of work · Data pipeline development

Build ETL/data pipelines and automate workflows to feed dashboards and other data products, while monitoring the data lake, pulling in third-party sources, and supporting ML-related data enrichment with stakeholders.

"Build ETL & data pipelines to help feed the data into different business facing data products/dashboards"

Tech stack maturity

Mainstream Modern cache hit

A data engineer with machine-learning as a primary skill typically works in modern data and ML platforms, but the role alone does not imply cutting-edge AI-native or legacy-only stack characteristics.

AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)

1.70 / 5

· Title match

✓ Has AI skill

✓ AI skill (primary)

· AI skill (secondary)

· On AI team

· Builds AI products

vocab breakdown (legacy)

Assistants (×1): —

Frameworks (×2): —

Models / concepts (×3): ML, Machine Learning

Evidence — skills matched in JD (5)

ETL Data Pipelines Data Lake Machine Learning Data Science

Skill cluster (2 dimension groups, role-scoped)

AI Governance and Model Security

Machine Learning

Cross-cutting / unaligned

ETL Data Pipelines Data Lake Data Science

Show KRA description ↓

Liaise with different client stakeholders on ad-hoc analyses related to monitoring the entire data lake Build ETL & data pipelines to help feed the data into different business facing data products/dashboards Explore options to automate processes & workflows and thus drive efficiencies for client Work on ML Model based initiatives to enrich the overall data ecosystem Build algorithms to ingest different 3rd party data sources in the client ecosystem. BA/BS/B.Tech. Have prior experience in data engineering projects, built automation workflows Are interested in learning about Data Science Have a strong attention to detail and care deeply about data quality Proactively reach out to stakeholders to understand data better Enjoy collaborating with team members to drive impact Are a strong communicator; you can adjust communication for technical stakeholders and non-technical stakeholders.

Signals

Skill ml-engineer

0.25

Alias —

—

KRA data-engineer

0.61

Post-classification

Centroidupdated · n=254

Alias collision log—

New-role queue—

New skills captured4

New KRA captured—

Captured for admin review

ETL primary ↔ Data Engineer pending

Data Pipelines primary ↔ Data Engineer pending

Data Lake primary ↔ Data Engineer pending

Data Science ↔ Data Engineer pending

Status: completed Created: 2026-05-27T15:06:11.854651Z Updated: 2026-06-12T16:53:24.675382Z API 3 duration: 6250 ms

Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

domain · Data Engineering & Analytics CASE DOMAIN

slug: data-engineer · id: 2 · source: db

Domain=Data Engineering & Analytics; The JD centers on building ETL/data pipelines, automating workflows, ingesting third-party data, and supporting data lake and ML-related data engineering work.

Matched skills

ETLdata pipelinesdata lakeautomation workflowsML Model3rd party data sourcesdata qualitydata engineering projects

Matched dimensions

Data Pipeline EngineeringData Lake MonitoringWorkflow AutomationThird-party Data IngestionData QualityStakeholder CollaborationData Ecosystem SupportML Data Enablement

Matched KRAs

Liaise with different client stakeholders on ad-hoc analysesBuild ETL & data pipelinesExplore options to automate processes & workflowsDrive efficiencies for clientWork on ML Model based initiativesBuild algorithms to ingest different 3rd party data sources

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

New skills

Skill↔dim saved

Role↔dim saved

Skipped

Job description

Role And Responsibilities

Liaise with different client stakeholders on ad-hoc analyses related to monitoring the entire data lake
Build ETL & data pipelines to help feed the data into different business facing data products/dashboards
Explore options to automate processes & workflows and thus drive efficiencies for client
Work on ML Model based initiatives to enrich the overall data ecosystem
Build algorithms to ingest different 3rd party data sources in the client ecosystem.


Requirement

BA/BS/B.Tech.
Have prior experience in data engineering projects, built automation workflows
Are interested in learning about Data Science
Have a strong attention to detail and care deeply about data quality
Proactively reach out to stakeholders to understand data better
Enjoy collaborating with team members to drive impact
Are a strong communicator; you can adjust communication for technical stakeholders and non-technical stakeholders.


(ref:hirist.com)

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

ETL Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields

Category: Data Engineering Tools
Sub-category: general
Skill nature: PRACTICE
Volatility: MEDIUM
Typical lifespan: MULTI_YEAR
Version strategy: UNVERSIONED

Data Pipelines Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields

Category: Data Engineering Tools
Sub-category: general
Skill nature: PRACTICE
Volatility: MEDIUM
Typical lifespan: MULTI_YEAR
Version strategy: UNVERSIONED

Data Lake Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Data Lakes id=1358 · data-lakes

Aliases — catalog

Data Lakes (CANONICAL)

Context tags (catalog)

AWS Lake Formation Azure Data Lake ETL big data data catalog data governance data ingestion data lakes vs data warehouses data modeling data pipelines data warehousing partitioning real-time analytics schema evolution serverless architecture

Stored enrichment (catalog DB)

Category: Architecture
Sub-category: Data Lake Architecture
Confidence: 0.90
Version strategy: NOT_APPLICABLE

Maturity reasoning: Data lakes are widely listed in cloud/data platform job descriptions and are a standard architecture in AWS, Azure, and GCP ecosystems; they’re a common hiring-pipeline staple rather than a niche pattern.

Skill profile (library / DB)

Skill nature: PATTERN
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 1
Sub-category id: 1025
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Cloud Storage and Data Services Catalog dimension db id 144

Library dimension (catalog)

Roles linked in library: Cloud Architect
React Frontend Development Catalog dimension db id 96

Library dimension (catalog)

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Cloud Storage and Data Services cloud-storage-and-data-services	—	—	Skipped — no persistable v3 meta for new skill skill_not_in_db_v3_proposed
React Frontend Development d_init_01	—	—	Skipped — no persistable v3 meta for new skill skill_not_in_db_v3_proposed

Machine Learning Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Machine Learning id=1356 · machine-learning

Aliases — catalog

Machine Learning (CANONICAL)

Context tags (catalog)

Keras PyTorch TensorFlow cross-validation data preprocessing ensemble methods feature engineering hyperparameter tuning model evaluation natural language processing neural networks reinforcement learning scikit-learn supervised learning unsupervised learning

Stored enrichment (catalog DB)

Category: Concept
Sub-category: Machine Learning
Confidence: 0.98
Version strategy: NOT_APPLICABLE

Maturity reasoning: Machine Learning appears in large volumes of job descriptions across data, product, and platform roles, and major cloud vendors (AWS, Google Cloud, Azure) offer dedicated ML services and certifications, indicating broad adoption.

Skill profile (library / DB)

Skill nature: CONCEPT
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 2
Sub-category id: 1024
Extractable: True
Also category: False

Dimensions (API 2 worklist)

AI Governance and Model Security Catalog dimension db id 50

Library dimension (catalog)

Roles linked in library: AI Engineer, ML Engineer, MLOps Engineer
React Frontend Development Catalog dimension db id 96

Library dimension (catalog)

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
AI Governance and Model Security ai-governance-and-model-security	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
React Frontend Development d_init_01	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Data Science Secondary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields

Category: Concepts
Sub-category: general
Skill nature: CONCEPT
Volatility: MEDIUM
Typical lifespan: MULTI_YEAR
Version strategy: UNVERSIONED

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill	Tag	Dimension	Skill↔dim	Role↔dim	Outcome	Notes
Data Lake	new	Cloud Storage and Data Services cloud-storage-and-data-services	—	—	Skipped — no persistable v3 meta for new skill	skill_not_in_db_v3_proposed
Data Lake	new	React Frontend Development d_init_01	—	—	Skipped — no persistable v3 meta for new skill	skill_not_in_db_v3_proposed
Machine Learning	in_db	AI Governance and Model Security ai-governance-and-model-security	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Machine Learning	in_db	React Frontend Development d_init_01	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Library artifacts (this run)

Kind	Detail	DB id
canonical_skill_proposed	ETL \| type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR
canonical_skill_proposed	Data Pipelines \| type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR
canonical_skill_proposed	Data Science \| type=Concepts subtype=general nature=CONCEPT lifespan=MULTI_YEAR
dimension_skill_link_proposed	Data Lake ↔ Cloud Storage and Data Services
dimension_skill_link_proposed	Data Lake ↔ React Frontend Development

nano JD Parser — gpt-4.1-nano click to toggle

DomainOther

JD type pass

Show raw JSON

{
  "JD_type": "pass",
  "about_company": null,
  "certifications": [],
  "company_name": null,
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [],
      "domain": "Other"
    },
    "secondary": null
  },
  "education": [
    {
      "level": "Bachelor\u0027s",
      "qualification": "BTECH/BE/BSC - Any Discipline",
      "raw": "BA/BS/B.Tech.",
      "requirement": "required"
    }
  ],
  "experience": null,
  "job_locations": [],
  "role": null,
  "role_aliases": [],
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 5,
      "heading": "Role And Responsibilities",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Liaise with different client stakeholders",
        "last_5_words": "in the client ecosystem."
      },
      "text": "Liaise with different client stakeholders on ad-hoc analyses related to monitoring the entire data lake\nBuild ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards\nExplore options to automate processes \u0026 workflows and thus drive efficiencies for client\nWork on ML Model based initiatives to enrich the overall data ecosystem\nBuild algorithms to ingest different 3rd party data sources in the client ecosystem.",
      "word_count": 54
    },
    {
      "bullet_count": 7,
      "heading": "Requirement",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "BA/BS/B.Tech. Have prior experience",
        "last_5_words": "technical stakeholders and non-technical stakeholders."
      },
      "text": "BA/BS/B.Tech.\nHave prior experience in data engineering projects, built automation workflows\nAre interested in learning about Data Science\nHave a strong attention to detail and care deeply about data quality\nProactively reach out to stakeholders to understand data better\nEnjoy collaborating with team members to drive impact\nAre a strong communicator; you can adjust communication for technical stakeholders and non-technical stakeholders.",
      "word_count": 66
    }
  ],
  "urls": []
}

API 1 — extract-from-jd click to toggle

{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "ETL"
    },
    {
      "is_primary": true,
      "skill_name": "Data Pipelines"
    },
    {
      "is_primary": true,
      "skill_name": "Data Lake"
    },
    {
      "is_primary": true,
      "skill_name": "Machine Learning"
    },
    {
      "is_primary": false,
      "skill_name": "Data Science"
    }
  ],
  "jd_role": null,
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": null,
    "certifications": [],
    "company_name": null,
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [],
        "domain": "Other"
      },
      "secondary": null
    },
    "education": [
      {
        "level": "Bachelor\u0027s",
        "qualification": "BTECH/BE/BSC - Any Discipline",
        "raw": "BA/BS/B.Tech.",
        "requirement": "required"
      }
    ],
    "experience": null,
    "job_locations": [],
    "role": null,
    "role_aliases": [],
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 5,
        "heading": "Role And Responsibilities",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Liaise with different client stakeholders",
          "last_5_words": "in the client ecosystem."
        },
        "text": "Liaise with different client stakeholders on ad-hoc analyses related to monitoring the entire data lake\nBuild ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards\nExplore options to automate processes \u0026 workflows and thus drive efficiencies for client\nWork on ML Model based initiatives to enrich the overall data ecosystem\nBuild algorithms to ingest different 3rd party data sources in the client ecosystem.",
        "word_count": 54
      },
      {
        "bullet_count": 7,
        "heading": "Requirement",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "BA/BS/B.Tech. Have prior experience",
          "last_5_words": "technical stakeholders and non-technical stakeholders."
        },
        "text": "BA/BS/B.Tech.\nHave prior experience in data engineering projects, built automation workflows\nAre interested in learning about Data Science\nHave a strong attention to detail and care deeply about data quality\nProactively reach out to stakeholders to understand data better\nEnjoy collaborating with team members to drive impact\nAre a strong communicator; you can adjust communication for technical stakeholders and non-technical stakeholders.",
        "word_count": 66
      }
    ],
    "urls": []
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "9ca819dc-2f75-4d3f-abdf-4d447fa207ae",
  "stage3_signals": {
    "alias_found": false,
    "alias_match_roles": [],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": [
          {
            "kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
            "sentence": "Build ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards",
            "similarity": 0.643
          },
          {
            "kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
            "sentence": "Build algorithms to ingest different 3rd party data sources in the client ecosystem.",
            "similarity": 0.6373
          },
          {
            "kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
            "sentence": "Proactively reach out to stakeholders to understand data better",
            "similarity": 0.5502
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 0.6102,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "Flutter Developer",
        "kra_matches": [
          {
            "kra_text": "collaborate with design, product, and backend teams",
            "sentence": "Enjoy collaborating with team members to drive impact",
            "similarity": 0.5813
          },
          {
            "kra_text": "integrate external APIs and data sources",
            "sentence": "Build algorithms to ingest different 3rd party data sources in the client ecosystem.",
            "similarity": 0.5764
          },
          {
            "kra_text": "integrate external APIs and data sources",
            "sentence": "Build ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards",
            "similarity": 0.4577
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 74,
        "score": 0.5385,
        "slug": "flutter-developer",
        "total_count": null
      },
      {
        "display_name": "Svelte Frontend Developer",
        "kra_matches": [
          {
            "kra_text": "backend data integration",
            "sentence": "Build ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards",
            "similarity": 0.541
          },
          {
            "kra_text": "backend data integration",
            "sentence": "Build algorithms to ingest different 3rd party data sources in the client ecosystem.",
            "similarity": 0.5392
          },
          {
            "kra_text": "backend data integration",
            "sentence": "Liaise with different client stakeholders on ad-hoc analyses related to monitoring the entire data lake",
            "similarity": 0.4617
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 92,
        "score": 0.514,
        "slug": "svelte-frontend-developer",
        "total_count": null
      },
      {
        "display_name": "Engineering Manager",
        "kra_matches": [
          {
            "kra_text": "Set team goals and delivery plans",
            "sentence": "Enjoy collaborating with team members to drive impact",
            "similarity": 0.5032
          },
          {
            "kra_text": "manage stakeholder alignment and tradeoffs",
            "sentence": "Are a strong communicator; you can adjust communication for technical stakeholders and non-technical stakeholders.",
            "similarity": 0.4845
          },
          {
            "kra_text": "manage stakeholder alignment and tradeoffs",
            "sentence": "Proactively reach out to stakeholders to understand data better",
            "similarity": 0.482
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 121,
        "score": 0.4899,
        "slug": "engineering-manager",
        "total_count": null
      },
      {
        "display_name": "MLOps Engineer",
        "kra_matches": [
          {
            "kra_text": "Supports ML platform incidents by diagnosing model serving failures, feature store pipeline breaks, and training environment configuration issues.",
            "sentence": "Work on ML Model based initiatives to enrich the overall data ecosystem",
            "similarity": 0.5558
          },
          {
            "kra_text": "Sets up model monitoring dashboards, data drift detection, prediction performance tracking, and alert routing for production ML systems.",
            "sentence": "Build ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards",
            "similarity": 0.4633
          },
          {
            "kra_text": "Automates ML platform operations including scheduled retraining triggers, pipeline orchestration, evaluation workflows, and alerting configuration.",
            "sentence": "Explore options to automate processes \u0026 workflows and thus drive efficiencies for client",
            "similarity": 0.4495
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 16,
        "score": 0.4896,
        "slug": "ml-ops-engineer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "ML Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Machine Learning"
        ],
        "role_id": 3,
        "score": 0.25,
        "slug": "ml-engineer",
        "total_count": 4
      },
      {
        "display_name": "AI Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Machine Learning"
        ],
        "role_id": 13,
        "score": 0.25,
        "slug": "ai-engineer",
        "total_count": 4
      },
      {
        "display_name": "MLOps Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Machine Learning"
        ],
        "role_id": 16,
        "score": 0.25,
        "slug": "ml-ops-engineer",
        "total_count": 4
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "DOMAIN",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 0.95,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 0.95,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [
      "Data Pipeline Engineering",
      "Data Lake Monitoring",
      "Workflow Automation",
      "Third-party Data Ingestion",
      "Data Quality",
      "Stakeholder Collaboration",
      "Data Ecosystem Support",
      "ML Data Enablement"
    ],
    "matched_kras": [
      "Liaise with different client stakeholders on ad-hoc analyses",
      "Build ETL \u0026 data pipelines",
      "Explore options to automate processes \u0026 workflows",
      "Drive efficiencies for client",
      "Work on ML Model based initiatives",
      "Build algorithms to ingest different 3rd party data sources"
    ],
    "matched_skills": [
      "ETL",
      "data pipelines",
      "data lake",
      "automation workflows",
      "ML Model",
      "3rd party data sources",
      "data quality",
      "data engineering projects"
    ],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Domain=Data Engineering \u0026 Analytics; The JD centers on building ETL/data pipelines, automating workflows, ingesting third-party data, and supporting data lake and ML-related data engineering work.",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 254,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": null,
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 12668,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "ETL",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 12669,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Pipelines",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 12670,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Lake",
        "status": "pending"
      },
      {
        "is_primary": false,
        "queue_id": 12671,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Science",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}

API 2 — extract-details

{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "TODO: REMOVE AFTER TESTING \u2014 alias DB write disabled",
      "alias_persisted": false,
      "existing_alias_id": 2017,
      "existing_alias_text": "Data Lakes",
      "input_term": "Data Lake",
      "matched_canonical": {
        "category_id": 1,
        "display_name": "Data Lakes",
        "id": 1358,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PATTERN",
        "slug": "data-lakes",
        "sub_category_id": 1025,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "embedding_alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2015,
      "existing_alias_text": "Machine Learning",
      "input_term": "Machine Learning",
      "matched_canonical": {
        "category_id": 2,
        "display_name": "Machine Learning",
        "id": 1356,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CONCEPT",
        "slug": "machine-learning",
        "sub_category_id": 1024,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "Cloud Architect",
      "id": 9,
      "rationale": null,
      "role_archetype": null,
      "slug": "cloud-architect",
      "source": "db"
    },
    {
      "display_name": "AI Engineer",
      "id": 13,
      "rationale": null,
      "role_archetype": null,
      "slug": "ai-engineer",
      "source": "db"
    },
    {
      "display_name": "ML Engineer",
      "id": 3,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-engineer",
      "source": "db"
    },
    {
      "display_name": "MLOps Engineer",
      "id": 16,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-ops-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD centers on building ETL/data pipelines, automating workflows, ingesting third-party data, and supporting data lake and ML-related data engineering work.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Storage and Data Services",
        "id": 144,
        "rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
        "slug": "cloud-storage-and-data-services",
        "source": "db"
      },
      "input_skill": "Data Lake",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cloud Architect",
          "id": 9,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-architect",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "React Frontend Development",
        "id": 96,
        "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Data Lake",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "AI Governance and Model Security",
        "id": 50,
        "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
        "slug": "ai-governance-and-model-security",
        "source": "db"
      },
      "input_skill": "Machine Learning",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "AI Engineer",
          "id": 13,
          "rationale": null,
          "role_archetype": null,
          "slug": "ai-engineer",
          "source": "db"
        },
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        },
        {
          "display_name": "MLOps Engineer",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-ops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "React Frontend Development",
        "id": 96,
        "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Machine Learning",
      "llm_role": null,
      "roles_from_db": []
    }
  ],
  "input_final_skills": [
    "ETL",
    "Data Pipelines",
    "Data Lake",
    "Machine Learning",
    "Data Science"
  ],
  "input_llm_skills": [
    "ETL",
    "Data Pipelines",
    "Data Lake",
    "Machine Learning",
    "Data Science"
  ],
  "new_aliases_persisted": 0,
  "run_id": "9ca819dc-2f75-4d3f-abdf-4d447fa207ae",
  "skills_detail": [
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "ETL",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "PRACTICE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "etl",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Data Pipelines",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "PRACTICE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "data-pipelines",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Data Lakes",
          "alias_type": "CANONICAL",
          "id": 2017,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 1,
        "display_name": "Data Lakes",
        "id": 1358,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PATTERN",
        "slug": "data-lakes",
        "sub_category_id": 1025,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Storage and Data Services",
            "id": 144,
            "rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
            "slug": "cloud-storage-and-data-services",
            "source": "db"
          },
          "input_skill": "Data Lake",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cloud Architect",
              "id": 9,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-architect",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "React Frontend Development",
            "id": 96,
            "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Data Lake",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Data Lake",
      "matched_via": "embedding_alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Machine Learning",
          "alias_type": "CANONICAL",
          "id": 2015,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 2,
        "display_name": "Machine Learning",
        "id": 1356,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CONCEPT",
        "slug": "machine-learning",
        "sub_category_id": 1024,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "AI Governance and Model Security",
            "id": 50,
            "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
            "slug": "ai-governance-and-model-security",
            "source": "db"
          },
          "input_skill": "Machine Learning",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "AI Engineer",
              "id": 13,
              "rationale": null,
              "role_archetype": null,
              "slug": "ai-engineer",
              "source": "db"
            },
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            },
            {
              "display_name": "MLOps Engineer",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-ops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "React Frontend Development",
            "id": 96,
            "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Machine Learning",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Machine Learning",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Data Science",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Concepts",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "data-science",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": [
    "ETL",
    "Data Pipelines",
    "Data Science"
  ]
}

API 3 — final-role-output

{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD centers on building ETL/data pipelines, automating workflows, ingesting third-party data, and supporting data lake and ML-related data engineering work.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "ETL",
      "tag": "new"
    },
    {
      "skill": "Data Pipelines",
      "tag": "new"
    },
    {
      "skill": "Data Lake",
      "tag": "in_db"
    },
    {
      "skill": "Machine Learning",
      "tag": "in_db"
    },
    {
      "skill": "Data Science",
      "tag": "new"
    }
  ],
  "llm_cost_api1_usd": null,
  "llm_cost_api2_usd": null,
  "llm_cost_api3_usd": null,
  "llm_cost_total_usd": null,
  "persistence": {
    "items": [
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Storage and Data Services",
          "id": 144,
          "rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
          "slug": "cloud-storage-and-data-services",
          "source": "db"
        },
        "dimension_id": 144,
        "input_skill": "Data Lake",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cloud Architect",
            "id": 9,
            "rationale": null,
            "role_archetype": null,
            "slug": "cloud-architect",
            "source": "db"
          }
        ],
        "skill_dimension_saved": false,
        "skill_id": null,
        "skill_tag": "new",
        "skipped_reason": "skill_not_in_db_v3_proposed"
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "React Frontend Development",
          "id": 96,
          "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 96,
        "input_skill": "Data Lake",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": false,
        "skill_id": null,
        "skill_tag": "new",
        "skipped_reason": "skill_not_in_db_v3_proposed"
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "AI Governance and Model Security",
          "id": 50,
          "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
          "slug": "ai-governance-and-model-security",
          "source": "db"
        },
        "dimension_id": 50,
        "input_skill": "Machine Learning",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "AI Engineer",
            "id": 13,
            "rationale": null,
            "role_archetype": null,
            "slug": "ai-engineer",
            "source": "db"
          },
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          },
          {
            "display_name": "MLOps Engineer",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-ops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1356,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "React Frontend Development",
          "id": 96,
          "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 96,
        "input_skill": "Machine Learning",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 1356,
        "skill_tag": "in_db",
        "skipped_reason": null
      }
    ],
    "new_skills_created": 0,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 0,
    "skipped": 2
  },
  "planner_output": null,
  "run_id": "9ca819dc-2f75-4d3f-abdf-4d447fa207ae"
}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…