← Back to history

Pipeline run

9ca819dc-2f75-4d3f-abdf-4d447fa207ae

Pipeline LLM cost (USD)
API 1: $0.0069 API 2: $0.0002 API 3: $0.0000 Total: $0.0070

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
SPARSE JD role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd
Nature of work · Data pipeline development
Build ETL/data pipelines and automate workflows to feed dashboards and other data products, while monitoring the data lake, pulling in third-party sources, and supporting ML-related data enrichment with stakeholders.
"Build ETL & data pipelines to help feed the data into different business facing data products/dashboards"
Tech stack maturity
Mainstream Modern cache hit
A data engineer with machine-learning as a primary skill typically works in modern data and ML platforms, but the role alone does not imply cutting-edge AI-native or legacy-only stack characteristics.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
1.70 / 5
· Title match
Has AI skill
AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
Frameworks (×2):
Models / concepts (×3): ML, Machine Learning
Evidence — skills matched in JD (5)
ETL Data Pipelines Data Lake Machine Learning Data Science
Skill cluster (2 dimension groups, role-scoped)
AI Governance and Model Security
Machine Learning
Cross-cutting / unaligned
ETL Data Pipelines Data Lake Data Science
Show KRA description ↓
Liaise with different client stakeholders on ad-hoc analyses related to monitoring the entire data lake Build ETL & data pipelines to help feed the data into different business facing data products/dashboards Explore options to automate processes & workflows and thus drive efficiencies for client Work on ML Model based initiatives to enrich the overall data ecosystem Build algorithms to ingest different 3rd party data sources in the client ecosystem. BA/BS/B.Tech. Have prior experience in data engineering projects, built automation workflows Are interested in learning about Data Science Have a strong attention to detail and care deeply about data quality Proactively reach out to stakeholders to understand data better Enjoy collaborating with team members to drive impact Are a strong communicator; you can adjust communication for technical stakeholders and non-technical stakeholders.

Signals

Skill ml-engineer
0.25
Alias
KRA data-engineer
0.61

Post-classification

Centroidupdated · n=254
Alias collision log
New-role queue
New skills captured4
New KRA captured

Captured for admin review

ETL primary Data Engineer pending
Data Pipelines primary Data Engineer pending
Data Lake primary Data Engineer pending
Data Science Data Engineer pending
Status: completed Created: 2026-05-27T15:06:11.854651Z Updated: 2026-06-12T16:53:24.675382Z API 3 duration: 6250 ms
Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

domain · Data Engineering & Analytics CASE DOMAIN

slug: data-engineer · id: 2 · source: db

Domain=Data Engineering & Analytics; The JD centers on building ETL/data pipelines, automating workflows, ingesting third-party data, and supporting data lake and ML-related data engineering work.

Matched skills

ETLdata pipelinesdata lakeautomation workflowsML Model3rd party data sourcesdata qualitydata engineering projects

Matched dimensions

Data Pipeline EngineeringData Lake MonitoringWorkflow AutomationThird-party Data IngestionData QualityStakeholder CollaborationData Ecosystem SupportML Data Enablement

Matched KRAs

Liaise with different client stakeholders on ad-hoc analysesBuild ETL & data pipelinesExplore options to automate processes & workflowsDrive efficiencies for clientWork on ML Model based initiativesBuild algorithms to ingest different 3rd party data sources

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

0
New skills
0
Skill↔dim saved
0
Role↔dim saved
2
Skipped

Job description

Role And Responsibilities

Liaise with different client stakeholders on ad-hoc analyses related to monitoring the entire data lake
Build ETL & data pipelines to help feed the data into different business facing data products/dashboards
Explore options to automate processes & workflows and thus drive efficiencies for client
Work on ML Model based initiatives to enrich the overall data ecosystem
Build algorithms to ingest different 3rd party data sources in the client ecosystem.


Requirement

BA/BS/B.Tech.
Have prior experience in data engineering projects, built automation workflows
Are interested in learning about Data Science
Have a strong attention to detail and care deeply about data quality
Proactively reach out to stakeholders to understand data better
Enjoy collaborating with team members to drive impact
Are a strong communicator; you can adjust communication for technical stakeholders and non-technical stakeholders.


(ref:hirist.com)

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

ETL Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
PRACTICE
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Data Pipelines Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
PRACTICE
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Data Lake Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Data Lakes id=1358 · data-lakes

Aliases — catalog

  • Data Lakes (CANONICAL)

Context tags (catalog)

AWS Lake Formation Azure Data Lake ETL big data data catalog data governance data ingestion data lakes vs data warehouses data modeling data pipelines data warehousing partitioning real-time analytics schema evolution serverless architecture

Stored enrichment (catalog DB)

Category
Architecture
Sub-category
Data Lake Architecture
Confidence
0.90
Version strategy
NOT_APPLICABLE

Maturity reasoning: Data lakes are widely listed in cloud/data platform job descriptions and are a standard architecture in AWS, Azure, and GCP ecosystems; they’re a common hiring-pipeline staple rather than a niche pattern.

Skill profile (library / DB)

Skill nature
PATTERN
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
1
Sub-category id
1025
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Cloud Storage and Data Services Catalog dimension db id 144

    Library dimension (catalog)

    Roles linked in library: Cloud Architect

  • React Frontend Development Catalog dimension db id 96

    Library dimension (catalog)

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Storage and Data Services
cloud-storage-and-data-services
Skipped — no persistable v3 meta for new skill
skill_not_in_db_v3_proposed
React Frontend Development
d_init_01
Skipped — no persistable v3 meta for new skill
skill_not_in_db_v3_proposed
Machine Learning Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Machine Learning id=1356 · machine-learning

Aliases — catalog

  • Machine Learning (CANONICAL)

Context tags (catalog)

Keras PyTorch TensorFlow cross-validation data preprocessing ensemble methods feature engineering hyperparameter tuning model evaluation natural language processing neural networks reinforcement learning scikit-learn supervised learning unsupervised learning

Stored enrichment (catalog DB)

Category
Concept
Sub-category
Machine Learning
Confidence
0.98
Version strategy
NOT_APPLICABLE

Maturity reasoning: Machine Learning appears in large volumes of job descriptions across data, product, and platform roles, and major cloud vendors (AWS, Google Cloud, Azure) offer dedicated ML services and certifications, indicating broad adoption.

Skill profile (library / DB)

Skill nature
CONCEPT
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
2
Sub-category id
1024
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • AI Governance and Model Security Catalog dimension db id 50

    Library dimension (catalog)

    Roles linked in library: AI Engineer, ML Engineer, MLOps Engineer

  • React Frontend Development Catalog dimension db id 96

    Library dimension (catalog)

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
AI Governance and Model Security
ai-governance-and-model-security
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
React Frontend Development
d_init_01
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Data Science Secondary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Concepts
Sub-category
general
Skill nature
CONCEPT
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill Tag Dimension Skill↔dim Role↔dim Outcome Notes
Data Lake new
Cloud Storage and Data Services
cloud-storage-and-data-services
Skipped — no persistable v3 meta for new skill skill_not_in_db_v3_proposed
Data Lake new
React Frontend Development
d_init_01
Skipped — no persistable v3 meta for new skill skill_not_in_db_v3_proposed
Machine Learning in_db
AI Governance and Model Security
ai-governance-and-model-security
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Machine Learning in_db
React Frontend Development
d_init_01
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Library artifacts (this run)

Kind Detail DB id
canonical_skill_proposed ETL | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR
canonical_skill_proposed Data Pipelines | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR
canonical_skill_proposed Data Science | type=Concepts subtype=general nature=CONCEPT lifespan=MULTI_YEAR
dimension_skill_link_proposed Data Lake ↔ Cloud Storage and Data Services
dimension_skill_link_proposed Data Lake ↔ React Frontend Development
nano JD Parser — gpt-4.1-nano click to toggle
DomainOther
JD type pass
Show raw JSON
{
  "JD_type": "pass",
  "about_company": null,
  "certifications": [],
  "company_name": null,
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [],
      "domain": "Other"
    },
    "secondary": null
  },
  "education": [
    {
      "level": "Bachelor\u0027s",
      "qualification": "BTECH/BE/BSC - Any Discipline",
      "raw": "BA/BS/B.Tech.",
      "requirement": "required"
    }
  ],
  "experience": null,
  "job_locations": [],
  "role": null,
  "role_aliases": [],
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 5,
      "heading": "Role And Responsibilities",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Liaise with different client stakeholders",
        "last_5_words": "in the client ecosystem."
      },
      "text": "Liaise with different client stakeholders on ad-hoc analyses related to monitoring the entire data lake\nBuild ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards\nExplore options to automate processes \u0026 workflows and thus drive efficiencies for client\nWork on ML Model based initiatives to enrich the overall data ecosystem\nBuild algorithms to ingest different 3rd party data sources in the client ecosystem.",
      "word_count": 54
    },
    {
      "bullet_count": 7,
      "heading": "Requirement",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "BA/BS/B.Tech. Have prior experience",
        "last_5_words": "technical stakeholders and non-technical stakeholders."
      },
      "text": "BA/BS/B.Tech.\nHave prior experience in data engineering projects, built automation workflows\nAre interested in learning about Data Science\nHave a strong attention to detail and care deeply about data quality\nProactively reach out to stakeholders to understand data better\nEnjoy collaborating with team members to drive impact\nAre a strong communicator; you can adjust communication for technical stakeholders and non-technical stakeholders.",
      "word_count": 66
    }
  ],
  "urls": []
}
API 1 — extract-from-jd click to toggle
{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "ETL"
    },
    {
      "is_primary": true,
      "skill_name": "Data Pipelines"
    },
    {
      "is_primary": true,
      "skill_name": "Data Lake"
    },
    {
      "is_primary": true,
      "skill_name": "Machine Learning"
    },
    {
      "is_primary": false,
      "skill_name": "Data Science"
    }
  ],
  "jd_role": null,
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": null,
    "certifications": [],
    "company_name": null,
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [],
        "domain": "Other"
      },
      "secondary": null
    },
    "education": [
      {
        "level": "Bachelor\u0027s",
        "qualification": "BTECH/BE/BSC - Any Discipline",
        "raw": "BA/BS/B.Tech.",
        "requirement": "required"
      }
    ],
    "experience": null,
    "job_locations": [],
    "role": null,
    "role_aliases": [],
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 5,
        "heading": "Role And Responsibilities",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Liaise with different client stakeholders",
          "last_5_words": "in the client ecosystem."
        },
        "text": "Liaise with different client stakeholders on ad-hoc analyses related to monitoring the entire data lake\nBuild ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards\nExplore options to automate processes \u0026 workflows and thus drive efficiencies for client\nWork on ML Model based initiatives to enrich the overall data ecosystem\nBuild algorithms to ingest different 3rd party data sources in the client ecosystem.",
        "word_count": 54
      },
      {
        "bullet_count": 7,
        "heading": "Requirement",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "BA/BS/B.Tech. Have prior experience",
          "last_5_words": "technical stakeholders and non-technical stakeholders."
        },
        "text": "BA/BS/B.Tech.\nHave prior experience in data engineering projects, built automation workflows\nAre interested in learning about Data Science\nHave a strong attention to detail and care deeply about data quality\nProactively reach out to stakeholders to understand data better\nEnjoy collaborating with team members to drive impact\nAre a strong communicator; you can adjust communication for technical stakeholders and non-technical stakeholders.",
        "word_count": 66
      }
    ],
    "urls": []
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "9ca819dc-2f75-4d3f-abdf-4d447fa207ae",
  "stage3_signals": {
    "alias_found": false,
    "alias_match_roles": [],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": [
          {
            "kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
            "sentence": "Build ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards",
            "similarity": 0.643
          },
          {
            "kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
            "sentence": "Build algorithms to ingest different 3rd party data sources in the client ecosystem.",
            "similarity": 0.6373
          },
          {
            "kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
            "sentence": "Proactively reach out to stakeholders to understand data better",
            "similarity": 0.5502
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 0.6102,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "Flutter Developer",
        "kra_matches": [
          {
            "kra_text": "collaborate with design, product, and backend teams",
            "sentence": "Enjoy collaborating with team members to drive impact",
            "similarity": 0.5813
          },
          {
            "kra_text": "integrate external APIs and data sources",
            "sentence": "Build algorithms to ingest different 3rd party data sources in the client ecosystem.",
            "similarity": 0.5764
          },
          {
            "kra_text": "integrate external APIs and data sources",
            "sentence": "Build ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards",
            "similarity": 0.4577
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 74,
        "score": 0.5385,
        "slug": "flutter-developer",
        "total_count": null
      },
      {
        "display_name": "Svelte Frontend Developer",
        "kra_matches": [
          {
            "kra_text": "backend data integration",
            "sentence": "Build ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards",
            "similarity": 0.541
          },
          {
            "kra_text": "backend data integration",
            "sentence": "Build algorithms to ingest different 3rd party data sources in the client ecosystem.",
            "similarity": 0.5392
          },
          {
            "kra_text": "backend data integration",
            "sentence": "Liaise with different client stakeholders on ad-hoc analyses related to monitoring the entire data lake",
            "similarity": 0.4617
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 92,
        "score": 0.514,
        "slug": "svelte-frontend-developer",
        "total_count": null
      },
      {
        "display_name": "Engineering Manager",
        "kra_matches": [
          {
            "kra_text": "Set team goals and delivery plans",
            "sentence": "Enjoy collaborating with team members to drive impact",
            "similarity": 0.5032
          },
          {
            "kra_text": "manage stakeholder alignment and tradeoffs",
            "sentence": "Are a strong communicator; you can adjust communication for technical stakeholders and non-technical stakeholders.",
            "similarity": 0.4845
          },
          {
            "kra_text": "manage stakeholder alignment and tradeoffs",
            "sentence": "Proactively reach out to stakeholders to understand data better",
            "similarity": 0.482
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 121,
        "score": 0.4899,
        "slug": "engineering-manager",
        "total_count": null
      },
      {
        "display_name": "MLOps Engineer",
        "kra_matches": [
          {
            "kra_text": "Supports ML platform incidents by diagnosing model serving failures, feature store pipeline breaks, and training environment configuration issues.",
            "sentence": "Work on ML Model based initiatives to enrich the overall data ecosystem",
            "similarity": 0.5558
          },
          {
            "kra_text": "Sets up model monitoring dashboards, data drift detection, prediction performance tracking, and alert routing for production ML systems.",
            "sentence": "Build ETL \u0026 data pipelines to help feed the data into different business facing data products/dashboards",
            "similarity": 0.4633
          },
          {
            "kra_text": "Automates ML platform operations including scheduled retraining triggers, pipeline orchestration, evaluation workflows, and alerting configuration.",
            "sentence": "Explore options to automate processes \u0026 workflows and thus drive efficiencies for client",
            "similarity": 0.4495
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 16,
        "score": 0.4896,
        "slug": "ml-ops-engineer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "ML Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Machine Learning"
        ],
        "role_id": 3,
        "score": 0.25,
        "slug": "ml-engineer",
        "total_count": 4
      },
      {
        "display_name": "AI Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Machine Learning"
        ],
        "role_id": 13,
        "score": 0.25,
        "slug": "ai-engineer",
        "total_count": 4
      },
      {
        "display_name": "MLOps Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Machine Learning"
        ],
        "role_id": 16,
        "score": 0.25,
        "slug": "ml-ops-engineer",
        "total_count": 4
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "DOMAIN",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 0.95,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 0.95,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [
      "Data Pipeline Engineering",
      "Data Lake Monitoring",
      "Workflow Automation",
      "Third-party Data Ingestion",
      "Data Quality",
      "Stakeholder Collaboration",
      "Data Ecosystem Support",
      "ML Data Enablement"
    ],
    "matched_kras": [
      "Liaise with different client stakeholders on ad-hoc analyses",
      "Build ETL \u0026 data pipelines",
      "Explore options to automate processes \u0026 workflows",
      "Drive efficiencies for client",
      "Work on ML Model based initiatives",
      "Build algorithms to ingest different 3rd party data sources"
    ],
    "matched_skills": [
      "ETL",
      "data pipelines",
      "data lake",
      "automation workflows",
      "ML Model",
      "3rd party data sources",
      "data quality",
      "data engineering projects"
    ],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Domain=Data Engineering \u0026 Analytics; The JD centers on building ETL/data pipelines, automating workflows, ingesting third-party data, and supporting data lake and ML-related data engineering work.",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 254,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": null,
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 12668,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "ETL",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 12669,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Pipelines",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 12670,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Lake",
        "status": "pending"
      },
      {
        "is_primary": false,
        "queue_id": 12671,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Science",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}
API 2 — extract-details
{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "TODO: REMOVE AFTER TESTING \u2014 alias DB write disabled",
      "alias_persisted": false,
      "existing_alias_id": 2017,
      "existing_alias_text": "Data Lakes",
      "input_term": "Data Lake",
      "matched_canonical": {
        "category_id": 1,
        "display_name": "Data Lakes",
        "id": 1358,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PATTERN",
        "slug": "data-lakes",
        "sub_category_id": 1025,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "embedding_alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2015,
      "existing_alias_text": "Machine Learning",
      "input_term": "Machine Learning",
      "matched_canonical": {
        "category_id": 2,
        "display_name": "Machine Learning",
        "id": 1356,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CONCEPT",
        "slug": "machine-learning",
        "sub_category_id": 1024,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "Cloud Architect",
      "id": 9,
      "rationale": null,
      "role_archetype": null,
      "slug": "cloud-architect",
      "source": "db"
    },
    {
      "display_name": "AI Engineer",
      "id": 13,
      "rationale": null,
      "role_archetype": null,
      "slug": "ai-engineer",
      "source": "db"
    },
    {
      "display_name": "ML Engineer",
      "id": 3,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-engineer",
      "source": "db"
    },
    {
      "display_name": "MLOps Engineer",
      "id": 16,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-ops-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD centers on building ETL/data pipelines, automating workflows, ingesting third-party data, and supporting data lake and ML-related data engineering work.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Storage and Data Services",
        "id": 144,
        "rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
        "slug": "cloud-storage-and-data-services",
        "source": "db"
      },
      "input_skill": "Data Lake",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cloud Architect",
          "id": 9,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-architect",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "React Frontend Development",
        "id": 96,
        "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Data Lake",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "AI Governance and Model Security",
        "id": 50,
        "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
        "slug": "ai-governance-and-model-security",
        "source": "db"
      },
      "input_skill": "Machine Learning",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "AI Engineer",
          "id": 13,
          "rationale": null,
          "role_archetype": null,
          "slug": "ai-engineer",
          "source": "db"
        },
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        },
        {
          "display_name": "MLOps Engineer",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-ops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "React Frontend Development",
        "id": 96,
        "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Machine Learning",
      "llm_role": null,
      "roles_from_db": []
    }
  ],
  "input_final_skills": [
    "ETL",
    "Data Pipelines",
    "Data Lake",
    "Machine Learning",
    "Data Science"
  ],
  "input_llm_skills": [
    "ETL",
    "Data Pipelines",
    "Data Lake",
    "Machine Learning",
    "Data Science"
  ],
  "new_aliases_persisted": 0,
  "run_id": "9ca819dc-2f75-4d3f-abdf-4d447fa207ae",
  "skills_detail": [
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "ETL",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "PRACTICE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "etl",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Data Pipelines",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "PRACTICE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "data-pipelines",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Data Lakes",
          "alias_type": "CANONICAL",
          "id": 2017,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 1,
        "display_name": "Data Lakes",
        "id": 1358,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PATTERN",
        "slug": "data-lakes",
        "sub_category_id": 1025,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Storage and Data Services",
            "id": 144,
            "rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
            "slug": "cloud-storage-and-data-services",
            "source": "db"
          },
          "input_skill": "Data Lake",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cloud Architect",
              "id": 9,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-architect",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "React Frontend Development",
            "id": 96,
            "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Data Lake",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Data Lake",
      "matched_via": "embedding_alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Machine Learning",
          "alias_type": "CANONICAL",
          "id": 2015,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 2,
        "display_name": "Machine Learning",
        "id": 1356,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CONCEPT",
        "slug": "machine-learning",
        "sub_category_id": 1024,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "AI Governance and Model Security",
            "id": 50,
            "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
            "slug": "ai-governance-and-model-security",
            "source": "db"
          },
          "input_skill": "Machine Learning",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "AI Engineer",
              "id": 13,
              "rationale": null,
              "role_archetype": null,
              "slug": "ai-engineer",
              "source": "db"
            },
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            },
            {
              "display_name": "MLOps Engineer",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-ops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "React Frontend Development",
            "id": 96,
            "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Machine Learning",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Machine Learning",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Data Science",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Concepts",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "data-science",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": [
    "ETL",
    "Data Pipelines",
    "Data Science"
  ]
}
API 3 — final-role-output
{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD centers on building ETL/data pipelines, automating workflows, ingesting third-party data, and supporting data lake and ML-related data engineering work.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "ETL",
      "tag": "new"
    },
    {
      "skill": "Data Pipelines",
      "tag": "new"
    },
    {
      "skill": "Data Lake",
      "tag": "in_db"
    },
    {
      "skill": "Machine Learning",
      "tag": "in_db"
    },
    {
      "skill": "Data Science",
      "tag": "new"
    }
  ],
  "llm_cost_api1_usd": null,
  "llm_cost_api2_usd": null,
  "llm_cost_api3_usd": null,
  "llm_cost_total_usd": null,
  "persistence": {
    "items": [
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Storage and Data Services",
          "id": 144,
          "rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
          "slug": "cloud-storage-and-data-services",
          "source": "db"
        },
        "dimension_id": 144,
        "input_skill": "Data Lake",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cloud Architect",
            "id": 9,
            "rationale": null,
            "role_archetype": null,
            "slug": "cloud-architect",
            "source": "db"
          }
        ],
        "skill_dimension_saved": false,
        "skill_id": null,
        "skill_tag": "new",
        "skipped_reason": "skill_not_in_db_v3_proposed"
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "React Frontend Development",
          "id": 96,
          "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 96,
        "input_skill": "Data Lake",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": false,
        "skill_id": null,
        "skill_tag": "new",
        "skipped_reason": "skill_not_in_db_v3_proposed"
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "AI Governance and Model Security",
          "id": 50,
          "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
          "slug": "ai-governance-and-model-security",
          "source": "db"
        },
        "dimension_id": 50,
        "input_skill": "Machine Learning",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "AI Engineer",
            "id": 13,
            "rationale": null,
            "role_archetype": null,
            "slug": "ai-engineer",
            "source": "db"
          },
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          },
          {
            "display_name": "MLOps Engineer",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-ops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1356,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "React Frontend Development",
          "id": 96,
          "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 96,
        "input_skill": "Machine Learning",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 1356,
        "skill_tag": "in_db",
        "skipped_reason": null
      }
    ],
    "new_skills_created": 0,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 0,
    "skipped": 2
  },
  "planner_output": null,
  "run_id": "9ca819dc-2f75-4d3f-abdf-4d447fa207ae"
}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…