Pipeline run
3d064490-6487-4435-a307-5f6eab50f613
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA descriptionvocab breakdown (legacy)
Signals
Post-classification
Captured for admin review
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Data Engineer
CASE Aslug: data-engineer · id: 2 · source: db
Exact alias hit on data-engineer (1.0) — no other alias at this confidence; skill_top absent does not contradict
Resolution:
in_db
— role exists in library; skill↔dim and role↔dim links saved when applicable.
Job description
Job Title: Data Engineer (Databricks) Experience: 4 - 8 Yrs Must-Have Skills: Pyspark, Azure, ADF, Databricks, ETL, SQL Responsibilities: • Design and build data pipelines using Spark-SQL and PySpark in Azure Databricks • Design and build ETL pipelines using ADF • Build and maintain a Lakehouse architecture in ADLS / Databricks. • Perform data preparation tasks including data cleaning, normalization, deduplication, type conversion etc. • Work with DevOps team to deploy solutions in production environments. • Control data processes and take corrective action when errors are identified. Corrective action may include executing a work around process and then identifying the cause and solution for data errors. • Participate as a full member of the global Analytics team, providing solutions for and insights into data related items. • Collaborate with your Data Science and Business Intelligence colleagues across the world to share key learnings, leverage ideas and solutions and to propagate best practices. • You will lead projects that include other team members and participate in projects led by other team members. • Apply change management tools including training, communication and documentation to manage upgrades, changes and data migrations.
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- LANGUAGE
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Apache Spark (CANONICAL)
- apache spark 3 (VERSION)
- spark (VERSION)
- spark 3 (VERSION)
- spark 3.x (VERSION)
- spark3 (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Framework
- Sub-category
- Distributed Data Processing Framework
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2010
- Confidence
- 0.94
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 3.x
Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.
Skill profile (library / DB)
- Skill nature
- FRAMEWORK
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 5
- Sub-category id
- 1021
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
ETL and ELT Tooling Catalog dimension db id 24
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
ETL and ELT Tooling
etl-and-elt-tooling
|
— | — |
Skipped — no persistable v3 meta for new skill
skill_not_in_db_v3_proposed
|
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Cloud Platforms
- Sub-category
- Data Analytics
- Skill nature
- PLATFORM
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Cloud Platforms
- Sub-category
- Data Integration
- Skill nature
- PLATFORM
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Cloud Platforms
- Sub-category
- Data Storage
- Skill nature
- PLATFORM
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Lakehouse (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Architecture
- Sub-category
- Data Platform Architecture
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Lakehouse is increasingly listed in data-platform JDs and vendor docs (Databricks, Snowflake, Microsoft Fabric), but it is not yet as universal as core warehouse or lake skills.
Skill profile (library / DB)
- Skill nature
- PATTERN
- Volatility
- EMERGING
- Typical lifespan
- EVERGREEN
- Category id
- 1
- Sub-category id
- 1026
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- DevOps (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Methodology
- Sub-category
- Devops Methodology
- Confidence
- 0.97
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: DevOps appears in a large share of software and platform engineering job descriptions, often alongside CI/CD, Kubernetes, and cloud tooling; it is a standard hiring-pipeline keyword rather than a niche specialty.
Skill profile (library / DB)
- Skill nature
- METHODOLOGY
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 8
- Sub-category id
- 922
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
CI/CD Pipeline Platforms Catalog dimension db id 150
Library dimension (catalog)
Roles linked in library: DevOps Engineer
-
Deployment and Release Patterns Catalog dimension db id 140
Library dimension (catalog)
Roles linked in library: Cloud Architect
-
Infrastructure as Code Catalog dimension db id 132
Library dimension (catalog)
Roles linked in library: Cloud Architect, DevOps Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
CI/CD Pipeline Platforms
ci-cd-pipeline-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Deployment and Release Patterns
deployment-and-release-patterns
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Infrastructure as Code
infrastructure-as-code
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
All API 3 persistence rows
Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.
| Skill | Tag | Dimension | Skill↔dim | Role↔dim | Outcome | Notes |
|---|---|---|---|---|---|---|
| PySpark | new |
ETL and ELT Tooling
etl-and-elt-tooling
|
— | — | Skipped — no persistable v3 meta for new skill | skill_not_in_db_v3_proposed |
| Lakehouse | in_db |
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| DevOps | in_db |
CI/CD Pipeline Platforms
ci-cd-pipeline-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| DevOps | in_db |
Deployment and Release Patterns
deployment-and-release-patterns
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| DevOps | in_db |
Infrastructure as Code
infrastructure-as-code
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Library artifacts (this run)
| Kind | Detail | DB id |
|---|---|---|
| canonical_skill_proposed | Spark SQL | type=Data Engineering Tools subtype=general nature=LANGUAGE lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Azure Databricks | type=Cloud Platforms subtype=Data Analytics nature=PLATFORM lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Azure Data Factory | type=Cloud Platforms subtype=Data Integration nature=PLATFORM lifespan=MULTI_YEAR | |
| canonical_skill_proposed | ADLS | type=Cloud Platforms subtype=Data Storage nature=PLATFORM lifespan=MULTI_YEAR | |
| dimension_skill_link_proposed | PySpark ↔ ETL and ELT Tooling | |
| role_dimension_link_proposed | Data Engineer ↔ ETL and ELT Tooling |
nano JD Parser — gpt-4.1-nano click to toggle
Show raw JSON
{
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": 8,
"min": 4,
"raw": "4 - 8 Yrs"
},
"job_locations": [],
"role": "Data Engineer (Databricks)",
"role_aliases": [
"Data Engineer",
"Data Pipeline Engineer",
"ETL Engineer"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 10,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Design and build data",
"last_5_words": "changes and data migrations."
},
"text": "\u2022 Design and build data pipelines using Spark-SQL and PySpark in Azure Databricks\n\u2022 Design and build ETL pipelines using ADF\n\u2022 Build and maintain a Lakehouse architecture in ADLS / Databricks.\n\u2022 Perform data preparation tasks including data cleaning, normalization, deduplication, type conversion etc.\n\u2022 Work with DevOps team to deploy solutions in production environments.\n\u2022 Control data processes and take corrective action when errors are identified. Corrective action may include executing a work around process and then identifying the cause and solution for data errors.\n\u2022 Participate as a full member of the global Analytics team, providing solutions for and insights into data related items.\n\u2022 Collaborate with your Data Science and Business Intelligence colleagues across the world to share key learnings, leverage ideas and solutions and to propagate best practices.\n\u2022 You will lead projects that include other team members and participate in projects led by other team members.\n\u2022 Apply change management tools including training, communication and documentation to manage upgrades, changes and data migrations.",
"word_count": 186
}
],
"urls": []
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "Spark SQL"
},
{
"is_primary": true,
"skill_name": "PySpark"
},
{
"is_primary": true,
"skill_name": "Azure Databricks"
},
{
"is_primary": true,
"skill_name": "Azure Data Factory"
},
{
"is_primary": true,
"skill_name": "ADLS"
},
{
"is_primary": true,
"skill_name": "Lakehouse"
},
{
"is_primary": false,
"skill_name": "DevOps"
}
],
"jd_role": {
"display_name": "Data Engineer (Databricks)",
"rationale": null,
"role_aliases": [
"Data Engineer",
"Data Pipeline Engineer",
"ETL Engineer"
],
"role_archetype": "Data",
"slug": ""
},
"nano_parsed": {
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": 8,
"min": 4,
"raw": "4 - 8 Yrs"
},
"job_locations": [],
"role": "Data Engineer (Databricks)",
"role_aliases": [
"Data Engineer",
"Data Pipeline Engineer",
"ETL Engineer"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 10,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Design and build data",
"last_5_words": "changes and data migrations."
},
"text": "\u2022 Design and build data pipelines using Spark-SQL and PySpark in Azure Databricks\n\u2022 Design and build ETL pipelines using ADF\n\u2022 Build and maintain a Lakehouse architecture in ADLS / Databricks.\n\u2022 Perform data preparation tasks including data cleaning, normalization, deduplication, type conversion etc.\n\u2022 Work with DevOps team to deploy solutions in production environments.\n\u2022 Control data processes and take corrective action when errors are identified. Corrective action may include executing a work around process and then identifying the cause and solution for data errors.\n\u2022 Participate as a full member of the global Analytics team, providing solutions for and insights into data related items.\n\u2022 Collaborate with your Data Science and Business Intelligence colleagues across the world to share key learnings, leverage ideas and solutions and to propagate best practices.\n\u2022 You will lead projects that include other team members and participate in projects led by other team members.\n\u2022 Apply change management tools including training, communication and documentation to manage upgrades, changes and data migrations.",
"word_count": 186
}
],
"urls": []
},
"rejected": false,
"rejection_reason": null,
"run_id": "3d064490-6487-4435-a307-5f6eab50f613",
"stage3_signals": {
"alias_found": true,
"alias_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
}
],
"kra_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": [
{
"kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
"sentence": "Design and build data pipelines using Spark-SQL and PySpark in Azure Databricks",
"similarity": 0.6257
},
{
"kra_text": "Implements data transformation, cleansing, deduplication, and enrichment logic to convert raw source data into analytics-ready curated datasets.",
"sentence": "Perform data preparation tasks including data cleaning, normalization, deduplication, type conversion etc. \u2022 Work with DevOps team to deploy solutions in production environments.",
"similarity": 0.6084
},
{
"kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
"sentence": "Design and build ETL pipelines using ADF",
"similarity": 0.5751
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 0.603,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "DevOps Engineer",
"kra_matches": [
{
"kra_text": "Manages release management processes including environment promotion gates, deployment approval workflows, change management records, and rollback procedures.",
"sentence": "Apply change management tools including training, communication and documentation to manage upgrades, changes and data migrations.",
"similarity": 0.5327
},
{
"kra_text": "Collaborates with development teams to improve build processes, reduce deployment friction, containerize applications, and adopt DevOps best practices.",
"sentence": "Perform data preparation tasks including data cleaning, normalization, deduplication, type conversion etc. \u2022 Work with DevOps team to deploy solutions in production environments.",
"similarity": 0.5131
},
{
"kra_text": "Responds to deployment failures, infrastructure incidents, and environment misconfiguration issues to restore service availability and prevent recurrence.",
"sentence": "Control data processes and take corrective action when errors are identified.",
"similarity": 0.4714
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 10,
"score": 0.5057,
"slug": "devops-engineer",
"total_count": null
},
{
"display_name": "ML Engineer",
"kra_matches": [
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Perform data preparation tasks including data cleaning, normalization, deduplication, type conversion etc. \u2022 Work with DevOps team to deploy solutions in production environments.",
"similarity": 0.5578
},
{
"kra_text": "Designs end-to-end ML training pipelines and model inference workflows using TensorFlow, PyTorch, or scikit-learn on cloud ML platforms.",
"sentence": "Design and build data pipelines using Spark-SQL and PySpark in Azure Databricks",
"similarity": 0.4697
},
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Design and build ETL pipelines using ADF",
"similarity": 0.4625
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 3,
"score": 0.4967,
"slug": "ml-engineer",
"total_count": null
},
{
"display_name": "MLOps Engineer",
"kra_matches": [
{
"kra_text": "Coordinates model promotion workflows across development, staging, and production environments including integration testing and data contract validation.",
"sentence": "Perform data preparation tasks including data cleaning, normalization, deduplication, type conversion etc. \u2022 Work with DevOps team to deploy solutions in production environments.",
"similarity": 0.5121
},
{
"kra_text": "Coordinates model promotion workflows across development, staging, and production environments including integration testing and data contract validation.",
"sentence": "Apply change management tools including training, communication and documentation to manage upgrades, changes and data migrations.",
"similarity": 0.4707
},
{
"kra_text": "Automates ML platform operations including scheduled retraining triggers, pipeline orchestration, evaluation workflows, and alerting configuration.",
"sentence": "Design and build ETL pipelines using ADF",
"similarity": 0.4324
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 16,
"score": 0.4717,
"slug": "ml-ops-engineer",
"total_count": null
},
{
"display_name": "Backend Developer",
"kra_matches": [
{
"kra_text": "Investigates and resolves production incidents, API bugs, and service degradation through root cause analysis, hotfixes, and post-mortems.",
"sentence": "Control data processes and take corrective action when errors are identified.",
"similarity": 0.5182
},
{
"kra_text": "Investigates and resolves production incidents, API bugs, and service degradation through root cause analysis, hotfixes, and post-mortems.",
"sentence": "Corrective action may include executing a work around process and then identifying the cause and solution for data errors.",
"similarity": 0.4391
},
{
"kra_text": "Investigates and resolves production incidents, API bugs, and service degradation through root cause analysis, hotfixes, and post-mortems.",
"sentence": "Perform data preparation tasks including data cleaning, normalization, deduplication, type conversion etc. \u2022 Work with DevOps team to deploy solutions in production environments.",
"similarity": 0.4281
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 1,
"score": 0.4618,
"slug": "backend-engineer",
"total_count": null
}
],
"skill_match_roles": []
},
"stage4_decision": {
"alias_collision_detected": false,
"case": "A",
"chosen_role": {
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
},
"confidence": 1.0,
"is_new_role": false,
"llm2_fired": false,
"llm2_reasoning": null,
"matched_dimensions": [],
"matched_kras": [],
"matched_skills": [],
"new_role_display_name": null,
"new_role_slug": null,
"queued": false,
"reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top absent does not contradict",
"sub_role": null
},
"stage5_updates": {
"centroid_n_after": 346,
"centroid_updated": true,
"collision_log_id": null,
"new_kra_attached": null,
"new_skills_attached": [
{
"is_primary": true,
"queue_id": 16342,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Spark SQL",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 16343,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "PySpark",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 16344,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Azure Databricks",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 16346,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Azure Data Factory",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 16348,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "ADLS",
"status": "pending"
}
],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{
"alias_matches": [
{
"alias_persist_skipped_reason": "TODO: REMOVE AFTER TESTING \u2014 alias DB write disabled",
"alias_persisted": false,
"existing_alias_id": 2004,
"existing_alias_text": "Apache Spark",
"input_term": "PySpark",
"matched_canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "embedding_alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2018,
"existing_alias_text": "Lakehouse",
"input_term": "Lakehouse",
"matched_canonical": {
"category_id": 1,
"display_name": "Lakehouse",
"id": 1359,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PATTERN",
"slug": "lakehouse",
"sub_category_id": 1026,
"typical_lifespan": "EVERGREEN",
"volatility": "EMERGING"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 1852,
"existing_alias_text": "DevOps",
"input_term": "DevOps",
"matched_canonical": {
"category_id": 8,
"display_name": "DevOps",
"id": 1216,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "METHODOLOGY",
"slug": "devops",
"sub_category_id": 922,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
}
],
"candidate_roles": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top absent does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "PySpark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Lakehouse",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "CI/CD Pipeline Platforms",
"id": 150,
"rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
"slug": "ci-cd-pipeline-platforms",
"source": "db"
},
"input_skill": "DevOps",
"llm_role": null,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Deployment and Release Patterns",
"id": 140,
"rationale": "Patterns for promoting changes safely across environments, including rollout, rollback, and release gating strategies. Cloud Architects define these patterns so teams can deploy consistently across the platform.",
"slug": "deployment-and-release-patterns",
"source": "db"
},
"input_skill": "DevOps",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure as Code",
"id": 132,
"rationale": "Declarative provisioning and environment definition tools used to codify cloud infrastructure, repeatable environments, and platform standards. Cloud Architects use these to express reference architectures and guardrails.",
"slug": "infrastructure-as-code",
"source": "db"
},
"input_skill": "DevOps",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
}
]
}
],
"input_final_skills": [
"Spark SQL",
"PySpark",
"Azure Databricks",
"Azure Data Factory",
"ADLS",
"Lakehouse",
"DevOps"
],
"input_llm_skills": [
"Spark SQL",
"PySpark",
"Azure Databricks",
"Azure Data Factory",
"ADLS",
"Lakehouse",
"DevOps"
],
"new_aliases_persisted": 0,
"run_id": "3d064490-6487-4435-a307-5f6eab50f613",
"skills_detail": [
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Spark SQL",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "LANGUAGE",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "spark-sql",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Apache Spark",
"alias_type": "CANONICAL",
"id": 2004,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "apache spark 3",
"alias_type": "VERSION",
"id": 2006,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark",
"alias_type": "VERSION",
"id": 2510,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3",
"alias_type": "VERSION",
"id": 2007,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3.x",
"alias_type": "VERSION",
"id": 2009,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark3",
"alias_type": "VERSION",
"id": 2008,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "PySpark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "PySpark",
"matched_via": "embedding_alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Azure Databricks",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Cloud Platforms",
"skill_nature": "PLATFORM",
"sub_category": "Data Analytics",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "azure-databricks",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Azure Data Factory",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Cloud Platforms",
"skill_nature": "PLATFORM",
"sub_category": "Data Integration",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "azure-data-factory",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "ADLS",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Cloud Platforms",
"skill_nature": "PLATFORM",
"sub_category": "Data Storage",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "adls",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Lakehouse",
"alias_type": "CANONICAL",
"id": 2018,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 1,
"display_name": "Lakehouse",
"id": 1359,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PATTERN",
"slug": "lakehouse",
"sub_category_id": 1026,
"typical_lifespan": "EVERGREEN",
"volatility": "EMERGING"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Lakehouse",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Lakehouse",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "DevOps",
"alias_type": "CANONICAL",
"id": 1852,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 8,
"display_name": "DevOps",
"id": 1216,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "METHODOLOGY",
"slug": "devops",
"sub_category_id": 922,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "CI/CD Pipeline Platforms",
"id": 150,
"rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
"slug": "ci-cd-pipeline-platforms",
"source": "db"
},
"input_skill": "DevOps",
"llm_role": null,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Deployment and Release Patterns",
"id": 140,
"rationale": "Patterns for promoting changes safely across environments, including rollout, rollback, and release gating strategies. Cloud Architects define these patterns so teams can deploy consistently across the platform.",
"slug": "deployment-and-release-patterns",
"source": "db"
},
"input_skill": "DevOps",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure as Code",
"id": 132,
"rationale": "Declarative provisioning and environment definition tools used to codify cloud infrastructure, repeatable environments, and platform standards. Cloud Architects use these to express reference architectures and guardrails.",
"slug": "infrastructure-as-code",
"source": "db"
},
"input_skill": "DevOps",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
}
]
}
],
"input_skill": "DevOps",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
}
],
"unmatched_skills": [
"Spark SQL",
"Azure Databricks",
"Azure Data Factory",
"ADLS"
]
}
API 3 — final-role-output
{
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top absent does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"chosen_role_resolution": "in_db",
"final_input_skills": [
{
"skill": "Spark SQL",
"tag": "new"
},
{
"skill": "PySpark",
"tag": "in_db"
},
{
"skill": "Azure Databricks",
"tag": "new"
},
{
"skill": "Azure Data Factory",
"tag": "new"
},
{
"skill": "ADLS",
"tag": "new"
},
{
"skill": "Lakehouse",
"tag": "in_db"
},
{
"skill": "DevOps",
"tag": "in_db"
}
],
"llm_cost_api1_usd": null,
"llm_cost_api2_usd": null,
"llm_cost_api3_usd": null,
"llm_cost_total_usd": null,
"persistence": {
"items": [
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"dimension_id": 24,
"input_skill": "PySpark",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": false,
"skill_id": null,
"skill_tag": "new",
"skipped_reason": "skill_not_in_db_v3_proposed"
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Lakehouse",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1359,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "CI/CD Pipeline Platforms",
"id": 150,
"rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
"slug": "ci-cd-pipeline-platforms",
"source": "db"
},
"dimension_id": 150,
"input_skill": "DevOps",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1216,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Deployment and Release Patterns",
"id": 140,
"rationale": "Patterns for promoting changes safely across environments, including rollout, rollback, and release gating strategies. Cloud Architects define these patterns so teams can deploy consistently across the platform.",
"slug": "deployment-and-release-patterns",
"source": "db"
},
"dimension_id": 140,
"input_skill": "DevOps",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1216,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure as Code",
"id": 132,
"rationale": "Declarative provisioning and environment definition tools used to codify cloud infrastructure, repeatable environments, and platform standards. Cloud Architects use these to express reference architectures and guardrails.",
"slug": "infrastructure-as-code",
"source": "db"
},
"dimension_id": 132,
"input_skill": "DevOps",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1216,
"skill_tag": "in_db",
"skipped_reason": null
}
],
"new_skills_created": 0,
"role_dimension_saved": 0,
"skill_dimension_saved": 0,
"skipped": 1
},
"planner_output": null,
"run_id": "3d064490-6487-4435-a307-5f6eab50f613"
}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.