← Back to history

Pipeline run

d2ebff95-b8c3-481f-a9e4-af393314fd5f

Pipeline LLM cost (USD)
API 1: $0.0089 API 2: $0.0003 API 3: $0.0000 Total: $0.0092

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd
Nature of work · Data pipeline development
Build and optimize big-data ingestion and processing pipelines using Hadoop/HDFS, Spark/Scala, Sqoop and Airflow/Jenkins, with heavy SQL Server/T-SQL work for querying, stored procedures, troubleshooting, and performance tuning.
"Data Ingestion: Collecting and importing data from various sources, such as databases, logs, APIs into the Big Data infrastructure."
Tech stack maturity
Mainstream Legacy
The stack centers on established big-data and enterprise tools like Hadoop, Hive, Spark, Scala, SQL Server, Jenkins, and GitLab, which are widely used but not cloud-native or bleeding-edge.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
0.50 / 5
· Title match
Has AI skill
· AI skill (primary)
AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
Frameworks (×2):
Models / concepts (×3): Machine Learning
Evidence — skills matched in JD (19)
Hadoop Unix HDFS Hive Impala Spark Scala Sqoop Airflow Jenkins SQL Server T-SQL stored procedures Jira Confluence GitLab Snowflake Databricks machine learning
Skill cluster (5 dimension groups, role-scoped)
ETL and ELT Tooling
Hadoop Spark
AI Governance and Model Security
machine learning
Cloud Data Warehouses
Snowflake
Programming Languages for Data Work
Scala
Cross-cutting / unaligned
Unix HDFS Hive Impala Sqoop Airflow Jenkins SQL Server T-SQL stored procedures Jira Confluence GitLab Databricks
Show KRA description ↓
• Strong experience with big data technologies and associated tools such as Hadoop, Unix, HDFS, Hive, Impala, etc. • Proficient in using Spark/Scala • Experience with data Import/Export using Sqoop or similar tools • Experience using Airflow, Jenkins or similar other automation tools • Excellent knowledge of SQL Server and database structures • Demonstrate ability to write and optimize T-SQL queries and stored procedures • Experience working with Jira/Confluence/GitLab • Excellent organizational skills and ability to handle multiple activities with changing priorities simultaneously • Details oriented and strong problem-solving skills • Team player and able to integrate in an international team Data Ingestion: Collecting and importing data from various sources, such as databases, logs, APIs into the Big Data infrastructure. Data Processing: Designing data pipelines to clean, transform, and prepare raw data for analysis. This often involves using technologies like Apache Hadoop, Apache Spark. Data Storage: Selecting appropriate data storage technologies like Hadoop Distributed File System (HDFS), HIVE, IMPALA, or cloud-based storage solutions (Snowflake, Databricks). Data Analysis: Developing algorithms and implementing data processing techniques to extract meaningful insights, conduct statistical analysis( build machine learning models is advantage). Performance Optimization: Tuning and optimizing the performance of Big Data applications and infrastructure to ensure efficient data processing and reduced latency. Data Security: Implementing security measures to protect sensitive data and ensuring compliance with data protection regulations. Integration: Integrating Big Data solutions with existing enterprise systems and applications. Monitoring and Troubleshooting: Monitoring data pipelines and processes to identify and resolve issues or bottlenecks in the system. Collaboration: Collaborating with data scientists, data engineers, and other stakeholders to understand requirements and deliver valuable data-driven solutions. Continuous Learning: Staying updated with the latest Big Data technologies, tools, and industry trends to improve skills and enhance productivity.

Signals

Skill ml-engineer
0.25
Alias data-engineer
1.00
KRA data-engineer
0.69

Post-classification

Centroidupdated · n=86
Alias collision log
New-role queue
New skills captured8
New KRA captured

Captured for admin review

Unix primary Data Engineer pending
HDFS primary Data Engineer pending
Impala primary Data Engineer pending
Sqoop primary Data Engineer pending
T-SQL primary Data Engineer pending
stored procedures primary Data Engineer pending
Jira primary Data Engineer pending
Confluence primary Data Engineer pending
Status: completed Created: 2026-05-27T13:52:41.180433Z Updated: 2026-05-27T13:55:01.792618Z API 3 duration: 42546 ms
Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

domain · Data Engineering & Analytics CASE DOMAIN

slug: data-engineer · id: 2 · source: db

Domain=Data Engineering & Analytics; The JD is centered on big data engineering, data ingestion/pipeline work, Spark, Hadoop ecosystem tools, and automation/orchestration, which best matches Data Engineer.

Matched skills

HadoopUnixHDFSHiveImpalaSpark/ScalaSqoopAirflowJenkinsSQL ServerT-SQLJiraConfluenceGitLab

Matched dimensions

Big Data EngineeringData Ingestion and ProcessingData Pipeline DevelopmentData Storage and Query OptimizationWorkflow AutomationPerformance OptimizationCross-functional Collaboration

Matched KRAs

Collecting and importing data from various sourcesDesigning data pipelines to clean, transform, and prepare raw dataSelecting appropriate data storage technologiesDeveloping algorithms and implementing data processing techniquesTuning and optimizing the performance of Big Data applicationsMonitoring data pipelines and processesIntegrating Big Data solutions with existing enterprise systems

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

0
New skills
0
Skill↔dim saved
0
Role↔dim saved
0
Skipped

Job description

Technical Skills:

• 
• Strong experience with big data technologies and associated tools such as Hadoop, Unix, HDFS, Hive, Impala, etc.
• Proficient in using Spark/Scala 
• Experience with data Import/Export using Sqoop or similar tools
• Experience using Airflow, Jenkins or similar other automation tools
• Excellent knowledge of SQL Server and database structures
• Demonstrate ability to write and optimize T-SQL queries and stored procedures
• Experience working with Jira/Confluence/GitLab
• Excellent organizational skills and ability to handle multiple activities with changing priorities simultaneously
• Details oriented and strong problem-solving skills
• Team player and able to integrate in an international team


The typical tasks and responsibilities of a Big Data Developer include:

Data Ingestion: Collecting and importing data from various sources, such as databases, logs, APIs into the Big Data infrastructure. Data Processing: Designing data pipelines to clean, transform, and prepare raw data for analysis. This often involves using technologies like Apache Hadoop, Apache Spark. Data Storage: Selecting appropriate data storage technologies like Hadoop Distributed File System (HDFS), HIVE, IMPALA, or cloud-based storage solutions (Snowflake, Databricks). Data Analysis: Developing algorithms and implementing data processing techniques to extract meaningful insights, conduct statistical analysis( build machine learning models is advantage). Performance Optimization: Tuning and optimizing the performance of Big Data applications and infrastructure to ensure efficient data processing and reduced latency. Data Security: Implementing security measures to protect sensitive data and ensuring compliance with data protection regulations. Integration: Integrating Big Data solutions with existing enterprise systems and applications. Monitoring and Troubleshooting: Monitoring data pipelines and processes to identify and resolve issues or bottlenecks in the system. Collaboration: Collaborating with data scientists, data engineers, and other stakeholders to understand requirements and deliver valuable data-driven solutions. Continuous Learning: Staying updated with the latest Big Data technologies, tools, and industry trends to improve skills and enhance productivity.

IQVIA is a leading global provider of advanced analytics, technology solutions and clinical research services to the life sciences industry. We believe in pushing the boundaries of human science and data science to make the biggest impact possible – to help our customers create a healthier world. Learn more at https://jobs.iqvia.com

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Hadoop Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Hadoop id=1351 · hadoop

Aliases — catalog

  • Hadoop (CANONICAL)

Context tags (catalog)

Big Data Data Lake Distributed Computing ELT ETL Flume HDFS Hive Kafka MapReduce NoSQL Oozie Pig Spark Sqoop YARN

Stored enrichment (catalog DB)

Category
Framework
Sub-category
Data Processing Framework
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2006
Confidence
0.90
Version strategy
NOT_APPLICABLE

Maturity reasoning: Job postings still mention Hadoop for legacy big-data stacks, but JD volume has fallen as Spark and cloud warehouses replaced MapReduce-era clusters.

Skill profile (library / DB)

Skill nature
FRAMEWORK
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
5
Sub-category id
91
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • ETL and ELT Tooling Catalog dimension db id 24

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
Unix Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Operating Systems
Sub-category
general
Skill nature
CONCEPT
Volatility
STABLE
Typical lifespan
EVERGREEN
Version strategy
UNVERSIONED
HDFS Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
TOOL
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Hive Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Hive id=2754 · hive

Aliases — catalog

  • Hive (CANONICAL) primary

Context tags (catalog)

Apache Apache Hive Bucketing ETL HQL Hive Metastore Hive SerDe HiveQL MapReduce SQL SQL-on-Hadoop big data bucketing columnar storage data lakes data warehousing integration metadata partitioning schema evolution

Stored enrichment (catalog DB)

Category
Datastore
Sub-category
Local Key Value Store
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2010
Confidence
0.90
Version strategy
NOT_APPLICABLE

Maturity reasoning: Hive appears in Flutter/mobile JDs and package docs, but JD volume is far below SQLite/Realm and it’s mainly used for local key-value storage in Flutter apps.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
3
Sub-category id
2242
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Local Persistence and Offline Behavior Catalog dimension db id 85

    Library dimension (catalog)

    Roles linked in library: Android Developer, Flutter Developer, Hybrid Mobile Developer, Native Mobile Developer, React Native Developer, iOS Developer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Local Persistence and Offline Behavior
local-persistence-and-offline-behavior
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Impala Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Databases
Sub-category
general
Skill nature
TOOL
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Spark Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Apache Spark id=1350 · apache-spark

Aliases — catalog

  • Apache Spark (CANONICAL)
  • apache spark 3 (VERSION)
  • spark (VERSION)
  • spark 3 (VERSION)
  • spark 3.x (VERSION)
  • spark3 (VERSION)

Context tags (catalog)

Apache Kafka Cluster Manager DAGScheduler Data Lake DataFrame ETL Hadoop MLlib Machine Learning PySpark RDD Scala Spark SQL Spark Streaming SparkSession

Stored enrichment (catalog DB)

Category
Framework
Sub-category
Distributed Data Processing Framework
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2010
Confidence
0.94
Version strategy
SEPARATE_ENTITY
Version tag
3.x

Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.

Skill profile (library / DB)

Skill nature
FRAMEWORK
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
5
Sub-category id
1021
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • ETL and ELT Tooling Catalog dimension db id 24

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
Scala Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Scala id=102 · scala

Aliases — catalog

  • Scala (CANONICAL) primary

Context tags (catalog)

Akka Apache Kafka Cats Flink JVM Monads Play Framework SBT ScalaTest Shapeless Spark Spark SQL ZIO case class for-comprehension functional programming implicit pattern matching typeclass

Stored enrichment (catalog DB)

Category
Language
Sub-category
Programming Language
Vendor
EPFL
License
apache_2
Year introduced
2004
Confidence
0.99
Version strategy
NOT_APPLICABLE

Maturity reasoning: Scala still appears in many backend/data engineering JDs, especially with Spark and Akka, and remains supported by major JVM ecosystems; it’s not a sunset technology.

Skill profile (library / DB)

Skill nature
LANGUAGE
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
6
Sub-category id
96
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Programming Languages for Data Work Catalog dimension db id 21

    Library dimension (catalog)

    Roles linked in library: Data Engineer

  • Programming Languages for ML Systems Catalog dimension db id 39

    Library dimension (catalog)

    Roles linked in library: ML Engineer, MLOps Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Programming Languages for Data Work
programming-languages-for-data-work
Existing dimension (library) · Role↔dimension saved
Programming Languages for ML Systems
programming-languages-for-ml-systems
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Sqoop Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
TOOL
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Airflow Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Airflow id=265 · airflow

Aliases — catalog

  • Airflow (CANONICAL) primary
  • airflow 2 (VERSION)
  • airflow-2 (VERSION)
  • airflow2 (VERSION)
  • airflow2.x (VERSION)
  • apache airflow 2 (VERSION)

Context tags (catalog)

Apache Celery CeleryExecutor DAG ETL Executor Jinja templating Python SLA Sensors UI XCom backfill connections data pipeline executor hooks logging monitoring operators plugins scheduler task dependencies task instance variables

Stored enrichment (catalog DB)

Category
Tool
Sub-category
Workflow Orchestration Tool
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2014
Confidence
0.95
Version strategy
SEPARATE_ENTITY
Version tag
2.x

Maturity reasoning: Apache Airflow appears in many data engineering job postings and is a common orchestration choice in production stacks; its GitHub activity and ecosystem remain strong, with no vendor sunset or clear replacement dominating JDs.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
13
Sub-category id
130
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Workflow Orchestration for ML Pipelines Catalog dimension db id 54

    Library dimension (catalog)

    Roles linked in library: ML Engineer, MLOps Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Workflow Orchestration for ML Pipelines
workflow-orchestration-for-ml-pipelines
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Jenkins Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Jenkins id=283 · jenkins

Aliases — catalog

  • Jenkins (CANONICAL) primary

Context tags (catalog)

Blue Ocean CI/CD Declarative Pipeline Docker Groovy Jenkinsfile Kubernetes agents artifact repository artifacts automation build triggers integration multibranch pipeline pipeline plugins shared libraries stages test automation version control webhooks

Stored enrichment (catalog DB)

Category
Tool
Sub-category
Ci Cd Tool
Vendor
CloudBees
License
mit
Year introduced
2011
Confidence
0.99
Version strategy
NOT_APPLICABLE

Maturity reasoning: Jenkins remains a common CI/CD requirement in job postings and enterprise DevOps stacks, with broad plugin ecosystem and long-running GitHub activity despite newer alternatives.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
13
Sub-category id
184
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • CI/CD Pipeline Platforms Catalog dimension db id 150

    Library dimension (catalog)

    Roles linked in library: DevOps Engineer

  • CI/CD for Machine Learning Catalog dimension db id 56

    Library dimension (catalog)

    Roles linked in library: ML Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
CI/CD Pipeline Platforms
ci-cd-pipeline-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
CI/CD for Machine Learning
ci-cd-for-machine-learning
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
SQL Server Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: SQL Server id=18 · sql-server

Aliases — catalog

  • SQL Server (CANONICAL) primary
  • SQL Server 2000 (VERSION)
  • SQL Server 2005 (VERSION)
  • SQL Server 2008 (VERSION)
  • SQL Server 2012 (VERSION)
  • SQL Server 2014 (VERSION)
  • SQL Server 2016 (VERSION)
  • SQL Server 2017 (VERSION)
  • SQL Server 2019 (VERSION)
  • SQL Server 2022 (VERSION)
  • SQL Server 6.5 (VERSION)
  • SQL Server 7.0 (VERSION)

Context tags (catalog)

Always On CLR Integration Clustered Index ETL Execution Plan Linked Servers Query Store Replication SQL Agent SQL Server Agent SQL Server Integration Services SQL Server Management Studio SQL Server Reporting Services SSIS SSMS SSRS Stored Procedures T-SQL TempDB backup and recovery backup and restore clustering data migration data warehousing database design database normalization indexing performance tuning query optimization replication stored procedures transaction log transaction logs

Stored enrichment (catalog DB)

Category
Datastore
Sub-category
Relational Database
Vendor
Microsoft
License
proprietary
Year introduced
1989
Confidence
0.99
Version strategy
NOT_APPLICABLE

Maturity reasoning: SQL Server appears in many enterprise job descriptions and remains a major Microsoft-supported RDBMS with active Azure SQL/SQL Server demand; it is a common hiring-pipeline staple, not a sunset technology.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
3
Sub-category id
29
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Relational Database Design Catalog dimension db id 4

    Library dimension (catalog)

    Roles linked in library: .NET Backend Developer, Backend Developer, Kotlin Backend Developer, Node.js Backend Developer, Python Backend Developer, Ruby Backend Developer, Scala Backend Developer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Relational Database Design
relational-database-design
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
T-SQL Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Programming Languages
Sub-category
general
Skill nature
LANGUAGE
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
stored procedures Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Programming Languages
Sub-category
general
Skill nature
CONCEPT
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Jira Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Project Management Tools
Sub-category
general
Skill nature
TOOL
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Confluence Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Collaboration Tools
Sub-category
general
Skill nature
TOOL
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
GitLab Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: GitLab id=279 · gitlab

Aliases — catalog

  • GitLab (CANONICAL) primary

Context tags (catalog)

.gitlab-ci.yml CI/CD DevOps DevSecOps GitLab CI GitLab Pages GitLab Runner Kubernetes YAML artifact registry automated testing code review container registry issue boards issues merge requests monitoring pipelines repository management runners security scanning self-hosted version control webhooks

Stored enrichment (catalog DB)

Category
Platform
Sub-category
Devops Platform
Vendor
GitLab Inc.
License
mit
Year introduced
2011
Confidence
0.96
Version strategy
NOT_APPLICABLE

Maturity reasoning: GitLab appears in many DevOps/CI-CD job descriptions and is widely used as an integrated source control and pipeline platform; its GitLab CI/CD and self-managed/SaaS offerings are common hiring signals.

Skill profile (library / DB)

Skill nature
PLATFORM
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
9
Sub-category id
170
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • CI/CD Pipeline Platforms Catalog dimension db id 150

    Library dimension (catalog)

    Roles linked in library: DevOps Engineer

  • CI/CD for Machine Learning Catalog dimension db id 56

    Library dimension (catalog)

    Roles linked in library: ML Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
CI/CD Pipeline Platforms
ci-cd-pipeline-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
CI/CD for Machine Learning
ci-cd-for-machine-learning
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Snowflake Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Snowflake id=105 · snowflake

Aliases — catalog

  • Snowflake (CANONICAL) primary

Context tags (catalog)

ELT ETL SQL Snowpark Snowpipe Streams Tasks Time Travel VARIANT data sharing data warehouse dbt semi-structured data virtual warehouse zero-copy cloning

Stored enrichment (catalog DB)

Category
Platform
Sub-category
Data Cloud Platform
Vendor
Snowflake Inc.
License
proprietary
Year introduced
2012
Confidence
0.98
Version strategy
NOT_APPLICABLE

Maturity reasoning: Snowflake appears frequently in data/analytics job postings and is a standard cloud data warehouse platform alongside BigQuery and Redshift.

Skill profile (library / DB)

Skill nature
PLATFORM
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
9
Sub-category id
113
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Cloud Data Warehouses Catalog dimension db id 22

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Data Warehouses
cloud-data-warehouses
Existing dimension (library) · Role↔dimension saved
Databricks Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Databricks id=1202 · databricks

Aliases — catalog

  • Databricks (CANONICAL)

Context tags (catalog)

Apache Spark Databricks Runtime Delta Lake MLflow SQL Analytics Spark cloud integration collaborative workspace data engineering data lakes data pipelines data visualization job scheduling machine learning notebooks real-time analytics

Stored enrichment (catalog DB)

Category
Platform
Sub-category
Data Analytics Platform
Vendor
Databricks, Inc.
License
other_open
Year introduced
2013
Confidence
0.97
Version strategy
NOT_APPLICABLE

Maturity reasoning: Databricks appears frequently in data engineering and analytics job postings, especially alongside Spark, Delta Lake, and lakehouse stacks; strong vendor adoption and broad enterprise usage signal mainstream demand.

Skill profile (library / DB)

Skill nature
PLATFORM
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
9
Sub-category id
911
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • React Frontend Development Catalog dimension db id 96

    Library dimension (catalog)

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
React Frontend Development
d_init_01
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
machine learning Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Machine Learning id=1356 · machine-learning

Aliases — catalog

  • Machine Learning (CANONICAL)

Context tags (catalog)

Keras PyTorch TensorFlow cross-validation data preprocessing ensemble methods feature engineering hyperparameter tuning model evaluation natural language processing neural networks reinforcement learning scikit-learn supervised learning unsupervised learning

Stored enrichment (catalog DB)

Category
Concept
Sub-category
Machine Learning
Confidence
0.98
Version strategy
NOT_APPLICABLE

Maturity reasoning: Machine Learning appears in large volumes of job descriptions across data, product, and platform roles, and major cloud vendors (AWS, Google Cloud, Azure) offer dedicated ML services and certifications, indicating broad adoption.

Skill profile (library / DB)

Skill nature
CONCEPT
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
2
Sub-category id
1024
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • AI Governance and Model Security Catalog dimension db id 50

    Library dimension (catalog)

    Roles linked in library: AI Engineer, ML Engineer, MLOps Engineer

  • React Frontend Development Catalog dimension db id 96

    Library dimension (catalog)

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
AI Governance and Model Security
ai-governance-and-model-security
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
React Frontend Development
d_init_01
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill Tag Dimension Skill↔dim Role↔dim Outcome Notes
Hadoop in_db
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
Hive in_db
Local Persistence and Offline Behavior
local-persistence-and-offline-behavior
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Spark in_db
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
Scala in_db
Programming Languages for Data Work
programming-languages-for-data-work
Existing dimension (library) · Role↔dimension saved
Scala in_db
Programming Languages for ML Systems
programming-languages-for-ml-systems
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Airflow in_db
Workflow Orchestration for ML Pipelines
workflow-orchestration-for-ml-pipelines
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Jenkins in_db
CI/CD Pipeline Platforms
ci-cd-pipeline-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Jenkins in_db
CI/CD for Machine Learning
ci-cd-for-machine-learning
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
SQL Server in_db
Relational Database Design
relational-database-design
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
GitLab in_db
CI/CD Pipeline Platforms
ci-cd-pipeline-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
GitLab in_db
CI/CD for Machine Learning
ci-cd-for-machine-learning
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Snowflake in_db
Cloud Data Warehouses
cloud-data-warehouses
Existing dimension (library) · Role↔dimension saved
Databricks in_db
React Frontend Development
d_init_01
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
machine learning in_db
AI Governance and Model Security
ai-governance-and-model-security
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
machine learning in_db
React Frontend Development
d_init_01
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Library artifacts (this run)

Kind Detail DB id
canonical_skill_proposed Unix | type=Operating Systems subtype=general nature=CONCEPT lifespan=EVERGREEN
canonical_skill_proposed HDFS | type=Data Engineering Tools subtype=general nature=TOOL lifespan=MULTI_YEAR
canonical_skill_proposed Impala | type=Databases subtype=general nature=TOOL lifespan=MULTI_YEAR
canonical_skill_proposed Sqoop | type=Data Engineering Tools subtype=general nature=TOOL lifespan=MULTI_YEAR
canonical_skill_proposed T-SQL | type=Programming Languages subtype=general nature=LANGUAGE lifespan=MULTI_YEAR
canonical_skill_proposed stored procedures | type=Programming Languages subtype=general nature=CONCEPT lifespan=MULTI_YEAR
canonical_skill_proposed Jira | type=Project Management Tools subtype=general nature=TOOL lifespan=MULTI_YEAR
canonical_skill_proposed Confluence | type=Collaboration Tools subtype=general nature=TOOL lifespan=MULTI_YEAR
nano JD Parser — gpt-4.1-nano click to toggle
RoleBig Data Developer
CompanyIQVIA
DomainHealthcare
JD type pass
Show raw JSON
{
  "JD_type": "pass",
  "about_company": {
    "source_marker": {
      "first_5_words": "IQVIA is a leading global",
      "last_5_words": "create a healthier world."
    },
    "text": "IQVIA is a leading global provider of advanced analytics, technology solutions and clinical research services to the life sciences industry. We believe in pushing the boundaries of human science and data science to make the biggest impact possible \u2013 to help our customers create a healthier world.",
    "word_count": 47
  },
  "certifications": [],
  "company_name": "IQVIA",
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [
        "Life Sciences",
        "Clinical Research"
      ],
      "domain": "Healthcare"
    },
    "secondary": null
  },
  "education": [],
  "experience": null,
  "job_locations": [],
  "role": "Big Data Developer",
  "role_aliases": [
    "Big Data Engineer",
    "Data Engineer",
    "Big Data Specialist"
  ],
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 10,
      "heading": "Technical Skills",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "\u2022 Strong experience with big",
        "last_5_words": "integrate in an international team"
      },
      "text": "\u2022 Strong experience with big data technologies and associated tools such as Hadoop, Unix, HDFS, Hive, Impala, etc.\n\u2022 Proficient in using Spark/Scala \n\u2022 Experience with data Import/Export using Sqoop or similar tools\n\u2022 Experience using Airflow, Jenkins or similar other automation tools\n\u2022 Excellent knowledge of SQL Server and database structures\n\u2022 Demonstrate ability to write and optimize T-SQL queries and stored procedures\n\u2022 Experience working with Jira/Confluence/GitLab\n\u2022 Excellent organizational skills and ability to handle multiple activities with changing priorities simultaneously\n\u2022 Details oriented and strong problem-solving skills\n\u2022 Team player and able to integrate in an international team",
      "word_count": 118
    },
    {
      "bullet_count": 0,
      "heading": "The typical tasks and responsibilities of a Big Data Developer include",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Data Ingestion: Collecting and importing",
        "last_5_words": "skills and enhance productivity."
      },
      "text": "Data Ingestion: Collecting and importing data from various sources, such as databases, logs, APIs into the Big Data infrastructure. Data Processing: Designing data pipelines to clean, transform, and prepare raw data for analysis. This often involves using technologies like Apache Hadoop, Apache Spark. Data Storage: Selecting appropriate data storage technologies like Hadoop Distributed File System (HDFS), HIVE, IMPALA, or cloud-based storage solutions (Snowflake, Databricks). Data Analysis: Developing algorithms and implementing data processing techniques to extract meaningful insights, conduct statistical analysis( build machine learning models is advantage). Performance Optimization: Tuning and optimizing the performance of Big Data applications and infrastructure to ensure efficient data processing and reduced latency. Data Security: Implementing security measures to protect sensitive data and ensuring compliance with data protection regulations. Integration: Integrating Big Data solutions with existing enterprise systems and applications. Monitoring and Troubleshooting: Monitoring data pipelines and processes to identify and resolve issues or bottlenecks in the system. Collaboration: Collaborating with data scientists, data engineers, and other stakeholders to understand requirements and deliver valuable data-driven solutions. Continuous Learning: Staying updated with the latest Big Data technologies, tools, and industry trends to improve skills and enhance productivity.",
      "word_count": 265
    }
  ],
  "urls": [
    {
      "type": "careers",
      "url": "https://jobs.iqvia.com"
    }
  ]
}
API 1 — extract-from-jd click to toggle
{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Hadoop"
    },
    {
      "is_primary": true,
      "skill_name": "Unix"
    },
    {
      "is_primary": true,
      "skill_name": "HDFS"
    },
    {
      "is_primary": true,
      "skill_name": "Hive"
    },
    {
      "is_primary": true,
      "skill_name": "Impala"
    },
    {
      "is_primary": true,
      "skill_name": "Spark"
    },
    {
      "is_primary": true,
      "skill_name": "Scala"
    },
    {
      "is_primary": true,
      "skill_name": "Sqoop"
    },
    {
      "is_primary": true,
      "skill_name": "Airflow"
    },
    {
      "is_primary": true,
      "skill_name": "Jenkins"
    },
    {
      "is_primary": true,
      "skill_name": "SQL Server"
    },
    {
      "is_primary": true,
      "skill_name": "T-SQL"
    },
    {
      "is_primary": true,
      "skill_name": "stored procedures"
    },
    {
      "is_primary": true,
      "skill_name": "Jira"
    },
    {
      "is_primary": true,
      "skill_name": "Confluence"
    },
    {
      "is_primary": true,
      "skill_name": "GitLab"
    },
    {
      "is_primary": false,
      "skill_name": "Snowflake"
    },
    {
      "is_primary": false,
      "skill_name": "Databricks"
    },
    {
      "is_primary": false,
      "skill_name": "machine learning"
    }
  ],
  "jd_role": {
    "display_name": "Big Data Developer",
    "rationale": null,
    "role_aliases": [
      "Big Data Engineer",
      "Data Engineer",
      "Big Data Specialist"
    ],
    "role_archetype": "Data",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": {
      "source_marker": {
        "first_5_words": "IQVIA is a leading global",
        "last_5_words": "create a healthier world."
      },
      "text": "IQVIA is a leading global provider of advanced analytics, technology solutions and clinical research services to the life sciences industry. We believe in pushing the boundaries of human science and data science to make the biggest impact possible \u2013 to help our customers create a healthier world.",
      "word_count": 47
    },
    "certifications": [],
    "company_name": "IQVIA",
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [
          "Life Sciences",
          "Clinical Research"
        ],
        "domain": "Healthcare"
      },
      "secondary": null
    },
    "education": [],
    "experience": null,
    "job_locations": [],
    "role": "Big Data Developer",
    "role_aliases": [
      "Big Data Engineer",
      "Data Engineer",
      "Big Data Specialist"
    ],
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 10,
        "heading": "Technical Skills",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "\u2022 Strong experience with big",
          "last_5_words": "integrate in an international team"
        },
        "text": "\u2022 Strong experience with big data technologies and associated tools such as Hadoop, Unix, HDFS, Hive, Impala, etc.\n\u2022 Proficient in using Spark/Scala \n\u2022 Experience with data Import/Export using Sqoop or similar tools\n\u2022 Experience using Airflow, Jenkins or similar other automation tools\n\u2022 Excellent knowledge of SQL Server and database structures\n\u2022 Demonstrate ability to write and optimize T-SQL queries and stored procedures\n\u2022 Experience working with Jira/Confluence/GitLab\n\u2022 Excellent organizational skills and ability to handle multiple activities with changing priorities simultaneously\n\u2022 Details oriented and strong problem-solving skills\n\u2022 Team player and able to integrate in an international team",
        "word_count": 118
      },
      {
        "bullet_count": 0,
        "heading": "The typical tasks and responsibilities of a Big Data Developer include",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Data Ingestion: Collecting and importing",
          "last_5_words": "skills and enhance productivity."
        },
        "text": "Data Ingestion: Collecting and importing data from various sources, such as databases, logs, APIs into the Big Data infrastructure. Data Processing: Designing data pipelines to clean, transform, and prepare raw data for analysis. This often involves using technologies like Apache Hadoop, Apache Spark. Data Storage: Selecting appropriate data storage technologies like Hadoop Distributed File System (HDFS), HIVE, IMPALA, or cloud-based storage solutions (Snowflake, Databricks). Data Analysis: Developing algorithms and implementing data processing techniques to extract meaningful insights, conduct statistical analysis( build machine learning models is advantage). Performance Optimization: Tuning and optimizing the performance of Big Data applications and infrastructure to ensure efficient data processing and reduced latency. Data Security: Implementing security measures to protect sensitive data and ensuring compliance with data protection regulations. Integration: Integrating Big Data solutions with existing enterprise systems and applications. Monitoring and Troubleshooting: Monitoring data pipelines and processes to identify and resolve issues or bottlenecks in the system. Collaboration: Collaborating with data scientists, data engineers, and other stakeholders to understand requirements and deliver valuable data-driven solutions. Continuous Learning: Staying updated with the latest Big Data technologies, tools, and industry trends to improve skills and enhance productivity.",
        "word_count": 265
      }
    ],
    "urls": [
      {
        "type": "careers",
        "url": "https://jobs.iqvia.com"
      }
    ]
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "d2ebff95-b8c3-481f-a9e4-af393314fd5f",
  "stage3_signals": {
    "alias_found": true,
    "alias_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 1.0,
        "slug": "data-engineer",
        "total_count": null
      }
    ],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": [
          {
            "kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
            "sentence": "Data Ingestion: Collecting and importing data from various sources, such as databases, logs, APIs into the Big Data infrastructure.",
            "similarity": 0.7047
          },
          {
            "kra_text": "Monitors pipeline health, SLA breach alerts, and job failure notifications, and performs root cause analysis for data pipeline incidents.",
            "sentence": "Monitoring and Troubleshooting: Monitoring data pipelines and processes to identify and resolve issues or bottlenecks in the system.",
            "similarity": 0.6857
          },
          {
            "kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
            "sentence": "This often involves using technologies like Apache Hadoop, Apache Spark.",
            "similarity": 0.6721
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 0.6875,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "DevOps Engineer",
        "kra_matches": [
          {
            "kra_text": "Monitors CI/CD pipeline reliability, identifies bottlenecks in delivery workflows, and improves deployment frequency, lead time, and failure recovery rate.",
            "sentence": "Monitoring and Troubleshooting: Monitoring data pipelines and processes to identify and resolve issues or bottlenecks in the system.",
            "similarity": 0.6455
          },
          {
            "kra_text": "Builds and maintains CI/CD pipelines using Jenkins, GitHub Actions, GitLab CI, or CircleCI to automate build, test, security scanning, and deployment workflows.",
            "sentence": "Experience using Airflow, Jenkins or similar other automation tools",
            "similarity": 0.6117
          },
          {
            "kra_text": "Builds and maintains CI/CD pipelines using Jenkins, GitHub Actions, GitLab CI, or CircleCI to automate build, test, security scanning, and deployment workflows.",
            "sentence": "Experience working with Jira/Confluence/GitLab",
            "similarity": 0.5214
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 10,
        "score": 0.5929,
        "slug": "devops-engineer",
        "total_count": null
      },
      {
        "display_name": "Svelte Frontend Developer",
        "kra_matches": [
          {
            "kra_text": "performance tuning",
            "sentence": "Performance Optimization: Tuning and optimizing the performance of Big Data applications and infrastructure to ensure efficient data processing and reduced latency.",
            "similarity": 0.5799
          },
          {
            "kra_text": "backend data integration",
            "sentence": "Integration: Integrating Big Data solutions with existing enterprise systems and applications.",
            "similarity": 0.5732
          },
          {
            "kra_text": "backend data integration",
            "sentence": "Data Ingestion: Collecting and importing data from various sources, such as databases, logs, APIs into the Big Data infrastructure.",
            "similarity": 0.5289
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 92,
        "score": 0.5607,
        "slug": "svelte-frontend-developer",
        "total_count": null
      },
      {
        "display_name": "Java Backend Developer",
        "kra_matches": [
          {
            "kra_text": "backend performance tuning",
            "sentence": "Performance Optimization: Tuning and optimizing the performance of Big Data applications and infrastructure to ensure efficient data processing and reduced latency.",
            "similarity": 0.5993
          },
          {
            "kra_text": "external system integration",
            "sentence": "Integration: Integrating Big Data solutions with existing enterprise systems and applications.",
            "similarity": 0.528
          },
          {
            "kra_text": "service contract collaboration",
            "sentence": "Collaboration: Collaborating with data scientists, data engineers, and other stakeholders to understand requirements and deliver valuable data-driven solutions.",
            "similarity": 0.4725
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 79,
        "score": 0.5333,
        "slug": "java-backend-developer",
        "total_count": null
      },
      {
        "display_name": "ML Engineer",
        "kra_matches": [
          {
            "kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
            "sentence": "Data Processing: Designing data pipelines to clean, transform, and prepare raw data for analysis.",
            "similarity": 0.5731
          },
          {
            "kra_text": "Monitors production model behavior for data drift, concept drift, and prediction performance degradation using monitoring dashboards and alerting.",
            "sentence": "Monitoring and Troubleshooting: Monitoring data pipelines and processes to identify and resolve issues or bottlenecks in the system.",
            "similarity": 0.5484
          },
          {
            "kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
            "sentence": "Data Analysis: Developing algorithms and implementing data processing techniques to extract meaningful insights, conduct statistical analysis( build machine learning models is advantage).",
            "similarity": 0.4709
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 3,
        "score": 0.5308,
        "slug": "ml-engineer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "ML Engineer",
        "kra_matches": null,
        "matched_count": 4,
        "matched_skills": [
          "Airflow",
          "GitLab",
          "Jenkins",
          "Scala"
        ],
        "role_id": 3,
        "score": 0.25,
        "slug": "ml-engineer",
        "total_count": 16
      },
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": 3,
        "matched_skills": [
          "Apache Spark",
          "Hadoop",
          "Scala"
        ],
        "role_id": 2,
        "score": 0.1875,
        "slug": "data-engineer",
        "total_count": 16
      },
      {
        "display_name": "MLOps Engineer",
        "kra_matches": null,
        "matched_count": 2,
        "matched_skills": [
          "Airflow",
          "Scala"
        ],
        "role_id": 16,
        "score": 0.125,
        "slug": "ml-ops-engineer",
        "total_count": 16
      },
      {
        "display_name": "DevOps Engineer",
        "kra_matches": null,
        "matched_count": 2,
        "matched_skills": [
          "GitLab",
          "Jenkins"
        ],
        "role_id": 10,
        "score": 0.125,
        "slug": "devops-engineer",
        "total_count": 16
      },
      {
        "display_name": "iOS Developer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Hive"
        ],
        "role_id": 6,
        "score": 0.0625,
        "slug": "ios-engineer",
        "total_count": 16
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "DOMAIN",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 0.98,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 0.98,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [
      "Big Data Engineering",
      "Data Ingestion and Processing",
      "Data Pipeline Development",
      "Data Storage and Query Optimization",
      "Workflow Automation",
      "Performance Optimization",
      "Cross-functional Collaboration"
    ],
    "matched_kras": [
      "Collecting and importing data from various sources",
      "Designing data pipelines to clean, transform, and prepare raw data",
      "Selecting appropriate data storage technologies",
      "Developing algorithms and implementing data processing techniques",
      "Tuning and optimizing the performance of Big Data applications",
      "Monitoring data pipelines and processes",
      "Integrating Big Data solutions with existing enterprise systems"
    ],
    "matched_skills": [
      "Hadoop",
      "Unix",
      "HDFS",
      "Hive",
      "Impala",
      "Spark/Scala",
      "Sqoop",
      "Airflow",
      "Jenkins",
      "SQL Server",
      "T-SQL",
      "Jira",
      "Confluence",
      "GitLab"
    ],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Domain=Data Engineering \u0026 Analytics; The JD is centered on big data engineering, data ingestion/pipeline work, Spark, Hadoop ecosystem tools, and automation/orchestration, which best matches Data Engineer.",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 86,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": null,
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 5376,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Unix",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 5377,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "HDFS",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 5380,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Impala",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 5383,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Sqoop",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 5384,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "T-SQL",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 5385,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "stored procedures",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 5386,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Jira",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 5387,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Confluence",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}
API 2 — extract-details
{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2010,
      "existing_alias_text": "Hadoop",
      "input_term": "Hadoop",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Hadoop",
        "id": 1351,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "hadoop",
        "sub_category_id": 91,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 4198,
      "existing_alias_text": "Hive",
      "input_term": "Hive",
      "matched_canonical": {
        "category_id": 3,
        "display_name": "Hive",
        "id": 2754,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "hive",
        "sub_category_id": 2242,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2510,
      "existing_alias_text": "spark",
      "input_term": "Spark",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 272,
      "existing_alias_text": "Scala",
      "input_term": "Scala",
      "matched_canonical": {
        "category_id": 6,
        "display_name": "Scala",
        "id": 102,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "scala",
        "sub_category_id": 96,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 526,
      "existing_alias_text": "Airflow",
      "input_term": "Airflow",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "Airflow",
        "id": 265,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "airflow",
        "sub_category_id": 130,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 544,
      "existing_alias_text": "Jenkins",
      "input_term": "Jenkins",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "Jenkins",
        "id": 283,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "jenkins",
        "sub_category_id": 184,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 135,
      "existing_alias_text": "SQL Server",
      "input_term": "SQL Server",
      "matched_canonical": {
        "category_id": 3,
        "display_name": "SQL Server",
        "id": 18,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "sql-server",
        "sub_category_id": 29,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 540,
      "existing_alias_text": "GitLab",
      "input_term": "GitLab",
      "matched_canonical": {
        "category_id": 9,
        "display_name": "GitLab",
        "id": 279,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "gitlab",
        "sub_category_id": 170,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 299,
      "existing_alias_text": "Snowflake",
      "input_term": "Snowflake",
      "matched_canonical": {
        "category_id": 9,
        "display_name": "Snowflake",
        "id": 105,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "snowflake",
        "sub_category_id": 113,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 1838,
      "existing_alias_text": "Databricks",
      "input_term": "Databricks",
      "matched_canonical": {
        "category_id": 9,
        "display_name": "Databricks",
        "id": 1202,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "databricks",
        "sub_category_id": 911,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2015,
      "existing_alias_text": "Machine Learning",
      "input_term": "machine learning",
      "matched_canonical": {
        "category_id": 2,
        "display_name": "Machine Learning",
        "id": 1356,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CONCEPT",
        "slug": "machine-learning",
        "sub_category_id": 1024,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "Data Engineer",
      "id": 2,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-engineer",
      "source": "db"
    },
    {
      "display_name": "Android Developer",
      "id": 4,
      "rationale": null,
      "role_archetype": null,
      "slug": "android-engineer",
      "source": "db"
    },
    {
      "display_name": "Flutter Developer",
      "id": 74,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "flutter-developer",
      "source": "db"
    },
    {
      "display_name": "Hybrid Mobile Developer",
      "id": 11,
      "rationale": null,
      "role_archetype": null,
      "slug": "hybrid-mobile-developer",
      "source": "db"
    },
    {
      "display_name": "Native Mobile Developer",
      "id": 75,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "native-mobile-developer",
      "source": "db"
    },
    {
      "display_name": "React Native Developer",
      "id": 73,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "react-native-developer",
      "source": "db"
    },
    {
      "display_name": "iOS Developer",
      "id": 6,
      "rationale": null,
      "role_archetype": null,
      "slug": "ios-engineer",
      "source": "db"
    },
    {
      "display_name": "ML Engineer",
      "id": 3,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-engineer",
      "source": "db"
    },
    {
      "display_name": "MLOps Engineer",
      "id": 16,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-ops-engineer",
      "source": "db"
    },
    {
      "display_name": "DevOps Engineer",
      "id": 10,
      "rationale": null,
      "role_archetype": null,
      "slug": "devops-engineer",
      "source": "db"
    },
    {
      "display_name": ".NET Backend Developer",
      "id": 83,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "dotnet-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Backend Developer",
      "id": 1,
      "rationale": null,
      "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
      "slug": "backend-engineer",
      "source": "db"
    },
    {
      "display_name": "Kotlin Backend Developer",
      "id": 84,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "kotlin-server-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Node.js Backend Developer",
      "id": 82,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "node-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Python Backend Developer",
      "id": 80,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "python-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Ruby Backend Developer",
      "id": 85,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "ruby-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Scala Backend Developer",
      "id": 87,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "scala-backend-developer",
      "source": "db"
    },
    {
      "display_name": "AI Engineer",
      "id": 13,
      "rationale": null,
      "role_archetype": null,
      "slug": "ai-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD is centered on big data engineering, data ingestion/pipeline work, Spark, Hadoop ecosystem tools, and automation/orchestration, which best matches Data Engineer.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "ETL and ELT Tooling",
        "id": 24,
        "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
        "slug": "etl-and-elt-tooling",
        "source": "db"
      },
      "input_skill": "Hadoop",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Local Persistence and Offline Behavior",
        "id": 85,
        "rationale": "On-device storage used for caching, offline support, and durable client state. This cluster is coherent because iOS apps often need to preserve user progress and data when connectivity is limited.",
        "slug": "local-persistence-and-offline-behavior",
        "source": "db"
      },
      "input_skill": "Hive",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Android Developer",
          "id": 4,
          "rationale": null,
          "role_archetype": null,
          "slug": "android-engineer",
          "source": "db"
        },
        {
          "display_name": "Flutter Developer",
          "id": 74,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "flutter-developer",
          "source": "db"
        },
        {
          "display_name": "Hybrid Mobile Developer",
          "id": 11,
          "rationale": null,
          "role_archetype": null,
          "slug": "hybrid-mobile-developer",
          "source": "db"
        },
        {
          "display_name": "Native Mobile Developer",
          "id": 75,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "native-mobile-developer",
          "source": "db"
        },
        {
          "display_name": "React Native Developer",
          "id": 73,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "react-native-developer",
          "source": "db"
        },
        {
          "display_name": "iOS Developer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "ios-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "ETL and ELT Tooling",
        "id": 24,
        "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
        "slug": "etl-and-elt-tooling",
        "source": "db"
      },
      "input_skill": "Spark",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Data Work",
        "id": 21,
        "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
        "slug": "programming-languages-for-data-work",
        "source": "db"
      },
      "input_skill": "Scala",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for ML Systems",
        "id": 39,
        "rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
        "slug": "programming-languages-for-ml-systems",
        "source": "db"
      },
      "input_skill": "Scala",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        },
        {
          "display_name": "MLOps Engineer",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-ops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Workflow Orchestration for ML Pipelines",
        "id": 54,
        "rationale": "Workflow engines used to coordinate training, evaluation, deployment, and retraining jobs. This cluster covers dependencies, retries, scheduling, and pipeline composition for ML lifecycle automation.",
        "slug": "workflow-orchestration-for-ml-pipelines",
        "source": "db"
      },
      "input_skill": "Airflow",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        },
        {
          "display_name": "MLOps Engineer",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-ops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "CI/CD Pipeline Platforms",
        "id": 150,
        "rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
        "slug": "ci-cd-pipeline-platforms",
        "source": "db"
      },
      "input_skill": "Jenkins",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "DevOps Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "devops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "CI/CD for Machine Learning",
        "id": 56,
        "rationale": "Tools and platforms for automating ML model integration, testing, and deployment pipelines.",
        "slug": "ci-cd-for-machine-learning",
        "source": "db"
      },
      "input_skill": "Jenkins",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Relational Database Design",
        "id": 4,
        "rationale": "Modeling and operating relational persistence for backend services. Includes schema design, normalization, indexing, transactions, and query tuning for operational data stores.",
        "slug": "relational-database-design",
        "source": "db"
      },
      "input_skill": "SQL Server",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": ".NET Backend Developer",
          "id": 83,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "dotnet-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Backend Developer",
          "id": 1,
          "rationale": null,
          "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
          "slug": "backend-engineer",
          "source": "db"
        },
        {
          "display_name": "Kotlin Backend Developer",
          "id": 84,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "kotlin-server-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Node.js Backend Developer",
          "id": 82,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "node-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Python Backend Developer",
          "id": 80,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "python-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Ruby Backend Developer",
          "id": 85,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "ruby-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Scala Backend Developer",
          "id": 87,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "scala-backend-developer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "CI/CD Pipeline Platforms",
        "id": 150,
        "rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
        "slug": "ci-cd-pipeline-platforms",
        "source": "db"
      },
      "input_skill": "GitLab",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "DevOps Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "devops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "CI/CD for Machine Learning",
        "id": 56,
        "rationale": "Tools and platforms for automating ML model integration, testing, and deployment pipelines.",
        "slug": "ci-cd-for-machine-learning",
        "source": "db"
      },
      "input_skill": "GitLab",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Data Warehouses",
        "id": 22,
        "rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
        "slug": "cloud-data-warehouses",
        "source": "db"
      },
      "input_skill": "Snowflake",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "React Frontend Development",
        "id": 96,
        "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Databricks",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "AI Governance and Model Security",
        "id": 50,
        "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
        "slug": "ai-governance-and-model-security",
        "source": "db"
      },
      "input_skill": "machine learning",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "AI Engineer",
          "id": 13,
          "rationale": null,
          "role_archetype": null,
          "slug": "ai-engineer",
          "source": "db"
        },
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        },
        {
          "display_name": "MLOps Engineer",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-ops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "React Frontend Development",
        "id": 96,
        "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "machine learning",
      "llm_role": null,
      "roles_from_db": []
    }
  ],
  "input_final_skills": [
    "Hadoop",
    "Unix",
    "HDFS",
    "Hive",
    "Impala",
    "Spark",
    "Scala",
    "Sqoop",
    "Airflow",
    "Jenkins",
    "SQL Server",
    "T-SQL",
    "stored procedures",
    "Jira",
    "Confluence",
    "GitLab",
    "Snowflake",
    "Databricks",
    "machine learning"
  ],
  "input_llm_skills": [
    "Hadoop",
    "Unix",
    "HDFS",
    "Hive",
    "Impala",
    "Spark",
    "Scala",
    "Sqoop",
    "Airflow",
    "Jenkins",
    "SQL Server",
    "T-SQL",
    "stored procedures",
    "Jira",
    "Confluence",
    "GitLab",
    "Snowflake",
    "Databricks",
    "machine learning"
  ],
  "new_aliases_persisted": 0,
  "run_id": "d2ebff95-b8c3-481f-a9e4-af393314fd5f",
  "skills_detail": [
    {
      "aliases_in_db": [
        {
          "alias_text": "Hadoop",
          "alias_type": "CANONICAL",
          "id": 2010,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Hadoop",
        "id": 1351,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "hadoop",
        "sub_category_id": 91,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "ETL and ELT Tooling",
            "id": 24,
            "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
            "slug": "etl-and-elt-tooling",
            "source": "db"
          },
          "input_skill": "Hadoop",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Hadoop",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Unix",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Operating Systems",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "UNVERSIONED",
          "volatility": "STABLE"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "unix",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "HDFS",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "TOOL",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "hdfs",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Hive",
          "alias_type": "CANONICAL",
          "id": 4198,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 3,
        "display_name": "Hive",
        "id": 2754,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "hive",
        "sub_category_id": 2242,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Local Persistence and Offline Behavior",
            "id": 85,
            "rationale": "On-device storage used for caching, offline support, and durable client state. This cluster is coherent because iOS apps often need to preserve user progress and data when connectivity is limited.",
            "slug": "local-persistence-and-offline-behavior",
            "source": "db"
          },
          "input_skill": "Hive",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Android Developer",
              "id": 4,
              "rationale": null,
              "role_archetype": null,
              "slug": "android-engineer",
              "source": "db"
            },
            {
              "display_name": "Flutter Developer",
              "id": 74,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "flutter-developer",
              "source": "db"
            },
            {
              "display_name": "Hybrid Mobile Developer",
              "id": 11,
              "rationale": null,
              "role_archetype": null,
              "slug": "hybrid-mobile-developer",
              "source": "db"
            },
            {
              "display_name": "Native Mobile Developer",
              "id": 75,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "native-mobile-developer",
              "source": "db"
            },
            {
              "display_name": "React Native Developer",
              "id": 73,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "react-native-developer",
              "source": "db"
            },
            {
              "display_name": "iOS Developer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "ios-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Hive",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Impala",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Databases",
          "skill_nature": "TOOL",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "impala",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Apache Spark",
          "alias_type": "CANONICAL",
          "id": 2004,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "apache spark 3",
          "alias_type": "VERSION",
          "id": 2006,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark",
          "alias_type": "VERSION",
          "id": 2510,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3",
          "alias_type": "VERSION",
          "id": 2007,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3.x",
          "alias_type": "VERSION",
          "id": 2009,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark3",
          "alias_type": "VERSION",
          "id": 2008,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "ETL and ELT Tooling",
            "id": 24,
            "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
            "slug": "etl-and-elt-tooling",
            "source": "db"
          },
          "input_skill": "Spark",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Spark",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Scala",
          "alias_type": "CANONICAL",
          "id": 272,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 6,
        "display_name": "Scala",
        "id": 102,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "scala",
        "sub_category_id": 96,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Data Work",
            "id": 21,
            "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
            "slug": "programming-languages-for-data-work",
            "source": "db"
          },
          "input_skill": "Scala",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for ML Systems",
            "id": 39,
            "rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
            "slug": "programming-languages-for-ml-systems",
            "source": "db"
          },
          "input_skill": "Scala",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            },
            {
              "display_name": "MLOps Engineer",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-ops-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Scala",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Sqoop",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "TOOL",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "sqoop",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Airflow",
          "alias_type": "CANONICAL",
          "id": 526,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "airflow 2",
          "alias_type": "VERSION",
          "id": 2477,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "airflow-2",
          "alias_type": "VERSION",
          "id": 2478,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "airflow2",
          "alias_type": "VERSION",
          "id": 2476,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "airflow2.x",
          "alias_type": "VERSION",
          "id": 2479,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "apache airflow 2",
          "alias_type": "VERSION",
          "id": 2480,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "Airflow",
        "id": 265,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "airflow",
        "sub_category_id": 130,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Workflow Orchestration for ML Pipelines",
            "id": 54,
            "rationale": "Workflow engines used to coordinate training, evaluation, deployment, and retraining jobs. This cluster covers dependencies, retries, scheduling, and pipeline composition for ML lifecycle automation.",
            "slug": "workflow-orchestration-for-ml-pipelines",
            "source": "db"
          },
          "input_skill": "Airflow",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            },
            {
              "display_name": "MLOps Engineer",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-ops-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Airflow",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Jenkins",
          "alias_type": "CANONICAL",
          "id": 544,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "Jenkins",
        "id": 283,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "jenkins",
        "sub_category_id": 184,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "CI/CD Pipeline Platforms",
            "id": 150,
            "rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
            "slug": "ci-cd-pipeline-platforms",
            "source": "db"
          },
          "input_skill": "Jenkins",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "DevOps Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "devops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "CI/CD for Machine Learning",
            "id": 56,
            "rationale": "Tools and platforms for automating ML model integration, testing, and deployment pipelines.",
            "slug": "ci-cd-for-machine-learning",
            "source": "db"
          },
          "input_skill": "Jenkins",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Jenkins",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "SQL Server",
          "alias_type": "CANONICAL",
          "id": 135,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 2000",
          "alias_type": "VERSION",
          "id": 138,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 2005",
          "alias_type": "VERSION",
          "id": 139,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 2008",
          "alias_type": "VERSION",
          "id": 140,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 2012",
          "alias_type": "VERSION",
          "id": 141,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 2014",
          "alias_type": "VERSION",
          "id": 142,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 2016",
          "alias_type": "VERSION",
          "id": 143,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 2017",
          "alias_type": "VERSION",
          "id": 144,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 2019",
          "alias_type": "VERSION",
          "id": 145,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 2022",
          "alias_type": "VERSION",
          "id": 146,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 6.5",
          "alias_type": "VERSION",
          "id": 136,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "SQL Server 7.0",
          "alias_type": "VERSION",
          "id": 137,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 3,
        "display_name": "SQL Server",
        "id": 18,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "sql-server",
        "sub_category_id": 29,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Relational Database Design",
            "id": 4,
            "rationale": "Modeling and operating relational persistence for backend services. Includes schema design, normalization, indexing, transactions, and query tuning for operational data stores.",
            "slug": "relational-database-design",
            "source": "db"
          },
          "input_skill": "SQL Server",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": ".NET Backend Developer",
              "id": 83,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "dotnet-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Backend Developer",
              "id": 1,
              "rationale": null,
              "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
              "slug": "backend-engineer",
              "source": "db"
            },
            {
              "display_name": "Kotlin Backend Developer",
              "id": 84,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "kotlin-server-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Node.js Backend Developer",
              "id": 82,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "node-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Python Backend Developer",
              "id": 80,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "python-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Ruby Backend Developer",
              "id": 85,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "ruby-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Scala Backend Developer",
              "id": 87,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "scala-backend-developer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "SQL Server",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "T-SQL",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Programming Languages",
          "skill_nature": "LANGUAGE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "t-sql",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "stored procedures",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Programming Languages",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "stored-procedures",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Jira",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Project Management Tools",
          "skill_nature": "TOOL",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "jira",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Confluence",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Collaboration Tools",
          "skill_nature": "TOOL",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "confluence",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "GitLab",
          "alias_type": "CANONICAL",
          "id": 540,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 9,
        "display_name": "GitLab",
        "id": 279,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "gitlab",
        "sub_category_id": 170,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "CI/CD Pipeline Platforms",
            "id": 150,
            "rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
            "slug": "ci-cd-pipeline-platforms",
            "source": "db"
          },
          "input_skill": "GitLab",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "DevOps Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "devops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "CI/CD for Machine Learning",
            "id": 56,
            "rationale": "Tools and platforms for automating ML model integration, testing, and deployment pipelines.",
            "slug": "ci-cd-for-machine-learning",
            "source": "db"
          },
          "input_skill": "GitLab",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "GitLab",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Snowflake",
          "alias_type": "CANONICAL",
          "id": 299,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 9,
        "display_name": "Snowflake",
        "id": 105,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "snowflake",
        "sub_category_id": 113,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Data Warehouses",
            "id": 22,
            "rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
            "slug": "cloud-data-warehouses",
            "source": "db"
          },
          "input_skill": "Snowflake",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Snowflake",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Databricks",
          "alias_type": "CANONICAL",
          "id": 1838,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 9,
        "display_name": "Databricks",
        "id": 1202,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "databricks",
        "sub_category_id": 911,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "React Frontend Development",
            "id": 96,
            "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Databricks",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Databricks",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Machine Learning",
          "alias_type": "CANONICAL",
          "id": 2015,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 2,
        "display_name": "Machine Learning",
        "id": 1356,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CONCEPT",
        "slug": "machine-learning",
        "sub_category_id": 1024,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "AI Governance and Model Security",
            "id": 50,
            "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
            "slug": "ai-governance-and-model-security",
            "source": "db"
          },
          "input_skill": "machine learning",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "AI Engineer",
              "id": 13,
              "rationale": null,
              "role_archetype": null,
              "slug": "ai-engineer",
              "source": "db"
            },
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            },
            {
              "display_name": "MLOps Engineer",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-ops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "React Frontend Development",
            "id": 96,
            "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "machine learning",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "machine learning",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": [
    "Unix",
    "HDFS",
    "Impala",
    "Sqoop",
    "T-SQL",
    "stored procedures",
    "Jira",
    "Confluence"
  ]
}
API 3 — final-role-output
{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD is centered on big data engineering, data ingestion/pipeline work, Spark, Hadoop ecosystem tools, and automation/orchestration, which best matches Data Engineer.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "Hadoop",
      "tag": "in_db"
    },
    {
      "skill": "Unix",
      "tag": "new"
    },
    {
      "skill": "HDFS",
      "tag": "new"
    },
    {
      "skill": "Hive",
      "tag": "in_db"
    },
    {
      "skill": "Impala",
      "tag": "new"
    },
    {
      "skill": "Spark",
      "tag": "in_db"
    },
    {
      "skill": "Scala",
      "tag": "in_db"
    },
    {
      "skill": "Sqoop",
      "tag": "new"
    },
    {
      "skill": "Airflow",
      "tag": "in_db"
    },
    {
      "skill": "Jenkins",
      "tag": "in_db"
    },
    {
      "skill": "SQL Server",
      "tag": "in_db"
    },
    {
      "skill": "T-SQL",
      "tag": "new"
    },
    {
      "skill": "stored procedures",
      "tag": "new"
    },
    {
      "skill": "Jira",
      "tag": "new"
    },
    {
      "skill": "Confluence",
      "tag": "new"
    },
    {
      "skill": "GitLab",
      "tag": "in_db"
    },
    {
      "skill": "Snowflake",
      "tag": "in_db"
    },
    {
      "skill": "Databricks",
      "tag": "in_db"
    },
    {
      "skill": "machine learning",
      "tag": "in_db"
    }
  ],
  "llm_cost_api1_usd": null,
  "llm_cost_api2_usd": null,
  "llm_cost_api3_usd": null,
  "llm_cost_total_usd": null,
  "persistence": {
    "items": [
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "ETL and ELT Tooling",
          "id": 24,
          "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
          "slug": "etl-and-elt-tooling",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "Hadoop",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1351,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Local Persistence and Offline Behavior",
          "id": 85,
          "rationale": "On-device storage used for caching, offline support, and durable client state. This cluster is coherent because iOS apps often need to preserve user progress and data when connectivity is limited.",
          "slug": "local-persistence-and-offline-behavior",
          "source": "db"
        },
        "dimension_id": 85,
        "input_skill": "Hive",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Android Developer",
            "id": 4,
            "rationale": null,
            "role_archetype": null,
            "slug": "android-engineer",
            "source": "db"
          },
          {
            "display_name": "Flutter Developer",
            "id": 74,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "flutter-developer",
            "source": "db"
          },
          {
            "display_name": "Hybrid Mobile Developer",
            "id": 11,
            "rationale": null,
            "role_archetype": null,
            "slug": "hybrid-mobile-developer",
            "source": "db"
          },
          {
            "display_name": "Native Mobile Developer",
            "id": 75,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "native-mobile-developer",
            "source": "db"
          },
          {
            "display_name": "React Native Developer",
            "id": 73,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "react-native-developer",
            "source": "db"
          },
          {
            "display_name": "iOS Developer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "ios-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2754,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "ETL and ELT Tooling",
          "id": 24,
          "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
          "slug": "etl-and-elt-tooling",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "Spark",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1350,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Data Work",
          "id": 21,
          "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
          "slug": "programming-languages-for-data-work",
          "source": "db"
        },
        "dimension_id": 21,
        "input_skill": "Scala",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 102,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for ML Systems",
          "id": 39,
          "rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
          "slug": "programming-languages-for-ml-systems",
          "source": "db"
        },
        "dimension_id": 39,
        "input_skill": "Scala",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          },
          {
            "display_name": "MLOps Engineer",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-ops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 102,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Workflow Orchestration for ML Pipelines",
          "id": 54,
          "rationale": "Workflow engines used to coordinate training, evaluation, deployment, and retraining jobs. This cluster covers dependencies, retries, scheduling, and pipeline composition for ML lifecycle automation.",
          "slug": "workflow-orchestration-for-ml-pipelines",
          "source": "db"
        },
        "dimension_id": 54,
        "input_skill": "Airflow",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          },
          {
            "display_name": "MLOps Engineer",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-ops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 265,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "CI/CD Pipeline Platforms",
          "id": 150,
          "rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
          "slug": "ci-cd-pipeline-platforms",
          "source": "db"
        },
        "dimension_id": 150,
        "input_skill": "Jenkins",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "DevOps Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "devops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 283,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "CI/CD for Machine Learning",
          "id": 56,
          "rationale": "Tools and platforms for automating ML model integration, testing, and deployment pipelines.",
          "slug": "ci-cd-for-machine-learning",
          "source": "db"
        },
        "dimension_id": 56,
        "input_skill": "Jenkins",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 283,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Relational Database Design",
          "id": 4,
          "rationale": "Modeling and operating relational persistence for backend services. Includes schema design, normalization, indexing, transactions, and query tuning for operational data stores.",
          "slug": "relational-database-design",
          "source": "db"
        },
        "dimension_id": 4,
        "input_skill": "SQL Server",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": ".NET Backend Developer",
            "id": 83,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "dotnet-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Backend Developer",
            "id": 1,
            "rationale": null,
            "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
            "slug": "backend-engineer",
            "source": "db"
          },
          {
            "display_name": "Kotlin Backend Developer",
            "id": 84,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "kotlin-server-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Node.js Backend Developer",
            "id": 82,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "node-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Python Backend Developer",
            "id": 80,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "python-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Ruby Backend Developer",
            "id": 85,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "ruby-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Scala Backend Developer",
            "id": 87,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "scala-backend-developer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 18,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "CI/CD Pipeline Platforms",
          "id": 150,
          "rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
          "slug": "ci-cd-pipeline-platforms",
          "source": "db"
        },
        "dimension_id": 150,
        "input_skill": "GitLab",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "DevOps Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "devops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 279,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "CI/CD for Machine Learning",
          "id": 56,
          "rationale": "Tools and platforms for automating ML model integration, testing, and deployment pipelines.",
          "slug": "ci-cd-for-machine-learning",
          "source": "db"
        },
        "dimension_id": 56,
        "input_skill": "GitLab",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 279,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Data Warehouses",
          "id": 22,
          "rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
          "slug": "cloud-data-warehouses",
          "source": "db"
        },
        "dimension_id": 22,
        "input_skill": "Snowflake",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 105,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "React Frontend Development",
          "id": 96,
          "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 96,
        "input_skill": "Databricks",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 1202,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "AI Governance and Model Security",
          "id": 50,
          "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
          "slug": "ai-governance-and-model-security",
          "source": "db"
        },
        "dimension_id": 50,
        "input_skill": "machine learning",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "AI Engineer",
            "id": 13,
            "rationale": null,
            "role_archetype": null,
            "slug": "ai-engineer",
            "source": "db"
          },
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          },
          {
            "display_name": "MLOps Engineer",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-ops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1356,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "React Frontend Development",
          "id": 96,
          "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 96,
        "input_skill": "machine learning",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 1356,
        "skill_tag": "in_db",
        "skipped_reason": null
      }
    ],
    "new_skills_created": 0,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 0,
    "skipped": 0
  },
  "planner_output": null,
  "run_id": "d2ebff95-b8c3-481f-a9e4-af393314fd5f"
}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…