Pipeline run

265899c9-6b42-43cb-a0f8-64ac64ac5a98

Pipeline LLM cost (USD)

API 1: $0.0036 API 2: $0.1370 API 3: $0.0000 Total: $0.1406

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description

role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd

Nature of work · API and backend integration

Build and operate Azure-based big data pipelines: ingest, model, and process data with Python, PySpark, SQL, Databricks, Data Factory, and CI/CD; also support MLOps pipelines and work with engineers, scientists, and SMEs.

""Ingest process and model data from heterogeneous data sources to support data science projects""

Tech stack maturity

Mainstream Modern

AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)

1.70 / 5

· Title match

✓ Has AI skill

✓ AI skill (primary)

· AI skill (secondary)

· On AI team

· Builds AI products

vocab breakdown (legacy)

Assistants (×1): —

Frameworks (×2): —

Models / concepts (×3): MLOps, AI, Machine Learning, Artificial Intelligence

Evidence — skills matched in JD (19)

Azure Python PySpark SQL MLflow Azure Machine Learning Azure Data Factory Databricks CI/CD Big Data Data Modeling Data Pipelines Data Ingestion Stream Processing APIs MLOps Notebooks Queueing Testability

Skill cluster (4 dimension groups, role-scoped)

CI/CD for Machine Learning

MLOps

Cloud Provider Platforms

Azure

Python Programming

Python

Cross-cutting / unaligned

PySpark SQL MLflow Azure Machine Learning Azure Data Factory Databricks CI/CD Big Data Data Modeling Data Pipelines Data Ingestion Stream Processing APIs Notebooks Queueing Testability

Show KRA description ↓

Big data design and analysis data modeling development deployment and CICD operations of big data pipelines Collaborate with a team of data engineers data scientists and business subject matter experts to process data and prepare data sources Mentor other data engineers to develop a world class data engineering team Ingest process and model data from heterogeneous data sources to support data science projects Bachelors degree or higher in Computer Science or equivalent degree and 3 to 10 years related working experience In depth experience with a big data cloud platform preferably Azure Strong grasp of programming languages such as Python PySpark or equivalent and willingness to learn new ones Experience writing database heavy services or APIs Experience building and optimizing data pipelines architectures and data sets Working knowledge of queueing stream processing and highly scalable data stores Experience working with and supporting cross functional teams Strong understanding of structuring code for testability Professional experience implementing and maintaining MLOps pipelines in MLflow or AzureML Professional experience implementing data ingestion pipelines using Data Factory Professional experience with Databricks and coding with notebooks Professional experience processing and manipulating data using SQL and Python Professional experience with user training customer support and coordination with cross functional teams

Status: completed Created: 2026-05-13T12:47:07.675750Z Updated: 2026-05-13T12:49:06.816662Z API 3 duration: 8807 ms

Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

slug: data-engineer · id: 6 · source: db

The primary skills indicate a strong focus on data processing, SQL, and Azure technologies, aligning well with a Data Engineer's responsibilities.

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

New skills

Skill↔dim saved

Role↔dim saved

Skipped

Job description

About the job
The corporation is seeking talented and ambitious big data engineers to join the AI Center of Excellence team The team designs develops and deploys industry leading data science and big data engineering solutions using Artificial Intelligence Machine Learning and big data platforms and technologies to increase efficiency in complex work processes enable and empower data driven decision making planning and execution throughout the lifecycle of projects and improve outcomes to the organization and its customersJob Responsibilities

Big data design and analysis data modeling development deployment and CICD operations of big data pipelines

Collaborate with a team of data engineers data scientists and business subject matter experts to process data and prepare data sources

Mentor other data engineers to develop a world class data engineering team

Ingest process and model data from heterogeneous data sources to support data science projects

Basic Qualifications

Bachelors degree or higher in Computer Science or equivalent degree and 3 to 10 years related working experience

In depth experience with a big data cloud platform preferably Azure

Strong grasp of programming languages such as Python PySpark or equivalent and willingness to learn new ones

Experience writing database heavy services or APIs

Experience building and optimizing data pipelines architectures and data sets

Working knowledge of queueing stream processing and highly scalable data stores

Experience working with and supporting cross functional teams

Strong understanding of structuring code for testability

Preferred Qualifications

Professional experience implementing and maintaining MLOps pipelines in MLflow or AzureML

Professional experience implementing data ingestion pipelines using Data Factory

Professional experience with Databricks and coding with notebooks

Professional experience processing and manipulating data using SQL and Python

Professional experience with user training customer support and coordination with cross functional teams

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Azure Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Azure id=164 · azure

Aliases — catalog

Compute right-sizing (CANONICAL) primary

Context tags (catalog)

CPU VM sizing autoscaling capacity planning cloud cost optimization instance sizing load testing memory performance profiling reserved instances resource utilization rightsizing spot instances utilization workload analysis

Stored enrichment (catalog DB)

Category: Methodology
Sub-category: Capacity Planning Methodology
Confidence: 0.78
Version strategy: NOT_APPLICABLE

Maturity reasoning: Common cloud/capacity-planning practice; widely referenced in AWS/Azure/GCP cost-optimization docs and frequently appears in FinOps and SRE job descriptions focused on reducing overprovisioning.

Skill profile (library / DB)

Skill nature: PLATFORM
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 13
Sub-category id: 161
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Cloud Platform Operations Catalog dimension db id 26

Library dimension (catalog)

Roles linked in library: DevOps Engineer
Cloud Security Platforms Catalog dimension db id 332

Library dimension (catalog)

Roles linked in library: Cybersecurity Engineer

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Cloud Platform Operations cloud-platform-operations	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Cloud Security Platforms cloud-security-platforms	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Python Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Python id=393 · python

Aliases — catalog

Cobalt Strike (CANONICAL) primary

Context tags (catalog)

Malleable C2 beacon credential dumping kerberos lateral movement payload phishing post-exploitation privilege escalation psexec red team sleep mask smb stager team server

Stored enrichment (catalog DB)

Category: Tool
Sub-category: Adversary Simulation Tool
Vendor: Fortra
License: proprietary
Year introduced: 2012
Confidence: 0.98
Version strategy: NOT_APPLICABLE

Maturity reasoning: Appears in a limited set of red-team/pentest JDs and security vendor training, but far below mainstream devops tools; market signal is specialized adversary-simulation usage rather than broad hiring demand.

Skill profile (library / DB)

Skill nature: LANGUAGE
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 5
Sub-category id: 54
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Analytical Programming Languages Catalog dimension db id 82

Library dimension (catalog)

Roles linked in library: Data Analyst, Data Scientist
Automation Scripting and CLI Catalog dimension db id 48

Library dimension (catalog)

Roles linked in library: Azure Cloud Engineer, Cloud Engineer
Automation and Scripting for Operations Catalog dimension db id 361

Library dimension (catalog)

Roles linked in library: Virtualization Engineer
Network Automation and Scripting Catalog dimension db id 285

Library dimension (catalog)

Roles linked in library: Network Engineer
Programming Languages for AI Workflows Catalog dimension db id 261

Library dimension (catalog)

Roles linked in library: AI Engineer
Programming Languages for Backend Systems Catalog dimension db id 140

Library dimension (catalog)

Roles linked in library: Backend Engineer
Programming Languages for Data Work Catalog dimension db id 67

Library dimension (catalog)

Roles linked in library: Data Engineer
Programming Languages for ML Systems Catalog dimension db id 113

Library dimension (catalog)

Roles linked in library: Machine Learning Engineer
Programming Languages for Security Work Catalog dimension db id 328

Library dimension (catalog)

Roles linked in library: Cybersecurity Engineer
Programming Languages for Test Automation Catalog dimension db id 193

Library dimension (catalog)

Roles linked in library: Automation Tester
Security Automation and Scripting Catalog dimension db id 258

Library dimension (catalog)

Roles linked in library: Cybersecurity Engineer

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Analytical Programming Languages analytical-programming-languages	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Automation Scripting and CLI automation-scripting-and-cli	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Automation and Scripting for Operations automation-and-scripting-for-operations	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Network Automation and Scripting network-automation-and-scripting	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for AI Workflows programming-languages-for-ai-workflows	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for Backend Systems programming-languages-for-backend-systems	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for Data Work programming-languages-for-data-work	✓	✓	Existing dimension (library) · Role↔dimension saved
Programming Languages for ML Systems programming-languages-for-ml-systems	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for Security Work programming-languages-for-security-work	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for Test Automation programming-languages-for-test-automation	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Security Automation and Scripting security-automation-and-scripting	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

PySpark Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.93

PySpark appears in many data engineering and analytics job descriptions, especially for Spark-based ETL and ML pipelines; it remains a standard skill alongside Databricks and AWS EMR.

Vendor & license

Apache Software Foundation ·apache_2 ·since 2010 (0.98)

Context keywords

Spark SQL DataFrame RDD Spark Streaming Structured Streaming Delta Lake Hive Parquet YARN Databricks EMR AWS Glue ETL partitioning broadcast join

Ambiguity low

PySpark is a specific Python API for Apache Spark and is usually named distinctly in JDs. It is unlikely to be reasonably confused with another catalog skill in typical job descriptions.

Versioning

Not versioned

Type assignment

Library ·data_processing_library confidence 0.93

PySpark is best classified as a Library because it is a Python package imported and used from application code, rather than a hosted environment or a framework you build inside.

Derived legacy fields

Category: Library
Sub-category: data_processing_library
Skill nature: LIBRARY
Volatility: STABLE
Typical lifespan: EVERGREEN
Version strategy: NOT_APPLICABLE

Dimensions (API 2 worklist)

Analytical Programming and Notebook Languages Proposed / LLM

Proposed / LLM dimension (no DB id yet)
Version Control Systems Catalog dimension db id 365

Library dimension (catalog)

Locked dimensions (v3 placement)

Analytical Programming and Notebook Languages
Pipeline tentative id

Languages and notebook/script-based coding used to clean, transform, analyze, and prototype data workflows and models. Includes Python, pandas, SQL, PySpark, notebook scripting, dataframe manipulation, exploratory analysis, ETL/data transformation logic, and other reproducible analytical code.
Distributed Data Processing
Pipeline tentative id

Covers writing and optimizing distributed batch or streaming data transformations on large datasets. PySpark belongs here because it is a Spark-based API used to express parallel data processing jobs at scale.

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Analytical Programming and Notebook Languages d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

SQL Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: SQL id=2601 · sql

Aliases — from this run (catalog unavailable)

SQL (CANONICAL)

Skill profile (library / DB)

Skill nature: LANGUAGE
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 5
Sub-category id: 55
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Relational Data Modeling Catalog dimension db id 71

Library dimension (catalog)

Roles linked in library: Backend Engineer, Data Engineer
Version Control Systems Catalog dimension db id 365

Library dimension (catalog)

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Relational Data Modeling relational-data-modeling	✓	✓	Existing dimension (library) · Role↔dimension saved
Version Control Systems d_init_01	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

MLflow Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: MLflow id=2640 · mlflow

Aliases — catalog

effects (CANONICAL) primary

Context tags (catalog)

asynchronous effects context API data flow event handling functional programming immutable state middleware observable pure functions react reactive programming redux side effects state management state transitions

Stored enrichment (catalog DB)

Category: Concept
Sub-category: State Side Effect Concept
Confidence: 0.74
Version strategy: NOT_APPLICABLE

Maturity reasoning: Effects are increasingly listed in modern frontend/state-management JDs and docs (e.g., React/Redux side-effect handling, RxJS, Effector), but there is no single universal standard or dominant hiring staple yet.

Skill profile (library / DB)

Skill nature: TOOL
Volatility: EMERGING
Typical lifespan: EVERGREEN
Category id: 11
Sub-category id: 2151
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Model Serving Deployment and Runtime Packaging Catalog dimension db id 52

Library dimension (catalog)

Roles linked in library: MLOps Engineer, Machine Learning Engineer
Project Delivery and Coordination Catalog dimension db id 366

Library dimension (catalog)
Version Control Systems Catalog dimension db id 365

Library dimension (catalog)

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Model Serving Deployment and Runtime Packaging model-serving-deployment-and-runtime-packaging	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Project Delivery and Coordination d_init_02	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Version Control Systems d_init_01	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Azure Machine Learning Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Azure Machine Learning id=385 · azure-machine-learning

Aliases — catalog

Forcepoint (CANONICAL) primary

Context tags (catalog)

CASB DLP URL filtering cloud access security broker content inspection data classification data loss prevention encryption endpoint protection incident response insider threat policy enforcement proxy secure web gateway web gateway

Stored enrichment (catalog DB)

Category: Platform
Sub-category: Data Security Platform
Vendor: Forcepoint
License: proprietary
Year introduced: 2016
Confidence: 0.95
Version strategy: NOT_APPLICABLE

Maturity reasoning: Forcepoint appears in some security/data-loss-prevention job postings, but JD volume is far below mainstream platforms like Microsoft Purview or Palo Alto; it’s a specialized enterprise tool rather than a broad hiring staple.

Skill profile (library / DB)

Skill nature: PLATFORM
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 13
Sub-category id: 326
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Cloud ML Platform Operations Catalog dimension db id 65

Library dimension (catalog)

Roles linked in library: MLOps Engineer

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Cloud ML Platform Operations cloud-ml-platform-operations	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Azure Data Factory Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Azure Data Factory id=467 · azure-data-factory

Aliases — catalog

Context tags (catalog)

App Store Connect Apple ID JWT OAuth 2.0 OpenID Connect Sign in with Google Swift authentication flow authorization code bundle ID client secret iOS macOS nonce redirect URI

Stored enrichment (catalog DB)

Category: Service
Sub-category: Identity Service
Vendor: Apple
License: proprietary
Year introduced: 2019
Confidence: 0.90
Version strategy: NOT_APPLICABLE

Maturity reasoning: Commonly listed in mobile/web auth JDs for iOS apps and Apple ecosystem integrations; Apple’s official docs and App Store requirements keep it a standard identity option rather than a niche add-on.

Skill profile (library / DB)

Skill nature: CLOUD_SERVICE
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 14
Sub-category id: 385
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Cloud Data Platform Services Catalog dimension db id 81

Library dimension (catalog)

Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Cloud Data Platform Services cloud-data-platform-services	✓	✓	Existing dimension (library) · Role↔dimension saved

Databricks Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Databricks id=386 · databricks

Aliases — catalog

data classification (CANONICAL) primary

Context tags (catalog)

DLP PHI PII access controls categorization classification policy compliance confidentiality data governance data integrity data labeling data lifecycle data lineage data loss prevention data privacy data quality data stewardship data taxonomy information classification labeling machine learning metadata public internal confidential records management retention schedule sensitivity labeling supervised learning taxonomy unsupervised learning

Stored enrichment (catalog DB)

Category: Methodology
Sub-category: Data Governance Methodology
Confidence: 0.88
Version strategy: NOT_APPLICABLE

Maturity reasoning: Common in security/compliance JDs and vendor docs (e.g., Microsoft Purview, AWS Macie) as a core data-governance control for labeling and handling sensitive data.

Skill profile (library / DB)

Skill nature: PLATFORM
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 13
Sub-category id: 323
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Cloud ML Platform Operations Catalog dimension db id 65

Library dimension (catalog)

Roles linked in library: MLOps Engineer

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Cloud ML Platform Operations cloud-ml-platform-operations	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Notebooks Secondary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.93

Notebook environments (e.g., Jupyter) appear in many data science and ML job descriptions and are a standard workflow in major cloud vendors’ managed notebook offerings.

Vendor & license

Project Jupyter ·bsd ·since 2014 (0.93)

Context keywords

Jupyter JupyterLab IPython nbconvert nbformat kernel Markdown code cells data visualization pandas NumPy Matplotlib interactive analysis reproducible research collaboration

Ambiguity flagged

Could be confused with: jupyter_notebook, colab

“Notebooks” is a generic term and in JDs could mean Jupyter notebooks or Google Colab, both common catalog skills. The standalone name is too broad to be unambiguous.

Versioning

Not versioned

Type assignment

Tool ·notebook_environment confidence 0.90

Notebooks are software you operate to write and run analyses interactively, so by the Tool vs Framework rule they are best classified as a tool rather than a framework or platform.

Derived legacy fields

Category: Tool
Sub-category: notebook_environment
Skill nature: TOOL
Volatility: STABLE
Typical lifespan: EVERGREEN
Version strategy: NOT_APPLICABLE

Dimensions (API 2 worklist)

Analytical Programming and Notebook-Based Data Analysis Proposed / LLM

Proposed / LLM dimension (no DB id yet)

Locked dimensions (v3 placement)

Analytical Programming and Notebook-Based Data Analysis
Pipeline tentative id

Languages and notebook-friendly coding used to clean, transform, analyze, and prototype data and model workflows. This includes Python, R, SQL, and Scala used in notebooks or scripts for data wrangling, exploratory data analysis, statistical logic, feature engineering, and reproducible prototyping. It excludes production orchestration and scheduling, dashboard/report authoring, model deployment packaging, database administration, and UI development.

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Analytical Programming and Notebook-Based Data Analysis d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)

CI/CD Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: CI/CD id=2579 · ci-cd

Aliases — from this run (catalog unavailable)

CI/CD (CANONICAL)

Skill profile (library / DB)

Skill nature: METHODOLOGY
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 7
Sub-category id: 2102
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Version Control Systems Catalog dimension db id 365

Library dimension (catalog)

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Version Control Systems d_init_01	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Big Data Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.92

Common in data/platform job descriptions across industries; JD volume remains high for Hadoop/Spark/streaming stacks, and cloud vendors market managed big-data services as standard offerings.

Vendor & license

(0.99)

Context keywords

Hadoop Spark Hive Kafka HDFS MapReduce ETL data lake data warehouse NoSQL Parquet Airflow Flink Scala YARN

Ambiguity low

“Big Data” is a well-established domain term with a specific meaning in JDs. It is unlikely to be reasonably confused with another catalog skill in typical extraction contexts.

Versioning

Not versioned

Type assignment

Domain ·data_intensive_computing confidence 0.93

Big Data is a vertical/problem-space body of knowledge rather than a tool, framework, or architecture, so it fits the Domain rule.

Derived legacy fields

Category: Domain
Sub-category: data_intensive_computing
Skill nature: CONCEPT
Volatility: STABLE
Typical lifespan: EVERGREEN
Version strategy: NOT_APPLICABLE

Dimensions (API 2 worklist)

Version Control Systems Catalog dimension db id 365

Library dimension (catalog)
Messaging and Event Streaming Catalog dimension db id 146

Library dimension (catalog)

Roles linked in library: Backend Engineer
Messaging and Event Streaming Catalog dimension db id 146

Library dimension (catalog)

Roles linked in library: Backend Engineer

Locked dimensions (v3 placement)

Big Data Processing
Pipeline tentative id

Large-scale data processing systems and techniques for storing, transforming, and analyzing high-volume, high-velocity datasets. Big Data belongs here because the term usually refers to the distributed data engineering stack rather than a single tool.
Messaging and Event Streaming
Reuses catalog slug

Asynchronous data movement and event-driven pipelines used to feed large-scale analytics systems. Big Data often overlaps with this area when the skill is used in streaming ingestion or pipeline orchestration.
Messaging and Event Streaming
Reuses catalog slug

Asynchronous communication patterns and systems for decoupled service interaction and background processing. This is a coherent backend cluster because many server-side workflows depend on queues, topics, and event streams.

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Messaging and Event Streaming messaging-and-event-streaming	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Data Modeling Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.93

Data modeling appears in many data engineer, DBA, and analytics JDs, and is a standard prerequisite alongside SQL and database design rather than a niche specialty.

Vendor & license

(0.99)

Context keywords

ER diagrams normalization denormalization star schema snowflake schema fact table dimension table OLTP OLAP entity-relationship schema design data warehouse dimensional modeling primary key foreign key

Ambiguity low

“Data Modeling” is a standard, well-scoped concept in JDs and is unlikely to be confused with a different catalog skill in typical usage.

Versioning

Not versioned

Type assignment

Concept ·data_modeling confidence 0.96

Data Modeling is fundamentally a knowledge unit about how to structure and relate data, so by the Concept vs Methodology rule it is a Concept rather than a process or tool.

Derived legacy fields

Category: Concept
Sub-category: data_modeling
Skill nature: CONCEPT
Volatility: STABLE
Typical lifespan: EVERGREEN
Version strategy: NOT_APPLICABLE

Dimensions (API 2 worklist)

Version Control Systems Catalog dimension db id 365

Library dimension (catalog)

Locked dimensions (v3 placement)

Data Modeling
Pipeline tentative id

Designing the logical and physical structure of data so it is consistent, queryable, and fit for downstream analytics or operational use. This belongs here because the skill centers on defining entities, relationships, keys, and schemas rather than storage tuning or pipeline execution.

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Data Pipelines Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.93

Data pipelines are a common requirement in cloud/data engineering JDs, with frequent mentions alongside Airflow, Spark, and ETL/ELT stacks; broad hiring demand signals mainstream adoption.

Vendor & license

(0.99)

Context keywords

ETL ELT Apache Airflow Apache NiFi Kafka Spark dbt orchestration batch processing stream processing data ingestion data warehouse data lake schema evolution data quality

Ambiguity low

“Data Pipelines” is a fairly specific architecture term and is unlikely to be mistaken for a different catalog skill in a typical JD.

Versioning

Not versioned

Type assignment

Architecture ·data_pipeline_architecture confidence 0.90

By the Architecture vs Concept rule, data pipelines describe a system-shape pattern for moving and transforming data across stages rather than a single knowledge unit or tool.

Derived legacy fields

Category: Architecture
Sub-category: data_pipeline_architecture
Skill nature: PATTERN
Volatility: STABLE
Typical lifespan: EVERGREEN
Version strategy: NOT_APPLICABLE

Dimensions (API 2 worklist)

Inference Data Pipelines for Serving and Batch Scoring Proposed / LLM

Proposed / LLM dimension (no DB id yet)
Version Control Systems Catalog dimension db id 365

Library dimension (catalog)

Locked dimensions (v3 placement)

Inference Data Pipelines for Serving and Batch Scoring
Pipeline tentative id

Operational data movement that prepares and delivers timely, reliable data to production inference systems. Includes batch scoring inputs, feature refresh jobs, inference-time preprocessing, scheduled extracts, data validation for serving, and online/offline feature synchronization. Excludes training dataset curation, model training workflows, experimentation-focused feature engineering, model evaluation, and serving infrastructure/routing.
Data Pipeline Orchestration
Pipeline tentative id

Designing, scheduling, and coordinating end-to-end data movement and transformation jobs. This is the best fit when Data Pipelines refers to building reliable multi-step workflows across sources, transforms, and sinks.

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Inference Data Pipelines for Serving and Batch Scoring d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Data Ingestion Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.86

Commonly appears in data/platform job descriptions and cloud vendor docs as a core pipeline capability; often paired with ETL/ELT, Kafka, and Airflow rather than treated as a niche specialty.

Vendor & license

(0.99)

Context keywords

ETL ELT batch processing streaming Kafka Apache NiFi Airflow CDC S3 schema validation data pipeline message queue Parquet JSON API ingestion

Ambiguity low

“Data Ingestion” is a standard, specific concept in data engineering and is unlikely to be mistaken for a different catalog skill in typical job descriptions.

Versioning

Not versioned

Type assignment

Concept ·data_ingestion confidence 0.93

Data Ingestion is fundamentally a named knowledge unit about bringing data into systems, so it fits the Concept category rather than a tool, platform, or methodology.

Derived legacy fields

Category: Concept
Sub-category: data_ingestion
Skill nature: CONCEPT
Volatility: STABLE
Typical lifespan: EVERGREEN
Version strategy: NOT_APPLICABLE

Dimensions (API 2 worklist)

Asynchronous Messaging and Event Streaming Proposed / LLM

Proposed / LLM dimension (no DB id yet)
Version Control Systems Catalog dimension db id 365

Library dimension (catalog)

Locked dimensions (v3 placement)

Asynchronous Messaging and Event Streaming
Pipeline tentative id

Covers asynchronous communication and data movement through queues, topics, streams, event buses, and pub/sub systems for decoupled processing, background jobs, and event-driven integration. Includes continuous or event-driven data ingestion and change data capture pipelines, but excludes batch ETL orchestration, warehouse modeling, query optimization, model training data prep, and direct application API calls.
Batch Data Ingestion Pipelines
Pipeline tentative id

Covers scheduled or bulk loading of data from files, databases, and external systems into analytical or operational stores. Data Ingestion fits here when the emphasis is on landing, validating, and loading datasets rather than streaming transport.

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Asynchronous Messaging and Event Streaming d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Stream Processing Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.93

Common in JDs for Kafka/Flink/Spark Streaming and cloud services like Kinesis/Pub/Sub; broad market adoption for real-time event pipelines.

Vendor & license

(0.99)

Context keywords

Apache Kafka Apache Flink Apache Spark Streaming Apache Storm event-driven architecture pub/sub message broker consumer group windowing checkpointing exactly-once semantics backpressure event time watermarking CDC

Ambiguity low

The term is fairly specific in JDs and usually refers to event/data stream processing architecture, not a different catalog skill. It is unlikely to be confused with another skill name in typical job descriptions.

Versioning

Not versioned

Type assignment

Architecture ·stream_processing_architecture confidence 0.90

Stream Processing is fundamentally a system-shape for handling continuous event flows, so by the Architecture vs Concept rule it fits Architecture rather than a tool or methodology.

Derived legacy fields

Category: Architecture
Sub-category: stream_processing_architecture
Skill nature: PATTERN
Volatility: STABLE
Typical lifespan: EVERGREEN
Version strategy: NOT_APPLICABLE

Dimensions (API 2 worklist)

Messaging and Event Streaming Catalog dimension db id 146

Library dimension (catalog)

Roles linked in library: Backend Engineer

Locked dimensions (v3 placement)

Stream Processing
Reuses catalog slug

Processing continuous event data as it arrives, using stream processors, windows, and stateful operators to transform and route records in near real time. This belongs here because stream processing is the core execution model for event-driven pipelines and low-latency data movement.

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Messaging and Event Streaming messaging-and-event-streaming	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Queueing Secondary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.86

Queueing theory is a standard CS/ops concept and appears in many systems, SRE, and performance-engineering job descriptions; it is not a sunset technology and remains a common interview/topic area.

Vendor & license

(0.99)

Context keywords

Little's Law M/M/1 M/M/c Poisson process service rate arrival rate waiting time throughput utilization backlog buffering congestion discrete-event simulation priority queue SLA

Ambiguity low

Queueing is a fairly specific operations-research concept; in typical JDs it is unlikely to be mistaken for a different catalog skill.

Versioning

Not versioned

Type assignment

Concept ·queueing_theory confidence 0.93

Queueing is fundamentally a knowledge unit about how waiting lines and work distribution behave, so by the Concept vs Methodology rule it is a Concept rather than an Architecture or Tool.

Derived legacy fields

Category: Concept
Sub-category: queueing_theory
Skill nature: CONCEPT
Volatility: STABLE
Typical lifespan: EVERGREEN
Version strategy: NOT_APPLICABLE

Dimensions (API 2 worklist)

Messaging, Queueing, and Event Streaming Proposed / LLM

Proposed / LLM dimension (no DB id yet)

Locked dimensions (v3 placement)

Messaging, Queueing, and Event Streaming
Pipeline tentative id

Asynchronous communication patterns and systems that decouple producers and consumers, buffer and route work items, and support background processing and service-to-service integration. Includes queueing, message queues, pub/sub, brokers, topics, consumer groups, producers/consumers, dead-letter queues, retry handling, backpressure, and event streaming platforms such as Kafka, RabbitMQ, SQS, and Azure Service Bus.

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Messaging, Queueing, and Event Streaming d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)

APIs Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.98

APIs are a hiring-pipeline staple across backend, mobile, and platform JDs; REST/GraphQL/API design appears in large-volume job postings and cloud vendor docs.

Vendor & license

(0.99)

Context keywords

REST GraphQL OpenAPI Swagger JSON XML OAuth 2.0 API gateway endpoint webhook rate limiting pagination versioning SDK microservices

Ambiguity low

“APIs” is a standard, widely used term in JDs and usually refers unambiguously to application programming interfaces; it is not typically confused with a distinct catalog skill.

Versioning

Not versioned

Type assignment

Protocol ·application_programming_interfaces confidence 0.91

APIs are a communication interface standard between systems, so by the Protocol vs Standard rule they fit best as a Protocol rather than a tool or platform.

Derived legacy fields

Category: Protocol
Sub-category: application_programming_interfaces
Skill nature: PROTOCOL
Volatility: STABLE
Typical lifespan: EVERGREEN
Version strategy: NOT_APPLICABLE

Dimensions (API 2 worklist)

API Integration, Request Orchestration, and Data Fetching Proposed / LLM

Proposed / LLM dimension (no DB id yet)
Cloud Service Integration Patterns Catalog dimension db id 188

Library dimension (catalog)

Roles linked in library: Cloud Architect
Version Control Systems Catalog dimension db id 365

Library dimension (catalog)
Cloud Service Integration Patterns Catalog dimension db id 188

Library dimension (catalog)

Roles linked in library: Cloud Architect

Locked dimensions (v3 placement)

API Integration, Request Orchestration, and Data Fetching
Pipeline tentative id

Connecting applications to internal or external services through request/response APIs. This includes consuming REST and GraphQL endpoints, orchestrating requests, handling payloads and response parsing, pagination, retries, error handling, and shaping remote data for downstream or UI consumption.
Cloud Service Integration Patterns
Reuses catalog slug

How services connect across boundaries using APIs, events, and shared interfaces. The target skill belongs here when APIs are treated as an integration mechanism between cloud services, pipelines, or platforms.
API Design and Specification
Pipeline tentative id

Defining API contracts, resource models, and request/response semantics for services. This dimension fits the target skill when APIs refers to designing or documenting interfaces rather than merely consuming them.
Cloud Service Integration Patterns
Reuses catalog slug

Covers how cloud services and workloads connect through APIs, events, shared services, and integration boundaries. This cluster is coherent because architects must define interaction patterns that preserve decoupling, security, and operability.

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
API Integration, Request Orchestration, and Data Fetching d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Cloud Service Integration Patterns cloud-service-integration-patterns	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Testability Secondary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.93

Testability is a common requirement in software engineering JDs and interview rubrics, often paired with unit/integration testing, CI, and TDD; it’s a standard quality attribute rather than a niche tool.

Vendor & license

(0.99)

Context keywords

unit tests integration tests test coverage mocking dependency injection assertions test harness automated testing regression testing test doubles stubs fixtures TDD CI/CD code coverage

Ambiguity low

“Testability” is a specific software engineering concept and is unlikely to be mistaken for a different catalog skill in typical job descriptions.

Versioning

Not versioned

Type assignment

Concept ·software_testability_concept confidence 0.97

By the Concept vs Methodology rule, testability is a named knowledge unit about how easily software can be tested, not a process or tool.

Derived legacy fields

Category: Concept
Sub-category: software_testability_concept
Skill nature: CONCEPT
Volatility: STABLE
Typical lifespan: EVERGREEN
Version strategy: NOT_APPLICABLE

Dimensions (API 2 worklist)

Testing and Validation Practices Catalog dimension db id 221

Library dimension (catalog)

Roles linked in library: ServiceNOW Developer
Testing and Validation Practices Catalog dimension db id 221

Library dimension (catalog)

Roles linked in library: ServiceNOW Developer

Locked dimensions (v3 placement)

Testing and Validation Practices
Reuses catalog slug

Practices for verifying that software changes behave correctly before release, including test design, regression checks, and validation workflows. Testability belongs here because it describes how easily a system can be exercised and verified by tests.
Testing and Validation Practices
Reuses catalog slug

Validating platform changes before release, including functional checks and regression verification. This cluster is coherent because ServiceNow developers must confirm workflows, scripts, and integrations behave as intended.

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Testing and Validation Practices testing-and-validation-practices	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

MLOps Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: MLOps id=2643 · mlops

Aliases — catalog

FormBuilder (CANONICAL) primary

Context tags (catalog)

Angular React UI libraries Vue component lifecycle custom components data binding dynamic forms error handling form design form serialization form submission state management user input validation

Stored enrichment (catalog DB)

Category: Library
Sub-category: Forms Helper Library
Vendor: null
License: unknown
Confidence: 0.88
Version strategy: NOT_APPLICABLE

Maturity reasoning: FormBuilder appears in relatively low JD volume compared with mainstream form stacks; market usage is mostly in legacy/admin app codebases rather than broad hiring pipelines.

Skill profile (library / DB)

Skill nature: METHODOLOGY
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 7
Sub-category id: 2156
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Inference Data Pipelines Catalog dimension db id 59

Library dimension (catalog)

Roles linked in library: MLOps Engineer
Model Serving Deployment and Runtime Packaging Catalog dimension db id 52

Library dimension (catalog)

Roles linked in library: MLOps Engineer, Machine Learning Engineer

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Inference Data Pipelines inference-data-pipelines	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Model Serving Deployment and Runtime Packaging model-serving-deployment-and-runtime-packaging	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill	Tag	Dimension	Skill↔dim	Role↔dim	Outcome
Azure	in_db	Cloud Platform Operations cloud-platform-operations	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Azure	in_db	Cloud Security Platforms cloud-security-platforms	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python	in_db	Analytical Programming Languages analytical-programming-languages	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python	in_db	Automation Scripting and CLI automation-scripting-and-cli	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python	in_db	Automation and Scripting for Operations automation-and-scripting-for-operations	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python	in_db	Network Automation and Scripting network-automation-and-scripting	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python	in_db	Programming Languages for AI Workflows programming-languages-for-ai-workflows	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python	in_db	Programming Languages for Backend Systems programming-languages-for-backend-systems	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python	in_db	Programming Languages for Data Work programming-languages-for-data-work	✓	✓	Existing dimension (library) · Role↔dimension saved
Python	in_db	Programming Languages for ML Systems programming-languages-for-ml-systems	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python	in_db	Programming Languages for Security Work programming-languages-for-security-work	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python	in_db	Programming Languages for Test Automation programming-languages-for-test-automation	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python	in_db	Security Automation and Scripting security-automation-and-scripting	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
SQL	in_db	Relational Data Modeling relational-data-modeling	✓	✓	Existing dimension (library) · Role↔dimension saved
SQL	in_db	Version Control Systems d_init_01	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
MLflow	in_db	Model Serving Deployment and Runtime Packaging model-serving-deployment-and-runtime-packaging	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
MLflow	in_db	Project Delivery and Coordination d_init_02	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
MLflow	in_db	Version Control Systems d_init_01	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Azure Machine Learning	in_db	Cloud ML Platform Operations cloud-ml-platform-operations	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Azure Data Factory	in_db	Cloud Data Platform Services cloud-data-platform-services	✓	✓	Existing dimension (library) · Role↔dimension saved
Databricks	in_db	Cloud ML Platform Operations cloud-ml-platform-operations	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
CI/CD	in_db	Version Control Systems d_init_01	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
MLOps	in_db	Inference Data Pipelines inference-data-pipelines	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
MLOps	in_db	Model Serving Deployment and Runtime Packaging model-serving-deployment-and-runtime-packaging	✓	—	Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
PySpark	in_db	Analytical Programming and Notebook Languages d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
PySpark	in_db	Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Notebooks	in_db	Analytical Programming and Notebook-Based Data Analysis d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Big Data	in_db	Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Big Data	in_db	Messaging and Event Streaming messaging-and-event-streaming	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Data Modeling	in_db	Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Data Pipelines	in_db	Inference Data Pipelines for Serving and Batch Scoring d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Data Pipelines	in_db	Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Data Ingestion	in_db	Asynchronous Messaging and Event Streaming d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Data Ingestion	in_db	Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Stream Processing	in_db	Messaging and Event Streaming messaging-and-event-streaming	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Queueing	in_db	Messaging, Queueing, and Event Streaming d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
APIs	in_db	API Integration, Request Orchestration, and Data Fetching d_merge_01	✓	—	New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
APIs	in_db	Cloud Service Integration Patterns cloud-service-integration-patterns	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
APIs	in_db	Version Control Systems d_init_01	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Testability	in_db	Testing and Validation Practices testing-and-validation-practices	✓	—	New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Library artifacts (this run)

Kind	Detail	DB id
canonical_skill_added	PySpark	2684
canonical_skill_added	Notebooks	2685
canonical_skill_added	Big Data	2686
canonical_skill_added	Data Modeling	2687
canonical_skill_added	Data Pipelines	2688
canonical_skill_added	Data Ingestion	2689
canonical_skill_added	Stream Processing	2690
canonical_skill_added	Queueing	2691
canonical_skill_added	APIs	2692
canonical_skill_added	Testability	2693
dimension_skill_link	PySpark ↔ Analytical Programming and Notebook Languages	82
dimension_skill_link	PySpark ↔ Version Control Systems	365
dimension_skill_link	Notebooks ↔ Analytical Programming and Notebook-Based Data Analysis	82
dimension_skill_link	Big Data ↔ Version Control Systems	365
dimension_skill_link	Big Data ↔ Messaging and Event Streaming	146
dimension_skill_link	Data Modeling ↔ Version Control Systems	365
dimension_skill_link	Data Pipelines ↔ Inference Data Pipelines for Serving and Batch Scoring	59
dimension_skill_link	Data Pipelines ↔ Version Control Systems	365
dimension_skill_link	Data Ingestion ↔ Asynchronous Messaging and Event Streaming	146
dimension_skill_link	Data Ingestion ↔ Version Control Systems	365
dimension_skill_link	Stream Processing ↔ Messaging and Event Streaming	146
dimension_skill_link	Queueing ↔ Messaging, Queueing, and Event Streaming	146
dimension_skill_link	APIs ↔ API Integration, Request Orchestration, and Data Fetching	9
dimension_skill_link	APIs ↔ Cloud Service Integration Patterns	188
dimension_skill_link	APIs ↔ Version Control Systems	365
dimension_skill_link	Testability ↔ Testing and Validation Practices	221

nano JD Parser — gpt-4.1-nano click to toggle

Rolebig data engineers

CompanyThe corporation

Experience3 to 10 years related working experience

DomainOther

JD type pass

Show raw JSON

{
  "JD_type": "pass",
  "about_company": null,
  "certifications": [],
  "company_name": "The corporation",
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [],
      "domain": "Other"
    },
    "secondary": null
  },
  "education": [
    {
      "level": "Bachelor\u0027s",
      "qualification": "BTECH/BE/BSC - Computer Science (or equivalent)",
      "raw": "Bachelors degree or higher in Computer Science or equivalent degree",
      "requirement": "required"
    }
  ],
  "experience": {
    "max": 10,
    "min": 3,
    "raw": "3 to 10 years related working experience"
  },
  "job_locations": [],
  "role": "big data engineers",
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 0,
      "heading": "Job Responsibilities",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Big data design and analysis",
        "last_5_words": "data science projects"
      },
      "text": "Big data design and analysis data modeling development deployment and CICD operations of big data pipelines\n\nCollaborate with a team of data engineers data scientists and business subject matter experts to process data and prepare data sources\n\nMentor other data engineers to develop a world class data engineering team\n\nIngest process and model data from heterogeneous data sources to support data science projects",
      "word_count": 66
    },
    {
      "bullet_count": 0,
      "heading": "Basic Qualifications",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Bachelors degree or higher in",
        "last_5_words": "structuring code for testability"
      },
      "text": "Bachelors degree or higher in Computer Science or equivalent degree and 3 to 10 years related working experience\n\nIn depth experience with a big data cloud platform preferably Azure\n\nStrong grasp of programming languages such as Python PySpark or equivalent and willingness to learn new ones\n\nExperience writing database heavy services or APIs\n\nExperience building and optimizing data pipelines architectures and data sets\n\nWorking knowledge of queueing stream processing and highly scalable data stores\n\nExperience working with and supporting cross functional teams\n\nStrong understanding of structuring code for testability",
      "word_count": 104
    },
    {
      "bullet_count": 0,
      "heading": "Preferred Qualifications",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Professional experience implementing and",
        "last_5_words": "with cross functional teams"
      },
      "text": "Professional experience implementing and maintaining MLOps pipelines in MLflow or AzureML\n\nProfessional experience implementing data ingestion pipelines using Data Factory\n\nProfessional experience with Databricks and coding with notebooks\n\nProfessional experience processing and manipulating data using SQL and Python\n\nProfessional experience with user training customer support and coordination with cross functional teams",
      "word_count": 66
    }
  ],
  "urls": []
}

API 1 — extract-from-jd click to toggle

{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Azure"
    },
    {
      "is_primary": true,
      "skill_name": "Python"
    },
    {
      "is_primary": true,
      "skill_name": "PySpark"
    },
    {
      "is_primary": true,
      "skill_name": "SQL"
    },
    {
      "is_primary": true,
      "skill_name": "MLflow"
    },
    {
      "is_primary": true,
      "skill_name": "Azure Machine Learning"
    },
    {
      "is_primary": true,
      "skill_name": "Azure Data Factory"
    },
    {
      "is_primary": true,
      "skill_name": "Databricks"
    },
    {
      "is_primary": false,
      "skill_name": "Notebooks"
    },
    {
      "is_primary": true,
      "skill_name": "CI/CD"
    },
    {
      "is_primary": true,
      "skill_name": "Big Data"
    },
    {
      "is_primary": true,
      "skill_name": "Data Modeling"
    },
    {
      "is_primary": true,
      "skill_name": "Data Pipelines"
    },
    {
      "is_primary": true,
      "skill_name": "Data Ingestion"
    },
    {
      "is_primary": true,
      "skill_name": "Stream Processing"
    },
    {
      "is_primary": false,
      "skill_name": "Queueing"
    },
    {
      "is_primary": true,
      "skill_name": "APIs"
    },
    {
      "is_primary": false,
      "skill_name": "Testability"
    },
    {
      "is_primary": true,
      "skill_name": "MLOps"
    }
  ],
  "jd_role": {
    "display_name": "big data engineers",
    "rationale": null,
    "role_archetype": "Data",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": null,
    "certifications": [],
    "company_name": "The corporation",
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [],
        "domain": "Other"
      },
      "secondary": null
    },
    "education": [
      {
        "level": "Bachelor\u0027s",
        "qualification": "BTECH/BE/BSC - Computer Science (or equivalent)",
        "raw": "Bachelors degree or higher in Computer Science or equivalent degree",
        "requirement": "required"
      }
    ],
    "experience": {
      "max": 10,
      "min": 3,
      "raw": "3 to 10 years related working experience"
    },
    "job_locations": [],
    "role": "big data engineers",
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 0,
        "heading": "Job Responsibilities",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Big data design and analysis",
          "last_5_words": "data science projects"
        },
        "text": "Big data design and analysis data modeling development deployment and CICD operations of big data pipelines\n\nCollaborate with a team of data engineers data scientists and business subject matter experts to process data and prepare data sources\n\nMentor other data engineers to develop a world class data engineering team\n\nIngest process and model data from heterogeneous data sources to support data science projects",
        "word_count": 66
      },
      {
        "bullet_count": 0,
        "heading": "Basic Qualifications",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Bachelors degree or higher in",
          "last_5_words": "structuring code for testability"
        },
        "text": "Bachelors degree or higher in Computer Science or equivalent degree and 3 to 10 years related working experience\n\nIn depth experience with a big data cloud platform preferably Azure\n\nStrong grasp of programming languages such as Python PySpark or equivalent and willingness to learn new ones\n\nExperience writing database heavy services or APIs\n\nExperience building and optimizing data pipelines architectures and data sets\n\nWorking knowledge of queueing stream processing and highly scalable data stores\n\nExperience working with and supporting cross functional teams\n\nStrong understanding of structuring code for testability",
        "word_count": 104
      },
      {
        "bullet_count": 0,
        "heading": "Preferred Qualifications",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Professional experience implementing and",
          "last_5_words": "with cross functional teams"
        },
        "text": "Professional experience implementing and maintaining MLOps pipelines in MLflow or AzureML\n\nProfessional experience implementing data ingestion pipelines using Data Factory\n\nProfessional experience with Databricks and coding with notebooks\n\nProfessional experience processing and manipulating data using SQL and Python\n\nProfessional experience with user training customer support and coordination with cross functional teams",
        "word_count": 66
      }
    ],
    "urls": []
  },
  "run_id": null
}

API 2 — extract-details

{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 349,
      "existing_alias_text": "Azure",
      "input_term": "Azure",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "Azure",
        "id": 164,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "azure",
        "sub_category_id": 161,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 608,
      "existing_alias_text": "Python",
      "input_term": "Python",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Python",
        "id": 393,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "python",
        "sub_category_id": 54,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 3398,
      "existing_alias_text": "SQL",
      "input_term": "SQL",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "SQL",
        "id": 2601,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "sql",
        "sub_category_id": 55,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 3593,
      "existing_alias_text": "MLflow",
      "input_term": "MLflow",
      "matched_canonical": {
        "category_id": 11,
        "display_name": "MLflow",
        "id": 2640,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "mlflow",
        "sub_category_id": 2151,
        "typical_lifespan": "EVERGREEN",
        "volatility": "EMERGING"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 600,
      "existing_alias_text": "Azure Machine Learning",
      "input_term": "Azure Machine Learning",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "Azure Machine Learning",
        "id": 385,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "azure-machine-learning",
        "sub_category_id": 326,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 731,
      "existing_alias_text": "Azure Data Factory",
      "input_term": "Azure Data Factory",
      "matched_canonical": {
        "category_id": 14,
        "display_name": "Azure Data Factory",
        "id": 467,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CLOUD_SERVICE",
        "slug": "azure-data-factory",
        "sub_category_id": 385,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 601,
      "existing_alias_text": "Databricks",
      "input_term": "Databricks",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "Databricks",
        "id": 386,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "databricks",
        "sub_category_id": 323,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 3376,
      "existing_alias_text": "CI/CD",
      "input_term": "CI/CD",
      "matched_canonical": {
        "category_id": 7,
        "display_name": "CI/CD",
        "id": 2579,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "METHODOLOGY",
        "slug": "ci-cd",
        "sub_category_id": 2102,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 3600,
      "existing_alias_text": "MLOps",
      "input_term": "MLOps",
      "matched_canonical": {
        "category_id": 7,
        "display_name": "MLOps",
        "id": 2643,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "METHODOLOGY",
        "slug": "mlops",
        "sub_category_id": 2156,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "DevOps Engineer",
      "id": 1,
      "rationale": null,
      "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
      "slug": "devops-engineer",
      "source": "db"
    },
    {
      "display_name": "Cybersecurity Engineer",
      "id": 9,
      "rationale": null,
      "role_archetype": null,
      "slug": "cybersecurity-engineer",
      "source": "db"
    },
    {
      "display_name": "Data Analyst",
      "id": 20,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-analyst",
      "source": "db"
    },
    {
      "display_name": "Data Scientist",
      "id": 7,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-scientist",
      "source": "db"
    },
    {
      "display_name": "Azure Cloud Engineer",
      "id": 4,
      "rationale": null,
      "role_archetype": null,
      "slug": "azure-cloud-engineer",
      "source": "db"
    },
    {
      "display_name": "Cloud Engineer",
      "id": 18,
      "rationale": null,
      "role_archetype": null,
      "slug": "cloud-engineer",
      "source": "db"
    },
    {
      "display_name": "Virtualization Engineer",
      "id": 26,
      "rationale": null,
      "role_archetype": null,
      "slug": "virtualization-engineer",
      "source": "db"
    },
    {
      "display_name": "Network Engineer",
      "id": 21,
      "rationale": null,
      "role_archetype": null,
      "slug": "network-engineer",
      "source": "db"
    },
    {
      "display_name": "AI Engineer",
      "id": 12,
      "rationale": null,
      "role_archetype": null,
      "slug": "ai-engineer",
      "source": "db"
    },
    {
      "display_name": "Backend Engineer",
      "id": 14,
      "rationale": null,
      "role_archetype": null,
      "slug": "backend-engineer",
      "source": "db"
    },
    {
      "display_name": "Data Engineer",
      "id": 6,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-engineer",
      "source": "db"
    },
    {
      "display_name": "Machine Learning Engineer",
      "id": 10,
      "rationale": null,
      "role_archetype": null,
      "slug": "machine-learning-engineer",
      "source": "db"
    },
    {
      "display_name": "Automation Tester",
      "id": 16,
      "rationale": null,
      "role_archetype": null,
      "slug": "automation-tester",
      "source": "db"
    },
    {
      "display_name": "MLOps Engineer",
      "id": 5,
      "rationale": null,
      "role_archetype": null,
      "slug": "mlops-engineer",
      "source": "db"
    },
    {
      "display_name": "Cloud Architect",
      "id": 11,
      "rationale": null,
      "role_archetype": null,
      "slug": "cloud-architect",
      "source": "db"
    },
    {
      "display_name": "ServiceNOW Developer",
      "id": 24,
      "rationale": null,
      "role_archetype": null,
      "slug": "servicenow-developer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 6,
    "rationale": "The primary skills indicate a strong focus on data processing, SQL, and Azure technologies, aligning well with a Data Engineer\u0027s responsibilities.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Platform Operations",
        "id": 26,
        "rationale": "Uses cloud provider services to support delivery and runtime environments. The focus is on consumer-level operation of cloud services rather than deep cloud architecture ownership.",
        "slug": "cloud-platform-operations",
        "source": "db"
      },
      "input_skill": "Azure",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "DevOps Engineer",
          "id": 1,
          "rationale": null,
          "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
          "slug": "devops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Security Platforms",
        "id": 332,
        "rationale": "Cloud-native security products used to assess posture, detect misconfigurations, and monitor workloads across AWS, Azure, and GCP. This is a distinct product family because the role often works across multiple CNAPP/CSPM/CWPP offerings and cloud-native detectors.",
        "slug": "cloud-security-platforms",
        "source": "db"
      },
      "input_skill": "Azure",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cybersecurity Engineer",
          "id": 9,
          "rationale": null,
          "role_archetype": null,
          "slug": "cybersecurity-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Analytical Programming Languages",
        "id": 82,
        "rationale": "Languages used to clean, transform, analyze, and prototype models in notebooks and scripts. This is the core coding surface for expressing statistical logic and data manipulation in a reproducible way.",
        "slug": "analytical-programming-languages",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Analyst",
          "id": 20,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-analyst",
          "source": "db"
        },
        {
          "display_name": "Data Scientist",
          "id": 7,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-scientist",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Automation Scripting and CLI",
        "id": 48,
        "rationale": "Uses scripts and command-line tooling to execute repeatable Azure operations and reduce manual work. This is a practical cluster because the role frequently automates provisioning, checks, and remediation tasks.",
        "slug": "automation-scripting-and-cli",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Azure Cloud Engineer",
          "id": 4,
          "rationale": null,
          "role_archetype": null,
          "slug": "azure-cloud-engineer",
          "source": "db"
        },
        {
          "display_name": "Cloud Engineer",
          "id": 18,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Automation and Scripting for Operations",
        "id": 361,
        "rationale": "Scripts and lightweight automation used to execute repetitive virtualization tasks and enforce operational consistency. This is the practical glue that reduces manual host and VM administration.",
        "slug": "automation-and-scripting-for-operations",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Virtualization Engineer",
          "id": 26,
          "rationale": null,
          "role_archetype": null,
          "slug": "virtualization-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Network Automation and Scripting",
        "id": 285,
        "rationale": "Covers scripts and automation used to configure, validate, and audit network devices and services. This cluster is coherent because repeatable network operations increasingly depend on programmatic changes and checks.",
        "slug": "network-automation-and-scripting",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Network Engineer",
          "id": 21,
          "rationale": null,
          "role_archetype": null,
          "slug": "network-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for AI Workflows",
        "id": 261,
        "rationale": "Languages used to implement AI feature logic, orchestration, and response handling inside product code. This is the core coding surface for turning prompts and model calls into reliable application behavior.",
        "slug": "programming-languages-for-ai-workflows",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "AI Engineer",
          "id": 12,
          "rationale": null,
          "role_archetype": null,
          "slug": "ai-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Backend Systems",
        "id": 140,
        "rationale": "Languages used to implement server-side business logic, request handlers, workers, and service integrations. This is the core coding surface for backend feature delivery and maintenance.",
        "slug": "programming-languages-for-backend-systems",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Backend Engineer",
          "id": 14,
          "rationale": null,
          "role_archetype": null,
          "slug": "backend-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Data Work",
        "id": 67,
        "rationale": "Languages used to implement data pipelines, transformations, and operational utilities. This is the code layer for expressing extraction, parsing, validation, and orchestration logic in data engineering workflows.",
        "slug": "programming-languages-for-data-work",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for ML Systems",
        "id": 113,
        "rationale": "Languages used to implement model integration code, inference services, and feature-processing logic. This is the core coding surface for turning trained models into product-facing software components.",
        "slug": "programming-languages-for-ml-systems",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Machine Learning Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "machine-learning-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Security Work",
        "id": 328,
        "rationale": "Languages used to automate security tasks, write detection logic, and build analysis or remediation tooling. This is the core coding surface for a cybersecurity engineer across scripts, queries, and small utilities.",
        "slug": "programming-languages-for-security-work",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cybersecurity Engineer",
          "id": 9,
          "rationale": null,
          "role_archetype": null,
          "slug": "cybersecurity-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Test Automation",
        "id": 193,
        "rationale": "Languages used to implement automated checks, helper utilities, and test harness code. This is the core coding surface for turning test ideas into maintainable automation.",
        "slug": "programming-languages-for-test-automation",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Automation Tester",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "automation-tester",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Security Automation and Scripting",
        "id": 258,
        "rationale": "Automating repeatable security checks, enrichment, and remediation workflows. This cluster is coherent because the role often needs lightweight automation to scale analysis and response.",
        "slug": "security-automation-and-scripting",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cybersecurity Engineer",
          "id": 9,
          "rationale": null,
          "role_archetype": null,
          "slug": "cybersecurity-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Relational Data Modeling",
        "id": 71,
        "rationale": "Designing tables, relationships, constraints, and transactional data shapes for operational backend systems. This cluster is coherent because backend services frequently own the canonical application data model.",
        "slug": "relational-data-modeling",
        "source": "db"
      },
      "input_skill": "SQL",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Backend Engineer",
          "id": 14,
          "rationale": null,
          "role_archetype": null,
          "slug": "backend-engineer",
          "source": "db"
        },
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "SQL",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Model Serving Deployment and Runtime Packaging",
        "id": 52,
        "rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
        "slug": "model-serving-deployment-and-runtime-packaging",
        "source": "db"
      },
      "input_skill": "MLflow",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "MLOps Engineer",
          "id": 5,
          "rationale": null,
          "role_archetype": null,
          "slug": "mlops-engineer",
          "source": "db"
        },
        {
          "display_name": "Machine Learning Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "machine-learning-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Project Delivery and Coordination",
        "id": 366,
        "rationale": "Coordination practices for organizing work, tracking progress, and aligning stakeholders across a delivery effort. Agile fits here when used as a team execution framework for managing scope, cadence, and collaboration.",
        "slug": "d_init_02",
        "source": "db"
      },
      "input_skill": "MLflow",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "MLflow",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud ML Platform Operations",
        "id": 65,
        "rationale": "Consumer-level operation of managed ML services and cloud resources used to train and serve models. This covers the cloud platform surface that MLOps engineers use without owning the underlying cloud platform itself.",
        "slug": "cloud-ml-platform-operations",
        "source": "db"
      },
      "input_skill": "Azure Machine Learning",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "MLOps Engineer",
          "id": 5,
          "rationale": null,
          "role_archetype": null,
          "slug": "mlops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Data Platform Services",
        "id": 81,
        "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
        "slug": "cloud-data-platform-services",
        "source": "db"
      },
      "input_skill": "Azure Data Factory",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud ML Platform Operations",
        "id": 65,
        "rationale": "Consumer-level operation of managed ML services and cloud resources used to train and serve models. This covers the cloud platform surface that MLOps engineers use without owning the underlying cloud platform itself.",
        "slug": "cloud-ml-platform-operations",
        "source": "db"
      },
      "input_skill": "Databricks",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "MLOps Engineer",
          "id": 5,
          "rationale": null,
          "role_archetype": null,
          "slug": "mlops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "CI/CD",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Inference Data Pipelines",
        "id": 59,
        "rationale": "Operational data movement for batch scoring, feature refresh, and inference-time data preparation. This is separate from model training because it focuses on getting the right data to the serving path reliably.",
        "slug": "inference-data-pipelines",
        "source": "db"
      },
      "input_skill": "MLOps",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "MLOps Engineer",
          "id": 5,
          "rationale": null,
          "role_archetype": null,
          "slug": "mlops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Model Serving Deployment and Runtime Packaging",
        "id": 52,
        "rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
        "slug": "model-serving-deployment-and-runtime-packaging",
        "source": "db"
      },
      "input_skill": "MLOps",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "MLOps Engineer",
          "id": 5,
          "rationale": null,
          "role_archetype": null,
          "slug": "mlops-engineer",
          "source": "db"
        },
        {
          "display_name": "Machine Learning Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "machine-learning-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Analytical Programming and Notebook Languages",
        "id": null,
        "rationale": "Languages and notebook/script-based coding used to clean, transform, analyze, and prototype data workflows and models. Includes Python, pandas, SQL, PySpark, notebook scripting, dataframe manipulation, exploratory analysis, ETL/data transformation logic, and other reproducible analytical code.",
        "slug": "d_merge_01",
        "source": "llm"
      },
      "input_skill": "PySpark",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "PySpark",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Analytical Programming and Notebook-Based Data Analysis",
        "id": null,
        "rationale": "Languages and notebook-friendly coding used to clean, transform, analyze, and prototype data and model workflows. This includes Python, R, SQL, and Scala used in notebooks or scripts for data wrangling, exploratory data analysis, statistical logic, feature engineering, and reproducible prototyping. It excludes production orchestration and scheduling, dashboard/report authoring, model deployment packaging, database administration, and UI development.",
        "slug": "d_merge_01",
        "source": "llm"
      },
      "input_skill": "Notebooks",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Big Data",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Messaging and Event Streaming",
        "id": 146,
        "rationale": "Asynchronous communication patterns and systems for decoupled service interaction and background processing. This is a coherent backend cluster because many server-side workflows depend on queues, topics, and event streams.",
        "slug": "messaging-and-event-streaming",
        "source": "db"
      },
      "input_skill": "Big Data",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Backend Engineer",
          "id": 14,
          "rationale": null,
          "role_archetype": null,
          "slug": "backend-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Messaging and Event Streaming",
        "id": 146,
        "rationale": "Asynchronous communication patterns and systems for decoupled service interaction and background processing. This is a coherent backend cluster because many server-side workflows depend on queues, topics, and event streams.",
        "slug": "messaging-and-event-streaming",
        "source": "db"
      },
      "input_skill": "Big Data",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Backend Engineer",
          "id": 14,
          "rationale": null,
          "role_archetype": null,
          "slug": "backend-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Data Modeling",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Inference Data Pipelines for Serving and Batch Scoring",
        "id": null,
        "rationale": "Operational data movement that prepares and delivers timely, reliable data to production inference systems. Includes batch scoring inputs, feature refresh jobs, inference-time preprocessing, scheduled extracts, data validation for serving, and online/offline feature synchronization. Excludes training dataset curation, model training workflows, experimentation-focused feature engineering, model evaluation, and serving infrastructure/routing.",
        "slug": "d_merge_01",
        "source": "llm"
      },
      "input_skill": "Data Pipelines",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Data Pipelines",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Asynchronous Messaging and Event Streaming",
        "id": null,
        "rationale": "Covers asynchronous communication and data movement through queues, topics, streams, event buses, and pub/sub systems for decoupled processing, background jobs, and event-driven integration. Includes continuous or event-driven data ingestion and change data capture pipelines, but excludes batch ETL orchestration, warehouse modeling, query optimization, model training data prep, and direct application API calls.",
        "slug": "d_merge_01",
        "source": "llm"
      },
      "input_skill": "Data Ingestion",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Data Ingestion",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Messaging and Event Streaming",
        "id": 146,
        "rationale": "Asynchronous communication patterns and systems for decoupled service interaction and background processing. This is a coherent backend cluster because many server-side workflows depend on queues, topics, and event streams.",
        "slug": "messaging-and-event-streaming",
        "source": "db"
      },
      "input_skill": "Stream Processing",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Backend Engineer",
          "id": 14,
          "rationale": null,
          "role_archetype": null,
          "slug": "backend-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Messaging, Queueing, and Event Streaming",
        "id": null,
        "rationale": "Asynchronous communication patterns and systems that decouple producers and consumers, buffer and route work items, and support background processing and service-to-service integration. Includes queueing, message queues, pub/sub, brokers, topics, consumer groups, producers/consumers, dead-letter queues, retry handling, backpressure, and event streaming platforms such as Kafka, RabbitMQ, SQS, and Azure Service Bus.",
        "slug": "d_merge_01",
        "source": "llm"
      },
      "input_skill": "Queueing",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "API Integration, Request Orchestration, and Data Fetching",
        "id": null,
        "rationale": "Connecting applications to internal or external services through request/response APIs. This includes consuming REST and GraphQL endpoints, orchestrating requests, handling payloads and response parsing, pagination, retries, error handling, and shaping remote data for downstream or UI consumption.",
        "slug": "d_merge_01",
        "source": "llm"
      },
      "input_skill": "APIs",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Service Integration Patterns",
        "id": 188,
        "rationale": "Covers how cloud services and workloads connect through APIs, events, shared services, and integration boundaries. This cluster is coherent because architects must define interaction patterns that preserve decoupling, security, and operability.",
        "slug": "cloud-service-integration-patterns",
        "source": "db"
      },
      "input_skill": "APIs",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cloud Architect",
          "id": 11,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-architect",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "APIs",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Service Integration Patterns",
        "id": 188,
        "rationale": "Covers how cloud services and workloads connect through APIs, events, shared services, and integration boundaries. This cluster is coherent because architects must define interaction patterns that preserve decoupling, security, and operability.",
        "slug": "cloud-service-integration-patterns",
        "source": "db"
      },
      "input_skill": "APIs",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cloud Architect",
          "id": 11,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-architect",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Testing and Validation Practices",
        "id": 221,
        "rationale": "Validating platform changes before release, including functional checks and regression verification. This cluster is coherent because ServiceNow developers must confirm workflows, scripts, and integrations behave as intended.",
        "slug": "testing-and-validation-practices",
        "source": "db"
      },
      "input_skill": "Testability",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "ServiceNOW Developer",
          "id": 24,
          "rationale": null,
          "role_archetype": null,
          "slug": "servicenow-developer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Testing and Validation Practices",
        "id": 221,
        "rationale": "Validating platform changes before release, including functional checks and regression verification. This cluster is coherent because ServiceNow developers must confirm workflows, scripts, and integrations behave as intended.",
        "slug": "testing-and-validation-practices",
        "source": "db"
      },
      "input_skill": "Testability",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "ServiceNOW Developer",
          "id": 24,
          "rationale": null,
          "role_archetype": null,
          "slug": "servicenow-developer",
          "source": "db"
        }
      ]
    }
  ],
  "input_final_skills": [
    "Azure",
    "Python",
    "PySpark",
    "SQL",
    "MLflow",
    "Azure Machine Learning",
    "Azure Data Factory",
    "Databricks",
    "Notebooks",
    "CI/CD",
    "Big Data",
    "Data Modeling",
    "Data Pipelines",
    "Data Ingestion",
    "Stream Processing",
    "Queueing",
    "APIs",
    "Testability",
    "MLOps"
  ],
  "input_llm_skills": [
    "Azure",
    "Python",
    "PySpark",
    "SQL",
    "MLflow",
    "Azure Machine Learning",
    "Azure Data Factory",
    "Databricks",
    "Notebooks",
    "CI/CD",
    "Big Data",
    "Data Modeling",
    "Data Pipelines",
    "Data Ingestion",
    "Stream Processing",
    "Queueing",
    "APIs",
    "Testability",
    "MLOps"
  ],
  "new_aliases_persisted": 0,
  "run_id": "265899c9-6b42-43cb-a0f8-64ac64ac5a98",
  "skills_detail": [
    {
      "aliases_in_db": [
        {
          "alias_text": "Azure",
          "alias_type": "CANONICAL",
          "id": 349,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "Azure",
        "id": 164,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "azure",
        "sub_category_id": 161,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Platform Operations",
            "id": 26,
            "rationale": "Uses cloud provider services to support delivery and runtime environments. The focus is on consumer-level operation of cloud services rather than deep cloud architecture ownership.",
            "slug": "cloud-platform-operations",
            "source": "db"
          },
          "input_skill": "Azure",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "DevOps Engineer",
              "id": 1,
              "rationale": null,
              "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
              "slug": "devops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Security Platforms",
            "id": 332,
            "rationale": "Cloud-native security products used to assess posture, detect misconfigurations, and monitor workloads across AWS, Azure, and GCP. This is a distinct product family because the role often works across multiple CNAPP/CSPM/CWPP offerings and cloud-native detectors.",
            "slug": "cloud-security-platforms",
            "source": "db"
          },
          "input_skill": "Azure",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cybersecurity Engineer",
              "id": 9,
              "rationale": null,
              "role_archetype": null,
              "slug": "cybersecurity-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Azure",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Python",
          "alias_type": "CANONICAL",
          "id": 608,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 2",
          "alias_type": "VERSION",
          "id": 611,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 2.x",
          "alias_type": "VERSION",
          "id": 613,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 3",
          "alias_type": "VERSION",
          "id": 612,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 3.10",
          "alias_type": "VERSION",
          "id": 2330,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 3.11",
          "alias_type": "VERSION",
          "id": 2331,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 3.12",
          "alias_type": "VERSION",
          "id": 2332,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 3.x",
          "alias_type": "VERSION",
          "id": 614,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "py2",
          "alias_type": "VERSION",
          "id": 609,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "py3",
          "alias_type": "VERSION",
          "id": 610,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 2",
          "alias_type": "VERSION",
          "id": 2152,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 2.x",
          "alias_type": "VERSION",
          "id": 2154,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 3",
          "alias_type": "VERSION",
          "id": 990,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 3.10",
          "alias_type": "VERSION",
          "id": 992,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 3.11",
          "alias_type": "VERSION",
          "id": 993,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 3.12",
          "alias_type": "VERSION",
          "id": 994,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 3.x",
          "alias_type": "VERSION",
          "id": 991,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python2",
          "alias_type": "VERSION",
          "id": 2150,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python3",
          "alias_type": "VERSION",
          "id": 989,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Python",
        "id": 393,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "python",
        "sub_category_id": 54,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Analytical Programming Languages",
            "id": 82,
            "rationale": "Languages used to clean, transform, analyze, and prototype models in notebooks and scripts. This is the core coding surface for expressing statistical logic and data manipulation in a reproducible way.",
            "slug": "analytical-programming-languages",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Analyst",
              "id": 20,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-analyst",
              "source": "db"
            },
            {
              "display_name": "Data Scientist",
              "id": 7,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-scientist",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Automation Scripting and CLI",
            "id": 48,
            "rationale": "Uses scripts and command-line tooling to execute repeatable Azure operations and reduce manual work. This is a practical cluster because the role frequently automates provisioning, checks, and remediation tasks.",
            "slug": "automation-scripting-and-cli",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Azure Cloud Engineer",
              "id": 4,
              "rationale": null,
              "role_archetype": null,
              "slug": "azure-cloud-engineer",
              "source": "db"
            },
            {
              "display_name": "Cloud Engineer",
              "id": 18,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Automation and Scripting for Operations",
            "id": 361,
            "rationale": "Scripts and lightweight automation used to execute repetitive virtualization tasks and enforce operational consistency. This is the practical glue that reduces manual host and VM administration.",
            "slug": "automation-and-scripting-for-operations",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Virtualization Engineer",
              "id": 26,
              "rationale": null,
              "role_archetype": null,
              "slug": "virtualization-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Network Automation and Scripting",
            "id": 285,
            "rationale": "Covers scripts and automation used to configure, validate, and audit network devices and services. This cluster is coherent because repeatable network operations increasingly depend on programmatic changes and checks.",
            "slug": "network-automation-and-scripting",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Network Engineer",
              "id": 21,
              "rationale": null,
              "role_archetype": null,
              "slug": "network-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for AI Workflows",
            "id": 261,
            "rationale": "Languages used to implement AI feature logic, orchestration, and response handling inside product code. This is the core coding surface for turning prompts and model calls into reliable application behavior.",
            "slug": "programming-languages-for-ai-workflows",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "AI Engineer",
              "id": 12,
              "rationale": null,
              "role_archetype": null,
              "slug": "ai-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Backend Systems",
            "id": 140,
            "rationale": "Languages used to implement server-side business logic, request handlers, workers, and service integrations. This is the core coding surface for backend feature delivery and maintenance.",
            "slug": "programming-languages-for-backend-systems",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Backend Engineer",
              "id": 14,
              "rationale": null,
              "role_archetype": null,
              "slug": "backend-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Data Work",
            "id": 67,
            "rationale": "Languages used to implement data pipelines, transformations, and operational utilities. This is the code layer for expressing extraction, parsing, validation, and orchestration logic in data engineering workflows.",
            "slug": "programming-languages-for-data-work",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for ML Systems",
            "id": 113,
            "rationale": "Languages used to implement model integration code, inference services, and feature-processing logic. This is the core coding surface for turning trained models into product-facing software components.",
            "slug": "programming-languages-for-ml-systems",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Machine Learning Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "machine-learning-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Security Work",
            "id": 328,
            "rationale": "Languages used to automate security tasks, write detection logic, and build analysis or remediation tooling. This is the core coding surface for a cybersecurity engineer across scripts, queries, and small utilities.",
            "slug": "programming-languages-for-security-work",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cybersecurity Engineer",
              "id": 9,
              "rationale": null,
              "role_archetype": null,
              "slug": "cybersecurity-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Test Automation",
            "id": 193,
            "rationale": "Languages used to implement automated checks, helper utilities, and test harness code. This is the core coding surface for turning test ideas into maintainable automation.",
            "slug": "programming-languages-for-test-automation",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Automation Tester",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "automation-tester",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Security Automation and Scripting",
            "id": 258,
            "rationale": "Automating repeatable security checks, enrichment, and remediation workflows. This cluster is coherent because the role often needs lightweight automation to scale analysis and response.",
            "slug": "security-automation-and-scripting",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cybersecurity Engineer",
              "id": 9,
              "rationale": null,
              "role_archetype": null,
              "slug": "cybersecurity-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Python",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Analytical Programming and Notebook Languages",
            "id": null,
            "rationale": "Languages and notebook/script-based coding used to clean, transform, analyze, and prototype data workflows and models. Includes Python, pandas, SQL, PySpark, notebook scripting, dataframe manipulation, exploratory analysis, ETL/data transformation logic, and other reproducible analytical code.",
            "slug": "d_merge_01",
            "source": "llm"
          },
          "input_skill": "PySpark",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "PySpark",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "PySpark",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Library",
          "skill_nature": "LIBRARY",
          "sub_category": "data_processing_library",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "PySpark is a specific Python API for Apache Spark and is usually named distinctly in JDs. It is unlikely to be reasonably confused with another catalog skill in typical job descriptions."
          },
          "context_keywords": {
            "context_keywords": [
              "Spark SQL",
              "DataFrame",
              "RDD",
              "Spark Streaming",
              "Structured Streaming",
              "Delta Lake",
              "Hive",
              "Parquet",
              "YARN",
              "Databricks",
              "EMR",
              "AWS Glue",
              "ETL",
              "partitioning",
              "broadcast join"
            ]
          },
          "maturity": {
            "confidence": 0.93,
            "maturity": "well_known",
            "reasoning": "PySpark appears in many data engineering and analytics job descriptions, especially for Spark-based ETL and ML pipelines; it remains a standard skill alongside Databricks and AWS EMR."
          },
          "skill_id": "pyspark",
          "vendor_license": {
            "confidence": 0.98,
            "license": "apache_2",
            "vendor": "Apache Software Foundation",
            "year_introduced": 2010
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Languages and notebook/script-based coding used to clean, transform, analyze, and prototype data workflows and models. Includes Python, pandas, SQL, PySpark, notebook scripting, dataframe manipulation, exploratory analysis, ETL/data transformation logic, and other reproducible analytical code.",
            "exemplar_skills": [
              "Analytical Programming and Notebook Languages"
            ],
            "in_scope": "Skills, tools, and practices that belong under Analytical Programming and Notebook Languages for the target role, including items implied by the dimension rationale.",
            "name": "Analytical Programming and Notebook Languages",
            "out_of_scope": "Adjacent clusters explicitly not owned by Analytical Programming and Notebook Languages, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          },
          {
            "description": "Covers writing and optimizing distributed batch or streaming data transformations on large datasets. PySpark belongs here because it is a Spark-based API used to express parallel data processing jobs at scale.",
            "exemplar_skills": [
              "PySpark",
              "Spark DataFrame API",
              "Spark SQL",
              "RDDs",
              "Spark joins",
              "Spark window functions"
            ],
            "in_scope": "PySpark, Spark DataFrame transformations, Spark SQL, RDD operations, joins, aggregations, partitioning, shuffles, window functions, UDFs, batch ETL jobs, distributed data cleansing",
            "name": "Distributed Data Processing",
            "out_of_scope": "Interactive BI dashboards, ad hoc SQL reporting, model training algorithms, low-level cluster administration, message broker configuration, which belong to analytics, ML, platform, or streaming infrastructure dimensions",
            "overlap_flags": [
              {
                "reason": "Spark can consume streams, but this dimension is about distributed computation rather than brokered event transport.",
                "with_dim_id": "messaging-and-event-streaming",
                "with_dim_name": null,
                "with_role": "Backend Engineer"
              },
              {
                "reason": "PySpark is also a programming language surface, but the stronger fit here is distributed data processing on Spark.",
                "with_dim_id": "analytical-programming-languages",
                "with_dim_name": null,
                "with_role": "Data Analyst, Data Scientist"
              }
            ],
            "tentative_id": "d_init_01"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "analytical-programming-languages",
            "a_name": "Analytical Programming Languages",
            "a_role": "__skill_focal__",
            "b_dim_id": "analytical-programming-languages",
            "b_name": "Analytical Programming Languages",
            "b_role": "Data Scientist",
            "into": "d_merge_01",
            "into_name": "Analytical Programming and Notebook Languages",
            "merged_from": [
              "analytical-programming-languages",
              "analytical-programming-languages"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims describe the same cluster: analytical coding in notebooks/scripts for data cleaning, transformation, analysis, and prototyping. Dim A lists PySpark, Python data wrangling, pandas, SQL for analysis, notebook scripting, ETL logic, and dataframe manipulation. Dim B describes the same core surface as languages used to clean, transform, analyze, and prototype models in notebooks and scripts, i.e. reproducible statistical/data-manipulation code. The cross-role difference is only framing; the underlying skills overlap heavily.",
            "similarity": 0.8286169942979785
          }
        ],
        "placed": {
          "name": "PySpark",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [
            "d_init_01"
          ],
          "skill_id": "pyspark"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [
            "python"
          ],
          "related_to": [
            "sql",
            "postgresql",
            "nosql",
            "amazon-athena",
            "amazon-sagemaker",
            "aws-data-pipeline",
            "aws-lambda",
            "kubeflow",
            "machine-learning",
            "elasticsearch"
          ],
          "requires": [],
          "skill_id": "pyspark",
          "suppress_on_match": []
        },
        "skill_id": "pyspark",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.93,
          "name": "PySpark",
          "reasoning": "PySpark is best classified as a Library because it is a Python package imported and used from application code, rather than a hosted environment or a framework you build inside.",
          "skill_id": "pyspark",
          "subtype": "data_processing_library",
          "type": "Library"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:41-\u003e2"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "SQL",
          "alias_type": "CANONICAL",
          "id": 3398,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "SQL",
        "id": 2601,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "sql",
        "sub_category_id": 55,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Relational Data Modeling",
            "id": 71,
            "rationale": "Designing tables, relationships, constraints, and transactional data shapes for operational backend systems. This cluster is coherent because backend services frequently own the canonical application data model.",
            "slug": "relational-data-modeling",
            "source": "db"
          },
          "input_skill": "SQL",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Backend Engineer",
              "id": 14,
              "rationale": null,
              "role_archetype": null,
              "slug": "backend-engineer",
              "source": "db"
            },
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "SQL",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "SQL",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "MLflow",
          "alias_type": "CANONICAL",
          "id": 3593,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 11,
        "display_name": "MLflow",
        "id": 2640,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "mlflow",
        "sub_category_id": 2151,
        "typical_lifespan": "EVERGREEN",
        "volatility": "EMERGING"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Model Serving Deployment and Runtime Packaging",
            "id": 52,
            "rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
            "slug": "model-serving-deployment-and-runtime-packaging",
            "source": "db"
          },
          "input_skill": "MLflow",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "MLOps Engineer",
              "id": 5,
              "rationale": null,
              "role_archetype": null,
              "slug": "mlops-engineer",
              "source": "db"
            },
            {
              "display_name": "Machine Learning Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "machine-learning-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Project Delivery and Coordination",
            "id": 366,
            "rationale": "Coordination practices for organizing work, tracking progress, and aligning stakeholders across a delivery effort. Agile fits here when used as a team execution framework for managing scope, cadence, and collaboration.",
            "slug": "d_init_02",
            "source": "db"
          },
          "input_skill": "MLflow",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "MLflow",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "MLflow",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Azure Machine Learning",
          "alias_type": "CANONICAL",
          "id": 600,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "Azure Machine Learning",
        "id": 385,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "azure-machine-learning",
        "sub_category_id": 326,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud ML Platform Operations",
            "id": 65,
            "rationale": "Consumer-level operation of managed ML services and cloud resources used to train and serve models. This covers the cloud platform surface that MLOps engineers use without owning the underlying cloud platform itself.",
            "slug": "cloud-ml-platform-operations",
            "source": "db"
          },
          "input_skill": "Azure Machine Learning",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "MLOps Engineer",
              "id": 5,
              "rationale": null,
              "role_archetype": null,
              "slug": "mlops-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Azure Machine Learning",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Azure Data Factory",
          "alias_type": "CANONICAL",
          "id": 731,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 14,
        "display_name": "Azure Data Factory",
        "id": 467,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CLOUD_SERVICE",
        "slug": "azure-data-factory",
        "sub_category_id": 385,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Data Platform Services",
            "id": 81,
            "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
            "slug": "cloud-data-platform-services",
            "source": "db"
          },
          "input_skill": "Azure Data Factory",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Azure Data Factory",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Databricks",
          "alias_type": "CANONICAL",
          "id": 601,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "Databricks",
        "id": 386,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "databricks",
        "sub_category_id": 323,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud ML Platform Operations",
            "id": 65,
            "rationale": "Consumer-level operation of managed ML services and cloud resources used to train and serve models. This covers the cloud platform surface that MLOps engineers use without owning the underlying cloud platform itself.",
            "slug": "cloud-ml-platform-operations",
            "source": "db"
          },
          "input_skill": "Databricks",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "MLOps Engineer",
              "id": 5,
              "rationale": null,
              "role_archetype": null,
              "slug": "mlops-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Databricks",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Analytical Programming and Notebook-Based Data Analysis",
            "id": null,
            "rationale": "Languages and notebook-friendly coding used to clean, transform, analyze, and prototype data and model workflows. This includes Python, R, SQL, and Scala used in notebooks or scripts for data wrangling, exploratory data analysis, statistical logic, feature engineering, and reproducible prototyping. It excludes production orchestration and scheduling, dashboard/report authoring, model deployment packaging, database administration, and UI development.",
            "slug": "d_merge_01",
            "source": "llm"
          },
          "input_skill": "Notebooks",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Notebooks",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Tool",
          "skill_nature": "TOOL",
          "sub_category": "notebook_environment",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": true,
            "confused_with": [
              "jupyter_notebook",
              "colab"
            ],
            "reasoning": "\u201cNotebooks\u201d is a generic term and in JDs could mean Jupyter notebooks or Google Colab, both common catalog skills. The standalone name is too broad to be unambiguous."
          },
          "context_keywords": {
            "context_keywords": [
              "Jupyter",
              "JupyterLab",
              "IPython",
              "nbconvert",
              "nbformat",
              "kernel",
              "Markdown",
              "code cells",
              "data visualization",
              "pandas",
              "NumPy",
              "Matplotlib",
              "interactive analysis",
              "reproducible research",
              "collaboration"
            ]
          },
          "maturity": {
            "confidence": 0.93,
            "maturity": "well_known",
            "reasoning": "Notebook environments (e.g., Jupyter) appear in many data science and ML job descriptions and are a standard workflow in major cloud vendors\u2019 managed notebook offerings."
          },
          "skill_id": "notebooks",
          "vendor_license": {
            "confidence": 0.93,
            "license": "bsd",
            "vendor": "Project Jupyter",
            "year_introduced": 2014
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Languages and notebook-friendly coding used to clean, transform, analyze, and prototype data and model workflows. This includes Python, R, SQL, and Scala used in notebooks or scripts for data wrangling, exploratory data analysis, statistical logic, feature engineering, and reproducible prototyping. It excludes production orchestration and scheduling, dashboard/report authoring, model deployment packaging, database administration, and UI development.",
            "exemplar_skills": [
              "Analytical Programming and Notebook-Based Data Analysis"
            ],
            "in_scope": "Skills, tools, and practices that belong under Analytical Programming and Notebook-Based Data Analysis for the target role, including items implied by the dimension rationale.",
            "name": "Analytical Programming and Notebook-Based Data Analysis",
            "out_of_scope": "Adjacent clusters explicitly not owned by Analytical Programming and Notebook-Based Data Analysis, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "analytical-programming-languages",
            "a_name": "Analytical Programming Languages",
            "a_role": "__skill_focal__",
            "b_dim_id": "analytical-programming-languages",
            "b_name": "Analytical Programming Languages",
            "b_role": "Data Scientist",
            "into": "d_merge_01",
            "into_name": "Analytical Programming and Notebook-Based Data Analysis",
            "merged_from": [
              "analytical-programming-languages",
              "analytical-programming-languages"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims describe the same analytical coding cluster: notebook/script-based use of Python, R, SQL, and Scala to clean, transform, analyze, and prototype data or models. Dim A\u2019s exemplars (Notebooks, Python, R, SQL, Scala, Data wrangling, Exploratory data analysis, Feature engineering) match Dim B\u2019s description of languages for statistical logic and data manipulation in notebooks and scripts. The extra role label does not change the substance; the overlap is real, not just similar wording.",
            "similarity": 0.8752619088294021
          }
        ],
        "placed": {
          "name": "Notebooks",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "notebooks"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "runbooks",
            "codex",
            "document-intelligence",
            "document-processing",
            "ocr",
            "ml",
            "python",
            "bash",
            "linux",
            "data-structures"
          ],
          "requires": [],
          "skill_id": "notebooks",
          "suppress_on_match": []
        },
        "skill_id": "notebooks",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.9,
          "name": "Notebooks",
          "reasoning": "Notebooks are software you operate to write and run analyses interactively, so by the Tool vs Framework rule they are best classified as a tool rather than a framework or platform.",
          "skill_id": "notebooks",
          "subtype": "notebook_environment",
          "type": "Tool"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:40-\u003e1"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "CI/CD",
          "alias_type": "CANONICAL",
          "id": 3376,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 7,
        "display_name": "CI/CD",
        "id": 2579,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "METHODOLOGY",
        "slug": "ci-cd",
        "sub_category_id": 2102,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "CI/CD",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "CI/CD",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Big Data",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Messaging and Event Streaming",
            "id": 146,
            "rationale": "Asynchronous communication patterns and systems for decoupled service interaction and background processing. This is a coherent backend cluster because many server-side workflows depend on queues, topics, and event streams.",
            "slug": "messaging-and-event-streaming",
            "source": "db"
          },
          "input_skill": "Big Data",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Backend Engineer",
              "id": 14,
              "rationale": null,
              "role_archetype": null,
              "slug": "backend-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Messaging and Event Streaming",
            "id": 146,
            "rationale": "Asynchronous communication patterns and systems for decoupled service interaction and background processing. This is a coherent backend cluster because many server-side workflows depend on queues, topics, and event streams.",
            "slug": "messaging-and-event-streaming",
            "source": "db"
          },
          "input_skill": "Big Data",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Backend Engineer",
              "id": 14,
              "rationale": null,
              "role_archetype": null,
              "slug": "backend-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Big Data",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Domain",
          "skill_nature": "CONCEPT",
          "sub_category": "data_intensive_computing",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "\u201cBig Data\u201d is a well-established domain term with a specific meaning in JDs. It is unlikely to be reasonably confused with another catalog skill in typical extraction contexts."
          },
          "context_keywords": {
            "context_keywords": [
              "Hadoop",
              "Spark",
              "Hive",
              "Kafka",
              "HDFS",
              "MapReduce",
              "ETL",
              "data lake",
              "data warehouse",
              "NoSQL",
              "Parquet",
              "Airflow",
              "Flink",
              "Scala",
              "YARN"
            ]
          },
          "maturity": {
            "confidence": 0.92,
            "maturity": "well_known",
            "reasoning": "Common in data/platform job descriptions across industries; JD volume remains high for Hadoop/Spark/streaming stacks, and cloud vendors market managed big-data services as standard offerings."
          },
          "skill_id": "big-data",
          "vendor_license": {
            "confidence": 0.99,
            "license": null,
            "vendor": null,
            "year_introduced": null
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [
          {
            "a_dim_id": "messaging-and-event-streaming",
            "a_name": "Messaging and Event Streaming",
            "a_role": "__skill_focal__",
            "b_dim_id": "messaging-and-event-streaming",
            "b_name": "Messaging and Event Streaming",
            "b_role": "Backend Engineer",
            "pair_kind": "cross_role",
            "reasoning": "Dim A is analytics-oriented: its description says it is for \"data movement and event-driven pipelines used to feed large-scale analytics systems,\" with exemplars like Kafka, Spark Streaming, Apache Flink, stream ingestion, and near-real-time analytics. Dim B is backend-oriented: it covers \"asynchronous communication patterns and systems for decoupled service interaction and background processing,\" i.e., queues/topics/event streams for server-side workflows. Same technology words, different conceptual anchors and role usage.",
            "similarity": 0.7224413551673322
          }
        ],
        "locked_dimensions": [
          {
            "description": "Large-scale data processing systems and techniques for storing, transforming, and analyzing high-volume, high-velocity datasets. Big Data belongs here because the term usually refers to the distributed data engineering stack rather than a single tool.",
            "exemplar_skills": [
              "Big Data",
              "Hadoop",
              "Apache Spark",
              "Hive",
              "MapReduce",
              "distributed ETL",
              "data lake processing"
            ],
            "in_scope": "Big Data, Hadoop, Spark, Hive, MapReduce, distributed batch processing, large-scale ETL, data lakes, cluster-based data processing",
            "name": "Big Data Processing",
            "out_of_scope": "Data warehouse modeling and BI reporting, ad hoc SQL optimization, model training and serving, network or storage hardware operations",
            "overlap_flags": [
              {
                "reason": "Big data platforms often ingest from streams, but this dimension is about the processing stack rather than asynchronous messaging itself.",
                "with_dim_id": "messaging-and-event-streaming",
                "with_dim_name": null,
                "with_role": "Backend Engineer"
              },
              {
                "reason": "Big data workloads rely on query tuning, but that catalog dimension is narrower and focused on access-path performance.",
                "with_dim_id": "data-access-and-query-optimization",
                "with_dim_name": null,
                "with_role": "Data Engineer"
              }
            ],
            "tentative_id": "d_init_01"
          },
          {
            "description": "Asynchronous data movement and event-driven pipelines used to feed large-scale analytics systems. Big Data often overlaps with this area when the skill is used in streaming ingestion or pipeline orchestration.",
            "exemplar_skills": [
              "Big Data",
              "Kafka",
              "Spark Streaming",
              "Apache Flink",
              "event-driven pipelines",
              "stream ingestion",
              "real-time analytics"
            ],
            "in_scope": "Big Data, Kafka, Spark Streaming, Flink, event hubs, pub-sub pipelines, stream ingestion, near-real-time analytics",
            "name": "Messaging and Event Streaming",
            "out_of_scope": "Batch-only distributed computation, storage layout tuning, SQL query optimization, dashboarding and reporting, model deployment",
            "overlap_flags": [
              {
                "reason": "Many big data solutions combine batch and streaming processing, so the boundary depends on whether the emphasis is computation or event transport.",
                "with_dim_id": "d_init_01",
                "with_dim_name": null,
                "with_role": null
              }
            ],
            "tentative_id": "messaging-and-event-streaming"
          },
          {
            "description": "Asynchronous communication patterns and systems for decoupled service interaction and background processing. This is a coherent backend cluster because many server-side workflows depend on queues, topics, and event streams.",
            "exemplar_skills": [
              "Messaging and Event Streaming"
            ],
            "in_scope": "Skills, tools, and practices that belong under Messaging and Event Streaming for the target role, including items implied by the dimension rationale.",
            "name": "Messaging and Event Streaming",
            "out_of_scope": "Adjacent clusters explicitly not owned by Messaging and Event Streaming, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "messaging-and-event-streaming"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "Big Data",
          "placement_confidence": 0.92,
          "primary_dimension": "d_init_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 3 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [
            "messaging-and-event-streaming"
          ],
          "skill_id": "big-data"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "machine-learning",
            "nosql",
            "ai-ml",
            "devops",
            "amazon-athena",
            "elasticsearch",
            "sql",
            "mysql",
            "postgresql",
            "artificial-intelligence"
          ],
          "requires": [],
          "skill_id": "big-data",
          "suppress_on_match": []
        },
        "skill_id": "big-data",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.93,
          "name": "Big Data",
          "reasoning": "Big Data is a vertical/problem-space body of knowledge rather than a tool, framework, or architecture, so it fits the Domain rule.",
          "skill_id": "big-data",
          "subtype": "data_intensive_computing",
          "type": "Domain"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:42-\u003e3"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Data Modeling",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Data Modeling",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Concept",
          "skill_nature": "CONCEPT",
          "sub_category": "data_modeling",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "\u201cData Modeling\u201d is a standard, well-scoped concept in JDs and is unlikely to be confused with a different catalog skill in typical usage."
          },
          "context_keywords": {
            "context_keywords": [
              "ER diagrams",
              "normalization",
              "denormalization",
              "star schema",
              "snowflake schema",
              "fact table",
              "dimension table",
              "OLTP",
              "OLAP",
              "entity-relationship",
              "schema design",
              "data warehouse",
              "dimensional modeling",
              "primary key",
              "foreign key"
            ]
          },
          "maturity": {
            "confidence": 0.93,
            "maturity": "well_known",
            "reasoning": "Data modeling appears in many data engineer, DBA, and analytics JDs, and is a standard prerequisite alongside SQL and database design rather than a niche specialty."
          },
          "skill_id": "data-modeling",
          "vendor_license": {
            "confidence": 0.99,
            "license": null,
            "vendor": null,
            "year_introduced": null
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Designing the logical and physical structure of data so it is consistent, queryable, and fit for downstream analytics or operational use. This belongs here because the skill centers on defining entities, relationships, keys, and schemas rather than storage tuning or pipeline execution.",
            "exemplar_skills": [
              "Data Modeling",
              "Schema Design",
              "Dimensional Modeling",
              "Entity-Relationship Modeling",
              "Normalization",
              "Star Schema Design",
              "Snowflake Schema Design"
            ],
            "in_scope": "Data Modeling, conceptual/logical/physical schema design, entities and relationships, normalization and denormalization, primary and foreign keys, dimensional modeling, star and snowflake schemas, fact and dimension tables, schema evolution",
            "name": "Data Modeling",
            "out_of_scope": "Data Access and Query Optimization, file layout and partitioning choices, ETL orchestration and data movement, dashboard/report design, database performance tuning, application API payload shaping",
            "overlap_flags": [
              {
                "reason": "Data models influence query performance, but this dimension owns the structural design rather than access-path tuning.",
                "with_dim_id": "data-access-and-query-optimization",
                "with_dim_name": null,
                "with_role": "Data Engineer"
              },
              {
                "reason": "Event schemas and contracts can be modeled here, but the messaging dimension owns transport and delivery semantics.",
                "with_dim_id": "messaging-and-event-streaming",
                "with_dim_name": null,
                "with_role": "Backend Engineer"
              }
            ],
            "tentative_id": "d_init_01"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "Data Modeling",
          "placement_confidence": 0.92,
          "primary_dimension": "d_init_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "data-modeling"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "data-structures",
            "sql",
            "nosql",
            "storage-layout",
            "derived-views",
            "metadata-json",
            "postgresql",
            "document-processing",
            "failure-analysis",
            "capacity-forecasting"
          ],
          "requires": [],
          "skill_id": "data-modeling",
          "suppress_on_match": []
        },
        "skill_id": "data-modeling",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.96,
          "name": "Data Modeling",
          "reasoning": "Data Modeling is fundamentally a knowledge unit about how to structure and relate data, so by the Concept vs Methodology rule it is a Concept rather than a process or tool.",
          "skill_id": "data-modeling",
          "subtype": "data_modeling",
          "type": "Concept"
        },
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Inference Data Pipelines for Serving and Batch Scoring",
            "id": null,
            "rationale": "Operational data movement that prepares and delivers timely, reliable data to production inference systems. Includes batch scoring inputs, feature refresh jobs, inference-time preprocessing, scheduled extracts, data validation for serving, and online/offline feature synchronization. Excludes training dataset curation, model training workflows, experimentation-focused feature engineering, model evaluation, and serving infrastructure/routing.",
            "slug": "d_merge_01",
            "source": "llm"
          },
          "input_skill": "Data Pipelines",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Data Pipelines",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Data Pipelines",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Architecture",
          "skill_nature": "PATTERN",
          "sub_category": "data_pipeline_architecture",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "\u201cData Pipelines\u201d is a fairly specific architecture term and is unlikely to be mistaken for a different catalog skill in a typical JD."
          },
          "context_keywords": {
            "context_keywords": [
              "ETL",
              "ELT",
              "Apache Airflow",
              "Apache NiFi",
              "Kafka",
              "Spark",
              "dbt",
              "orchestration",
              "batch processing",
              "stream processing",
              "data ingestion",
              "data warehouse",
              "data lake",
              "schema evolution",
              "data quality"
            ]
          },
          "maturity": {
            "confidence": 0.93,
            "maturity": "well_known",
            "reasoning": "Data pipelines are a common requirement in cloud/data engineering JDs, with frequent mentions alongside Airflow, Spark, and ETL/ELT stacks; broad hiring demand signals mainstream adoption."
          },
          "skill_id": "data-pipelines",
          "vendor_license": {
            "confidence": 0.99,
            "license": null,
            "vendor": null,
            "year_introduced": null
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Operational data movement that prepares and delivers timely, reliable data to production inference systems. Includes batch scoring inputs, feature refresh jobs, inference-time preprocessing, scheduled extracts, data validation for serving, and online/offline feature synchronization. Excludes training dataset curation, model training workflows, experimentation-focused feature engineering, model evaluation, and serving infrastructure/routing.",
            "exemplar_skills": [
              "Inference Data Pipelines for Serving and Batch Scoring"
            ],
            "in_scope": "Skills, tools, and practices that belong under Inference Data Pipelines for Serving and Batch Scoring for the target role, including items implied by the dimension rationale.",
            "name": "Inference Data Pipelines for Serving and Batch Scoring",
            "out_of_scope": "Adjacent clusters explicitly not owned by Inference Data Pipelines for Serving and Batch Scoring, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          },
          {
            "description": "Designing, scheduling, and coordinating end-to-end data movement and transformation jobs. This is the best fit when Data Pipelines refers to building reliable multi-step workflows across sources, transforms, and sinks.",
            "exemplar_skills": [
              "Data Pipelines",
              "workflow orchestration",
              "ETL orchestration",
              "DAG scheduling",
              "backfills",
              "Airflow",
              "Dagster",
              "Prefect"
            ],
            "in_scope": "Data Pipelines, workflow scheduling, DAG orchestration, dependency management, retries and backfills, ETL/ELT job coordination, Airflow, Dagster, Prefect, dbt orchestration",
            "name": "Data Pipeline Orchestration",
            "out_of_scope": "Low-level query tuning, storage layout optimization, message broker internals, model serving, dashboard/report generation",
            "overlap_flags": [
              {
                "reason": "Streaming systems can trigger pipeline steps, but this dimension is about orchestration rather than asynchronous messaging semantics.",
                "with_dim_id": "messaging-and-event-streaming",
                "with_dim_name": null,
                "with_role": "Backend Engineer"
              },
              {
                "reason": "Pipeline jobs often include SQL, but query tuning and physical access-path optimization belong to the analytical storage/query dimension.",
                "with_dim_id": "data-access-and-query-optimization",
                "with_dim_name": null,
                "with_role": "Data Engineer"
              }
            ],
            "tentative_id": "d_init_01"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "inference-data-pipelines",
            "a_name": "Inference Data Pipelines",
            "a_role": "__skill_focal__",
            "b_dim_id": "inference-data-pipelines",
            "b_name": "Inference Data Pipelines",
            "b_role": "MLOps Engineer",
            "into": "d_merge_01",
            "into_name": "Inference Data Pipelines for Serving and Batch Scoring",
            "merged_from": [
              "inference-data-pipelines",
              "inference-data-pipelines"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims describe the same serving-oriented data movement cluster. Dim A covers batch scoring inputs, feature refresh pipelines, inference-time preprocessing, and online/offline feature sync, while Dim B uses the same language and adds that it is separate from model training. Dim A\u2019s out_of_scope already excludes training and serving infra, so the overlap is substantive, not just name similarity.",
            "similarity": 0.9616676657656642
          }
        ],
        "placed": {
          "name": "Data Pipelines",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [
            "d_init_01"
          ],
          "skill_id": "data-pipelines"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "workflow-automation",
            "ci-cd",
            "devops",
            "kubeflow",
            "mlops",
            "ai-ml",
            "rest-apis",
            "sql",
            "document-processing",
            "agentic-workflows"
          ],
          "requires": [],
          "skill_id": "data-pipelines",
          "suppress_on_match": []
        },
        "skill_id": "data-pipelines",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.9,
          "name": "Data Pipelines",
          "reasoning": "By the Architecture vs Concept rule, data pipelines describe a system-shape pattern for moving and transforming data across stages rather than a single knowledge unit or tool.",
          "skill_id": "data-pipelines",
          "subtype": "data_pipeline_architecture",
          "type": "Architecture"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:41-\u003e2"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Asynchronous Messaging and Event Streaming",
            "id": null,
            "rationale": "Covers asynchronous communication and data movement through queues, topics, streams, event buses, and pub/sub systems for decoupled processing, background jobs, and event-driven integration. Includes continuous or event-driven data ingestion and change data capture pipelines, but excludes batch ETL orchestration, warehouse modeling, query optimization, model training data prep, and direct application API calls.",
            "slug": "d_merge_01",
            "source": "llm"
          },
          "input_skill": "Data Ingestion",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Data Ingestion",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Data Ingestion",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Concept",
          "skill_nature": "CONCEPT",
          "sub_category": "data_ingestion",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "\u201cData Ingestion\u201d is a standard, specific concept in data engineering and is unlikely to be mistaken for a different catalog skill in typical job descriptions."
          },
          "context_keywords": {
            "context_keywords": [
              "ETL",
              "ELT",
              "batch processing",
              "streaming",
              "Kafka",
              "Apache NiFi",
              "Airflow",
              "CDC",
              "S3",
              "schema validation",
              "data pipeline",
              "message queue",
              "Parquet",
              "JSON",
              "API ingestion"
            ]
          },
          "maturity": {
            "confidence": 0.86,
            "maturity": "well_known",
            "reasoning": "Commonly appears in data/platform job descriptions and cloud vendor docs as a core pipeline capability; often paired with ETL/ELT, Kafka, and Airflow rather than treated as a niche specialty."
          },
          "skill_id": "data-ingestion",
          "vendor_license": {
            "confidence": 0.99,
            "license": null,
            "vendor": null,
            "year_introduced": null
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Covers asynchronous communication and data movement through queues, topics, streams, event buses, and pub/sub systems for decoupled processing, background jobs, and event-driven integration. Includes continuous or event-driven data ingestion and change data capture pipelines, but excludes batch ETL orchestration, warehouse modeling, query optimization, model training data prep, and direct application API calls.",
            "exemplar_skills": [
              "Asynchronous Messaging and Event Streaming"
            ],
            "in_scope": "Skills, tools, and practices that belong under Asynchronous Messaging and Event Streaming for the target role, including items implied by the dimension rationale.",
            "name": "Asynchronous Messaging and Event Streaming",
            "out_of_scope": "Adjacent clusters explicitly not owned by Asynchronous Messaging and Event Streaming, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          },
          {
            "description": "Covers scheduled or bulk loading of data from files, databases, and external systems into analytical or operational stores. Data Ingestion fits here when the emphasis is on landing, validating, and loading datasets rather than streaming transport.",
            "exemplar_skills": [
              "Data Ingestion",
              "ETL Ingestion",
              "ELT",
              "Bulk Load",
              "Incremental Load",
              "Schema Validation",
              "File Import Pipelines",
              "JDBC Extracts"
            ],
            "in_scope": "Data Ingestion, ETL ingestion, ELT landing, file imports, S3/GCS/Azure Blob loads, JDBC extracts, bulk loads, schema validation, deduplication, incremental loads",
            "name": "Batch Data Ingestion Pipelines",
            "out_of_scope": "Real-time event streaming, message brokers, API request orchestration, warehouse query tuning, downstream analytics modeling",
            "overlap_flags": [
              {
                "reason": "Some ingestion pipelines pull from cloud services, but this dimension focuses on bulk movement and landing mechanics.",
                "with_dim_id": "cloud-service-integration-patterns",
                "with_dim_name": null,
                "with_role": "Cloud Architect"
              },
              {
                "reason": "Ingested data often lands in analytical stores, but query tuning is about how data is read after ingestion.",
                "with_dim_id": "data-access-and-query-optimization",
                "with_dim_name": null,
                "with_role": "Data Engineer"
              }
            ],
            "tentative_id": "d_init_01"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "messaging-and-event-streaming",
            "a_name": "Messaging and Event Streaming",
            "a_role": "__skill_focal__",
            "b_dim_id": "messaging-and-event-streaming",
            "b_name": "Messaging and Event Streaming",
            "b_role": "Backend Engineer",
            "into": "d_merge_01",
            "into_name": "Asynchronous Messaging and Event Streaming",
            "merged_from": [
              "messaging-and-event-streaming",
              "messaging-and-event-streaming"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims describe the same cluster: asynchronous, decoupled communication via queues, topics, streams, and event buses. Dim A frames it as moving data through messaging/streaming systems and explicitly includes Kafka, Kinesis, RabbitMQ, Event Streaming, and Change Data Capture. Dim B frames the same mechanisms as backend asynchronous communication and background processing. The role difference is only emphasis, not substance, so the overlap is a true match.",
            "similarity": 0.7307849947371147
          }
        ],
        "placed": {
          "name": "Data Ingestion",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [
            "d_init_01"
          ],
          "skill_id": "data-ingestion"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "document-processing",
            "document-intelligence",
            "retrieval",
            "hybrid-retrieval",
            "aws-data-pipeline",
            "devops",
            "observability",
            "containers",
            "storage-layout",
            "multimodal-document-understanding"
          ],
          "requires": [],
          "skill_id": "data-ingestion",
          "suppress_on_match": []
        },
        "skill_id": "data-ingestion",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.93,
          "name": "Data Ingestion",
          "reasoning": "Data Ingestion is fundamentally a named knowledge unit about bringing data into systems, so it fits the Concept category rather than a tool, platform, or methodology.",
          "skill_id": "data-ingestion",
          "subtype": "data_ingestion",
          "type": "Concept"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:41-\u003e2"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Messaging and Event Streaming",
            "id": 146,
            "rationale": "Asynchronous communication patterns and systems for decoupled service interaction and background processing. This is a coherent backend cluster because many server-side workflows depend on queues, topics, and event streams.",
            "slug": "messaging-and-event-streaming",
            "source": "db"
          },
          "input_skill": "Stream Processing",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Backend Engineer",
              "id": 14,
              "rationale": null,
              "role_archetype": null,
              "slug": "backend-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Stream Processing",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Architecture",
          "skill_nature": "PATTERN",
          "sub_category": "stream_processing_architecture",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "The term is fairly specific in JDs and usually refers to event/data stream processing architecture, not a different catalog skill. It is unlikely to be confused with another skill name in typical job descriptions."
          },
          "context_keywords": {
            "context_keywords": [
              "Apache Kafka",
              "Apache Flink",
              "Apache Spark Streaming",
              "Apache Storm",
              "event-driven architecture",
              "pub/sub",
              "message broker",
              "consumer group",
              "windowing",
              "checkpointing",
              "exactly-once semantics",
              "backpressure",
              "event time",
              "watermarking",
              "CDC"
            ]
          },
          "maturity": {
            "confidence": 0.93,
            "maturity": "well_known",
            "reasoning": "Common in JDs for Kafka/Flink/Spark Streaming and cloud services like Kinesis/Pub/Sub; broad market adoption for real-time event pipelines."
          },
          "skill_id": "stream-processing",
          "vendor_license": {
            "confidence": 0.99,
            "license": null,
            "vendor": null,
            "year_introduced": null
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Processing continuous event data as it arrives, using stream processors, windows, and stateful operators to transform and route records in near real time. This belongs here because stream processing is the core execution model for event-driven pipelines and low-latency data movement.",
            "exemplar_skills": [
              "Stream Processing",
              "Apache Flink",
              "Spark Structured Streaming",
              "Kafka Streams",
              "Apache Beam",
              "event-time processing",
              "windowing",
              "watermarking"
            ],
            "in_scope": "Stream Processing, Apache Kafka Streams, Apache Flink, Spark Structured Streaming, Apache Beam, event-time processing, windowing, watermarking, stateful transforms, joins on streams, exactly-once processing, checkpointing, backpressure handling",
            "name": "Stream Processing",
            "out_of_scope": "Batch ETL jobs, offline warehouse transformations, model training pipelines, message broker administration, low-level network transport, dashboard/reporting logic",
            "overlap_flags": [
              {
                "reason": "Streaming pipelines can feed inference features or scoring jobs, but this dimension is about the real-time processing mechanics rather than model data movement.",
                "with_dim_id": "inference-data-pipelines",
                "with_dim_name": null,
                "with_role": "MLOps Engineer"
              },
              {
                "reason": "Stream processors often use partitioning and state stores, but query tuning is a separate concern from stream execution semantics.",
                "with_dim_id": "data-access-and-query-optimization",
                "with_dim_name": null,
                "with_role": "Data Engineer"
              }
            ],
            "tentative_id": "messaging-and-event-streaming"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "Stream Processing",
          "placement_confidence": 0.92,
          "primary_dimension": "messaging-and-event-streaming",
          "reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "stream-processing"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "document-processing",
            "layout-parsing",
            "workflow-automation",
            "aws-data-pipeline",
            "event-logs",
            "snapshot",
            "state-transitions",
            "proxy-patterns",
            "agentic-workflows",
            "event-emission"
          ],
          "requires": [],
          "skill_id": "stream-processing",
          "suppress_on_match": []
        },
        "skill_id": "stream-processing",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.9,
          "name": "Stream Processing",
          "reasoning": "Stream Processing is fundamentally a system-shape for handling continuous event flows, so by the Architecture vs Concept rule it fits Architecture rather than a tool or methodology.",
          "skill_id": "stream-processing",
          "subtype": "stream_processing_architecture",
          "type": "Architecture"
        },
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Messaging, Queueing, and Event Streaming",
            "id": null,
            "rationale": "Asynchronous communication patterns and systems that decouple producers and consumers, buffer and route work items, and support background processing and service-to-service integration. Includes queueing, message queues, pub/sub, brokers, topics, consumer groups, producers/consumers, dead-letter queues, retry handling, backpressure, and event streaming platforms such as Kafka, RabbitMQ, SQS, and Azure Service Bus.",
            "slug": "d_merge_01",
            "source": "llm"
          },
          "input_skill": "Queueing",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Queueing",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Concept",
          "skill_nature": "CONCEPT",
          "sub_category": "queueing_theory",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "Queueing is a fairly specific operations-research concept; in typical JDs it is unlikely to be mistaken for a different catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "Little\u0027s Law",
              "M/M/1",
              "M/M/c",
              "Poisson process",
              "service rate",
              "arrival rate",
              "waiting time",
              "throughput",
              "utilization",
              "backlog",
              "buffering",
              "congestion",
              "discrete-event simulation",
              "priority queue",
              "SLA"
            ]
          },
          "maturity": {
            "confidence": 0.86,
            "maturity": "well_known",
            "reasoning": "Queueing theory is a standard CS/ops concept and appears in many systems, SRE, and performance-engineering job descriptions; it is not a sunset technology and remains a common interview/topic area."
          },
          "skill_id": "queueing",
          "vendor_license": {
            "confidence": 0.99,
            "license": null,
            "vendor": null,
            "year_introduced": null
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Asynchronous communication patterns and systems that decouple producers and consumers, buffer and route work items, and support background processing and service-to-service integration. Includes queueing, message queues, pub/sub, brokers, topics, consumer groups, producers/consumers, dead-letter queues, retry handling, backpressure, and event streaming platforms such as Kafka, RabbitMQ, SQS, and Azure Service Bus.",
            "exemplar_skills": [
              "Messaging, Queueing, and Event Streaming"
            ],
            "in_scope": "Skills, tools, and practices that belong under Messaging, Queueing, and Event Streaming for the target role, including items implied by the dimension rationale.",
            "name": "Messaging, Queueing, and Event Streaming",
            "out_of_scope": "Adjacent clusters explicitly not owned by Messaging, Queueing, and Event Streaming, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "messaging-and-event-streaming",
            "a_name": "Messaging and Event Streaming",
            "a_role": "__skill_focal__",
            "b_dim_id": "messaging-and-event-streaming",
            "b_name": "Messaging and Event Streaming",
            "b_role": "Backend Engineer",
            "into": "d_merge_01",
            "into_name": "Messaging, Queueing, and Event Streaming",
            "merged_from": [
              "messaging-and-event-streaming",
              "messaging-and-event-streaming"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims describe the same backend messaging cluster: asynchronous communication via queues, brokers, pub/sub, and event streams for decoupled service interaction and background processing. Dim A is the more detailed version, explicitly listing queueing, message queues, Kafka, RabbitMQ, SQS, dead-letter queues, retry handling, and backpressure. Dim B states the same substance in broader terms and adds no distinct skill area. Cross-role similarity is expected here; the underlying cluster is identical.",
            "similarity": 0.8270280068284861
          }
        ],
        "placed": {
          "name": "Queueing",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "queueing"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "capacity-forecasting",
            "capacity-alerts",
            "workflow-automation",
            "event-emission",
            "retrieval",
            "reranking",
            "context-management",
            "scrum",
            "devops",
            "containers"
          ],
          "requires": [],
          "skill_id": "queueing",
          "suppress_on_match": []
        },
        "skill_id": "queueing",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.93,
          "name": "Queueing",
          "reasoning": "Queueing is fundamentally a knowledge unit about how waiting lines and work distribution behave, so by the Concept vs Methodology rule it is a Concept rather than an Architecture or Tool.",
          "skill_id": "queueing",
          "subtype": "queueing_theory",
          "type": "Concept"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:40-\u003e1"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "API Integration, Request Orchestration, and Data Fetching",
            "id": null,
            "rationale": "Connecting applications to internal or external services through request/response APIs. This includes consuming REST and GraphQL endpoints, orchestrating requests, handling payloads and response parsing, pagination, retries, error handling, and shaping remote data for downstream or UI consumption.",
            "slug": "d_merge_01",
            "source": "llm"
          },
          "input_skill": "APIs",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Service Integration Patterns",
            "id": 188,
            "rationale": "Covers how cloud services and workloads connect through APIs, events, shared services, and integration boundaries. This cluster is coherent because architects must define interaction patterns that preserve decoupling, security, and operability.",
            "slug": "cloud-service-integration-patterns",
            "source": "db"
          },
          "input_skill": "APIs",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cloud Architect",
              "id": 11,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-architect",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "APIs",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Service Integration Patterns",
            "id": 188,
            "rationale": "Covers how cloud services and workloads connect through APIs, events, shared services, and integration boundaries. This cluster is coherent because architects must define interaction patterns that preserve decoupling, security, and operability.",
            "slug": "cloud-service-integration-patterns",
            "source": "db"
          },
          "input_skill": "APIs",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cloud Architect",
              "id": 11,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-architect",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "APIs",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Protocol",
          "skill_nature": "PROTOCOL",
          "sub_category": "application_programming_interfaces",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "\u201cAPIs\u201d is a standard, widely used term in JDs and usually refers unambiguously to application programming interfaces; it is not typically confused with a distinct catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "REST",
              "GraphQL",
              "OpenAPI",
              "Swagger",
              "JSON",
              "XML",
              "OAuth 2.0",
              "API gateway",
              "endpoint",
              "webhook",
              "rate limiting",
              "pagination",
              "versioning",
              "SDK",
              "microservices"
            ]
          },
          "maturity": {
            "confidence": 0.98,
            "maturity": "well_known",
            "reasoning": "APIs are a hiring-pipeline staple across backend, mobile, and platform JDs; REST/GraphQL/API design appears in large-volume job postings and cloud vendor docs."
          },
          "skill_id": "apis",
          "vendor_license": {
            "confidence": 0.99,
            "license": null,
            "vendor": null,
            "year_introduced": null
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [
          {
            "a_dim_id": "cloud-service-integration-patterns",
            "a_name": "Cloud Service Integration Patterns",
            "a_role": "__skill_focal__",
            "b_dim_id": "cloud-service-integration-patterns",
            "b_name": "Cloud Service Integration Patterns",
            "b_role": "Cloud Architect",
            "pair_kind": "cross_role",
            "reasoning": "Same wording, different level. Dim A is implementation-facing: APIs as an integration mechanism between cloud services/pipelines/platforms, with examples like RESTful integration, webhook consumers, shared service contracts, and cross-system orchestration; it excludes client parsing, broker internals, and auth-only concerns. Dim B is architect-facing: defining interaction patterns that preserve decoupling, security, and operability across cloud services and workloads. A covers how to integrate; B covers how to design the integration boundary. Different skills belong under each.",
            "similarity": 0.8418532624733192
          }
        ],
        "locked_dimensions": [
          {
            "description": "Connecting applications to internal or external services through request/response APIs. This includes consuming REST and GraphQL endpoints, orchestrating requests, handling payloads and response parsing, pagination, retries, error handling, and shaping remote data for downstream or UI consumption.",
            "exemplar_skills": [
              "API Integration, Request Orchestration, and Data Fetching"
            ],
            "in_scope": "Skills, tools, and practices that belong under API Integration, Request Orchestration, and Data Fetching for the target role, including items implied by the dimension rationale.",
            "name": "API Integration, Request Orchestration, and Data Fetching",
            "out_of_scope": "Adjacent clusters explicitly not owned by API Integration, Request Orchestration, and Data Fetching, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          },
          {
            "description": "How services connect across boundaries using APIs, events, and shared interfaces. The target skill belongs here when APIs are treated as an integration mechanism between cloud services, pipelines, or platforms.",
            "exemplar_skills": [
              "APIs",
              "service integration",
              "RESTful services",
              "webhooks",
              "integration patterns",
              "service contracts"
            ],
            "in_scope": "APIs, service-to-service calls, RESTful integration, webhook consumers, integration boundaries, shared service contracts, cross-system orchestration",
            "name": "Cloud Service Integration Patterns",
            "out_of_scope": "API client parsing details, UI data fetching, message broker internals, database access optimization, authentication-only concerns",
            "overlap_flags": [
              {
                "reason": "Both dimensions cover API-based communication, but this one is broader and architecture-oriented rather than client-fetch focused.",
                "with_dim_id": "api-integration-and-data-fetching",
                "with_dim_name": null,
                "with_role": "Frontend Engineer, Full Stack Developer"
              },
              {
                "reason": "Integration architectures often combine APIs with asynchronous messaging, so boundary decisions can overlap.",
                "with_dim_id": "messaging-and-event-streaming",
                "with_dim_name": null,
                "with_role": "Backend Engineer"
              }
            ],
            "tentative_id": "cloud-service-integration-patterns"
          },
          {
            "description": "Defining API contracts, resource models, and request/response semantics for services. This dimension fits the target skill when APIs refers to designing or documenting interfaces rather than merely consuming them.",
            "exemplar_skills": [
              "APIs",
              "REST API design",
              "OpenAPI",
              "Swagger",
              "endpoint design",
              "API versioning"
            ],
            "in_scope": "APIs, REST resource design, endpoint naming, versioning, OpenAPI, Swagger, request/response schemas, status codes, idempotency, contract design",
            "name": "API Design and Specification",
            "out_of_scope": "Consuming APIs from client code, pagination handling in fetch logic, event streaming, database schema design, authentication implementation",
            "overlap_flags": [
              {
                "reason": "API design overlaps with API consumption, but this dimension is about defining contracts rather than calling them.",
                "with_dim_id": "api-integration-and-data-fetching",
                "with_dim_name": null,
                "with_role": "Frontend Engineer, Full Stack Developer"
              },
              {
                "reason": "Well-designed APIs are often part of broader integration patterns across services and platforms.",
                "with_dim_id": "cloud-service-integration-patterns",
                "with_dim_name": null,
                "with_role": "Cloud Architect"
              }
            ],
            "tentative_id": "d_init_01"
          },
          {
            "description": "Covers how cloud services and workloads connect through APIs, events, shared services, and integration boundaries. This cluster is coherent because architects must define interaction patterns that preserve decoupling, security, and operability.",
            "exemplar_skills": [
              "Cloud Service Integration Patterns"
            ],
            "in_scope": "Skills, tools, and practices that belong under Cloud Service Integration Patterns for the target role, including items implied by the dimension rationale.",
            "name": "Cloud Service Integration Patterns",
            "out_of_scope": "Adjacent clusters explicitly not owned by Cloud Service Integration Patterns, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "cloud-service-integration-patterns"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "api-integration-and-data-fetching",
            "a_name": "API Integration and Data Fetching",
            "a_role": "__skill_focal__",
            "b_dim_id": "api-integration-and-data-fetching",
            "b_name": "API Integration and Data Fetching",
            "b_role": "Full Stack Developer",
            "into": "d_merge_01",
            "into_name": "API Integration, Request Orchestration, and Data Fetching",
            "merged_from": [
              "api-integration-and-data-fetching",
              "api-integration-and-data-fetching"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dimensions describe the same skill cluster: consuming APIs and fetching remote data via request/response calls. Dim A covers external/internal services, REST, GraphQL, pagination, retries, payload handling, and contract-aware data shaping; Dim B covers frontend-to-backend and third-party endpoints, request orchestration, error handling, pagination, and shaping remote data for UI use. The exemplar skills in A (REST APIs, GraphQL, HTTP request handling, response parsing) match B\u2019s described work, so this is a wording difference, not a distinct cluster.",
            "similarity": 0.8030311798440094
          }
        ],
        "placed": {
          "name": "APIs",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 4 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [
            "cloud-service-integration-patterns",
            "d_init_01"
          ],
          "skill_id": "apis"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "rest-apis",
            "amazon-api-gateway",
            "infura",
            "aws-lambda",
            "proxy-patterns",
            "workflow-automation",
            "agent-tooling",
            "aws-data-pipeline",
            "minio",
            "authentication"
          ],
          "requires": [],
          "skill_id": "apis",
          "suppress_on_match": []
        },
        "skill_id": "apis",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.91,
          "name": "APIs",
          "reasoning": "APIs are a communication interface standard between systems, so by the Protocol vs Standard rule they fit best as a Protocol rather than a tool or platform.",
          "skill_id": "apis",
          "subtype": "application_programming_interfaces",
          "type": "Protocol"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:42-\u003e4"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Testing and Validation Practices",
            "id": 221,
            "rationale": "Validating platform changes before release, including functional checks and regression verification. This cluster is coherent because ServiceNow developers must confirm workflows, scripts, and integrations behave as intended.",
            "slug": "testing-and-validation-practices",
            "source": "db"
          },
          "input_skill": "Testability",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "ServiceNOW Developer",
              "id": 24,
              "rationale": null,
              "role_archetype": null,
              "slug": "servicenow-developer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Testing and Validation Practices",
            "id": 221,
            "rationale": "Validating platform changes before release, including functional checks and regression verification. This cluster is coherent because ServiceNow developers must confirm workflows, scripts, and integrations behave as intended.",
            "slug": "testing-and-validation-practices",
            "source": "db"
          },
          "input_skill": "Testability",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "ServiceNOW Developer",
              "id": 24,
              "rationale": null,
              "role_archetype": null,
              "slug": "servicenow-developer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Testability",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Concept",
          "skill_nature": "CONCEPT",
          "sub_category": "software_testability_concept",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "\u201cTestability\u201d is a specific software engineering concept and is unlikely to be mistaken for a different catalog skill in typical job descriptions."
          },
          "context_keywords": {
            "context_keywords": [
              "unit tests",
              "integration tests",
              "test coverage",
              "mocking",
              "dependency injection",
              "assertions",
              "test harness",
              "automated testing",
              "regression testing",
              "test doubles",
              "stubs",
              "fixtures",
              "TDD",
              "CI/CD",
              "code coverage"
            ]
          },
          "maturity": {
            "confidence": 0.93,
            "maturity": "well_known",
            "reasoning": "Testability is a common requirement in software engineering JDs and interview rubrics, often paired with unit/integration testing, CI, and TDD; it\u2019s a standard quality attribute rather than a niche tool."
          },
          "skill_id": "testability",
          "vendor_license": {
            "confidence": 0.99,
            "license": null,
            "vendor": null,
            "year_introduced": null
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [
          {
            "a_dim_id": "testing-and-validation-practices",
            "a_name": "Testing and Validation Practices",
            "a_role": "__skill_focal__",
            "b_dim_id": "testing-and-validation-practices",
            "b_name": "Testing and Validation Practices",
            "b_role": "ServiceNOW Developer",
            "pair_kind": "cross_role",
            "reasoning": "Both dims share the label, but A is general software testing practice: test design, regression checks, harnesses, mocks/stubs, assertions, fixtures, and testability. B is ServiceNow-specific release validation: checking workflows, scripts, and integrations behave as intended. A\u2019s exemplars (unit/integration testing, mocking and stubbing) are about building test infrastructure; B\u2019s description is about platform change verification. Same umbrella word, different skill clusters.",
            "similarity": 0.6642010668546267
          }
        ],
        "locked_dimensions": [
          {
            "description": "Practices for verifying that software changes behave correctly before release, including test design, regression checks, and validation workflows. Testability belongs here because it describes how easily a system can be exercised and verified by tests.",
            "exemplar_skills": [
              "Testability",
              "unit testing",
              "integration testing",
              "regression testing",
              "test harness design",
              "mocking and stubbing",
              "validation checks"
            ],
            "in_scope": "Testability, unit tests, integration tests, regression testing, test harnesses, mocks and stubs, assertions, test fixtures, validation checks",
            "name": "Testing and Validation Practices",
            "out_of_scope": "Test reporting and defect triage, manual test evidence capture, performance benchmarking, production monitoring, these belong to quality reporting or operations rather than making code easier to test",
            "overlap_flags": [
              {
                "reason": "Testability can influence coverage and pass rates, but that dimension focuses on reporting outcomes rather than designing testable systems.",
                "with_dim_id": "test-reporting-and-quality-metrics",
                "with_dim_name": null,
                "with_role": "Automation Tester"
              }
            ],
            "tentative_id": "testing-and-validation-practices"
          },
          {
            "description": "Validating platform changes before release, including functional checks and regression verification. This cluster is coherent because ServiceNow developers must confirm workflows, scripts, and integrations behave as intended.",
            "exemplar_skills": [
              "Testing and Validation Practices"
            ],
            "in_scope": "Skills, tools, and practices that belong under Testing and Validation Practices for the target role, including items implied by the dimension rationale.",
            "name": "Testing and Validation Practices",
            "out_of_scope": "Adjacent clusters explicitly not owned by Testing and Validation Practices, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "testing-and-validation-practices"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "Testability",
          "placement_confidence": 0.92,
          "primary_dimension": "testing-and-validation-practices",
          "reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "testability"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "observability",
            "failure-analysis",
            "restore-testing",
            "eval-design",
            "evaluation",
            "evaluation-design",
            "code-review",
            "ci-cd",
            "devops",
            "agile"
          ],
          "requires": [],
          "skill_id": "testability",
          "suppress_on_match": []
        },
        "skill_id": "testability",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.97,
          "name": "Testability",
          "reasoning": "By the Concept vs Methodology rule, testability is a named knowledge unit about how easily software can be tested, not a process or tool.",
          "skill_id": "testability",
          "subtype": "software_testability_concept",
          "type": "Concept"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:41-\u003e2"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "MLOps",
          "alias_type": "CANONICAL",
          "id": 3600,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 7,
        "display_name": "MLOps",
        "id": 2643,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "METHODOLOGY",
        "slug": "mlops",
        "sub_category_id": 2156,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Inference Data Pipelines",
            "id": 59,
            "rationale": "Operational data movement for batch scoring, feature refresh, and inference-time data preparation. This is separate from model training because it focuses on getting the right data to the serving path reliably.",
            "slug": "inference-data-pipelines",
            "source": "db"
          },
          "input_skill": "MLOps",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "MLOps Engineer",
              "id": 5,
              "rationale": null,
              "role_archetype": null,
              "slug": "mlops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Model Serving Deployment and Runtime Packaging",
            "id": 52,
            "rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
            "slug": "model-serving-deployment-and-runtime-packaging",
            "source": "db"
          },
          "input_skill": "MLOps",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "MLOps Engineer",
              "id": 5,
              "rationale": null,
              "role_archetype": null,
              "slug": "mlops-engineer",
              "source": "db"
            },
            {
              "display_name": "Machine Learning Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "machine-learning-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "MLOps",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": [
    "PySpark",
    "Notebooks",
    "Big Data",
    "Data Modeling",
    "Data Pipelines",
    "Data Ingestion",
    "Stream Processing",
    "Queueing",
    "APIs",
    "Testability"
  ]
}

API 3 — final-role-output

{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 6,
    "rationale": "The primary skills indicate a strong focus on data processing, SQL, and Azure technologies, aligning well with a Data Engineer\u0027s responsibilities.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "Azure",
      "tag": "in_db"
    },
    {
      "skill": "Python",
      "tag": "in_db"
    },
    {
      "skill": "PySpark",
      "tag": "new"
    },
    {
      "skill": "SQL",
      "tag": "in_db"
    },
    {
      "skill": "MLflow",
      "tag": "in_db"
    },
    {
      "skill": "Azure Machine Learning",
      "tag": "in_db"
    },
    {
      "skill": "Azure Data Factory",
      "tag": "in_db"
    },
    {
      "skill": "Databricks",
      "tag": "in_db"
    },
    {
      "skill": "Notebooks",
      "tag": "new"
    },
    {
      "skill": "CI/CD",
      "tag": "in_db"
    },
    {
      "skill": "Big Data",
      "tag": "new"
    },
    {
      "skill": "Data Modeling",
      "tag": "new"
    },
    {
      "skill": "Data Pipelines",
      "tag": "new"
    },
    {
      "skill": "Data Ingestion",
      "tag": "new"
    },
    {
      "skill": "Stream Processing",
      "tag": "new"
    },
    {
      "skill": "Queueing",
      "tag": "new"
    },
    {
      "skill": "APIs",
      "tag": "new"
    },
    {
      "skill": "Testability",
      "tag": "new"
    },
    {
      "skill": "MLOps",
      "tag": "in_db"
    }
  ],
  "persistence": {
    "items": [
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Platform Operations",
          "id": 26,
          "rationale": "Uses cloud provider services to support delivery and runtime environments. The focus is on consumer-level operation of cloud services rather than deep cloud architecture ownership.",
          "slug": "cloud-platform-operations",
          "source": "db"
        },
        "dimension_id": 26,
        "input_skill": "Azure",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "DevOps Engineer",
            "id": 1,
            "rationale": null,
            "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
            "slug": "devops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 164,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Security Platforms",
          "id": 332,
          "rationale": "Cloud-native security products used to assess posture, detect misconfigurations, and monitor workloads across AWS, Azure, and GCP. This is a distinct product family because the role often works across multiple CNAPP/CSPM/CWPP offerings and cloud-native detectors.",
          "slug": "cloud-security-platforms",
          "source": "db"
        },
        "dimension_id": 332,
        "input_skill": "Azure",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cybersecurity Engineer",
            "id": 9,
            "rationale": null,
            "role_archetype": null,
            "slug": "cybersecurity-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 164,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Analytical Programming Languages",
          "id": 82,
          "rationale": "Languages used to clean, transform, analyze, and prototype models in notebooks and scripts. This is the core coding surface for expressing statistical logic and data manipulation in a reproducible way.",
          "slug": "analytical-programming-languages",
          "source": "db"
        },
        "dimension_id": 82,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Analyst",
            "id": 20,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-analyst",
            "source": "db"
          },
          {
            "display_name": "Data Scientist",
            "id": 7,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-scientist",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Automation Scripting and CLI",
          "id": 48,
          "rationale": "Uses scripts and command-line tooling to execute repeatable Azure operations and reduce manual work. This is a practical cluster because the role frequently automates provisioning, checks, and remediation tasks.",
          "slug": "automation-scripting-and-cli",
          "source": "db"
        },
        "dimension_id": 48,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Azure Cloud Engineer",
            "id": 4,
            "rationale": null,
            "role_archetype": null,
            "slug": "azure-cloud-engineer",
            "source": "db"
          },
          {
            "display_name": "Cloud Engineer",
            "id": 18,
            "rationale": null,
            "role_archetype": null,
            "slug": "cloud-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Automation and Scripting for Operations",
          "id": 361,
          "rationale": "Scripts and lightweight automation used to execute repetitive virtualization tasks and enforce operational consistency. This is the practical glue that reduces manual host and VM administration.",
          "slug": "automation-and-scripting-for-operations",
          "source": "db"
        },
        "dimension_id": 361,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Virtualization Engineer",
            "id": 26,
            "rationale": null,
            "role_archetype": null,
            "slug": "virtualization-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Network Automation and Scripting",
          "id": 285,
          "rationale": "Covers scripts and automation used to configure, validate, and audit network devices and services. This cluster is coherent because repeatable network operations increasingly depend on programmatic changes and checks.",
          "slug": "network-automation-and-scripting",
          "source": "db"
        },
        "dimension_id": 285,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Network Engineer",
            "id": 21,
            "rationale": null,
            "role_archetype": null,
            "slug": "network-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for AI Workflows",
          "id": 261,
          "rationale": "Languages used to implement AI feature logic, orchestration, and response handling inside product code. This is the core coding surface for turning prompts and model calls into reliable application behavior.",
          "slug": "programming-languages-for-ai-workflows",
          "source": "db"
        },
        "dimension_id": 261,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "AI Engineer",
            "id": 12,
            "rationale": null,
            "role_archetype": null,
            "slug": "ai-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Backend Systems",
          "id": 140,
          "rationale": "Languages used to implement server-side business logic, request handlers, workers, and service integrations. This is the core coding surface for backend feature delivery and maintenance.",
          "slug": "programming-languages-for-backend-systems",
          "source": "db"
        },
        "dimension_id": 140,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Backend Engineer",
            "id": 14,
            "rationale": null,
            "role_archetype": null,
            "slug": "backend-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Data Work",
          "id": 67,
          "rationale": "Languages used to implement data pipelines, transformations, and operational utilities. This is the code layer for expressing extraction, parsing, validation, and orchestration logic in data engineering workflows.",
          "slug": "programming-languages-for-data-work",
          "source": "db"
        },
        "dimension_id": 67,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for ML Systems",
          "id": 113,
          "rationale": "Languages used to implement model integration code, inference services, and feature-processing logic. This is the core coding surface for turning trained models into product-facing software components.",
          "slug": "programming-languages-for-ml-systems",
          "source": "db"
        },
        "dimension_id": 113,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Machine Learning Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "machine-learning-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Security Work",
          "id": 328,
          "rationale": "Languages used to automate security tasks, write detection logic, and build analysis or remediation tooling. This is the core coding surface for a cybersecurity engineer across scripts, queries, and small utilities.",
          "slug": "programming-languages-for-security-work",
          "source": "db"
        },
        "dimension_id": 328,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cybersecurity Engineer",
            "id": 9,
            "rationale": null,
            "role_archetype": null,
            "slug": "cybersecurity-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Test Automation",
          "id": 193,
          "rationale": "Languages used to implement automated checks, helper utilities, and test harness code. This is the core coding surface for turning test ideas into maintainable automation.",
          "slug": "programming-languages-for-test-automation",
          "source": "db"
        },
        "dimension_id": 193,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Automation Tester",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "automation-tester",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Security Automation and Scripting",
          "id": 258,
          "rationale": "Automating repeatable security checks, enrichment, and remediation workflows. This cluster is coherent because the role often needs lightweight automation to scale analysis and response.",
          "slug": "security-automation-and-scripting",
          "source": "db"
        },
        "dimension_id": 258,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cybersecurity Engineer",
            "id": 9,
            "rationale": null,
            "role_archetype": null,
            "slug": "cybersecurity-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Relational Data Modeling",
          "id": 71,
          "rationale": "Designing tables, relationships, constraints, and transactional data shapes for operational backend systems. This cluster is coherent because backend services frequently own the canonical application data model.",
          "slug": "relational-data-modeling",
          "source": "db"
        },
        "dimension_id": 71,
        "input_skill": "SQL",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Backend Engineer",
            "id": 14,
            "rationale": null,
            "role_archetype": null,
            "slug": "backend-engineer",
            "source": "db"
          },
          {
            "display_name": "Data Engineer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2601,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "SQL",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2601,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Model Serving Deployment and Runtime Packaging",
          "id": 52,
          "rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
          "slug": "model-serving-deployment-and-runtime-packaging",
          "source": "db"
        },
        "dimension_id": 52,
        "input_skill": "MLflow",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "MLOps Engineer",
            "id": 5,
            "rationale": null,
            "role_archetype": null,
            "slug": "mlops-engineer",
            "source": "db"
          },
          {
            "display_name": "Machine Learning Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "machine-learning-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2640,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Project Delivery and Coordination",
          "id": 366,
          "rationale": "Coordination practices for organizing work, tracking progress, and aligning stakeholders across a delivery effort. Agile fits here when used as a team execution framework for managing scope, cadence, and collaboration.",
          "slug": "d_init_02",
          "source": "db"
        },
        "dimension_id": 366,
        "input_skill": "MLflow",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2640,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "MLflow",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2640,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud ML Platform Operations",
          "id": 65,
          "rationale": "Consumer-level operation of managed ML services and cloud resources used to train and serve models. This covers the cloud platform surface that MLOps engineers use without owning the underlying cloud platform itself.",
          "slug": "cloud-ml-platform-operations",
          "source": "db"
        },
        "dimension_id": 65,
        "input_skill": "Azure Machine Learning",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "MLOps Engineer",
            "id": 5,
            "rationale": null,
            "role_archetype": null,
            "slug": "mlops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 385,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Data Platform Services",
          "id": 81,
          "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
          "slug": "cloud-data-platform-services",
          "source": "db"
        },
        "dimension_id": 81,
        "input_skill": "Azure Data Factory",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 467,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud ML Platform Operations",
          "id": 65,
          "rationale": "Consumer-level operation of managed ML services and cloud resources used to train and serve models. This covers the cloud platform surface that MLOps engineers use without owning the underlying cloud platform itself.",
          "slug": "cloud-ml-platform-operations",
          "source": "db"
        },
        "dimension_id": 65,
        "input_skill": "Databricks",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "MLOps Engineer",
            "id": 5,
            "rationale": null,
            "role_archetype": null,
            "slug": "mlops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 386,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "CI/CD",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2579,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Inference Data Pipelines",
          "id": 59,
          "rationale": "Operational data movement for batch scoring, feature refresh, and inference-time data preparation. This is separate from model training because it focuses on getting the right data to the serving path reliably.",
          "slug": "inference-data-pipelines",
          "source": "db"
        },
        "dimension_id": 59,
        "input_skill": "MLOps",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "MLOps Engineer",
            "id": 5,
            "rationale": null,
            "role_archetype": null,
            "slug": "mlops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2643,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Model Serving Deployment and Runtime Packaging",
          "id": 52,
          "rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
          "slug": "model-serving-deployment-and-runtime-packaging",
          "source": "db"
        },
        "dimension_id": 52,
        "input_skill": "MLOps",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "MLOps Engineer",
            "id": 5,
            "rationale": null,
            "role_archetype": null,
            "slug": "mlops-engineer",
            "source": "db"
          },
          {
            "display_name": "Machine Learning Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "machine-learning-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2643,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Analytical Programming and Notebook Languages",
          "id": null,
          "rationale": "Languages and notebook/script-based coding used to clean, transform, analyze, and prototype data workflows and models. Includes Python, pandas, SQL, PySpark, notebook scripting, dataframe manipulation, exploratory analysis, ETL/data transformation logic, and other reproducible analytical code.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 82,
        "input_skill": "PySpark",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2684,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "PySpark",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2684,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Analytical Programming and Notebook-Based Data Analysis",
          "id": null,
          "rationale": "Languages and notebook-friendly coding used to clean, transform, analyze, and prototype data and model workflows. This includes Python, R, SQL, and Scala used in notebooks or scripts for data wrangling, exploratory data analysis, statistical logic, feature engineering, and reproducible prototyping. It excludes production orchestration and scheduling, dashboard/report authoring, model deployment packaging, database administration, and UI development.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 82,
        "input_skill": "Notebooks",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2685,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "Big Data",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2686,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Messaging and Event Streaming",
          "id": 146,
          "rationale": "Asynchronous communication patterns and systems for decoupled service interaction and background processing. This is a coherent backend cluster because many server-side workflows depend on queues, topics, and event streams.",
          "slug": "messaging-and-event-streaming",
          "source": "db"
        },
        "dimension_id": 146,
        "input_skill": "Big Data",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Backend Engineer",
            "id": 14,
            "rationale": null,
            "role_archetype": null,
            "slug": "backend-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2686,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "Data Modeling",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2687,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Inference Data Pipelines for Serving and Batch Scoring",
          "id": null,
          "rationale": "Operational data movement that prepares and delivers timely, reliable data to production inference systems. Includes batch scoring inputs, feature refresh jobs, inference-time preprocessing, scheduled extracts, data validation for serving, and online/offline feature synchronization. Excludes training dataset curation, model training workflows, experimentation-focused feature engineering, model evaluation, and serving infrastructure/routing.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 59,
        "input_skill": "Data Pipelines",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2688,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "Data Pipelines",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2688,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Asynchronous Messaging and Event Streaming",
          "id": null,
          "rationale": "Covers asynchronous communication and data movement through queues, topics, streams, event buses, and pub/sub systems for decoupled processing, background jobs, and event-driven integration. Includes continuous or event-driven data ingestion and change data capture pipelines, but excludes batch ETL orchestration, warehouse modeling, query optimization, model training data prep, and direct application API calls.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 146,
        "input_skill": "Data Ingestion",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2689,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "Data Ingestion",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2689,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Messaging and Event Streaming",
          "id": 146,
          "rationale": "Asynchronous communication patterns and systems for decoupled service interaction and background processing. This is a coherent backend cluster because many server-side workflows depend on queues, topics, and event streams.",
          "slug": "messaging-and-event-streaming",
          "source": "db"
        },
        "dimension_id": 146,
        "input_skill": "Stream Processing",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Backend Engineer",
            "id": 14,
            "rationale": null,
            "role_archetype": null,
            "slug": "backend-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2690,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Messaging, Queueing, and Event Streaming",
          "id": null,
          "rationale": "Asynchronous communication patterns and systems that decouple producers and consumers, buffer and route work items, and support background processing and service-to-service integration. Includes queueing, message queues, pub/sub, brokers, topics, consumer groups, producers/consumers, dead-letter queues, retry handling, backpressure, and event streaming platforms such as Kafka, RabbitMQ, SQS, and Azure Service Bus.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 146,
        "input_skill": "Queueing",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2691,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "API Integration, Request Orchestration, and Data Fetching",
          "id": null,
          "rationale": "Connecting applications to internal or external services through request/response APIs. This includes consuming REST and GraphQL endpoints, orchestrating requests, handling payloads and response parsing, pagination, retries, error handling, and shaping remote data for downstream or UI consumption.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 9,
        "input_skill": "APIs",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2692,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Service Integration Patterns",
          "id": 188,
          "rationale": "Covers how cloud services and workloads connect through APIs, events, shared services, and integration boundaries. This cluster is coherent because architects must define interaction patterns that preserve decoupling, security, and operability.",
          "slug": "cloud-service-integration-patterns",
          "source": "db"
        },
        "dimension_id": 188,
        "input_skill": "APIs",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cloud Architect",
            "id": 11,
            "rationale": null,
            "role_archetype": null,
            "slug": "cloud-architect",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2692,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "APIs",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2692,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 6,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Testing and Validation Practices",
          "id": 221,
          "rationale": "Validating platform changes before release, including functional checks and regression verification. This cluster is coherent because ServiceNow developers must confirm workflows, scripts, and integrations behave as intended.",
          "slug": "testing-and-validation-practices",
          "source": "db"
        },
        "dimension_id": 221,
        "input_skill": "Testability",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "ServiceNOW Developer",
            "id": 24,
            "rationale": null,
            "role_archetype": null,
            "slug": "servicenow-developer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2693,
        "skill_tag": "in_db",
        "skipped_reason": null
      }
    ],
    "new_skills_created": 10,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 16,
    "skipped": 0
  },
  "planner_output": null,
  "run_id": "265899c9-6b42-43cb-a0f8-64ac64ac5a98"
}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…