← Back to history

Pipeline run

20755499-04f6-440f-80a9-bb023fddc1ff

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
Nature of work no kras
Vague JD — no KRAs present to derive a specific nature of work.
Tech stack maturity
Mainstream Modern
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
3.20 / 5
Title match
Has AI skill
AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
Frameworks (×2): Bedrock, Pinecone
Models / concepts (×3): RAG, LLM, LLMs, MLOps, AI, ML, AI/ML, GenAI, Generative AI, Machine Learning
Evidence — skills matched in JD (27)
AWS AI/ML Amazon SageMaker Amazon Bedrock AWS Lambda ECS EKS EC2 AWS Glue Amazon Athena Redshift AWS Data Pipeline S3 Kinesis Amazon API Gateway Python TensorFlow PyTorch Scikit-learn GitHub Actions Airflow Terraform Docker Kubernetes Pinecone +2
Skill cluster (9 dimension groups, role-scoped)
Cloud Platforms
AWS AWS Lambda Redshift S3
Container Orchestration Platforms
ECS Kubernetes
ML Frameworks and Libraries
PyTorch FAISS
CI/CD Pipeline Platforms
GitHub Actions
Containerization and Image Builds
Docker
Infrastructure as Code
Terraform
Python Programming
Python
Vector Databases
Pinecone
Cross-cutting / unaligned
AI/ML Amazon SageMaker Amazon Bedrock EKS EC2 AWS Glue Amazon Athena AWS Data Pipeline Kinesis Amazon API Gateway TensorFlow Scikit-learn Airflow OpenSearch
Status: completed Created: 2026-05-12T06:39:43.003239Z Updated: 2026-05-12T06:43:04.485872Z API 3 duration: 73061 ms
Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Machine Learning Engineer

slug: machine-learning-engineer · id: 10 · source: db

The primary skills include a strong focus on AWS and AI/ML technologies, which aligns well with the role of a Machine Learning Engineer.

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

13
New skills
21
Skill↔dim saved
0
Role↔dim saved
0
Skipped

Job description

About the job
At Capgemini Invent, we believe difference drives change. As inventive transformation consultants, we blend our strategic, creative and scientific capabilities, collaborating closely with clients to deliver cutting-edge solutions. Join us to drive transformation tailored to our client's challenges of today and tomorrow. Informed and validated by science and data. Superpowered by creativity and design. All underpinned by technology created with purpose.

Your Role

We are seeking a highly skilled Solution Architect – AWS Cloud & AI/ML to design, architect, and implement advanced AI/ML and generative AI solutions on the AWS platform. The ideal candidate will have deep expertise in large-scale distributed systems, modern AI/ML architectures, LLMs, data engineering pipelines, and AWS-native services. This role involves partnering with cross-functional teams, understanding business challenges, and crafting end‑to‑end scalable, secure, and cost‑optimized solutions

Architect and deliver end‑to‑end AI/ML solutions on AWS, covering data ingestion, training, inference, orchestration, monitoring, and governance.
Design and integrate LLM‑based and Generative AI solutions, including retrieval-augmented generation (RAG), prompt workflows, and production deployment strategies.
Develop feature engineering strategies and scalable data pipelines to support ML training and real-time inference workloads.
Lead technical discussions and provide guidance on AI/ML best practices, model lifecycle, optimization, MLOps, and model governance.
Design highly scalable, secure, and cost-efficient architectures using:
Amazon SageMaker (Training Jobs, Inference Endpoints, Pipelines, Feature Store, Model Registry)
Amazon Bedrock (Foundation models, Generative AI orchestration, prompt management)
AWS Lambda, ECS, EKS, EC2 for building and orchestrating distributed AI workloads.
Architect and optimize data engineering platforms using:
AWS Glue, Amazon Athena, Redshift, AWS Data Pipeline, S3, Kinesis, and related services.
Build secure, production-grade API services for AI model inference using Amazon API Gateway and AWS compute services.
Your Profile

8+ years of experience in cloud architecture, with at least 5 years in AWS.
Strong expertise in:
Machine Learning, MLOps, and GenAI solution design.
Amazon SageMaker (end‑to‑end ML lifecycle).
Amazon Bedrock and modern LLM architectures.
Data engineering with Glue, Redshift, Athena, and pipeline orchestration.
Experience containerizing and scaling AI workloads on Lambda/ECS/EKS.
Strong coding experience in Python and familiarity with ML frameworks (TensorFlow, PyTorch, Scikit‑learn).
Deep understanding of security, networking, IAM, and compliance best practices for AI systems.
Excellent communication, design thinking, and stakeholder management skills.
AWS certifications (e.g., AWS Certified Solutions Architect – Professional, Machine Learning – Specialty).
Experience with vector databases (e.g., Pinecone, OpenSearch, FAISS).
Experience building RAG pipelines, multi‑agent orchestration frameworks, or custom LLM fine‑tuning workflows.
Familiarity with DevOps/MLOps tools: GitHub Actions, Airflow, Terraform, Docker, Kubernetes
Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem.

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

AWS Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: AWS id=163 · aws

Aliases — catalog

  • Compaction (CANONICAL) primary

Context tags (catalog)

Bloom filter LSM tree SSTable checkpointing defragmentation garbage collection leveling log-structured merge policy segment merge storage engine tiered compaction tombstones vacuum write amplification

Stored enrichment (catalog DB)

Category
Concept
Sub-category
Storage Maintenance Concept
Confidence
0.74
Version strategy
NOT_APPLICABLE

Maturity reasoning: Compaction is a standard storage-maintenance concept in widely used systems like LSM databases and Kafka; it appears in many JDs for Cassandra, RocksDB, and Kafka ops roles, indicating broad market demand.

Skill profile (library / DB)

Skill nature
PLATFORM
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
13
Sub-category id
161
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Cloud Platform Operations Catalog dimension db id 26

    Library dimension (catalog)

    Roles linked in library: DevOps Engineer

  • Cloud Security Platforms Catalog dimension db id 332

    Library dimension (catalog)

    Roles linked in library: Cybersecurity Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Platform Operations
cloud-platform-operations
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Cloud Security Platforms
cloud-security-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
AI/ML Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.93

AI/ML appears in a broad share of software and data job postings, with major vendors (AWS, Google, Microsoft) offering mainstream ML platforms and tooling; it’s now a common hiring-pipeline requirement rather than a niche specialty.

Vendor & license

(0.99)

Context keywords
TensorFlow PyTorch scikit-learn deep learning neural networks NLP computer vision model training feature engineering hyperparameter tuning classification regression clustering reinforcement learning MLOps
Ambiguity low

AI/ML is a common combined domain label in JDs and usually clearly means artificial intelligence and machine learning, not a different catalog skill.

Versioning

Not versioned

Type assignment

Domain ·artificial_intelligence_machine_learning confidence 0.98

AI/ML is a vertical body of knowledge and problem-space rather than a tool, framework, or methodology, so it fits the Domain type.

Derived legacy fields
Category
Domain
Sub-category
artificial_intelligence_machine_learning
Skill nature
CONCEPT
Volatility
STABLE
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Applied Machine Learning Tooling and Frameworks Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

  • AI Service Integration and Orchestration Patterns Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

  • AI Inference Cost, Latency, and Throughput Optimization Catalog dimension db id 260

    Library dimension (catalog)

    Roles linked in library: AI Engineer

Locked dimensions (v3 placement)

  • Applied Machine Learning Tooling and Frameworks

    Pipeline tentative id

    Libraries and frameworks used to prototype, build, train, tune, and compare machine learning models in practice. Includes hands-on AI/ML tooling such as scikit-learn, XGBoost, LightGBM, TensorFlow, and PyTorch, along with model training, feature engineering, hyperparameter tuning, and evaluation workflows. Excludes model serving, deployment packaging, online inference, registry/version promotion, distributed stream processing, and cloud infrastructure provisioning.

  • AI Service Integration and Orchestration Patterns

    Pipeline tentative id

    Patterns for structuring and embedding AI/ML capabilities within products and services, including deciding where AI logic lives and how it interacts with application flows. Covers model-backed APIs, retrieval-augmented generation, agent and workflow orchestration, batch scoring services, online inference integration, feature pipelines, and placement across handlers, workers, gateways, and service boundaries.

  • AI Inference Cost, Latency, and Throughput Optimization

    Pipeline tentative id

    Improving the runtime efficiency of AI/ML-powered features by reducing inference cost and latency while increasing throughput and preserving user experience. Includes token budgeting, prompt compression, batching, caching, quantization, pruning, model selection, async inference, warm starts, streaming UX, timeout tuning, concurrency control, GPU utilization, and profiling. Excludes model training, feature engineering, registry/versioning, infrastructure autoscaling, serving capacity planning, ge

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Applied Machine Learning Tooling and Frameworks
d_merge_01
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
AI Service Integration and Orchestration Patterns
d_merge_02
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
AI Inference Cost, Latency, and Throughput Optimization
ai-inference-cost-latency-and-throughput-optimization
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
AI Inference Cost, Latency, and Throughput Optimization
d_merge_03
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Amazon SageMaker Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.90

Commonly listed in ML/DS job descriptions and AWS’s managed ML platform is broadly adopted for training, deployment, and MLOps across enterprises.

Vendor & license

Amazon Web Services ·proprietary ·since 2017 (0.98)

Context keywords
MLOps notebooks training jobs hyperparameter tuning model registry endpoint deployment batch transform feature store pipelines ground truth AutoML S3 IAM ECR CloudWatch
Ambiguity low

Amazon SageMaker is a specific AWS ML platform name and is usually unambiguous in job descriptions; it is unlikely to be mistaken for a different catalog skill.

Versioning

Not versioned

Type assignment

Platform ·ml_platform confidence 0.98

By the Platform vs Tool rule, Amazon SageMaker is a hosted multi-tenant AWS environment with APIs and managed machine-learning capabilities, so it is a Platform rather than a Tool or a single Service in this typology.

Derived legacy fields
Category
Platform
Sub-category
ml_platform
Skill nature
PLATFORM
Volatility
STABLE
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Managed ML Platform Workflows Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

  • Managed Model Hosting and Endpoints Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

  • Model Serving Runtime Packaging Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

  • Model Serving Frameworks and Platforms Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

Locked dimensions (v3 placement)

  • Managed ML Platform Workflows

    Pipeline tentative id

    Cloud ML platforms for building and operating models end-to-end, including notebooks, experiments, managed training jobs, and pipeline/studio workflows. Examples: SageMaker Studio, SageMaker notebooks, SageMaker Pipelines, managed training jobs.

  • Managed Model Hosting and Endpoints

    Pipeline tentative id

    Cloud-managed services for deploying trained models as online or batch inference endpoints, including endpoint provisioning, batch transform, and rollout coordination. Examples: SageMaker endpoints, SageMaker batch transform.

  • Model Serving Runtime Packaging

    Pipeline tentative id

    Packaging trained models and model servers for deployment, including containers, base images, runtime dependencies, entrypoints, GPU images, and image scanning. Examples: Docker images for model services.

  • Model Serving Frameworks and Platforms

    Pipeline tentative id

    Serving frameworks and platforms used to run deployed models in online, batch, or streaming environments. Examples: TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, Seldon Core.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Managed ML Platform Workflows
d_split_01_01
New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role)
Managed Model Hosting and Endpoints
d_split_01_02
New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role)
Model Serving Runtime Packaging
d_split_01_03
New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role)
Model Serving Frameworks and Platforms
d_split_01_04
New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role)
Amazon Bedrock Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity emerging confidence 0.86

Appears increasingly in cloud/ML job descriptions and AWS partner materials, but JD volume is still far below core AWS services like S3 or Lambda.

Vendor & license

Amazon Web Services ·proprietary ·since 2023 (0.98)

Context keywords
foundation models FM prompt engineering RAG vector database embeddings guardrails Agents for Amazon Bedrock Knowledge Bases model invocation fine-tuning inference LLM LangChain Anthropic Claude
Ambiguity low

Amazon Bedrock is a specific AWS managed AI model service with a distinctive name; typical JDs mentioning it are unlikely to mean a different catalog skill.

Versioning

Not versioned

Type assignment

Service ·managed_ai_model_service confidence 0.97

By the Platform vs Service rule, Amazon Bedrock is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.

Derived legacy fields
Category
Service
Sub-category
managed_ai_model_service
Skill nature
CLOUD_SERVICE
Volatility
EMERGING
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Cloud Model Runtime Services Catalog dimension db id 121

    Library dimension (catalog)

    Roles linked in library: Machine Learning Engineer

  • Cloud Model Runtime Services Catalog dimension db id 121

    Library dimension (catalog)

    Roles linked in library: Machine Learning Engineer

Locked dimensions (v3 placement)

  • Cloud Model Runtime Services

    Reuses catalog slug

    Consumer-facing managed services used to run, invoke, and integrate foundation models and related AI capabilities in cloud applications. Amazon Bedrock belongs here because it provides hosted model access, orchestration features, and runtime APIs for generative AI workloads.

  • Cloud Model Runtime Services

    Reuses catalog slug

    Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Model Runtime Services
cloud-model-runtime-services
New skill saved · Existing dimension (library) · Role↔dimension saved
AWS Lambda Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.97

Broadly adopted serverless compute; AWS Lambda appears in many cloud/backend job descriptions and is a standard AWS offering with strong ecosystem support.

Vendor & license

Amazon Web Services ·proprietary ·since 2014 (0.99)

Context keywords
serverless event-driven API Gateway CloudWatch IAM role S3 trigger SNS SQS Step Functions DynamoDB Lambda layers cold start Node.js Python VPC
Ambiguity low

AWS Lambda is a specific AWS serverless compute service with a distinctive full name; in typical JDs it is unlikely to be confused with unrelated skills in the catalog.

Versioning

Not versioned

Type assignment

Service ·serverless_compute_service confidence 0.99

By the Service vs Platform rule, AWS Lambda is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.

Derived legacy fields
Category
Service
Sub-category
serverless_compute_service
Skill nature
CLOUD_SERVICE
Volatility
STABLE
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Managed Cloud Data Platform Services Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

Locked dimensions (v3 placement)

  • Managed Cloud Data Platform Services

    Pipeline tentative id

    Managed cloud services used to run data engineering and related application workloads, including serverless compute, workflow orchestration, storage, event triggers, and adjacent security/platform primitives. This covers services such as AWS Lambda, AWS Step Functions, AWS Glue, Amazon S3 event triggers, and other managed services used to build and operate pipelines and data platforms.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Managed Cloud Data Platform Services
d_merge_01
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
ECS Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.78

ECS appears in many game-engine and engine-architecture job descriptions, especially in Unity/DOTS and Rust/C++ gameplay systems, and has strong GitHub/library activity; it’s a common modern architecture pattern rather than a niche tool.

Vendor & license

(0.99)

Context keywords
entity-component-system game engine gameplay architecture component-based architecture systems entities components data-oriented design Unity Unreal Engine rendering pipeline physics engine scheduling serialization scene graph
Ambiguity flagged

Could be confused with: amazon_ecs, elastic_container_service

“ECS” is a common acronym and in JDs often means Amazon Elastic Container Service; it can also be read as the generic entity-component-system architecture concept.

Versioning

Not versioned

Type assignment

Concept ·entity_component_system confidence 0.93

ECS is fundamentally the Entity-Component-System design pattern, so by the Architecture vs Concept rule it is best typed as a Concept rather than a tool or platform.

Derived legacy fields
Category
Concept
Sub-category
entity_component_system
Skill nature
CONCEPT
Volatility
STABLE
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Version Control Systems Catalog dimension db id 365

    Library dimension (catalog)

Locked dimensions (v3 placement)

  • Container Orchestration Services

    Pipeline tentative id

    Managed services for running and scaling containerized workloads. ECS belongs here because it is an orchestration platform for scheduling tasks, managing services, and coordinating container runtime operations.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Version Control Systems
d_init_01
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
EKS Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: EKS id=725 · eks

Aliases — catalog

  • Ansible playbooks (CANONICAL) primary

Context tags (catalog)

Jinja2 YAML ansible-galaxy ansible-vault collections group_vars handlers host_vars idempotent inventory playbook roles tasks templates vars

Stored enrichment (catalog DB)

Category
Format
Sub-category
Automation Playbook Format
Vendor
Red Hat
License
gpl_v3
Year introduced
2012
Confidence
0.88
Version strategy
NOT_APPLICABLE

Maturity reasoning: Common in DevOps JDs and widely used for infrastructure automation; Red Hat/Ansible remains a standard tool in hiring pipelines, with playbooks the core format.

Skill profile (library / DB)

Skill nature
CLOUD_SERVICE
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
14
Sub-category id
251
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Cloud Model Runtime Services Catalog dimension db id 121

    Library dimension (catalog)

    Roles linked in library: Machine Learning Engineer

  • Orchestration Platforms Catalog dimension db id 25

    Library dimension (catalog)

    Roles linked in library: Cloud Engineer, DevOps Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Model Runtime Services
cloud-model-runtime-services
Existing dimension (library) · Role↔dimension saved
Orchestration Platforms
orchestration-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
EC2 Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: EC2 id=1773 · ec2

Aliases — from this run (catalog unavailable)

  • EC2 (CANONICAL) primary

Skill profile (library / DB)

Skill nature
CLOUD_SERVICE
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
14
Sub-category id
1544
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Cloud Provider Core Services Catalog dimension db id 290

    Library dimension (catalog)

    Roles linked in library: Cloud Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Provider Core Services
cloud-provider-core-services
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
AWS Glue Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: AWS Glue id=466 · aws-glue

Aliases — from this run (catalog unavailable)

  • AWS Glue (CANONICAL) primary

Skill profile (library / DB)

Skill nature
CLOUD_SERVICE
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
14
Sub-category id
385
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Cloud Data Platform Services Catalog dimension db id 81

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Data Platform Services
cloud-data-platform-services
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Amazon Athena Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.91

Commonly listed in cloud/data analytics JDs and AWS’s own docs position Athena as a standard serverless SQL query service for S3 data lakes, indicating broad market adoption.

Vendor & license

Amazon Web Services ·proprietary ·since 2016 (0.99)

Context keywords
AWS Glue S3 Presto Trino SQL CTAS partitioning Parquet ORC Glue Data Catalog Athena Federated Query IAM Lake Formation JDBC serverless analytics
Ambiguity low

Amazon Athena is a specific AWS query service with a distinctive full name; in typical JDs it is unlikely to be confused with another catalog skill.

Versioning

Not versioned

Type assignment

Service ·query_service confidence 0.98

By the Platform vs Tool and Service vs Platform rules, Amazon Athena is a managed capability inside AWS rather than software you run yourself, so it is a Service.

Derived legacy fields
Category
Service
Sub-category
query_service
Skill nature
CLOUD_SERVICE
Volatility
STABLE
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Cloud Analytics Query Services Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

  • Cloud Data Pipeline Runtime Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

  • Cloud Data Platform Storage Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

  • Cloud Data Platform Security and Networking Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

Locked dimensions (v3 placement)

  • Cloud Analytics Query Services

    Pipeline tentative id

    Managed cloud services for querying and processing analytical data, including serverless SQL analytics, data lake querying, and federated query services such as Athena, Glue, Redshift Spectrum, and EMR-based analytics.

  • Cloud Data Pipeline Runtime

    Pipeline tentative id

    Managed cloud compute and orchestration services used to run data engineering jobs, transformations, and workflows.

  • Cloud Data Platform Storage

    Pipeline tentative id

    Cloud storage and data access services used as the persistence layer for data engineering platforms and pipelines.

  • Cloud Data Platform Security and Networking

    Pipeline tentative id

    Identity, access, secrets, and networking primitives used to support cloud data platforms and pipelines.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Analytics Query Services
d_split_01_01
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Cloud Data Pipeline Runtime
d_split_01_02
New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role)
Cloud Data Platform Storage
d_split_01_03
New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role)
Cloud Data Platform Security and Networking
d_split_01_04
New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role)
Redshift Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Redshift id=2570 · redshift

Aliases — from this run (catalog unavailable)

  • Redshift (CANONICAL)

Skill profile (library / DB)

Skill nature
PLATFORM
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
13
Sub-category id
2098
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Data Warehousing Platforms Catalog dimension db id 72

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Data Warehousing Platforms
data-warehousing-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
AWS Data Pipeline Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity deprecated confidence 0.96

AWS announced AWS Data Pipeline is in maintenance mode and recommends newer services like Glue/Step Functions; recent JDs rarely list it compared with modern AWS data tooling.

Vendor & license

Amazon Web Services ·proprietary ·since 2012 (0.98)

Context keywords
ETL S3 Redshift EMR Glue RDS EC2 Lambda Step Functions Kinesis Athena Data Lake Apache Spark cron orchestration
Ambiguity low

AWS Data Pipeline is a specific AWS service name and is unlikely to be mistaken for another catalog skill in a typical JD.

Versioning

Not versioned

Type assignment

Service ·data_pipeline_service confidence 0.97

By the Service vs Platform rule, AWS Data Pipeline is a specific managed capability inside AWS rather than the AWS platform itself.

Derived legacy fields
Category
Service
Sub-category
data_pipeline_service
Skill nature
CLOUD_SERVICE
Volatility
DEPRECATED
Typical lifespan
SHORT_LIVED
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Cloud Data Platform Services Catalog dimension db id 81

    Library dimension (catalog)

    Roles linked in library: Data Engineer

  • Cloud Data Platform Services Catalog dimension db id 81

    Library dimension (catalog)

    Roles linked in library: Data Engineer

Locked dimensions (v3 placement)

  • Cloud Data Platform Services

    Reuses catalog slug

    Managed cloud services used to build and operate data engineering workloads. AWS Data Pipeline fits here because it is an AWS service for orchestrating data movement and scheduled processing across storage and compute services.

  • Cloud Data Platform Services

    Reuses catalog slug

    Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Data Platform Services
cloud-data-platform-services
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
S3 Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.98

Amazon S3 is a default cloud storage requirement in many job descriptions and is a core AWS service with broad ecosystem support; no sunset or replacement signal exists.

Vendor & license

Amazon Web Services ·proprietary ·since 2006 (0.99)

Context keywords
bucket object storage prefix versioning lifecycle policy bucket policy IAM replication multipart upload presigned URL SSE-S3 SSE-KMS event notifications static website hosting storage class
Ambiguity flagged

Could be confused with: s4

"S3" is a short acronym that in JDs can mean AWS S3, but could also be read as a generic storage tier/label or other S3-named products in the catalog. A reasonable extractor may confuse it with adjacent cloud storage skills.

Versioning

Not versioned

Type assignment

Platform ·cloud_storage_platform confidence 0.91

By the Platform vs Service rule, S3 is a hosted multi-tenant AWS capability with APIs rather than software you run yourself, so it fits Platform best.

Derived legacy fields
Category
Platform
Sub-category
cloud_storage_platform
Skill nature
PLATFORM
Volatility
STABLE
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Storage Provisioning and Automation Catalog dimension db id 311

    Library dimension (catalog)

    Roles linked in library: Storage Engineer

Locked dimensions (v3 placement)

  • Object Storage Provisioning

    Reuses catalog slug

    Covers creating, configuring, and operating S3-style object storage resources and their access controls. S3 belongs here because it is the canonical AWS object storage service used for buckets, objects, lifecycle, and access policies.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Storage Provisioning and Automation
storage-provisioning-and-automation
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Kinesis Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.89

AWS Kinesis appears in many cloud/data engineering job postings and is a standard managed streaming service in AWS stacks; no vendor sunset indicates active market demand.

Vendor & license

Amazon Web Services ·proprietary ·since 2013 (0.98)

Context keywords
streaming event-driven real-time ingestion shards producers consumers Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics Lambda S3 CloudWatch partition key checkpointing throughput
Ambiguity low

In JDs, Kinesis usually clearly refers to AWS Kinesis, a distinct streaming service. The name is not a common overloaded acronym or short token likely to be mistaken for another catalog skill.

Versioning

Not versioned

Type assignment

Service ·streaming_data_service confidence 0.93

By the Platform vs Service rule, Kinesis is a specific managed capability within AWS rather than a standalone hosted environment, so it is a Service.

Derived legacy fields
Category
Service
Sub-category
streaming_data_service
Skill nature
CLOUD_SERVICE
Volatility
STABLE
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Streaming Data Processing Catalog dimension db id 69

    Library dimension (catalog)

    Roles linked in library: Data Engineer

Locked dimensions (v3 placement)

  • Streaming Data Processing

    Pipeline tentative id

    Tools and patterns for ingesting, buffering, and transforming event streams with low latency. This includes continuous processing, windowing, stateful stream jobs, checkpointing, shard scaling, stream partitioning, and managed streaming services such as Kinesis, Amazon Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Streaming Data Processing
streaming-data-processing
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Streaming Data Processing
d_merge_01
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Amazon API Gateway Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity well_known confidence 0.95

Broadly listed in cloud/backend JDs and AWS docs; commonly paired with Lambda, IAM, and serverless stacks, indicating staple market demand rather than niche use.

Vendor & license

Amazon Web Services ·proprietary ·since 2015 (0.98)

Context keywords
REST APIs HTTP APIs Lambda proxy OpenAPI Swagger CORS authorizers usage plans throttling stages deployments request validation mapping templates VPC Link CloudWatch
Ambiguity low

Amazon API Gateway is a specific AWS service name with little overlap in typical JDs; it is unlikely to be confused with a different catalog skill.

Versioning

Not versioned

Type assignment

Service ·api_management_service confidence 0.98

By the Platform vs Service rule, Amazon API Gateway is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.

Derived legacy fields
Category
Service
Sub-category
api_management_service
Skill nature
CLOUD_SERVICE
Volatility
STABLE
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • HTTP API Frameworks and Gateway Layers Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

Locked dimensions (v3 placement)

  • HTTP API Frameworks and Gateway Layers

    Pipeline tentative id

    Server-side frameworks and gateway layers used to build, route, validate, and manage HTTP APIs. Includes backend web service frameworks and API gateway configuration for endpoints, request/response mapping, routing, input validation, auth integration, throttling, usage plans, stages, custom domains, and backend integration.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
HTTP API Frameworks and Gateway Layers
d_merge_01
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Python Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Python id=393 · python

Aliases — catalog

  • Cobalt Strike (CANONICAL) primary

Context tags (catalog)

Malleable C2 beacon credential dumping kerberos lateral movement payload phishing post-exploitation privilege escalation psexec red team sleep mask smb stager team server

Stored enrichment (catalog DB)

Category
Tool
Sub-category
Adversary Simulation Tool
Vendor
Fortra
License
proprietary
Year introduced
2012
Confidence
0.98
Version strategy
NOT_APPLICABLE

Maturity reasoning: Appears in a limited set of red-team/pentest JDs and security vendor training, but far below mainstream devops tools; market signal is specialized adversary-simulation usage rather than broad hiring demand.

Skill profile (library / DB)

Skill nature
LANGUAGE
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
5
Sub-category id
54
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Analytical Programming Languages Catalog dimension db id 82

    Library dimension (catalog)

    Roles linked in library: Data Analyst, Data Scientist

  • Automation Scripting and CLI Catalog dimension db id 48

    Library dimension (catalog)

    Roles linked in library: Azure Cloud Engineer, Cloud Engineer

  • Automation and Scripting for Operations Catalog dimension db id 361

    Library dimension (catalog)

    Roles linked in library: Virtualization Engineer

  • Network Automation and Scripting Catalog dimension db id 285

    Library dimension (catalog)

    Roles linked in library: Network Engineer

  • Programming Languages for AI Workflows Catalog dimension db id 261

    Library dimension (catalog)

    Roles linked in library: AI Engineer

  • Programming Languages for Backend Systems Catalog dimension db id 140

    Library dimension (catalog)

    Roles linked in library: Backend Engineer

  • Programming Languages for Data Work Catalog dimension db id 67

    Library dimension (catalog)

    Roles linked in library: Data Engineer

  • Programming Languages for ML Systems Catalog dimension db id 113

    Library dimension (catalog)

    Roles linked in library: Machine Learning Engineer

  • Programming Languages for Security Work Catalog dimension db id 328

    Library dimension (catalog)

    Roles linked in library: Cybersecurity Engineer

  • Programming Languages for Test Automation Catalog dimension db id 193

    Library dimension (catalog)

    Roles linked in library: Automation Tester

  • Security Automation and Scripting Catalog dimension db id 258

    Library dimension (catalog)

    Roles linked in library: Cybersecurity Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Analytical Programming Languages
analytical-programming-languages
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Automation Scripting and CLI
automation-scripting-and-cli
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Automation and Scripting for Operations
automation-and-scripting-for-operations
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Network Automation and Scripting
network-automation-and-scripting
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for AI Workflows
programming-languages-for-ai-workflows
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for Backend Systems
programming-languages-for-backend-systems
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for Data Work
programming-languages-for-data-work
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for ML Systems
programming-languages-for-ml-systems
Existing dimension (library) · Role↔dimension saved
Programming Languages for Security Work
programming-languages-for-security-work
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Programming Languages for Test Automation
programming-languages-for-test-automation
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Security Automation and Scripting
security-automation-and-scripting
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
TensorFlow Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: TensorFlow id=558 · tensorflow

Aliases — catalog

  • shader graphs (CANONICAL) primary

Context tags (catalog)

GLSL HLSL PBR UV mapping fragment shader material editor node graph node-based normal map procedural texturing render pipeline shader compiler tessellation vertex shader visual scripting

Stored enrichment (catalog DB)

Category
Framework
Sub-category
Visual Shader Authoring Framework
Vendor
Unity Technologies
License
proprietary
Year introduced
2018
Confidence
0.74
Version strategy
NOT_APPLICABLE

Maturity reasoning: Shader graphs appear in some Unity/Unreal and VFX job postings, but JD volume is far below core graphics skills like HLSL/GLSL; market use is concentrated in game/real-time rendering teams.

Skill profile (library / DB)

Skill nature
LIBRARY
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
6
Sub-category id
456
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Applied Machine Learning Toolkits Catalog dimension db id 94

    Library dimension (catalog)

    Roles linked in library: Data Scientist

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
PyTorch Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: PyTorch id=557 · pytorch

Aliases — catalog

  • GLSL (CANONICAL) primary

Context tags (catalog)

GPU HLSL OpenGL SPIR-V Vulkan WebGL compute shader fragment shader rendering pipeline shader shader pipeline texture sampling uniform varying vertex shader

Stored enrichment (catalog DB)

Category
Language
Sub-category
Shader Language
Vendor
Khronos Group
License
other_open
Year introduced
2004
Confidence
0.99
Version strategy
NOT_APPLICABLE

Maturity reasoning: GLSL appears in graphics/game-engine JDs but at much lower volume than mainstream languages; it’s specialized for shader programming and often replaced in newer pipelines by HLSL/Metal Shading Language or higher-level abstractions.

Skill profile (library / DB)

Skill nature
LIBRARY
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
6
Sub-category id
456
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Applied Machine Learning Toolkits Catalog dimension db id 94

    Library dimension (catalog)

    Roles linked in library: Data Scientist

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Scikit-learn Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: scikit-learn id=554 · scikit-learn

Aliases — catalog

  • post-processing (CANONICAL) primary

Context tags (catalog)

GPU anti-aliasing bloom color grading compositing depth of field fragment shader framebuffer image filtering motion blur render pipeline render target screen-space shader tone mapping

Stored enrichment (catalog DB)

Category
Concept
Sub-category
Graphics Effect Concept
Confidence
0.86
Version strategy
NOT_APPLICABLE

Maturity reasoning: Job postings rarely list "post-processing" as a standalone skill; it appears mainly in graphics/VFX roles, while broader JDs usually specify tools like Unreal/Unity or Photoshop instead.

Skill profile (library / DB)

Skill nature
LIBRARY
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
6
Sub-category id
458
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Applied Machine Learning Toolkits Catalog dimension db id 94

    Library dimension (catalog)

    Roles linked in library: Data Scientist

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
GitHub Actions Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: GitHub Actions id=1250 · github-actions

Aliases — catalog

  • E5 (CANONICAL) primary

Context tags (catalog)

cosine similarity data augmentation dimensionality reduction embedding space feature extraction fine-tuning model evaluation natural language processing nearest neighbors pre-trained models semantic search similarity scoring transfer learning transformer models vector embeddings

Stored enrichment (catalog DB)

Category
Library
Sub-category
Embedding Model Library
Vendor
OpenAI
License
other_open
Year introduced
2021
Confidence
0.80
Version strategy
NOT_APPLICABLE

Maturity reasoning: E5 is a specific embedding-model library with limited JD volume; market demand is concentrated in AI/ML roles rather than broad software hiring, unlike mainstream libraries.

Skill profile (library / DB)

Skill nature
CLOUD_SERVICE
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
14
Sub-category id
1019
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Continuous Integration Test Integration Catalog dimension db id 207

    Library dimension (catalog)

    Roles linked in library: Automation Tester

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Continuous Integration Test Integration
continuous-integration-test-integration
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Airflow Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Airflow id=325 · airflow

Aliases — catalog

  • OpenVAS (CANONICAL) primary

Context tags (catalog)

CVE CVSS GVM Greenbone NVT asset discovery authenticated scan compliance scan network scan port scanning remediation reporting service detection unauthenticated scan vulnerability assessment

Stored enrichment (catalog DB)

Category
Tool
Sub-category
Vulnerability Scanner
Vendor
Greenbone Networks
License
gpl_v2
Year introduced
2009
Confidence
0.98
Version strategy
NOT_APPLICABLE

Maturity reasoning: OpenVAS appears in security-focused JDs far less often than mainstream scanners like Nessus or Qualys, and its usage is concentrated in pentest/vuln-management roles rather than general DevOps stacks.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
11
Sub-category id
335
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Workflow Orchestration Systems Catalog dimension db id 64

    Library dimension (catalog)

    Roles linked in library: Data Engineer, MLOps Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Workflow Orchestration Systems
workflow-orchestration-systems
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Terraform Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Terraform id=144 · terraform

Aliases — catalog

  • Snapshot loads (CANONICAL) primary

Context tags (catalog)

CDC ELT ETL SCD backfill batch ingestion change data capture data warehouse full refresh historical snapshot idempotent loads incremental load late-arriving data partition overwrite point-in-time

Stored enrichment (catalog DB)

Category
Methodology
Sub-category
Data Loading Methodology
Confidence
0.90
Version strategy
NOT_APPLICABLE

Maturity reasoning: Snapshot loads are a specialized data-loading pattern; JD volume is very low compared with mainstream ETL/ELT tools, and market discussion is mostly in niche data-engineering forums rather than broad hiring pipelines.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
11
Sub-category id
171
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Infrastructure Provisioning Templates Catalog dimension db id 291

    Library dimension (catalog)

    Roles linked in library: Cloud Engineer

  • Infrastructure as Code Catalog dimension db id 22

    Library dimension (catalog)

    Roles linked in library: DevOps Engineer

  • Infrastructure as Code and Declarative Provisioning Catalog dimension db id 36

    Library dimension (catalog)

    Roles linked in library: Azure Cloud Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Infrastructure Provisioning Templates
infrastructure-provisioning-templates
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Infrastructure as Code
infrastructure-as-code
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Infrastructure as Code and Declarative Provisioning
infrastructure-as-code-and-declarative-provisioning
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Docker Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Docker id=153 · docker

Aliases — catalog

  • Metabase (CANONICAL) primary

Context tags (catalog)

BigQuery MySQL PostgreSQL Redshift SQL ad hoc analysis cards collections dashboards data visualization embedded analytics filters questions segments self-service BI

Stored enrichment (catalog DB)

Category
Tool
Sub-category
Bi Analytics Tool
Vendor
Metabase, Inc.
License
apache_2
Year introduced
2014
Confidence
0.90
Version strategy
NOT_APPLICABLE

Maturity reasoning: Metabase appears in many BI/analytics job postings and is growing in GitHub usage, but it is still far less universal than Tableau/Power BI in enterprise JDs.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
11
Sub-category id
170
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Containerization and Image Delivery Catalog dimension db id 24

    Library dimension (catalog)

    Roles linked in library: DevOps Engineer

  • Model Serving Deployment and Runtime Packaging Catalog dimension db id 52

    Library dimension (catalog)

    Roles linked in library: MLOps Engineer, Machine Learning Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Containerization and Image Delivery
containerization-and-image-delivery
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Model Serving Deployment and Runtime Packaging
model-serving-deployment-and-runtime-packaging
Existing dimension (library) · Role↔dimension saved
Kubernetes Secondary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Kubernetes id=158 · kubernetes

Aliases — catalog

  • Column-level security (CANONICAL) primary

Context tags (catalog)

ABAC PII access policies attribute-based access control audit logging data governance data masking database permissions dynamic masking fine-grained access control least privilege policy enforcement row-level security sensitive data static masking

Stored enrichment (catalog DB)

Category
Concept
Sub-category
Access Control Concept
Confidence
0.90
Version strategy
NOT_APPLICABLE

Maturity reasoning: Appears in cloud/data platform JDs and vendor docs for Snowflake, BigQuery, and PostgreSQL RLS/column masking, but is not yet a universal hiring staple like core IAM or RBAC.

Skill profile (library / DB)

Skill nature
PLATFORM
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
13
Sub-category id
1524
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Orchestration Platforms Catalog dimension db id 25

    Library dimension (catalog)

    Roles linked in library: Cloud Engineer, DevOps Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Orchestration Platforms
orchestration-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Pinecone Secondary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity emerging confidence 0.86

Pinecone appears in many AI/vector-search job descriptions and vendor docs, but it’s still far less universal than PostgreSQL/AWS; market signal shows growing adoption rather than staple status.

Vendor & license

Pinecone Systems, Inc. ·proprietary ·since 2019 (0.95)

Context keywords
vector database embeddings semantic search similarity search ANN approximate nearest neighbor RAG retrieval augmented generation indexing namespace metadata filtering upsert vector index hybrid search OpenAI
Ambiguity low

Pinecone is a distinctive vector database platform name; in typical JDs it is unlikely to be confused with another catalog skill.

Versioning

Not versioned

Type assignment

Platform ·vector_database_platform confidence 0.90

By the Vendor SaaS = Platform rule, Pinecone is a hosted multi-tenant vector database service consumed via APIs rather than software you run yourself.

Derived legacy fields
Category
Platform
Sub-category
vector_database_platform
Skill nature
PLATFORM
Volatility
EMERGING
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Cloud Model Runtime Services Catalog dimension db id 121

    Library dimension (catalog)

    Roles linked in library: Machine Learning Engineer

Locked dimensions (v3 placement)

  • Vector Database Services

    Reuses catalog slug

    Managed services used to store, index, and query embeddings for semantic search and retrieval-augmented applications. Pinecone belongs here because it is a purpose-built vector database service rather than a general-purpose datastore.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Model Runtime Services
cloud-model-runtime-services
New skill saved · Existing dimension (library) · Role↔dimension saved
OpenSearch Secondary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity emerging confidence 0.84

OpenSearch appears in growing numbers of JDs for search/log analytics, but Elasticsearch still dominates most postings; AWS also continues to position it as the open-source successor to Elasticsearch.

Vendor & license

OpenSearch Project ·apache_2 ·since 2021 (0.98)

Context keywords
Elasticsearch Kibana Lucene index mapping shards replicas full-text search aggregations query DSL ingest pipeline cluster management index templates analyzers vector search OpenSearch Dashboards
Ambiguity low

OpenSearch is a specific search engine/datastore name with little overlap in typical JDs; it is unlikely to be mistaken for another catalog skill.

Versioning

Not versioned

Type assignment

Datastore ·search_engine_datastore confidence 0.93

OpenSearch is fundamentally a persistent search and analytics datastore, and under the Datastore vs Format rule it fits Datastore because it stores and indexes data rather than merely defining a format.

Derived legacy fields
Category
Datastore
Sub-category
search_engine_datastore
Skill nature
TOOL
Volatility
EMERGING
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Cloud Data Platform Services Catalog dimension db id 81

    Library dimension (catalog)

    Roles linked in library: Data Engineer

  • Version Control Systems Catalog dimension db id 365

    Library dimension (catalog)

Locked dimensions (v3 placement)

  • Search and Analytics Services

    Reuses catalog slug

    Managed search and indexing services used to store, query, and analyze large document or event datasets. OpenSearch belongs here because it is commonly used as a search engine and analytics backend in cloud data platforms.

  • Search Engine Administration

    Pipeline tentative id

    Operational setup and tuning of search clusters, indexes, and query behavior. This fits OpenSearch when the skill emphasis is on running and configuring the search engine itself rather than integrating it into an application.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Data Platform Services
cloud-data-platform-services
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Version Control Systems
d_init_01
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
FAISS Secondary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

Maturity emerging confidence 0.84

FAISS appears in many ML/vector-search job descriptions and is widely used in RAG stacks, but it’s still less universal than Elasticsearch/PostgreSQL; market demand is growing rather than ubiquitous.

Vendor & license

Meta ·mit ·since 2017 (0.99)

Context keywords
approximate nearest neighbor ANN vector index similarity search embeddings cosine similarity L2 distance IVF HNSW PQ flat index GPU acceleration k-NN semantic search re-ranking
Ambiguity low

FAISS is a distinctive library name for vector similarity search; in typical JDs it is unlikely to be confused with another catalog skill.

Versioning

Not versioned

Type assignment

Library ·vector_search_library confidence 0.93

FAISS is fundamentally a code package imported by applications for similarity search, so under the Tool vs Framework rule it fits Library rather than a user-operated tool or hosted platform.

Derived legacy fields
Category
Library
Sub-category
vector_search_library
Skill nature
LIBRARY
Volatility
EMERGING
Typical lifespan
EVERGREEN
Version strategy
NOT_APPLICABLE

Dimensions (API 2 worklist)

  • Applied Machine Learning Toolkits and Frameworks Proposed / LLM

    Proposed / LLM dimension (no DB id yet)

  • Version Control Systems Catalog dimension db id 365

    Library dimension (catalog)

Locked dimensions (v3 placement)

  • Applied Machine Learning Toolkits and Frameworks

    Pipeline tentative id

    Libraries and frameworks used to prototype, compare, index, and evaluate machine learning solutions quickly. Includes applied-ML toolkits such as scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, FAISS, vector search libraries, and embedding utilities. Excludes serving runtimes, deployment containers, online inference infrastructure, streaming systems, and database/storage engines.

  • Vector Search Indexing

    Pipeline tentative id

    Index structures and libraries for approximate nearest-neighbor search over embeddings and feature vectors. FAISS fits strongly here because it is primarily used to build and query high-performance vector indexes for retrieval.

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Applied Machine Learning Toolkits and Frameworks
d_merge_01
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Version Control Systems
d_init_01
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill Tag Dimension Skill↔dim Role↔dim Outcome Notes
AWS in_db
Cloud Platform Operations
cloud-platform-operations
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
AWS in_db
Cloud Security Platforms
cloud-security-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
EKS in_db
Cloud Model Runtime Services
cloud-model-runtime-services
Existing dimension (library) · Role↔dimension saved
EKS in_db
Orchestration Platforms
orchestration-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
EC2 in_db
Cloud Provider Core Services
cloud-provider-core-services
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
AWS Glue in_db
Cloud Data Platform Services
cloud-data-platform-services
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Redshift in_db
Data Warehousing Platforms
data-warehousing-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python in_db
Analytical Programming Languages
analytical-programming-languages
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python in_db
Automation Scripting and CLI
automation-scripting-and-cli
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python in_db
Automation and Scripting for Operations
automation-and-scripting-for-operations
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python in_db
Network Automation and Scripting
network-automation-and-scripting
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python in_db
Programming Languages for AI Workflows
programming-languages-for-ai-workflows
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python in_db
Programming Languages for Backend Systems
programming-languages-for-backend-systems
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python in_db
Programming Languages for Data Work
programming-languages-for-data-work
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python in_db
Programming Languages for ML Systems
programming-languages-for-ml-systems
Existing dimension (library) · Role↔dimension saved
Python in_db
Programming Languages for Security Work
programming-languages-for-security-work
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python in_db
Programming Languages for Test Automation
programming-languages-for-test-automation
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Python in_db
Security Automation and Scripting
security-automation-and-scripting
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
TensorFlow in_db
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
PyTorch in_db
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Scikit-learn in_db
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
GitHub Actions in_db
Continuous Integration Test Integration
continuous-integration-test-integration
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Airflow in_db
Workflow Orchestration Systems
workflow-orchestration-systems
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Terraform in_db
Infrastructure Provisioning Templates
infrastructure-provisioning-templates
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Terraform in_db
Infrastructure as Code
infrastructure-as-code
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Terraform in_db
Infrastructure as Code and Declarative Provisioning
infrastructure-as-code-and-declarative-provisioning
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Docker in_db
Containerization and Image Delivery
containerization-and-image-delivery
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Docker in_db
Model Serving Deployment and Runtime Packaging
model-serving-deployment-and-runtime-packaging
Existing dimension (library) · Role↔dimension saved
Kubernetes in_db
Orchestration Platforms
orchestration-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
AI/ML in_db
Applied Machine Learning Tooling and Frameworks
d_merge_01
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
AI/ML in_db
AI Service Integration and Orchestration Patterns
d_merge_02
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
AI/ML in_db
AI Inference Cost, Latency, and Throughput Optimization
ai-inference-cost-latency-and-throughput-optimization
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Amazon SageMaker in_db
Managed ML Platform Workflows
d_split_01_01
New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role)
Amazon SageMaker in_db
Managed Model Hosting and Endpoints
d_split_01_02
New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role)
Amazon SageMaker in_db
Model Serving Runtime Packaging
d_split_01_03
New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role)
Amazon SageMaker in_db
Model Serving Frameworks and Platforms
d_split_01_04
New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role)
Amazon Bedrock in_db
Cloud Model Runtime Services
cloud-model-runtime-services
New skill saved · Existing dimension (library) · Role↔dimension saved
AWS Lambda in_db
Managed Cloud Data Platform Services
d_merge_01
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
ECS in_db
Version Control Systems
d_init_01
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Amazon Athena in_db
Cloud Analytics Query Services
d_split_01_01
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Amazon Athena in_db
Cloud Data Pipeline Runtime
d_split_01_02
New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role)
Amazon Athena in_db
Cloud Data Platform Storage
d_split_01_03
New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role)
Amazon Athena in_db
Cloud Data Platform Security and Networking
d_split_01_04
New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role)
AWS Data Pipeline in_db
Cloud Data Platform Services
cloud-data-platform-services
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
S3 in_db
Storage Provisioning and Automation
storage-provisioning-and-automation
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Kinesis in_db
Streaming Data Processing
streaming-data-processing
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Amazon API Gateway in_db
HTTP API Frameworks and Gateway Layers
d_merge_01
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Pinecone in_db
Cloud Model Runtime Services
cloud-model-runtime-services
New skill saved · Existing dimension (library) · Role↔dimension saved
OpenSearch in_db
Cloud Data Platform Services
cloud-data-platform-services
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
OpenSearch in_db
Version Control Systems
d_init_01
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
FAISS in_db
Applied Machine Learning Toolkits and Frameworks
d_merge_01
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
FAISS in_db
Version Control Systems
d_init_01
New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
AI/ML in_db
AI Inference Cost, Latency, and Throughput Optimization
d_merge_03
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)
Kinesis in_db
Streaming Data Processing
d_merge_01
New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role)

Library artifacts (this run)

Kind Detail DB id
canonical_skill_added AI/ML 2611
canonical_skill_added Amazon SageMaker 2612
canonical_skill_added Amazon Bedrock 2613
canonical_skill_added AWS Lambda 2614
canonical_skill_added ECS 2615
canonical_skill_added Amazon Athena 2616
canonical_skill_added AWS Data Pipeline 2617
canonical_skill_added S3 2618
canonical_skill_added Kinesis 2619
canonical_skill_added Amazon API Gateway 2620
canonical_skill_added Pinecone 2621
canonical_skill_added OpenSearch 2622
canonical_skill_added FAISS 2623
dimension_skill_link AI/ML ↔ Applied Machine Learning Tooling and Frameworks 94
dimension_skill_link AI/ML ↔ AI Service Integration and Orchestration Patterns 270
dimension_skill_link AI/ML ↔ AI Inference Cost, Latency, and Throughput Optimization 260
dimension_created Managed ML Platform Workflows 367
dimension_skill_link Amazon SageMaker ↔ Managed ML Platform Workflows 367
dimension_created Managed Model Hosting and Endpoints 368
dimension_skill_link Amazon SageMaker ↔ Managed Model Hosting and Endpoints 368
dimension_skill_link Amazon SageMaker ↔ Model Serving Runtime Packaging 52
dimension_skill_link Amazon Bedrock ↔ Cloud Model Runtime Services 121
dimension_skill_link AWS Lambda ↔ Managed Cloud Data Platform Services 81
dimension_skill_link ECS ↔ Version Control Systems 365
dimension_skill_link Amazon Athena ↔ Cloud Analytics Query Services 367
dimension_skill_link Amazon Athena ↔ Cloud Data Pipeline Runtime 81
dimension_created Cloud Data Platform Security and Networking 369
dimension_skill_link Amazon Athena ↔ Cloud Data Platform Security and Networking 369
dimension_skill_link AWS Data Pipeline ↔ Cloud Data Platform Services 81
dimension_skill_link S3 ↔ Storage Provisioning and Automation 311
dimension_skill_link Kinesis ↔ Streaming Data Processing 69
dimension_skill_link Amazon API Gateway ↔ HTTP API Frameworks and Gateway Layers 141
dimension_skill_link Pinecone ↔ Cloud Model Runtime Services 121
dimension_skill_link OpenSearch ↔ Cloud Data Platform Services 81
dimension_skill_link OpenSearch ↔ Version Control Systems 365
dimension_skill_link FAISS ↔ Applied Machine Learning Toolkits and Frameworks 94
dimension_skill_link FAISS ↔ Version Control Systems 365
API 1 — extract-from-jd click to toggle
{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "AWS"
    },
    {
      "is_primary": true,
      "skill_name": "AI/ML"
    },
    {
      "is_primary": true,
      "skill_name": "Amazon SageMaker"
    },
    {
      "is_primary": true,
      "skill_name": "Amazon Bedrock"
    },
    {
      "is_primary": true,
      "skill_name": "AWS Lambda"
    },
    {
      "is_primary": true,
      "skill_name": "ECS"
    },
    {
      "is_primary": true,
      "skill_name": "EKS"
    },
    {
      "is_primary": true,
      "skill_name": "EC2"
    },
    {
      "is_primary": true,
      "skill_name": "AWS Glue"
    },
    {
      "is_primary": true,
      "skill_name": "Amazon Athena"
    },
    {
      "is_primary": true,
      "skill_name": "Redshift"
    },
    {
      "is_primary": true,
      "skill_name": "AWS Data Pipeline"
    },
    {
      "is_primary": true,
      "skill_name": "S3"
    },
    {
      "is_primary": true,
      "skill_name": "Kinesis"
    },
    {
      "is_primary": true,
      "skill_name": "Amazon API Gateway"
    },
    {
      "is_primary": true,
      "skill_name": "Python"
    },
    {
      "is_primary": false,
      "skill_name": "TensorFlow"
    },
    {
      "is_primary": false,
      "skill_name": "PyTorch"
    },
    {
      "is_primary": false,
      "skill_name": "Scikit-learn"
    },
    {
      "is_primary": false,
      "skill_name": "GitHub Actions"
    },
    {
      "is_primary": false,
      "skill_name": "Airflow"
    },
    {
      "is_primary": false,
      "skill_name": "Terraform"
    },
    {
      "is_primary": false,
      "skill_name": "Docker"
    },
    {
      "is_primary": false,
      "skill_name": "Kubernetes"
    },
    {
      "is_primary": false,
      "skill_name": "Pinecone"
    },
    {
      "is_primary": false,
      "skill_name": "OpenSearch"
    },
    {
      "is_primary": false,
      "skill_name": "FAISS"
    }
  ],
  "run_id": null
}
API 2 — extract-details
{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 348,
      "existing_alias_text": "AWS",
      "input_term": "AWS",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "AWS",
        "id": 163,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "aws",
        "sub_category_id": 161,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 1093,
      "existing_alias_text": "EKS",
      "input_term": "EKS",
      "matched_canonical": {
        "category_id": 14,
        "display_name": "EKS",
        "id": 725,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CLOUD_SERVICE",
        "slug": "eks",
        "sub_category_id": 251,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2372,
      "existing_alias_text": "EC2",
      "input_term": "EC2",
      "matched_canonical": {
        "category_id": 14,
        "display_name": "EC2",
        "id": 1773,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CLOUD_SERVICE",
        "slug": "ec2",
        "sub_category_id": 1544,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 730,
      "existing_alias_text": "AWS Glue",
      "input_term": "AWS Glue",
      "matched_canonical": {
        "category_id": 14,
        "display_name": "AWS Glue",
        "id": 466,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CLOUD_SERVICE",
        "slug": "aws-glue",
        "sub_category_id": 385,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 3367,
      "existing_alias_text": "Redshift",
      "input_term": "Redshift",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "Redshift",
        "id": 2570,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "redshift",
        "sub_category_id": 2098,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 608,
      "existing_alias_text": "Python",
      "input_term": "Python",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Python",
        "id": 393,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "python",
        "sub_category_id": 54,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 862,
      "existing_alias_text": "TensorFlow",
      "input_term": "TensorFlow",
      "matched_canonical": {
        "category_id": 6,
        "display_name": "TensorFlow",
        "id": 558,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LIBRARY",
        "slug": "tensorflow",
        "sub_category_id": 456,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 861,
      "existing_alias_text": "PyTorch",
      "input_term": "PyTorch",
      "matched_canonical": {
        "category_id": 6,
        "display_name": "PyTorch",
        "id": 557,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LIBRARY",
        "slug": "pytorch",
        "sub_category_id": 456,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 852,
      "existing_alias_text": "scikit-learn",
      "input_term": "Scikit-learn",
      "matched_canonical": {
        "category_id": 6,
        "display_name": "scikit-learn",
        "id": 554,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LIBRARY",
        "slug": "scikit-learn",
        "sub_category_id": 458,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 1800,
      "existing_alias_text": "GitHub Actions",
      "input_term": "GitHub Actions",
      "matched_canonical": {
        "category_id": 14,
        "display_name": "GitHub Actions",
        "id": 1250,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CLOUD_SERVICE",
        "slug": "github-actions",
        "sub_category_id": 1019,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 540,
      "existing_alias_text": "Airflow",
      "input_term": "Airflow",
      "matched_canonical": {
        "category_id": 11,
        "display_name": "Airflow",
        "id": 325,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "airflow",
        "sub_category_id": 335,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 290,
      "existing_alias_text": "Terraform",
      "input_term": "Terraform",
      "matched_canonical": {
        "category_id": 11,
        "display_name": "Terraform",
        "id": 144,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "terraform",
        "sub_category_id": 171,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 299,
      "existing_alias_text": "Docker",
      "input_term": "Docker",
      "matched_canonical": {
        "category_id": 11,
        "display_name": "Docker",
        "id": 153,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "docker",
        "sub_category_id": 170,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 304,
      "existing_alias_text": "Kubernetes",
      "input_term": "Kubernetes",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "Kubernetes",
        "id": 158,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "kubernetes",
        "sub_category_id": 1524,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "DevOps Engineer",
      "id": 1,
      "rationale": null,
      "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
      "slug": "devops-engineer",
      "source": "db"
    },
    {
      "display_name": "Cybersecurity Engineer",
      "id": 9,
      "rationale": null,
      "role_archetype": null,
      "slug": "cybersecurity-engineer",
      "source": "db"
    },
    {
      "display_name": "Machine Learning Engineer",
      "id": 10,
      "rationale": null,
      "role_archetype": null,
      "slug": "machine-learning-engineer",
      "source": "db"
    },
    {
      "display_name": "Cloud Engineer",
      "id": 18,
      "rationale": null,
      "role_archetype": null,
      "slug": "cloud-engineer",
      "source": "db"
    },
    {
      "display_name": "Data Engineer",
      "id": 6,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-engineer",
      "source": "db"
    },
    {
      "display_name": "Data Analyst",
      "id": 20,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-analyst",
      "source": "db"
    },
    {
      "display_name": "Data Scientist",
      "id": 7,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-scientist",
      "source": "db"
    },
    {
      "display_name": "Azure Cloud Engineer",
      "id": 4,
      "rationale": null,
      "role_archetype": null,
      "slug": "azure-cloud-engineer",
      "source": "db"
    },
    {
      "display_name": "Virtualization Engineer",
      "id": 26,
      "rationale": null,
      "role_archetype": null,
      "slug": "virtualization-engineer",
      "source": "db"
    },
    {
      "display_name": "Network Engineer",
      "id": 21,
      "rationale": null,
      "role_archetype": null,
      "slug": "network-engineer",
      "source": "db"
    },
    {
      "display_name": "AI Engineer",
      "id": 12,
      "rationale": null,
      "role_archetype": null,
      "slug": "ai-engineer",
      "source": "db"
    },
    {
      "display_name": "Backend Engineer",
      "id": 14,
      "rationale": null,
      "role_archetype": null,
      "slug": "backend-engineer",
      "source": "db"
    },
    {
      "display_name": "Automation Tester",
      "id": 16,
      "rationale": null,
      "role_archetype": null,
      "slug": "automation-tester",
      "source": "db"
    },
    {
      "display_name": "MLOps Engineer",
      "id": 5,
      "rationale": null,
      "role_archetype": null,
      "slug": "mlops-engineer",
      "source": "db"
    },
    {
      "display_name": "Storage Engineer",
      "id": 22,
      "rationale": null,
      "role_archetype": null,
      "slug": "storage-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Machine Learning Engineer",
    "id": 10,
    "rationale": "The primary skills include a strong focus on AWS and AI/ML technologies, which aligns well with the role of a Machine Learning Engineer.",
    "role_archetype": null,
    "slug": "machine-learning-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Platform Operations",
        "id": 26,
        "rationale": "Uses cloud provider services to support delivery and runtime environments. The focus is on consumer-level operation of cloud services rather than deep cloud architecture ownership.",
        "slug": "cloud-platform-operations",
        "source": "db"
      },
      "input_skill": "AWS",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "DevOps Engineer",
          "id": 1,
          "rationale": null,
          "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
          "slug": "devops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Security Platforms",
        "id": 332,
        "rationale": "Cloud-native security products used to assess posture, detect misconfigurations, and monitor workloads across AWS, Azure, and GCP. This is a distinct product family because the role often works across multiple CNAPP/CSPM/CWPP offerings and cloud-native detectors.",
        "slug": "cloud-security-platforms",
        "source": "db"
      },
      "input_skill": "AWS",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cybersecurity Engineer",
          "id": 9,
          "rationale": null,
          "role_archetype": null,
          "slug": "cybersecurity-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Model Runtime Services",
        "id": 121,
        "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
        "slug": "cloud-model-runtime-services",
        "source": "db"
      },
      "input_skill": "EKS",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Machine Learning Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "machine-learning-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Orchestration Platforms",
        "id": 25,
        "rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
        "slug": "orchestration-platforms",
        "source": "db"
      },
      "input_skill": "EKS",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cloud Engineer",
          "id": 18,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-engineer",
          "source": "db"
        },
        {
          "display_name": "DevOps Engineer",
          "id": 1,
          "rationale": null,
          "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
          "slug": "devops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Provider Core Services",
        "id": 290,
        "rationale": "Core managed services used to provision and operate cloud environments. This is the base cloud surface for compute, storage, networking, and platform primitives the role configures and maintains.",
        "slug": "cloud-provider-core-services",
        "source": "db"
      },
      "input_skill": "EC2",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cloud Engineer",
          "id": 18,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Data Platform Services",
        "id": 81,
        "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
        "slug": "cloud-data-platform-services",
        "source": "db"
      },
      "input_skill": "AWS Glue",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Data Warehousing Platforms",
        "id": 72,
        "rationale": "Cloud and on-prem analytical storage systems used to persist curated datasets and serve downstream consumers. This cluster is about the warehouse/lakehouse layer where transformed data is organized for access.",
        "slug": "data-warehousing-platforms",
        "source": "db"
      },
      "input_skill": "Redshift",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Analytical Programming Languages",
        "id": 82,
        "rationale": "Languages used to clean, transform, analyze, and prototype models in notebooks and scripts. This is the core coding surface for expressing statistical logic and data manipulation in a reproducible way.",
        "slug": "analytical-programming-languages",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Analyst",
          "id": 20,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-analyst",
          "source": "db"
        },
        {
          "display_name": "Data Scientist",
          "id": 7,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-scientist",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Automation Scripting and CLI",
        "id": 48,
        "rationale": "Uses scripts and command-line tooling to execute repeatable Azure operations and reduce manual work. This is a practical cluster because the role frequently automates provisioning, checks, and remediation tasks.",
        "slug": "automation-scripting-and-cli",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Azure Cloud Engineer",
          "id": 4,
          "rationale": null,
          "role_archetype": null,
          "slug": "azure-cloud-engineer",
          "source": "db"
        },
        {
          "display_name": "Cloud Engineer",
          "id": 18,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Automation and Scripting for Operations",
        "id": 361,
        "rationale": "Scripts and lightweight automation used to execute repetitive virtualization tasks and enforce operational consistency. This is the practical glue that reduces manual host and VM administration.",
        "slug": "automation-and-scripting-for-operations",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Virtualization Engineer",
          "id": 26,
          "rationale": null,
          "role_archetype": null,
          "slug": "virtualization-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Network Automation and Scripting",
        "id": 285,
        "rationale": "Covers scripts and automation used to configure, validate, and audit network devices and services. This cluster is coherent because repeatable network operations increasingly depend on programmatic changes and checks.",
        "slug": "network-automation-and-scripting",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Network Engineer",
          "id": 21,
          "rationale": null,
          "role_archetype": null,
          "slug": "network-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for AI Workflows",
        "id": 261,
        "rationale": "Languages used to implement AI feature logic, orchestration, and response handling inside product code. This is the core coding surface for turning prompts and model calls into reliable application behavior.",
        "slug": "programming-languages-for-ai-workflows",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "AI Engineer",
          "id": 12,
          "rationale": null,
          "role_archetype": null,
          "slug": "ai-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Backend Systems",
        "id": 140,
        "rationale": "Languages used to implement server-side business logic, request handlers, workers, and service integrations. This is the core coding surface for backend feature delivery and maintenance.",
        "slug": "programming-languages-for-backend-systems",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Backend Engineer",
          "id": 14,
          "rationale": null,
          "role_archetype": null,
          "slug": "backend-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Data Work",
        "id": 67,
        "rationale": "Languages used to implement data pipelines, transformations, and operational utilities. This is the code layer for expressing extraction, parsing, validation, and orchestration logic in data engineering workflows.",
        "slug": "programming-languages-for-data-work",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for ML Systems",
        "id": 113,
        "rationale": "Languages used to implement model integration code, inference services, and feature-processing logic. This is the core coding surface for turning trained models into product-facing software components.",
        "slug": "programming-languages-for-ml-systems",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Machine Learning Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "machine-learning-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Security Work",
        "id": 328,
        "rationale": "Languages used to automate security tasks, write detection logic, and build analysis or remediation tooling. This is the core coding surface for a cybersecurity engineer across scripts, queries, and small utilities.",
        "slug": "programming-languages-for-security-work",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cybersecurity Engineer",
          "id": 9,
          "rationale": null,
          "role_archetype": null,
          "slug": "cybersecurity-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Test Automation",
        "id": 193,
        "rationale": "Languages used to implement automated checks, helper utilities, and test harness code. This is the core coding surface for turning test ideas into maintainable automation.",
        "slug": "programming-languages-for-test-automation",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Automation Tester",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "automation-tester",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Security Automation and Scripting",
        "id": 258,
        "rationale": "Automating repeatable security checks, enrichment, and remediation workflows. This cluster is coherent because the role often needs lightweight automation to scale analysis and response.",
        "slug": "security-automation-and-scripting",
        "source": "db"
      },
      "input_skill": "Python",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cybersecurity Engineer",
          "id": 9,
          "rationale": null,
          "role_archetype": null,
          "slug": "cybersecurity-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Applied Machine Learning Toolkits",
        "id": 94,
        "rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
        "slug": "applied-machine-learning-toolkits",
        "source": "db"
      },
      "input_skill": "TensorFlow",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Scientist",
          "id": 7,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-scientist",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Applied Machine Learning Toolkits",
        "id": 94,
        "rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
        "slug": "applied-machine-learning-toolkits",
        "source": "db"
      },
      "input_skill": "PyTorch",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Scientist",
          "id": 7,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-scientist",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Applied Machine Learning Toolkits",
        "id": 94,
        "rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
        "slug": "applied-machine-learning-toolkits",
        "source": "db"
      },
      "input_skill": "Scikit-learn",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Scientist",
          "id": 7,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-scientist",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Continuous Integration Test Integration",
        "id": 207,
        "rationale": "Integrating automated checks into shared build and merge workflows so results are repeatable and visible. This cluster is coherent because automation testers commonly configure test execution triggers, artifacts, and reporting hooks.",
        "slug": "continuous-integration-test-integration",
        "source": "db"
      },
      "input_skill": "GitHub Actions",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Automation Tester",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "automation-tester",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Workflow Orchestration Systems",
        "id": 64,
        "rationale": "Operational orchestration of ML jobs, dependencies, and handoffs across training, validation, deployment, and retraining. This is a useful split from training pipelines because it emphasizes the scheduler and control plane.",
        "slug": "workflow-orchestration-systems",
        "source": "db"
      },
      "input_skill": "Airflow",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        },
        {
          "display_name": "MLOps Engineer",
          "id": 5,
          "rationale": null,
          "role_archetype": null,
          "slug": "mlops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Infrastructure Provisioning Templates",
        "id": 291,
        "rationale": "Declarative templates and modules used to create repeatable cloud resources and environments. This cluster covers the infrastructure definitions the role applies, reviews, and updates to keep environments consistent.",
        "slug": "infrastructure-provisioning-templates",
        "source": "db"
      },
      "input_skill": "Terraform",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cloud Engineer",
          "id": 18,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Infrastructure as Code",
        "id": 22,
        "rationale": "Defines infrastructure and platform resources through versioned code so environments are repeatable and reviewable. This is a coherent cluster because it underpins environment consistency and change control.",
        "slug": "infrastructure-as-code",
        "source": "db"
      },
      "input_skill": "Terraform",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "DevOps Engineer",
          "id": 1,
          "rationale": null,
          "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
          "slug": "devops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Infrastructure as Code and Declarative Provisioning",
        "id": 36,
        "rationale": "Defines cloud and platform infrastructure declaratively through versioned code so environments are repeatable, reviewable, and automatable. This includes authoring and maintaining IaC templates/modules, managing parameters and state, and using plan/apply workflows to provision and update resources across Azure and other cloud platforms.",
        "slug": "infrastructure-as-code-and-declarative-provisioning",
        "source": "db"
      },
      "input_skill": "Terraform",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Azure Cloud Engineer",
          "id": 4,
          "rationale": null,
          "role_archetype": null,
          "slug": "azure-cloud-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Containerization and Image Delivery",
        "id": 24,
        "rationale": "Builds, packages, and ships application and support workloads as container images. This cluster covers the artifact format and the mechanics of producing deployable images.",
        "slug": "containerization-and-image-delivery",
        "source": "db"
      },
      "input_skill": "Docker",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "DevOps Engineer",
          "id": 1,
          "rationale": null,
          "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
          "slug": "devops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Model Serving Deployment and Runtime Packaging",
        "id": 52,
        "rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
        "slug": "model-serving-deployment-and-runtime-packaging",
        "source": "db"
      },
      "input_skill": "Docker",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "MLOps Engineer",
          "id": 5,
          "rationale": null,
          "role_archetype": null,
          "slug": "mlops-engineer",
          "source": "db"
        },
        {
          "display_name": "Machine Learning Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "machine-learning-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Orchestration Platforms",
        "id": 25,
        "rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
        "slug": "orchestration-platforms",
        "source": "db"
      },
      "input_skill": "Kubernetes",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cloud Engineer",
          "id": 18,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-engineer",
          "source": "db"
        },
        {
          "display_name": "DevOps Engineer",
          "id": 1,
          "rationale": null,
          "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
          "slug": "devops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Applied Machine Learning Tooling and Frameworks",
        "id": null,
        "rationale": "Libraries and frameworks used to prototype, build, train, tune, and compare machine learning models in practice. Includes hands-on AI/ML tooling such as scikit-learn, XGBoost, LightGBM, TensorFlow, and PyTorch, along with model training, feature engineering, hyperparameter tuning, and evaluation workflows. Excludes model serving, deployment packaging, online inference, registry/version promotion, distributed stream processing, and cloud infrastructure provisioning.",
        "slug": "d_merge_01",
        "source": "llm"
      },
      "input_skill": "AI/ML",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "AI Service Integration and Orchestration Patterns",
        "id": null,
        "rationale": "Patterns for structuring and embedding AI/ML capabilities within products and services, including deciding where AI logic lives and how it interacts with application flows. Covers model-backed APIs, retrieval-augmented generation, agent and workflow orchestration, batch scoring services, online inference integration, feature pipelines, and placement across handlers, workers, gateways, and service boundaries.",
        "slug": "d_merge_02",
        "source": "llm"
      },
      "input_skill": "AI/ML",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "AI Inference Cost, Latency, and Throughput Optimization",
        "id": 260,
        "rationale": "Improving the speed, throughput, and cost efficiency of AI and ML-powered product features without sacrificing correctness or user experience. Includes token budgeting, prompt compression, batching, caching, model selection, quantization, pruning, async inference, warm starts, streaming UX, timeout tuning, concurrency control, and profiling. Excludes infrastructure autoscaling, model serving capacity planning, generic backend performance tuning, and unrelated data/warehouse optimization.",
        "slug": "ai-inference-cost-latency-and-throughput-optimization",
        "source": "db"
      },
      "input_skill": "AI/ML",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "AI Engineer",
          "id": 12,
          "rationale": null,
          "role_archetype": null,
          "slug": "ai-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Managed ML Platform Workflows",
        "id": null,
        "rationale": "Cloud ML platforms for building and operating models end-to-end, including notebooks, experiments, managed training jobs, and pipeline/studio workflows. Examples: SageMaker Studio, SageMaker notebooks, SageMaker Pipelines, managed training jobs.",
        "slug": "d_split_01_01",
        "source": "llm"
      },
      "input_skill": "Amazon SageMaker",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Managed Model Hosting and Endpoints",
        "id": null,
        "rationale": "Cloud-managed services for deploying trained models as online or batch inference endpoints, including endpoint provisioning, batch transform, and rollout coordination. Examples: SageMaker endpoints, SageMaker batch transform.",
        "slug": "d_split_01_02",
        "source": "llm"
      },
      "input_skill": "Amazon SageMaker",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Model Serving Runtime Packaging",
        "id": null,
        "rationale": "Packaging trained models and model servers for deployment, including containers, base images, runtime dependencies, entrypoints, GPU images, and image scanning. Examples: Docker images for model services.",
        "slug": "d_split_01_03",
        "source": "llm"
      },
      "input_skill": "Amazon SageMaker",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Model Serving Frameworks and Platforms",
        "id": null,
        "rationale": "Serving frameworks and platforms used to run deployed models in online, batch, or streaming environments. Examples: TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, Seldon Core.",
        "slug": "d_split_01_04",
        "source": "llm"
      },
      "input_skill": "Amazon SageMaker",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Model Runtime Services",
        "id": 121,
        "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
        "slug": "cloud-model-runtime-services",
        "source": "db"
      },
      "input_skill": "Amazon Bedrock",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Machine Learning Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "machine-learning-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Model Runtime Services",
        "id": 121,
        "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
        "slug": "cloud-model-runtime-services",
        "source": "db"
      },
      "input_skill": "Amazon Bedrock",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Machine Learning Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "machine-learning-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Managed Cloud Data Platform Services",
        "id": null,
        "rationale": "Managed cloud services used to run data engineering and related application workloads, including serverless compute, workflow orchestration, storage, event triggers, and adjacent security/platform primitives. This covers services such as AWS Lambda, AWS Step Functions, AWS Glue, Amazon S3 event triggers, and other managed services used to build and operate pipelines and data platforms.",
        "slug": "d_merge_01",
        "source": "llm"
      },
      "input_skill": "AWS Lambda",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "ECS",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Cloud Analytics Query Services",
        "id": null,
        "rationale": "Managed cloud services for querying and processing analytical data, including serverless SQL analytics, data lake querying, and federated query services such as Athena, Glue, Redshift Spectrum, and EMR-based analytics.",
        "slug": "d_split_01_01",
        "source": "llm"
      },
      "input_skill": "Amazon Athena",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Cloud Data Pipeline Runtime",
        "id": null,
        "rationale": "Managed cloud compute and orchestration services used to run data engineering jobs, transformations, and workflows.",
        "slug": "d_split_01_02",
        "source": "llm"
      },
      "input_skill": "Amazon Athena",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Cloud Data Platform Storage",
        "id": null,
        "rationale": "Cloud storage and data access services used as the persistence layer for data engineering platforms and pipelines.",
        "slug": "d_split_01_03",
        "source": "llm"
      },
      "input_skill": "Amazon Athena",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Cloud Data Platform Security and Networking",
        "id": null,
        "rationale": "Identity, access, secrets, and networking primitives used to support cloud data platforms and pipelines.",
        "slug": "d_split_01_04",
        "source": "llm"
      },
      "input_skill": "Amazon Athena",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Data Platform Services",
        "id": 81,
        "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
        "slug": "cloud-data-platform-services",
        "source": "db"
      },
      "input_skill": "AWS Data Pipeline",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Data Platform Services",
        "id": 81,
        "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
        "slug": "cloud-data-platform-services",
        "source": "db"
      },
      "input_skill": "AWS Data Pipeline",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Storage Provisioning and Automation",
        "id": 311,
        "rationale": "Covers the scripts, APIs, and operational workflows used to create, resize, map, and retire storage resources. This cluster is coherent because storage engineers often automate repetitive provisioning and maintenance tasks.",
        "slug": "storage-provisioning-and-automation",
        "source": "db"
      },
      "input_skill": "S3",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Storage Engineer",
          "id": 22,
          "rationale": null,
          "role_archetype": null,
          "slug": "storage-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Streaming Data Processing",
        "id": 69,
        "rationale": "Tools and patterns for ingesting and transforming event streams with low latency. This cluster covers continuous processing, windowing, and stateful stream jobs used to keep data fresh.",
        "slug": "streaming-data-processing",
        "source": "db"
      },
      "input_skill": "Kinesis",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "HTTP API Frameworks and Gateway Layers",
        "id": null,
        "rationale": "Server-side frameworks and gateway layers used to build, route, validate, and manage HTTP APIs. Includes backend web service frameworks and API gateway configuration for endpoints, request/response mapping, routing, input validation, auth integration, throttling, usage plans, stages, custom domains, and backend integration.",
        "slug": "d_merge_01",
        "source": "llm"
      },
      "input_skill": "Amazon API Gateway",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Model Runtime Services",
        "id": 121,
        "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
        "slug": "cloud-model-runtime-services",
        "source": "db"
      },
      "input_skill": "Pinecone",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Machine Learning Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "machine-learning-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Data Platform Services",
        "id": 81,
        "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
        "slug": "cloud-data-platform-services",
        "source": "db"
      },
      "input_skill": "OpenSearch",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "OpenSearch",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": null,
        "display_name": "Applied Machine Learning Toolkits and Frameworks",
        "id": null,
        "rationale": "Libraries and frameworks used to prototype, compare, index, and evaluate machine learning solutions quickly. Includes applied-ML toolkits such as scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, FAISS, vector search libraries, and embedding utilities. Excludes serving runtimes, deployment containers, online inference infrastructure, streaming systems, and database/storage engines.",
        "slug": "d_merge_01",
        "source": "llm"
      },
      "input_skill": "FAISS",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Version Control Systems",
        "id": 365,
        "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "FAISS",
      "llm_role": null,
      "roles_from_db": []
    }
  ],
  "input_final_skills": [
    "AWS",
    "AI/ML",
    "Amazon SageMaker",
    "Amazon Bedrock",
    "AWS Lambda",
    "ECS",
    "EKS",
    "EC2",
    "AWS Glue",
    "Amazon Athena",
    "Redshift",
    "AWS Data Pipeline",
    "S3",
    "Kinesis",
    "Amazon API Gateway",
    "Python",
    "TensorFlow",
    "PyTorch",
    "Scikit-learn",
    "GitHub Actions",
    "Airflow",
    "Terraform",
    "Docker",
    "Kubernetes",
    "Pinecone",
    "OpenSearch",
    "FAISS"
  ],
  "input_llm_skills": [
    "AWS",
    "AI/ML",
    "Amazon SageMaker",
    "Amazon Bedrock",
    "AWS Lambda",
    "ECS",
    "EKS",
    "EC2",
    "AWS Glue",
    "Amazon Athena",
    "Redshift",
    "AWS Data Pipeline",
    "S3",
    "Kinesis",
    "Amazon API Gateway",
    "Python",
    "TensorFlow",
    "PyTorch",
    "Scikit-learn",
    "GitHub Actions",
    "Airflow",
    "Terraform",
    "Docker",
    "Kubernetes",
    "Pinecone",
    "OpenSearch",
    "FAISS"
  ],
  "new_aliases_persisted": 0,
  "run_id": "20755499-04f6-440f-80a9-bb023fddc1ff",
  "skills_detail": [
    {
      "aliases_in_db": [
        {
          "alias_text": "AWS",
          "alias_type": "CANONICAL",
          "id": 348,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "AWS",
        "id": 163,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "aws",
        "sub_category_id": 161,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Platform Operations",
            "id": 26,
            "rationale": "Uses cloud provider services to support delivery and runtime environments. The focus is on consumer-level operation of cloud services rather than deep cloud architecture ownership.",
            "slug": "cloud-platform-operations",
            "source": "db"
          },
          "input_skill": "AWS",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "DevOps Engineer",
              "id": 1,
              "rationale": null,
              "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
              "slug": "devops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Security Platforms",
            "id": 332,
            "rationale": "Cloud-native security products used to assess posture, detect misconfigurations, and monitor workloads across AWS, Azure, and GCP. This is a distinct product family because the role often works across multiple CNAPP/CSPM/CWPP offerings and cloud-native detectors.",
            "slug": "cloud-security-platforms",
            "source": "db"
          },
          "input_skill": "AWS",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cybersecurity Engineer",
              "id": 9,
              "rationale": null,
              "role_archetype": null,
              "slug": "cybersecurity-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "AWS",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Applied Machine Learning Tooling and Frameworks",
            "id": null,
            "rationale": "Libraries and frameworks used to prototype, build, train, tune, and compare machine learning models in practice. Includes hands-on AI/ML tooling such as scikit-learn, XGBoost, LightGBM, TensorFlow, and PyTorch, along with model training, feature engineering, hyperparameter tuning, and evaluation workflows. Excludes model serving, deployment packaging, online inference, registry/version promotion, distributed stream processing, and cloud infrastructure provisioning.",
            "slug": "d_merge_01",
            "source": "llm"
          },
          "input_skill": "AI/ML",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "AI Service Integration and Orchestration Patterns",
            "id": null,
            "rationale": "Patterns for structuring and embedding AI/ML capabilities within products and services, including deciding where AI logic lives and how it interacts with application flows. Covers model-backed APIs, retrieval-augmented generation, agent and workflow orchestration, batch scoring services, online inference integration, feature pipelines, and placement across handlers, workers, gateways, and service boundaries.",
            "slug": "d_merge_02",
            "source": "llm"
          },
          "input_skill": "AI/ML",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "AI Inference Cost, Latency, and Throughput Optimization",
            "id": 260,
            "rationale": "Improving the speed, throughput, and cost efficiency of AI and ML-powered product features without sacrificing correctness or user experience. Includes token budgeting, prompt compression, batching, caching, model selection, quantization, pruning, async inference, warm starts, streaming UX, timeout tuning, concurrency control, and profiling. Excludes infrastructure autoscaling, model serving capacity planning, generic backend performance tuning, and unrelated data/warehouse optimization.",
            "slug": "ai-inference-cost-latency-and-throughput-optimization",
            "source": "db"
          },
          "input_skill": "AI/ML",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "AI Engineer",
              "id": 12,
              "rationale": null,
              "role_archetype": null,
              "slug": "ai-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "AI/ML",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Domain",
          "skill_nature": "CONCEPT",
          "sub_category": "artificial_intelligence_machine_learning",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "AI/ML is a common combined domain label in JDs and usually clearly means artificial intelligence and machine learning, not a different catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "TensorFlow",
              "PyTorch",
              "scikit-learn",
              "deep learning",
              "neural networks",
              "NLP",
              "computer vision",
              "model training",
              "feature engineering",
              "hyperparameter tuning",
              "classification",
              "regression",
              "clustering",
              "reinforcement learning",
              "MLOps"
            ]
          },
          "maturity": {
            "confidence": 0.93,
            "maturity": "well_known",
            "reasoning": "AI/ML appears in a broad share of software and data job postings, with major vendors (AWS, Google, Microsoft) offering mainstream ML platforms and tooling; it\u2019s now a common hiring-pipeline requirement rather than a niche specialty."
          },
          "skill_id": "ai-ml",
          "vendor_license": {
            "confidence": 0.99,
            "license": null,
            "vendor": null,
            "year_introduced": null
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Libraries and frameworks used to prototype, build, train, tune, and compare machine learning models in practice. Includes hands-on AI/ML tooling such as scikit-learn, XGBoost, LightGBM, TensorFlow, and PyTorch, along with model training, feature engineering, hyperparameter tuning, and evaluation workflows. Excludes model serving, deployment packaging, online inference, registry/version promotion, distributed stream processing, and cloud infrastructure provisioning.",
            "exemplar_skills": [
              "Applied Machine Learning Tooling and Frameworks"
            ],
            "in_scope": "Skills, tools, and practices that belong under Applied Machine Learning Tooling and Frameworks for the target role, including items implied by the dimension rationale.",
            "name": "Applied Machine Learning Tooling and Frameworks",
            "out_of_scope": "Adjacent clusters explicitly not owned by Applied Machine Learning Tooling and Frameworks, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          },
          {
            "description": "Patterns for structuring and embedding AI/ML capabilities within products and services, including deciding where AI logic lives and how it interacts with application flows. Covers model-backed APIs, retrieval-augmented generation, agent and workflow orchestration, batch scoring services, online inference integration, feature pipelines, and placement across handlers, workers, gateways, and service boundaries.",
            "exemplar_skills": [
              "AI Service Integration and Orchestration Patterns"
            ],
            "in_scope": "Skills, tools, and practices that belong under AI Service Integration and Orchestration Patterns for the target role, including items implied by the dimension rationale.",
            "name": "AI Service Integration and Orchestration Patterns",
            "out_of_scope": "Adjacent clusters explicitly not owned by AI Service Integration and Orchestration Patterns, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_02"
          },
          {
            "description": "Improving the runtime efficiency of AI/ML-powered features by reducing inference cost and latency while increasing throughput and preserving user experience. Includes token budgeting, prompt compression, batching, caching, quantization, pruning, model selection, async inference, warm starts, streaming UX, timeout tuning, concurrency control, GPU utilization, and profiling. Excludes model training, feature engineering, registry/versioning, infrastructure autoscaling, serving capacity planning, generic backend performance tuning, and unrelated data/warehouse optimization.",
            "exemplar_skills": [
              "AI Inference Cost, Latency, and Throughput Optimization"
            ],
            "in_scope": "Skills, tools, and practices that belong under AI Inference Cost, Latency, and Throughput Optimization for the target role, including items implied by the dimension rationale.",
            "name": "AI Inference Cost, Latency, and Throughput Optimization",
            "out_of_scope": "Adjacent clusters explicitly not owned by AI Inference Cost, Latency, and Throughput Optimization, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_03"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "applied-machine-learning-toolkits",
            "a_name": "Applied Machine Learning Toolkits",
            "a_role": "__skill_focal__",
            "b_dim_id": "applied-machine-learning-toolkits",
            "b_name": "Applied Machine Learning Toolkits",
            "b_role": "Data Scientist",
            "into": "d_merge_01",
            "into_name": "Applied Machine Learning Tooling and Frameworks",
            "merged_from": [
              "applied-machine-learning-toolkits",
              "applied-machine-learning-toolkits"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Dim A and Dim B describe the same conceptual cluster: hands-on machine learning model development using common libraries/frameworks. Dim A explicitly includes scikit-learn, XGBoost, LightGBM, TensorFlow, PyTorch, model training, feature engineering, hyperparameter tuning, and evaluation workflows. Dim B\u2019s description says the same thing in slightly different words: tools to prototype and compare models quickly, capturing the concrete tooling layer beneath modeling methods and evaluation. The overlap is not just naming; the substance is identical, and the cross-role difference does not imply a different skill cluster here because both are about the same applied ML toolkit stack rather than role-specific responsibilities like deployment or infrastructure.",
            "similarity": 0.8079542209553862
          },
          {
            "a_dim_id": "ai-service-architecture-patterns",
            "a_name": "AI Service Architecture Patterns",
            "a_role": "__skill_focal__",
            "b_dim_id": "ai-service-architecture-patterns",
            "b_name": "AI Service Architecture Patterns",
            "b_role": "AI Engineer",
            "into": "d_merge_02",
            "into_name": "AI Service Integration and Orchestration Patterns",
            "merged_from": [
              "ai-service-architecture-patterns",
              "ai-service-architecture-patterns"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims describe the same skill cluster: placing and orchestrating AI capabilities inside product/service architecture. Dim A covers embedding AI/ML into products and services with examples like model-backed APIs, RAG, agent orchestration, and online inference integration. Dim B says the same thing in architectural terms, naming handlers, workers, gateways, and dedicated orchestration services. The overlap is substantive, not just lexical.",
            "similarity": 0.8337425449661009
          },
          {
            "a_dim_id": "ai-inference-cost-latency-and-throughput-optimization",
            "a_name": "AI Inference Performance Optimization",
            "a_role": "__skill_focal__",
            "b_dim_id": "ai-inference-cost-latency-and-throughput-optimization",
            "b_name": "AI Inference Cost, Latency, and Throughput Optimization",
            "b_role": "AI Engineer",
            "into": "d_merge_03",
            "into_name": "AI Inference Cost, Latency, and Throughput Optimization",
            "merged_from": [
              "ai-inference-cost-latency-and-throughput-optimization",
              "ai-inference-cost-latency-and-throughput-optimization"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims target the same AI inference optimization cluster: reducing latency, cost, and improving throughput for AI/ML-powered features at runtime. Dim A includes inference latency, throughput tuning, batching, quantization, caching, GPU utilization, and concurrency control. Dim B covers the same core skills and adds token budgeting, prompt compression, async inference, warm starts, streaming UX, timeout tuning, and profiling. The overlap on batching, caching, quantization, and concurrency control shows they are not distinct clusters; the cross-role difference is only wording.",
            "similarity": 0.8101390953678244
          }
        ],
        "placed": {
          "name": "AI/ML",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 3 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [
            "d_merge_02",
            "d_merge_03"
          ],
          "skill_id": "ai-ml"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "anomaly-investigation",
            "missing-data-analysis",
            "capacity-forecasting",
            "bls",
            "rapid7-insightvm",
            "azure-defender-for-cloud",
            "aws-iam-review",
            "mfa",
            "azure-ad",
            "azure-ad-conditional-access"
          ],
          "requires": [],
          "skill_id": "ai-ml",
          "suppress_on_match": []
        },
        "skill_id": "ai-ml",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.98,
          "name": "AI/ML",
          "reasoning": "AI/ML is a vertical body of knowledge and problem-space rather than a tool, framework, or methodology, so it fits the Domain type.",
          "skill_id": "ai-ml",
          "subtype": "artificial_intelligence_machine_learning",
          "type": "Domain"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:40-\u003e3"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Managed ML Platform Workflows",
            "id": null,
            "rationale": "Cloud ML platforms for building and operating models end-to-end, including notebooks, experiments, managed training jobs, and pipeline/studio workflows. Examples: SageMaker Studio, SageMaker notebooks, SageMaker Pipelines, managed training jobs.",
            "slug": "d_split_01_01",
            "source": "llm"
          },
          "input_skill": "Amazon SageMaker",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Managed Model Hosting and Endpoints",
            "id": null,
            "rationale": "Cloud-managed services for deploying trained models as online or batch inference endpoints, including endpoint provisioning, batch transform, and rollout coordination. Examples: SageMaker endpoints, SageMaker batch transform.",
            "slug": "d_split_01_02",
            "source": "llm"
          },
          "input_skill": "Amazon SageMaker",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Model Serving Runtime Packaging",
            "id": null,
            "rationale": "Packaging trained models and model servers for deployment, including containers, base images, runtime dependencies, entrypoints, GPU images, and image scanning. Examples: Docker images for model services.",
            "slug": "d_split_01_03",
            "source": "llm"
          },
          "input_skill": "Amazon SageMaker",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Model Serving Frameworks and Platforms",
            "id": null,
            "rationale": "Serving frameworks and platforms used to run deployed models in online, batch, or streaming environments. Examples: TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, Seldon Core.",
            "slug": "d_split_01_04",
            "source": "llm"
          },
          "input_skill": "Amazon SageMaker",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Amazon SageMaker",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Platform",
          "skill_nature": "PLATFORM",
          "sub_category": "ml_platform",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "Amazon SageMaker is a specific AWS ML platform name and is usually unambiguous in job descriptions; it is unlikely to be mistaken for a different catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "MLOps",
              "notebooks",
              "training jobs",
              "hyperparameter tuning",
              "model registry",
              "endpoint deployment",
              "batch transform",
              "feature store",
              "pipelines",
              "ground truth",
              "AutoML",
              "S3",
              "IAM",
              "ECR",
              "CloudWatch"
            ]
          },
          "maturity": {
            "confidence": 0.9,
            "maturity": "well_known",
            "reasoning": "Commonly listed in ML/DS job descriptions and AWS\u2019s managed ML platform is broadly adopted for training, deployment, and MLOps across enterprises."
          },
          "skill_id": "amazon-sagemaker",
          "vendor_license": {
            "confidence": 0.98,
            "license": "proprietary",
            "vendor": "Amazon Web Services",
            "year_introduced": 2017
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Cloud ML platforms for building and operating models end-to-end, including notebooks, experiments, managed training jobs, and pipeline/studio workflows. Examples: SageMaker Studio, SageMaker notebooks, SageMaker Pipelines, managed training jobs.",
            "exemplar_skills": [
              "Managed ML Platform Workflows"
            ],
            "in_scope": "Skills, tools, and practices that belong under Managed ML Platform Workflows for the target role, including items implied by the dimension rationale.",
            "name": "Managed ML Platform Workflows",
            "out_of_scope": "Adjacent clusters explicitly not owned by Managed ML Platform Workflows, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_split_01_01"
          },
          {
            "description": "Cloud-managed services for deploying trained models as online or batch inference endpoints, including endpoint provisioning, batch transform, and rollout coordination. Examples: SageMaker endpoints, SageMaker batch transform.",
            "exemplar_skills": [
              "Managed Model Hosting and Endpoints"
            ],
            "in_scope": "Skills, tools, and practices that belong under Managed Model Hosting and Endpoints for the target role, including items implied by the dimension rationale.",
            "name": "Managed Model Hosting and Endpoints",
            "out_of_scope": "Adjacent clusters explicitly not owned by Managed Model Hosting and Endpoints, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_split_01_02"
          },
          {
            "description": "Packaging trained models and model servers for deployment, including containers, base images, runtime dependencies, entrypoints, GPU images, and image scanning. Examples: Docker images for model services.",
            "exemplar_skills": [
              "Model Serving Runtime Packaging"
            ],
            "in_scope": "Skills, tools, and practices that belong under Model Serving Runtime Packaging for the target role, including items implied by the dimension rationale.",
            "name": "Model Serving Runtime Packaging",
            "out_of_scope": "Adjacent clusters explicitly not owned by Model Serving Runtime Packaging, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_split_01_03"
          },
          {
            "description": "Serving frameworks and platforms used to run deployed models in online, batch, or streaming environments. Examples: TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, Seldon Core.",
            "exemplar_skills": [
              "Model Serving Frameworks and Platforms"
            ],
            "in_scope": "Skills, tools, and practices that belong under Model Serving Frameworks and Platforms for the target role, including items implied by the dimension rationale.",
            "name": "Model Serving Frameworks and Platforms",
            "out_of_scope": "Adjacent clusters explicitly not owned by Model Serving Frameworks and Platforms, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_split_01_04"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "Amazon SageMaker",
          "placement_confidence": 0.92,
          "primary_dimension": "d_split_01_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 4 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [
            "d_split_01_02",
            "d_split_01_03"
          ],
          "skill_id": "amazon-sagemaker"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [
            "aws"
          ],
          "related_to": [
            "aws-s3",
            "aws-cdk",
            "aws-cloudformation",
            "aws-kms",
            "aws-guardduty",
            "microsoft-sentinel",
            "aws-iam-review",
            "azure-ad",
            "aks",
            "azure"
          ],
          "requires": [],
          "skill_id": "amazon-sagemaker",
          "suppress_on_match": []
        },
        "skill_id": "amazon-sagemaker",
        "split_log": [
          {
            "a_dim_id": "cloud-model-runtime-services",
            "a_name": "Cloud Model Runtime Services",
            "a_role": "__skill_focal__",
            "b_dim_id": "model-serving-deployment-and-runtime-packaging",
            "b_name": "Model Serving Deployment and Runtime Packaging",
            "b_role": "__skill_focal__",
            "into": [
              "d_split_01_01",
              "d_split_01_02",
              "d_split_01_03",
              "d_split_01_04"
            ],
            "into_names": [
              "Managed ML Platform Workflows",
              "Managed Model Hosting and Endpoints",
              "Model Serving Runtime Packaging",
              "Model Serving Frameworks and Platforms"
            ],
            "pair_kind": "intra_role",
            "reasoning": "Dim A is broader: it covers training, notebooks, pipelines, endpoints, and MLOps workflows, with SageMaker Studio/Pipelines/training jobs as exemplars. Dim B is narrower and specifically about packaging trained models for serving runtimes, with TensorFlow Serving, TorchServe, Triton, BentoML, KServe, Seldon Core, plus Docker/GPU container concerns. The overlap is only around deployment/serving; A also includes managed training and platform workflow skills that are not B\u0027s focus. So A should be split into narrower siblings rather than merged.",
            "similarity": 0.7320229250688253,
            "split_from": [
              "cloud-model-runtime-services",
              "model-serving-deployment-and-runtime-packaging"
            ]
          }
        ],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.98,
          "name": "Amazon SageMaker",
          "reasoning": "By the Platform vs Tool rule, Amazon SageMaker is a hosted multi-tenant AWS environment with APIs and managed machine-learning capabilities, so it is a Platform rather than a Tool or a single Service in this typology.",
          "skill_id": "amazon-sagemaker",
          "subtype": "ml_platform",
          "type": "Platform"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:42-\u003e4"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Model Runtime Services",
            "id": 121,
            "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
            "slug": "cloud-model-runtime-services",
            "source": "db"
          },
          "input_skill": "Amazon Bedrock",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Machine Learning Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "machine-learning-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Model Runtime Services",
            "id": 121,
            "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
            "slug": "cloud-model-runtime-services",
            "source": "db"
          },
          "input_skill": "Amazon Bedrock",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Machine Learning Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "machine-learning-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Amazon Bedrock",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Service",
          "skill_nature": "CLOUD_SERVICE",
          "sub_category": "managed_ai_model_service",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "EMERGING"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "Amazon Bedrock is a specific AWS managed AI model service with a distinctive name; typical JDs mentioning it are unlikely to mean a different catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "foundation models",
              "FM",
              "prompt engineering",
              "RAG",
              "vector database",
              "embeddings",
              "guardrails",
              "Agents for Amazon Bedrock",
              "Knowledge Bases",
              "model invocation",
              "fine-tuning",
              "inference",
              "LLM",
              "LangChain",
              "Anthropic Claude"
            ]
          },
          "maturity": {
            "confidence": 0.86,
            "maturity": "emerging",
            "reasoning": "Appears increasingly in cloud/ML job descriptions and AWS partner materials, but JD volume is still far below core AWS services like S3 or Lambda."
          },
          "skill_id": "amazon-bedrock",
          "vendor_license": {
            "confidence": 0.98,
            "license": "proprietary",
            "vendor": "Amazon Web Services",
            "year_introduced": 2023
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [
          {
            "a_dim_id": "cloud-model-runtime-services",
            "a_name": "Cloud Model Runtime Services",
            "a_role": "__skill_focal__",
            "b_dim_id": "cloud-model-runtime-services",
            "b_name": "Cloud Model Runtime Services",
            "b_role": "Machine Learning Engineer",
            "pair_kind": "cross_role",
            "reasoning": "Dim A is about managed foundation-model product services like Amazon Bedrock, Bedrock Agents, Bedrock Knowledge Bases, prompt orchestration, and guardrails. Dim B is broader cloud inference/runtime support for MLEs, emphasizing deployment and tuning on cloud compute, networking, and storage primitives. The overlap is only the shared runtime/inference wording; the concrete skills and anchors differ, so they are distinct clusters.",
            "similarity": 0.6605349086129982
          }
        ],
        "locked_dimensions": [
          {
            "description": "Consumer-facing managed services used to run, invoke, and integrate foundation models and related AI capabilities in cloud applications. Amazon Bedrock belongs here because it provides hosted model access, orchestration features, and runtime APIs for generative AI workloads.",
            "exemplar_skills": [
              "Amazon Bedrock",
              "Bedrock Agents",
              "Bedrock Knowledge Bases",
              "foundation model APIs",
              "prompt orchestration",
              "guardrails for generative AI"
            ],
            "in_scope": "Amazon Bedrock, model invocation APIs, foundation model access, prompt orchestration, guardrails, agents, knowledge bases, embeddings, managed inference endpoints",
            "name": "Cloud Model Runtime Services",
            "out_of_scope": "Model training pipelines, offline feature engineering, model registry workflows, these belong to model development and MLOps dimensions; generic cloud storage or networking, which are covered elsewhere",
            "overlap_flags": [
              {
                "reason": "Bedrock is often used as part of broader AI application architecture, but this dimension focuses on the managed runtime service itself.",
                "with_dim_id": "ai-service-architecture-patterns",
                "with_dim_name": null,
                "with_role": "AI Engineer"
              },
              {
                "reason": "Bedrock usage can involve tuning latency and cost, but that dimension owns optimization concerns rather than service selection.",
                "with_dim_id": "ai-inference-cost-latency-and-throughput-optimization",
                "with_dim_name": null,
                "with_role": "AI Engineer"
              }
            ],
            "tentative_id": "cloud-model-runtime-services"
          },
          {
            "description": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
            "exemplar_skills": [
              "Cloud Model Runtime Services"
            ],
            "in_scope": "Skills, tools, and practices that belong under Cloud Model Runtime Services for the target role, including items implied by the dimension rationale.",
            "name": "Cloud Model Runtime Services",
            "out_of_scope": "Adjacent clusters explicitly not owned by Cloud Model Runtime Services, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "cloud-model-runtime-services"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "Amazon Bedrock",
          "placement_confidence": 0.92,
          "primary_dimension": "cloud-model-runtime-services",
          "reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "amazon-bedrock"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [
            "aws"
          ],
          "related_to": [
            "azure",
            "azure-ad",
            "azure-defender-for-cloud",
            "azure-key-vault",
            "azure-expressroute",
            "aws-cdk",
            "aws-cloudformation",
            "aws-kms",
            "aws-s3",
            "aws-vpc"
          ],
          "requires": [],
          "skill_id": "amazon-bedrock",
          "suppress_on_match": []
        },
        "skill_id": "amazon-bedrock",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.97,
          "name": "Amazon Bedrock",
          "reasoning": "By the Platform vs Service rule, Amazon Bedrock is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.",
          "skill_id": "amazon-bedrock",
          "subtype": "managed_ai_model_service",
          "type": "Service"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:41-\u003e2"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Managed Cloud Data Platform Services",
            "id": null,
            "rationale": "Managed cloud services used to run data engineering and related application workloads, including serverless compute, workflow orchestration, storage, event triggers, and adjacent security/platform primitives. This covers services such as AWS Lambda, AWS Step Functions, AWS Glue, Amazon S3 event triggers, and other managed services used to build and operate pipelines and data platforms.",
            "slug": "d_merge_01",
            "source": "llm"
          },
          "input_skill": "AWS Lambda",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "AWS Lambda",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Service",
          "skill_nature": "CLOUD_SERVICE",
          "sub_category": "serverless_compute_service",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "AWS Lambda is a specific AWS serverless compute service with a distinctive full name; in typical JDs it is unlikely to be confused with unrelated skills in the catalog."
          },
          "context_keywords": {
            "context_keywords": [
              "serverless",
              "event-driven",
              "API Gateway",
              "CloudWatch",
              "IAM role",
              "S3 trigger",
              "SNS",
              "SQS",
              "Step Functions",
              "DynamoDB",
              "Lambda layers",
              "cold start",
              "Node.js",
              "Python",
              "VPC"
            ]
          },
          "maturity": {
            "confidence": 0.97,
            "maturity": "well_known",
            "reasoning": "Broadly adopted serverless compute; AWS Lambda appears in many cloud/backend job descriptions and is a standard AWS offering with strong ecosystem support."
          },
          "skill_id": "aws-lambda",
          "vendor_license": {
            "confidence": 0.99,
            "license": "proprietary",
            "vendor": "Amazon Web Services",
            "year_introduced": 2014
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Managed cloud services used to run data engineering and related application workloads, including serverless compute, workflow orchestration, storage, event triggers, and adjacent security/platform primitives. This covers services such as AWS Lambda, AWS Step Functions, AWS Glue, Amazon S3 event triggers, and other managed services used to build and operate pipelines and data platforms.",
            "exemplar_skills": [
              "Managed Cloud Data Platform Services"
            ],
            "in_scope": "Skills, tools, and practices that belong under Managed Cloud Data Platform Services for the target role, including items implied by the dimension rationale.",
            "name": "Managed Cloud Data Platform Services",
            "out_of_scope": "Adjacent clusters explicitly not owned by Managed Cloud Data Platform Services, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "cloud-data-platform-services",
            "a_name": "Cloud Data Platform Services",
            "a_role": "__skill_focal__",
            "b_dim_id": "cloud-data-platform-services",
            "b_name": "Cloud Data Platform Services",
            "b_role": "Data Engineer",
            "into": "d_merge_01",
            "into_name": "Managed Cloud Data Platform Services",
            "merged_from": [
              "cloud-data-platform-services",
              "cloud-data-platform-services"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims describe the same managed-cloud service cluster for data workloads. Dim A centers on serverless/managed execution and orchestration with concrete examples like AWS Lambda, AWS Step Functions, AWS Glue, and S3 event triggers. Dim B describes cloud services used to run data engineering pipelines, including managed compute, storage, networking-adjacent services, and security primitives. Those are the same skills in practice; B is just broader wording and A gives specific exemplars.",
            "similarity": 0.7804737777237593
          }
        ],
        "placed": {
          "name": "AWS Lambda",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "aws-lambda"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [
            "aws"
          ],
          "related_to": [
            "aws-s3",
            "aws-cloudformation",
            "aws-guardduty",
            "aws-kms",
            "aws-cdk",
            "aws-vpc",
            "aws-direct-connect",
            "ec2",
            "azure-expressroute",
            "rest-apis"
          ],
          "requires": [
            "aws-iam-review"
          ],
          "skill_id": "aws-lambda",
          "suppress_on_match": []
        },
        "skill_id": "aws-lambda",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.99,
          "name": "AWS Lambda",
          "reasoning": "By the Service vs Platform rule, AWS Lambda is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.",
          "skill_id": "aws-lambda",
          "subtype": "serverless_compute_service",
          "type": "Service"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:40-\u003e1"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "ECS",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "ECS",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Concept",
          "skill_nature": "CONCEPT",
          "sub_category": "entity_component_system",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": true,
            "confused_with": [
              "amazon_ecs",
              "elastic_container_service"
            ],
            "reasoning": "\u201cECS\u201d is a common acronym and in JDs often means Amazon Elastic Container Service; it can also be read as the generic entity-component-system architecture concept."
          },
          "context_keywords": {
            "context_keywords": [
              "entity-component-system",
              "game engine",
              "gameplay architecture",
              "component-based architecture",
              "systems",
              "entities",
              "components",
              "data-oriented design",
              "Unity",
              "Unreal Engine",
              "rendering pipeline",
              "physics engine",
              "scheduling",
              "serialization",
              "scene graph"
            ]
          },
          "maturity": {
            "confidence": 0.78,
            "maturity": "well_known",
            "reasoning": "ECS appears in many game-engine and engine-architecture job descriptions, especially in Unity/DOTS and Rust/C++ gameplay systems, and has strong GitHub/library activity; it\u2019s a common modern architecture pattern rather than a niche tool."
          },
          "skill_id": "ecs",
          "vendor_license": {
            "confidence": 0.99,
            "license": null,
            "vendor": null,
            "year_introduced": null
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Managed services for running and scaling containerized workloads. ECS belongs here because it is an orchestration platform for scheduling tasks, managing services, and coordinating container runtime operations.",
            "exemplar_skills": [
              "ECS",
              "Amazon ECS",
              "ECS task definitions",
              "ECS services",
              "ECS clusters",
              "Fargate",
              "capacity providers",
              "service autoscaling"
            ],
            "in_scope": "ECS, Amazon ECS, task definitions, services, clusters, capacity providers, service autoscaling, rolling deployments, Fargate, EC2 launch type, container scheduling",
            "name": "Container Orchestration Services",
            "out_of_scope": "Kubernetes control planes and manifests, image building and registry management, application code inside containers, load balancer design, general cloud networking",
            "overlap_flags": [
              {
                "reason": "ECS capacity and autoscaling decisions often intersect with broader scaling strategy and workload sizing.",
                "with_dim_id": "scalability-and-performance-architecture",
                "with_dim_name": null,
                "with_role": "Cloud Architect"
              },
              {
                "reason": "ECS is sometimes used as a managed compute substrate in cloud data workflows, but the orchestration layer is the primary fit.",
                "with_dim_id": "cloud-data-platform-services",
                "with_dim_name": null,
                "with_role": "Data Engineer"
              }
            ],
            "tentative_id": "d_init_01"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "ECS",
          "placement_confidence": 0.92,
          "primary_dimension": "d_init_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "ecs"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "ec2",
            "aks",
            "gke",
            "vmware-esxi",
            "vcenter-server",
            "dex",
            "ethereum",
            "erc-20",
            "erc-1155",
            "ethers-js"
          ],
          "requires": [],
          "skill_id": "ecs",
          "suppress_on_match": []
        },
        "skill_id": "ecs",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.93,
          "name": "ECS",
          "reasoning": "ECS is fundamentally the Entity-Component-System design pattern, so by the Architecture vs Concept rule it is best typed as a Concept rather than a tool or platform.",
          "skill_id": "ecs",
          "subtype": "entity_component_system",
          "type": "Concept"
        },
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "EKS",
          "alias_type": "CANONICAL",
          "id": 1093,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 14,
        "display_name": "EKS",
        "id": 725,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CLOUD_SERVICE",
        "slug": "eks",
        "sub_category_id": 251,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Model Runtime Services",
            "id": 121,
            "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
            "slug": "cloud-model-runtime-services",
            "source": "db"
          },
          "input_skill": "EKS",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Machine Learning Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "machine-learning-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Orchestration Platforms",
            "id": 25,
            "rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
            "slug": "orchestration-platforms",
            "source": "db"
          },
          "input_skill": "EKS",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cloud Engineer",
              "id": 18,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-engineer",
              "source": "db"
            },
            {
              "display_name": "DevOps Engineer",
              "id": 1,
              "rationale": null,
              "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
              "slug": "devops-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "EKS",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "EC2",
          "alias_type": "CANONICAL",
          "id": 2372,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 14,
        "display_name": "EC2",
        "id": 1773,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CLOUD_SERVICE",
        "slug": "ec2",
        "sub_category_id": 1544,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Provider Core Services",
            "id": 290,
            "rationale": "Core managed services used to provision and operate cloud environments. This is the base cloud surface for compute, storage, networking, and platform primitives the role configures and maintains.",
            "slug": "cloud-provider-core-services",
            "source": "db"
          },
          "input_skill": "EC2",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cloud Engineer",
              "id": 18,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "EC2",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "AWS Glue",
          "alias_type": "CANONICAL",
          "id": 730,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 14,
        "display_name": "AWS Glue",
        "id": 466,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CLOUD_SERVICE",
        "slug": "aws-glue",
        "sub_category_id": 385,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Data Platform Services",
            "id": 81,
            "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
            "slug": "cloud-data-platform-services",
            "source": "db"
          },
          "input_skill": "AWS Glue",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "AWS Glue",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Cloud Analytics Query Services",
            "id": null,
            "rationale": "Managed cloud services for querying and processing analytical data, including serverless SQL analytics, data lake querying, and federated query services such as Athena, Glue, Redshift Spectrum, and EMR-based analytics.",
            "slug": "d_split_01_01",
            "source": "llm"
          },
          "input_skill": "Amazon Athena",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Cloud Data Pipeline Runtime",
            "id": null,
            "rationale": "Managed cloud compute and orchestration services used to run data engineering jobs, transformations, and workflows.",
            "slug": "d_split_01_02",
            "source": "llm"
          },
          "input_skill": "Amazon Athena",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Cloud Data Platform Storage",
            "id": null,
            "rationale": "Cloud storage and data access services used as the persistence layer for data engineering platforms and pipelines.",
            "slug": "d_split_01_03",
            "source": "llm"
          },
          "input_skill": "Amazon Athena",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Cloud Data Platform Security and Networking",
            "id": null,
            "rationale": "Identity, access, secrets, and networking primitives used to support cloud data platforms and pipelines.",
            "slug": "d_split_01_04",
            "source": "llm"
          },
          "input_skill": "Amazon Athena",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Amazon Athena",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Service",
          "skill_nature": "CLOUD_SERVICE",
          "sub_category": "query_service",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "Amazon Athena is a specific AWS query service with a distinctive full name; in typical JDs it is unlikely to be confused with another catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "AWS Glue",
              "S3",
              "Presto",
              "Trino",
              "SQL",
              "CTAS",
              "partitioning",
              "Parquet",
              "ORC",
              "Glue Data Catalog",
              "Athena Federated Query",
              "IAM",
              "Lake Formation",
              "JDBC",
              "serverless analytics"
            ]
          },
          "maturity": {
            "confidence": 0.91,
            "maturity": "well_known",
            "reasoning": "Commonly listed in cloud/data analytics JDs and AWS\u2019s own docs position Athena as a standard serverless SQL query service for S3 data lakes, indicating broad market adoption."
          },
          "skill_id": "amazon-athena",
          "vendor_license": {
            "confidence": 0.99,
            "license": "proprietary",
            "vendor": "Amazon Web Services",
            "year_introduced": 2016
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Managed cloud services for querying and processing analytical data, including serverless SQL analytics, data lake querying, and federated query services such as Athena, Glue, Redshift Spectrum, and EMR-based analytics.",
            "exemplar_skills": [
              "Cloud Analytics Query Services"
            ],
            "in_scope": "Skills, tools, and practices that belong under Cloud Analytics Query Services for the target role, including items implied by the dimension rationale.",
            "name": "Cloud Analytics Query Services",
            "out_of_scope": "Adjacent clusters explicitly not owned by Cloud Analytics Query Services, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_split_01_01"
          },
          {
            "description": "Managed cloud compute and orchestration services used to run data engineering jobs, transformations, and workflows.",
            "exemplar_skills": [
              "Cloud Data Pipeline Runtime"
            ],
            "in_scope": "Skills, tools, and practices that belong under Cloud Data Pipeline Runtime for the target role, including items implied by the dimension rationale.",
            "name": "Cloud Data Pipeline Runtime",
            "out_of_scope": "Adjacent clusters explicitly not owned by Cloud Data Pipeline Runtime, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_split_01_02"
          },
          {
            "description": "Cloud storage and data access services used as the persistence layer for data engineering platforms and pipelines.",
            "exemplar_skills": [
              "Cloud Data Platform Storage"
            ],
            "in_scope": "Skills, tools, and practices that belong under Cloud Data Platform Storage for the target role, including items implied by the dimension rationale.",
            "name": "Cloud Data Platform Storage",
            "out_of_scope": "Adjacent clusters explicitly not owned by Cloud Data Platform Storage, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_split_01_03"
          },
          {
            "description": "Identity, access, secrets, and networking primitives used to support cloud data platforms and pipelines.",
            "exemplar_skills": [
              "Cloud Data Platform Security and Networking"
            ],
            "in_scope": "Skills, tools, and practices that belong under Cloud Data Platform Security and Networking for the target role, including items implied by the dimension rationale.",
            "name": "Cloud Data Platform Security and Networking",
            "out_of_scope": "Adjacent clusters explicitly not owned by Cloud Data Platform Security and Networking, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_split_01_04"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "Amazon Athena",
          "placement_confidence": 0.92,
          "primary_dimension": "d_split_01_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 4 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [
            "d_split_01_02",
            "d_split_01_03"
          ],
          "skill_id": "amazon-athena"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [
            "aws"
          ],
          "related_to": [
            "aws-s3",
            "aws-kms",
            "aws-cloudformation",
            "aws-direct-connect",
            "aws-cdk",
            "aws-guardduty",
            "aws-vpc",
            "rest-apis"
          ],
          "requires": [],
          "skill_id": "amazon-athena",
          "suppress_on_match": []
        },
        "skill_id": "amazon-athena",
        "split_log": [
          {
            "a_dim_id": "cloud-data-platform-services",
            "a_name": "Cloud Data Platform Services",
            "a_role": "__skill_focal__",
            "b_dim_id": "cloud-data-platform-services",
            "b_name": "Cloud Data Platform Services",
            "b_role": "Data Engineer",
            "into": [
              "d_split_01_01",
              "d_split_01_02",
              "d_split_01_03",
              "d_split_01_04"
            ],
            "into_names": [
              "Cloud Analytics Query Services",
              "Cloud Data Pipeline Runtime",
              "Cloud Data Platform Storage",
              "Cloud Data Platform Security and Networking"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Dim A is a narrow analytics/query-services cluster (Athena, Glue, Redshift Spectrum, EMR, serverless SQL analytics, data lake querying). Dim B is a broader umbrella for cloud services used in data engineering pipelines, including compute, storage, networking-adjacent, and security primitives. The overlap comes from B being too broad, not from identical substance. Split B into narrower siblings so the analytics-query piece stays separate from pipeline/runtime and platform-infra services.",
            "similarity": 0.7316862613085077,
            "split_from": [
              "cloud-data-platform-services",
              "cloud-data-platform-services"
            ]
          }
        ],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.98,
          "name": "Amazon Athena",
          "reasoning": "By the Platform vs Tool and Service vs Platform rules, Amazon Athena is a managed capability inside AWS rather than software you run yourself, so it is a Service.",
          "skill_id": "amazon-athena",
          "subtype": "query_service",
          "type": "Service"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:43-\u003e4"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Redshift",
          "alias_type": "CANONICAL",
          "id": 3367,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "Redshift",
        "id": 2570,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "redshift",
        "sub_category_id": 2098,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Data Warehousing Platforms",
            "id": 72,
            "rationale": "Cloud and on-prem analytical storage systems used to persist curated datasets and serve downstream consumers. This cluster is about the warehouse/lakehouse layer where transformed data is organized for access.",
            "slug": "data-warehousing-platforms",
            "source": "db"
          },
          "input_skill": "Redshift",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Redshift",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Data Platform Services",
            "id": 81,
            "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
            "slug": "cloud-data-platform-services",
            "source": "db"
          },
          "input_skill": "AWS Data Pipeline",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Data Platform Services",
            "id": 81,
            "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
            "slug": "cloud-data-platform-services",
            "source": "db"
          },
          "input_skill": "AWS Data Pipeline",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "AWS Data Pipeline",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Service",
          "skill_nature": "CLOUD_SERVICE",
          "sub_category": "data_pipeline_service",
          "typical_lifespan": "SHORT_LIVED",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "DEPRECATED"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "AWS Data Pipeline is a specific AWS service name and is unlikely to be mistaken for another catalog skill in a typical JD."
          },
          "context_keywords": {
            "context_keywords": [
              "ETL",
              "S3",
              "Redshift",
              "EMR",
              "Glue",
              "RDS",
              "EC2",
              "Lambda",
              "Step Functions",
              "Kinesis",
              "Athena",
              "Data Lake",
              "Apache Spark",
              "cron",
              "orchestration"
            ]
          },
          "maturity": {
            "confidence": 0.96,
            "maturity": "deprecated",
            "reasoning": "AWS announced AWS Data Pipeline is in maintenance mode and recommends newer services like Glue/Step Functions; recent JDs rarely list it compared with modern AWS data tooling."
          },
          "skill_id": "aws-data-pipeline",
          "vendor_license": {
            "confidence": 0.98,
            "license": "proprietary",
            "vendor": "Amazon Web Services",
            "year_introduced": 2012
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [
          {
            "a_dim_id": "cloud-data-platform-services",
            "a_name": "Cloud Data Platform Services",
            "a_role": "__skill_focal__",
            "b_dim_id": "cloud-data-platform-services",
            "b_name": "Cloud Data Platform Services",
            "b_role": "Data Engineer",
            "pair_kind": "cross_role",
            "reasoning": "Dim A is about managed cloud data-platform products and orchestration services, e.g. AWS Glue, Amazon EMR, Amazon Redshift, and AWS Data Pipeline for ETL and scheduled data movement. Dim B describes consumer use of cloud services that support data engineering workloads, including managed compute, storage, networking-adjacent services, and security primitives. A is service/catalog focused; B is infrastructure/usage focused. Same label, different skill clusters.",
            "similarity": 0.828494432740371
          }
        ],
        "locked_dimensions": [
          {
            "description": "Managed cloud services used to build and operate data engineering workloads. AWS Data Pipeline fits here because it is an AWS service for orchestrating data movement and scheduled processing across storage and compute services.",
            "exemplar_skills": [
              "AWS Data Pipeline",
              "AWS Glue",
              "Amazon EMR",
              "Amazon Redshift",
              "Amazon S3 ETL workflows"
            ],
            "in_scope": "AWS Data Pipeline, AWS Glue, Amazon EMR, Amazon Redshift, Amazon S3 data workflows, managed ETL orchestration, scheduled batch data movement, cloud data ingestion services",
            "name": "Cloud Data Platform Services",
            "out_of_scope": "Streaming engines and low-latency event processing, which belong to streaming-data-processing; model training or inference services, which belong to ML platform dimensions; generic infrastructure provisioning, which belongs to infrastructure-provisioning-templates",
            "overlap_flags": [
              {
                "reason": "Both can move and transform data, but AWS Data Pipeline is primarily batch/scheduled orchestration rather than continuous stream processing.",
                "with_dim_id": "streaming-data-processing",
                "with_dim_name": null,
                "with_role": "Data Engineer"
              },
              {
                "reason": "Pipeline setup may involve infrastructure definitions, but the core skill is data service orchestration rather than declarative resource provisioning.",
                "with_dim_id": "infrastructure-provisioning-templates",
                "with_dim_name": null,
                "with_role": "Cloud Engineer"
              }
            ],
            "tentative_id": "cloud-data-platform-services"
          },
          {
            "description": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
            "exemplar_skills": [
              "Cloud Data Platform Services"
            ],
            "in_scope": "Skills, tools, and practices that belong under Cloud Data Platform Services for the target role, including items implied by the dimension rationale.",
            "name": "Cloud Data Platform Services",
            "out_of_scope": "Adjacent clusters explicitly not owned by Cloud Data Platform Services, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "cloud-data-platform-services"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "AWS Data Pipeline",
          "placement_confidence": 0.92,
          "primary_dimension": "cloud-data-platform-services",
          "reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "aws-data-pipeline"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [
            "aws"
          ],
          "related_to": [
            "aws-s3",
            "aws-cloudformation",
            "aws-cdk",
            "aws-direct-connect",
            "aws-vpc",
            "azure",
            "azure-expressroute",
            "aks",
            "rest-apis",
            "ec2"
          ],
          "requires": [],
          "skill_id": "aws-data-pipeline",
          "suppress_on_match": []
        },
        "skill_id": "aws-data-pipeline",
        "split_log": [],
        "typed": {
          "alternatives_considered": [
            "Platform: ruled out \u2014 AWS is the platform, while Data Pipeline is one managed capability within it.",
            "Tool: ruled out \u2014 it is consumed as a managed AWS offering, not software you run yourself."
          ],
          "confidence": 0.97,
          "name": "AWS Data Pipeline",
          "reasoning": "By the Service vs Platform rule, AWS Data Pipeline is a specific managed capability inside AWS rather than the AWS platform itself.",
          "skill_id": "aws-data-pipeline",
          "subtype": "data_pipeline_service",
          "type": "Service"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:41-\u003e2"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Storage Provisioning and Automation",
            "id": 311,
            "rationale": "Covers the scripts, APIs, and operational workflows used to create, resize, map, and retire storage resources. This cluster is coherent because storage engineers often automate repetitive provisioning and maintenance tasks.",
            "slug": "storage-provisioning-and-automation",
            "source": "db"
          },
          "input_skill": "S3",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Storage Engineer",
              "id": 22,
              "rationale": null,
              "role_archetype": null,
              "slug": "storage-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "S3",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Platform",
          "skill_nature": "PLATFORM",
          "sub_category": "cloud_storage_platform",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": true,
            "confused_with": [
              "s4"
            ],
            "reasoning": "\"S3\" is a short acronym that in JDs can mean AWS S3, but could also be read as a generic storage tier/label or other S3-named products in the catalog. A reasonable extractor may confuse it with adjacent cloud storage skills."
          },
          "context_keywords": {
            "context_keywords": [
              "bucket",
              "object storage",
              "prefix",
              "versioning",
              "lifecycle policy",
              "bucket policy",
              "IAM",
              "replication",
              "multipart upload",
              "presigned URL",
              "SSE-S3",
              "SSE-KMS",
              "event notifications",
              "static website hosting",
              "storage class"
            ]
          },
          "maturity": {
            "confidence": 0.98,
            "maturity": "well_known",
            "reasoning": "Amazon S3 is a default cloud storage requirement in many job descriptions and is a core AWS service with broad ecosystem support; no sunset or replacement signal exists."
          },
          "skill_id": "s3",
          "vendor_license": {
            "confidence": 0.99,
            "license": "proprietary",
            "vendor": "Amazon Web Services",
            "year_introduced": 2006
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Covers creating, configuring, and operating S3-style object storage resources and their access controls. S3 belongs here because it is the canonical AWS object storage service used for buckets, objects, lifecycle, and access policies.",
            "exemplar_skills": [
              "S3",
              "Amazon S3",
              "S3 bucket policies",
              "S3 lifecycle management",
              "S3 versioning",
              "S3 multipart upload"
            ],
            "in_scope": "S3, S3 buckets, object storage, bucket policies, lifecycle rules, versioning, encryption at rest, access control lists, presigned URLs, multipart upload, object tagging",
            "name": "Object Storage Provisioning",
            "out_of_scope": "Block storage volumes, file shares, database storage, and storage migration planning, which belong to other storage or migration dimensions.",
            "overlap_flags": [
              {
                "reason": "S3 is often used as a managed data lake landing zone, so it can overlap with cloud data platform usage patterns.",
                "with_dim_id": "cloud-data-platform-services",
                "with_dim_name": null,
                "with_role": "Data Engineer"
              },
              {
                "reason": "Model artifacts and inference assets are sometimes stored in S3, creating incidental overlap with ML deployment workflows.",
                "with_dim_id": "cloud-model-runtime-services",
                "with_dim_name": null,
                "with_role": "Machine Learning Engineer"
              }
            ],
            "tentative_id": "storage-provisioning-and-automation"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "S3",
          "placement_confidence": 0.92,
          "primary_dimension": "storage-provisioning-and-automation",
          "reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "s3"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [
            "aws-s3"
          ],
          "related_to": [
            "ec2",
            "ebs-snapshots",
            "aks",
            "iscsi",
            "hsm",
            "hsms",
            "avalanche",
            "sui"
          ],
          "requires": [],
          "skill_id": "s3",
          "suppress_on_match": [
            "aws-s3"
          ]
        },
        "skill_id": "s3",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.91,
          "name": "S3",
          "reasoning": "By the Platform vs Service rule, S3 is a hosted multi-tenant AWS capability with APIs rather than software you run yourself, so it fits Platform best.",
          "skill_id": "s3",
          "subtype": "cloud_storage_platform",
          "type": "Platform"
        },
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Streaming Data Processing",
            "id": 69,
            "rationale": "Tools and patterns for ingesting and transforming event streams with low latency. This cluster covers continuous processing, windowing, and stateful stream jobs used to keep data fresh.",
            "slug": "streaming-data-processing",
            "source": "db"
          },
          "input_skill": "Kinesis",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Kinesis",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Service",
          "skill_nature": "CLOUD_SERVICE",
          "sub_category": "streaming_data_service",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "In JDs, Kinesis usually clearly refers to AWS Kinesis, a distinct streaming service. The name is not a common overloaded acronym or short token likely to be mistaken for another catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "streaming",
              "event-driven",
              "real-time ingestion",
              "shards",
              "producers",
              "consumers",
              "Kinesis Data Streams",
              "Kinesis Data Firehose",
              "Kinesis Data Analytics",
              "Lambda",
              "S3",
              "CloudWatch",
              "partition key",
              "checkpointing",
              "throughput"
            ]
          },
          "maturity": {
            "confidence": 0.89,
            "maturity": "well_known",
            "reasoning": "AWS Kinesis appears in many cloud/data engineering job postings and is a standard managed streaming service in AWS stacks; no vendor sunset indicates active market demand."
          },
          "skill_id": "kinesis",
          "vendor_license": {
            "confidence": 0.98,
            "license": "proprietary",
            "vendor": "Amazon Web Services",
            "year_introduced": 2013
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Tools and patterns for ingesting, buffering, and transforming event streams with low latency. This includes continuous processing, windowing, stateful stream jobs, checkpointing, shard scaling, stream partitioning, and managed streaming services such as Kinesis, Amazon Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics.",
            "exemplar_skills": [
              "Streaming Data Processing"
            ],
            "in_scope": "Skills, tools, and practices that belong under Streaming Data Processing for the target role, including items implied by the dimension rationale.",
            "name": "Streaming Data Processing",
            "out_of_scope": "Adjacent clusters explicitly not owned by Streaming Data Processing, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "streaming-data-processing",
            "a_name": "Streaming Data Processing",
            "a_role": "__skill_focal__",
            "b_dim_id": "streaming-data-processing",
            "b_name": "Streaming Data Processing",
            "b_role": "Data Engineer",
            "into": "d_merge_01",
            "into_name": "Streaming Data Processing",
            "merged_from": [
              "streaming-data-processing",
              "streaming-data-processing"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims define the same skill cluster: low-latency ingestion and transformation of event streams. A includes Kinesis, shard scaling, checkpointing, and windowed processing; B describes continuous processing, windowing, and stateful stream jobs. The wording differs, but the substance is identical, and Kinesis is clearly part of the same streaming-processing backbone rather than a separate concept.",
            "similarity": 0.7704853050706059
          }
        ],
        "placed": {
          "name": "Kinesis",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "kinesis"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [
            "aws"
          ],
          "related_to": [
            "gke",
            "aks",
            "ec2",
            "aws-kms",
            "aws-cdk",
            "quicknode",
            "the-graph",
            "avalanche",
            "event-emission",
            "idempotency"
          ],
          "requires": [],
          "skill_id": "kinesis",
          "suppress_on_match": []
        },
        "skill_id": "kinesis",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.93,
          "name": "Kinesis",
          "reasoning": "By the Platform vs Service rule, Kinesis is a specific managed capability within AWS rather than a standalone hosted environment, so it is a Service.",
          "skill_id": "kinesis",
          "subtype": "streaming_data_service",
          "type": "Service"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:40-\u003e1"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "HTTP API Frameworks and Gateway Layers",
            "id": null,
            "rationale": "Server-side frameworks and gateway layers used to build, route, validate, and manage HTTP APIs. Includes backend web service frameworks and API gateway configuration for endpoints, request/response mapping, routing, input validation, auth integration, throttling, usage plans, stages, custom domains, and backend integration.",
            "slug": "d_merge_01",
            "source": "llm"
          },
          "input_skill": "Amazon API Gateway",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Amazon API Gateway",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Service",
          "skill_nature": "CLOUD_SERVICE",
          "sub_category": "api_management_service",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "STABLE"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "Amazon API Gateway is a specific AWS service name with little overlap in typical JDs; it is unlikely to be confused with a different catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "REST APIs",
              "HTTP APIs",
              "Lambda proxy",
              "OpenAPI",
              "Swagger",
              "CORS",
              "authorizers",
              "usage plans",
              "throttling",
              "stages",
              "deployments",
              "request validation",
              "mapping templates",
              "VPC Link",
              "CloudWatch"
            ]
          },
          "maturity": {
            "confidence": 0.95,
            "maturity": "well_known",
            "reasoning": "Broadly listed in cloud/backend JDs and AWS docs; commonly paired with Lambda, IAM, and serverless stacks, indicating staple market demand rather than niche use."
          },
          "skill_id": "amazon-api-gateway",
          "vendor_license": {
            "confidence": 0.98,
            "license": "proprietary",
            "vendor": "Amazon Web Services",
            "year_introduced": 2015
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Server-side frameworks and gateway layers used to build, route, validate, and manage HTTP APIs. Includes backend web service frameworks and API gateway configuration for endpoints, request/response mapping, routing, input validation, auth integration, throttling, usage plans, stages, custom domains, and backend integration.",
            "exemplar_skills": [
              "HTTP API Frameworks and Gateway Layers"
            ],
            "in_scope": "Skills, tools, and practices that belong under HTTP API Frameworks and Gateway Layers for the target role, including items implied by the dimension rationale.",
            "name": "HTTP API Frameworks and Gateway Layers",
            "out_of_scope": "Adjacent clusters explicitly not owned by HTTP API Frameworks and Gateway Layers, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "web-service-frameworks",
            "a_name": "Web Service Frameworks",
            "a_role": "__skill_focal__",
            "b_dim_id": "web-service-frameworks",
            "b_name": "Web Service Frameworks",
            "b_role": "Backend Engineer",
            "into": "d_merge_01",
            "into_name": "HTTP API Frameworks and Gateway Layers",
            "merged_from": [
              "web-service-frameworks",
              "web-service-frameworks"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims target the same backend HTTP API cluster. A focuses on gateway-layer details like Amazon API Gateway, request/response mapping, authorizers, throttling, and backend integration. B describes the same server-side API framework space with routing, input validation, and backend service structure. The exemplar skills in A all fit B\u2019s scope, and there is no separate skill cluster here.",
            "similarity": 0.762365718275353
          }
        ],
        "placed": {
          "name": "Amazon API Gateway",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "amazon-api-gateway"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [
            "aws"
          ],
          "related_to": [
            "rest-apis",
            "aws-cloudformation",
            "aws-cdk",
            "aws-s3",
            "aws-kms",
            "aws-vpc",
            "aws-direct-connect",
            "azure-expressroute",
            "azure-ad",
            "azure-key-vault"
          ],
          "requires": [
            "aws-iam-review"
          ],
          "skill_id": "amazon-api-gateway",
          "suppress_on_match": []
        },
        "skill_id": "amazon-api-gateway",
        "split_log": [],
        "typed": {
          "alternatives_considered": [
            "Platform: ruled out \u2014 AWS is the platform, while API Gateway is one managed offering within it.",
            "Tool: ruled out \u2014 it is consumed as a hosted managed service, not software you run yourself."
          ],
          "confidence": 0.98,
          "name": "Amazon API Gateway",
          "reasoning": "By the Platform vs Service rule, Amazon API Gateway is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.",
          "skill_id": "amazon-api-gateway",
          "subtype": "api_management_service",
          "type": "Service"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:40-\u003e1"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Python",
          "alias_type": "CANONICAL",
          "id": 608,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 2",
          "alias_type": "VERSION",
          "id": 611,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 2.x",
          "alias_type": "VERSION",
          "id": 613,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 3",
          "alias_type": "VERSION",
          "id": 612,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 3.10",
          "alias_type": "VERSION",
          "id": 2330,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 3.11",
          "alias_type": "VERSION",
          "id": 2331,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 3.12",
          "alias_type": "VERSION",
          "id": 2332,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Python 3.x",
          "alias_type": "VERSION",
          "id": 614,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "py2",
          "alias_type": "VERSION",
          "id": 609,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "py3",
          "alias_type": "VERSION",
          "id": 610,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 2",
          "alias_type": "VERSION",
          "id": 2152,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 2.x",
          "alias_type": "VERSION",
          "id": 2154,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 3",
          "alias_type": "VERSION",
          "id": 990,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 3.10",
          "alias_type": "VERSION",
          "id": 992,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 3.11",
          "alias_type": "VERSION",
          "id": 993,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 3.12",
          "alias_type": "VERSION",
          "id": 994,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python 3.x",
          "alias_type": "VERSION",
          "id": 991,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python2",
          "alias_type": "VERSION",
          "id": 2150,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "python3",
          "alias_type": "VERSION",
          "id": 989,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Python",
        "id": 393,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "python",
        "sub_category_id": 54,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Analytical Programming Languages",
            "id": 82,
            "rationale": "Languages used to clean, transform, analyze, and prototype models in notebooks and scripts. This is the core coding surface for expressing statistical logic and data manipulation in a reproducible way.",
            "slug": "analytical-programming-languages",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Analyst",
              "id": 20,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-analyst",
              "source": "db"
            },
            {
              "display_name": "Data Scientist",
              "id": 7,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-scientist",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Automation Scripting and CLI",
            "id": 48,
            "rationale": "Uses scripts and command-line tooling to execute repeatable Azure operations and reduce manual work. This is a practical cluster because the role frequently automates provisioning, checks, and remediation tasks.",
            "slug": "automation-scripting-and-cli",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Azure Cloud Engineer",
              "id": 4,
              "rationale": null,
              "role_archetype": null,
              "slug": "azure-cloud-engineer",
              "source": "db"
            },
            {
              "display_name": "Cloud Engineer",
              "id": 18,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Automation and Scripting for Operations",
            "id": 361,
            "rationale": "Scripts and lightweight automation used to execute repetitive virtualization tasks and enforce operational consistency. This is the practical glue that reduces manual host and VM administration.",
            "slug": "automation-and-scripting-for-operations",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Virtualization Engineer",
              "id": 26,
              "rationale": null,
              "role_archetype": null,
              "slug": "virtualization-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Network Automation and Scripting",
            "id": 285,
            "rationale": "Covers scripts and automation used to configure, validate, and audit network devices and services. This cluster is coherent because repeatable network operations increasingly depend on programmatic changes and checks.",
            "slug": "network-automation-and-scripting",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Network Engineer",
              "id": 21,
              "rationale": null,
              "role_archetype": null,
              "slug": "network-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for AI Workflows",
            "id": 261,
            "rationale": "Languages used to implement AI feature logic, orchestration, and response handling inside product code. This is the core coding surface for turning prompts and model calls into reliable application behavior.",
            "slug": "programming-languages-for-ai-workflows",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "AI Engineer",
              "id": 12,
              "rationale": null,
              "role_archetype": null,
              "slug": "ai-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Backend Systems",
            "id": 140,
            "rationale": "Languages used to implement server-side business logic, request handlers, workers, and service integrations. This is the core coding surface for backend feature delivery and maintenance.",
            "slug": "programming-languages-for-backend-systems",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Backend Engineer",
              "id": 14,
              "rationale": null,
              "role_archetype": null,
              "slug": "backend-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Data Work",
            "id": 67,
            "rationale": "Languages used to implement data pipelines, transformations, and operational utilities. This is the code layer for expressing extraction, parsing, validation, and orchestration logic in data engineering workflows.",
            "slug": "programming-languages-for-data-work",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for ML Systems",
            "id": 113,
            "rationale": "Languages used to implement model integration code, inference services, and feature-processing logic. This is the core coding surface for turning trained models into product-facing software components.",
            "slug": "programming-languages-for-ml-systems",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Machine Learning Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "machine-learning-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Security Work",
            "id": 328,
            "rationale": "Languages used to automate security tasks, write detection logic, and build analysis or remediation tooling. This is the core coding surface for a cybersecurity engineer across scripts, queries, and small utilities.",
            "slug": "programming-languages-for-security-work",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cybersecurity Engineer",
              "id": 9,
              "rationale": null,
              "role_archetype": null,
              "slug": "cybersecurity-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Test Automation",
            "id": 193,
            "rationale": "Languages used to implement automated checks, helper utilities, and test harness code. This is the core coding surface for turning test ideas into maintainable automation.",
            "slug": "programming-languages-for-test-automation",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Automation Tester",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "automation-tester",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Security Automation and Scripting",
            "id": 258,
            "rationale": "Automating repeatable security checks, enrichment, and remediation workflows. This cluster is coherent because the role often needs lightweight automation to scale analysis and response.",
            "slug": "security-automation-and-scripting",
            "source": "db"
          },
          "input_skill": "Python",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cybersecurity Engineer",
              "id": 9,
              "rationale": null,
              "role_archetype": null,
              "slug": "cybersecurity-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Python",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "TensorFlow",
          "alias_type": "CANONICAL",
          "id": 862,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "TF1",
          "alias_type": "VERSION",
          "id": 863,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "TF2",
          "alias_type": "VERSION",
          "id": 864,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "TensorFlow 1",
          "alias_type": "VERSION",
          "id": 865,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "TensorFlow 1.x",
          "alias_type": "VERSION",
          "id": 867,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "TensorFlow 2",
          "alias_type": "VERSION",
          "id": 866,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "TensorFlow 2.x",
          "alias_type": "VERSION",
          "id": 868,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 6,
        "display_name": "TensorFlow",
        "id": 558,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LIBRARY",
        "slug": "tensorflow",
        "sub_category_id": 456,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Applied Machine Learning Toolkits",
            "id": 94,
            "rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
            "slug": "applied-machine-learning-toolkits",
            "source": "db"
          },
          "input_skill": "TensorFlow",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Scientist",
              "id": 7,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-scientist",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "TensorFlow",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "PyTorch",
          "alias_type": "CANONICAL",
          "id": 861,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 6,
        "display_name": "PyTorch",
        "id": 557,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LIBRARY",
        "slug": "pytorch",
        "sub_category_id": 456,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Applied Machine Learning Toolkits",
            "id": 94,
            "rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
            "slug": "applied-machine-learning-toolkits",
            "source": "db"
          },
          "input_skill": "PyTorch",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Scientist",
              "id": 7,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-scientist",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "PyTorch",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "scikit-learn",
          "alias_type": "CANONICAL",
          "id": 852,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 6,
        "display_name": "scikit-learn",
        "id": 554,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LIBRARY",
        "slug": "scikit-learn",
        "sub_category_id": 458,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Applied Machine Learning Toolkits",
            "id": 94,
            "rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
            "slug": "applied-machine-learning-toolkits",
            "source": "db"
          },
          "input_skill": "Scikit-learn",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Scientist",
              "id": 7,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-scientist",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Scikit-learn",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "GitHub Actions",
          "alias_type": "CANONICAL",
          "id": 1800,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 14,
        "display_name": "GitHub Actions",
        "id": 1250,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CLOUD_SERVICE",
        "slug": "github-actions",
        "sub_category_id": 1019,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Continuous Integration Test Integration",
            "id": 207,
            "rationale": "Integrating automated checks into shared build and merge workflows so results are repeatable and visible. This cluster is coherent because automation testers commonly configure test execution triggers, artifacts, and reporting hooks.",
            "slug": "continuous-integration-test-integration",
            "source": "db"
          },
          "input_skill": "GitHub Actions",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Automation Tester",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "automation-tester",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "GitHub Actions",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Airflow",
          "alias_type": "CANONICAL",
          "id": 540,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 11,
        "display_name": "Airflow",
        "id": 325,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "airflow",
        "sub_category_id": 335,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Workflow Orchestration Systems",
            "id": 64,
            "rationale": "Operational orchestration of ML jobs, dependencies, and handoffs across training, validation, deployment, and retraining. This is a useful split from training pipelines because it emphasizes the scheduler and control plane.",
            "slug": "workflow-orchestration-systems",
            "source": "db"
          },
          "input_skill": "Airflow",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            },
            {
              "display_name": "MLOps Engineer",
              "id": 5,
              "rationale": null,
              "role_archetype": null,
              "slug": "mlops-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Airflow",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Terraform",
          "alias_type": "CANONICAL",
          "id": 290,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 11,
        "display_name": "Terraform",
        "id": 144,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "terraform",
        "sub_category_id": 171,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Infrastructure Provisioning Templates",
            "id": 291,
            "rationale": "Declarative templates and modules used to create repeatable cloud resources and environments. This cluster covers the infrastructure definitions the role applies, reviews, and updates to keep environments consistent.",
            "slug": "infrastructure-provisioning-templates",
            "source": "db"
          },
          "input_skill": "Terraform",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cloud Engineer",
              "id": 18,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Infrastructure as Code",
            "id": 22,
            "rationale": "Defines infrastructure and platform resources through versioned code so environments are repeatable and reviewable. This is a coherent cluster because it underpins environment consistency and change control.",
            "slug": "infrastructure-as-code",
            "source": "db"
          },
          "input_skill": "Terraform",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "DevOps Engineer",
              "id": 1,
              "rationale": null,
              "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
              "slug": "devops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Infrastructure as Code and Declarative Provisioning",
            "id": 36,
            "rationale": "Defines cloud and platform infrastructure declaratively through versioned code so environments are repeatable, reviewable, and automatable. This includes authoring and maintaining IaC templates/modules, managing parameters and state, and using plan/apply workflows to provision and update resources across Azure and other cloud platforms.",
            "slug": "infrastructure-as-code-and-declarative-provisioning",
            "source": "db"
          },
          "input_skill": "Terraform",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Azure Cloud Engineer",
              "id": 4,
              "rationale": null,
              "role_archetype": null,
              "slug": "azure-cloud-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Terraform",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Docker",
          "alias_type": "CANONICAL",
          "id": 299,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 11,
        "display_name": "Docker",
        "id": 153,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "docker",
        "sub_category_id": 170,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Containerization and Image Delivery",
            "id": 24,
            "rationale": "Builds, packages, and ships application and support workloads as container images. This cluster covers the artifact format and the mechanics of producing deployable images.",
            "slug": "containerization-and-image-delivery",
            "source": "db"
          },
          "input_skill": "Docker",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "DevOps Engineer",
              "id": 1,
              "rationale": null,
              "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
              "slug": "devops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Model Serving Deployment and Runtime Packaging",
            "id": 52,
            "rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
            "slug": "model-serving-deployment-and-runtime-packaging",
            "source": "db"
          },
          "input_skill": "Docker",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "MLOps Engineer",
              "id": 5,
              "rationale": null,
              "role_archetype": null,
              "slug": "mlops-engineer",
              "source": "db"
            },
            {
              "display_name": "Machine Learning Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "machine-learning-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Docker",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Kubernetes",
          "alias_type": "CANONICAL",
          "id": 304,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.0",
          "alias_type": "VERSION",
          "id": 307,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.0+",
          "alias_type": "VERSION",
          "id": 2366,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.1",
          "alias_type": "VERSION",
          "id": 308,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.10",
          "alias_type": "VERSION",
          "id": 318,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.11",
          "alias_type": "VERSION",
          "id": 319,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.12",
          "alias_type": "VERSION",
          "id": 320,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.13",
          "alias_type": "VERSION",
          "id": 321,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.14",
          "alias_type": "VERSION",
          "id": 322,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.15",
          "alias_type": "VERSION",
          "id": 323,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.16",
          "alias_type": "VERSION",
          "id": 324,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.17",
          "alias_type": "VERSION",
          "id": 325,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.18",
          "alias_type": "VERSION",
          "id": 326,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.19",
          "alias_type": "VERSION",
          "id": 327,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.2",
          "alias_type": "VERSION",
          "id": 309,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.20",
          "alias_type": "VERSION",
          "id": 328,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.21",
          "alias_type": "VERSION",
          "id": 329,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.22",
          "alias_type": "VERSION",
          "id": 330,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.23",
          "alias_type": "VERSION",
          "id": 331,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.24",
          "alias_type": "VERSION",
          "id": 332,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.25",
          "alias_type": "VERSION",
          "id": 333,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.26",
          "alias_type": "VERSION",
          "id": 334,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.27",
          "alias_type": "VERSION",
          "id": 335,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.28",
          "alias_type": "VERSION",
          "id": 336,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.29",
          "alias_type": "VERSION",
          "id": 337,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.3",
          "alias_type": "VERSION",
          "id": 310,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.30",
          "alias_type": "VERSION",
          "id": 338,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.4",
          "alias_type": "VERSION",
          "id": 311,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.5",
          "alias_type": "VERSION",
          "id": 312,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.6",
          "alias_type": "VERSION",
          "id": 313,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.7",
          "alias_type": "VERSION",
          "id": 314,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.8",
          "alias_type": "VERSION",
          "id": 315,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.9",
          "alias_type": "VERSION",
          "id": 316,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes 1.x",
          "alias_type": "VERSION",
          "id": 317,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Kubernetes v1",
          "alias_type": "VERSION",
          "id": 306,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "k8s",
          "alias_type": "VERSION",
          "id": 305,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "Kubernetes",
        "id": 158,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "kubernetes",
        "sub_category_id": 1524,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Orchestration Platforms",
            "id": 25,
            "rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
            "slug": "orchestration-platforms",
            "source": "db"
          },
          "input_skill": "Kubernetes",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cloud Engineer",
              "id": 18,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-engineer",
              "source": "db"
            },
            {
              "display_name": "DevOps Engineer",
              "id": 1,
              "rationale": null,
              "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
              "slug": "devops-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Kubernetes",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Model Runtime Services",
            "id": 121,
            "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
            "slug": "cloud-model-runtime-services",
            "source": "db"
          },
          "input_skill": "Pinecone",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Machine Learning Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "machine-learning-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Pinecone",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Platform",
          "skill_nature": "PLATFORM",
          "sub_category": "vector_database_platform",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "EMERGING"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "Pinecone is a distinctive vector database platform name; in typical JDs it is unlikely to be confused with another catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "vector database",
              "embeddings",
              "semantic search",
              "similarity search",
              "ANN",
              "approximate nearest neighbor",
              "RAG",
              "retrieval augmented generation",
              "indexing",
              "namespace",
              "metadata filtering",
              "upsert",
              "vector index",
              "hybrid search",
              "OpenAI"
            ]
          },
          "maturity": {
            "confidence": 0.86,
            "maturity": "emerging",
            "reasoning": "Pinecone appears in many AI/vector-search job descriptions and vendor docs, but it\u2019s still far less universal than PostgreSQL/AWS; market signal shows growing adoption rather than staple status."
          },
          "skill_id": "pinecone",
          "vendor_license": {
            "confidence": 0.95,
            "license": "proprietary",
            "vendor": "Pinecone Systems, Inc.",
            "year_introduced": 2019
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Managed services used to store, index, and query embeddings for semantic search and retrieval-augmented applications. Pinecone belongs here because it is a purpose-built vector database service rather than a general-purpose datastore.",
            "exemplar_skills": [
              "Pinecone",
              "vector database",
              "similarity search",
              "embedding index",
              "metadata filtering",
              "approximate nearest neighbor search"
            ],
            "in_scope": "Pinecone, vector indexes, similarity search, embedding storage, metadata filtering, ANN retrieval, namespace partitioning, hybrid search",
            "name": "Vector Database Services",
            "out_of_scope": "traditional relational databases, document stores, cache layers, model training pipelines, prompt engineering, which belong to other dimensions",
            "overlap_flags": [
              {
                "reason": "Vector databases are often consumed as part of broader data platforms, but this dimension focuses specifically on managed vector retrieval services.",
                "with_dim_id": "cloud-data-platform-services",
                "with_dim_name": null,
                "with_role": "Data Engineer"
              },
              {
                "reason": "Pinecone is frequently used inside AI application architectures, though the service itself is the storage/retrieval layer rather than the overall system design.",
                "with_dim_id": "ai-service-architecture-patterns",
                "with_dim_name": null,
                "with_role": "AI Engineer"
              }
            ],
            "tentative_id": "cloud-model-runtime-services"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "Pinecone",
          "placement_confidence": 0.92,
          "primary_dimension": "cloud-model-runtime-services",
          "reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [],
          "skill_id": "pinecone"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "pandas",
            "quicknode",
            "avalanche",
            "aptos",
            "helm",
            "snapshot",
            "gcp-security-command-center",
            "anchor",
            "gke",
            "hardhat"
          ],
          "requires": [],
          "skill_id": "pinecone",
          "suppress_on_match": []
        },
        "skill_id": "pinecone",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.9,
          "name": "Pinecone",
          "reasoning": "By the Vendor SaaS = Platform rule, Pinecone is a hosted multi-tenant vector database service consumed via APIs rather than software you run yourself.",
          "skill_id": "pinecone",
          "subtype": "vector_database_platform",
          "type": "Platform"
        },
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Data Platform Services",
            "id": 81,
            "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
            "slug": "cloud-data-platform-services",
            "source": "db"
          },
          "input_skill": "OpenSearch",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "OpenSearch",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "OpenSearch",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Datastore",
          "skill_nature": "TOOL",
          "sub_category": "search_engine_datastore",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "EMERGING"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "OpenSearch is a specific search engine/datastore name with little overlap in typical JDs; it is unlikely to be mistaken for another catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "Elasticsearch",
              "Kibana",
              "Lucene",
              "index mapping",
              "shards",
              "replicas",
              "full-text search",
              "aggregations",
              "query DSL",
              "ingest pipeline",
              "cluster management",
              "index templates",
              "analyzers",
              "vector search",
              "OpenSearch Dashboards"
            ]
          },
          "maturity": {
            "confidence": 0.84,
            "maturity": "emerging",
            "reasoning": "OpenSearch appears in growing numbers of JDs for search/log analytics, but Elasticsearch still dominates most postings; AWS also continues to position it as the open-source successor to Elasticsearch."
          },
          "skill_id": "opensearch",
          "vendor_license": {
            "confidence": 0.98,
            "license": "apache_2",
            "vendor": "OpenSearch Project",
            "year_introduced": 2021
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Managed search and indexing services used to store, query, and analyze large document or event datasets. OpenSearch belongs here because it is commonly used as a search engine and analytics backend in cloud data platforms.",
            "exemplar_skills": [
              "OpenSearch",
              "Amazon OpenSearch Service",
              "Elasticsearch",
              "full-text search",
              "indexing",
              "search aggregations",
              "query DSL"
            ],
            "in_scope": "OpenSearch, Amazon OpenSearch Service, Elasticsearch-compatible search clusters, index management, full-text search, faceting, aggregations, query DSL, shard and replica configuration, ingest pipelines, search dashboards",
            "name": "Search and Analytics Services",
            "out_of_scope": "Streaming ingestion logic and event processing, which belong to streaming-data-processing; application-side API calls to search endpoints, which belong to api-integration-and-data-fetching; generic database administration, which belongs to storage-provisioning-and-automation",
            "overlap_flags": [
              {
                "reason": "OpenSearch often stores and queries logs/metrics, so operational observability use cases can overlap with monitoring systems.",
                "with_dim_id": "monitoring-and-alerting",
                "with_dim_name": null,
                "with_role": "Azure Cloud Engineer"
              },
              {
                "reason": "OpenSearch is frequently consumed as a managed cloud data service rather than only as a standalone search engine.",
                "with_dim_id": "cloud-data-platform-services",
                "with_dim_name": null,
                "with_role": "Data Engineer"
              }
            ],
            "tentative_id": "cloud-data-platform-services"
          },
          {
            "description": "Operational setup and tuning of search clusters, indexes, and query behavior. This fits OpenSearch when the skill emphasis is on running and configuring the search engine itself rather than integrating it into an application.",
            "exemplar_skills": [
              "OpenSearch",
              "index mappings",
              "shard allocation",
              "analyzers",
              "reindexing",
              "snapshot and restore",
              "query performance tuning"
            ],
            "in_scope": "OpenSearch, cluster sizing, shard allocation, index templates, analyzers, mappings, refresh intervals, replicas, query performance tuning, reindexing, snapshot and restore",
            "name": "Search Engine Administration",
            "out_of_scope": "Frontend or backend API integration with search results, which belongs to api-integration-and-data-fetching; general cloud provisioning, which belongs to infrastructure-provisioning-templates; log dashboarding and alerting, which belongs to monitoring-and-alerting",
            "overlap_flags": [
              {
                "reason": "Search cluster sizing and query tuning can overlap with broader scalability work when the focus is capacity and throughput.",
                "with_dim_id": "scalability-and-performance-architecture",
                "with_dim_name": null,
                "with_role": "Cloud Architect"
              }
            ],
            "tentative_id": "d_init_01"
          }
        ],
        "merge_log": [],
        "placed": {
          "name": "OpenSearch",
          "placement_confidence": 0.92,
          "primary_dimension": "cloud-data-platform-services",
          "reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [
            "d_init_01"
          ],
          "skill_id": "opensearch"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "microsoft-sentinel",
            "rapid7-insightvm",
            "owasp-top-10",
            "tls-internals",
            "rest-apis",
            "aws-cdk",
            "gke",
            "aks",
            "go",
            "javascript"
          ],
          "requires": [],
          "skill_id": "opensearch",
          "suppress_on_match": []
        },
        "skill_id": "opensearch",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.93,
          "name": "OpenSearch",
          "reasoning": "OpenSearch is fundamentally a persistent search and analytics datastore, and under the Datastore vs Format rule it fits Datastore because it stores and indexes data rather than merely defining a format.",
          "skill_id": "opensearch",
          "subtype": "search_engine_datastore",
          "type": "Datastore"
        },
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": null,
            "display_name": "Applied Machine Learning Toolkits and Frameworks",
            "id": null,
            "rationale": "Libraries and frameworks used to prototype, compare, index, and evaluate machine learning solutions quickly. Includes applied-ML toolkits such as scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, FAISS, vector search libraries, and embedding utilities. Excludes serving runtimes, deployment containers, online inference infrastructure, streaming systems, and database/storage engines.",
            "slug": "d_merge_01",
            "source": "llm"
          },
          "input_skill": "FAISS",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Version Control Systems",
            "id": 365,
            "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "FAISS",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "FAISS",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Library",
          "skill_nature": "LIBRARY",
          "sub_category": "vector_search_library",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "NOT_APPLICABLE",
          "volatility": "EMERGING"
        },
        "enrichment": {
          "ambiguity": {
            "ambiguity_flag": false,
            "confused_with": [],
            "reasoning": "FAISS is a distinctive library name for vector similarity search; in typical JDs it is unlikely to be confused with another catalog skill."
          },
          "context_keywords": {
            "context_keywords": [
              "approximate nearest neighbor",
              "ANN",
              "vector index",
              "similarity search",
              "embeddings",
              "cosine similarity",
              "L2 distance",
              "IVF",
              "HNSW",
              "PQ",
              "flat index",
              "GPU acceleration",
              "k-NN",
              "semantic search",
              "re-ranking"
            ]
          },
          "maturity": {
            "confidence": 0.84,
            "maturity": "emerging",
            "reasoning": "FAISS appears in many ML/vector-search job descriptions and is widely used in RAG stacks, but it\u2019s still less universal than Elasticsearch/PostgreSQL; market demand is growing rather than ubiquitous."
          },
          "skill_id": "faiss",
          "vendor_license": {
            "confidence": 0.99,
            "license": "mit",
            "vendor": "Meta",
            "year_introduced": 2017
          },
          "versioning": {
            "current_version": null,
            "version_aliases": {},
            "versioned": false
          }
        },
        "keep_log": [],
        "locked_dimensions": [
          {
            "description": "Libraries and frameworks used to prototype, compare, index, and evaluate machine learning solutions quickly. Includes applied-ML toolkits such as scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, FAISS, vector search libraries, and embedding utilities. Excludes serving runtimes, deployment containers, online inference infrastructure, streaming systems, and database/storage engines.",
            "exemplar_skills": [
              "Applied Machine Learning Toolkits and Frameworks"
            ],
            "in_scope": "Skills, tools, and practices that belong under Applied Machine Learning Toolkits and Frameworks for the target role, including items implied by the dimension rationale.",
            "name": "Applied Machine Learning Toolkits and Frameworks",
            "out_of_scope": "Adjacent clusters explicitly not owned by Applied Machine Learning Toolkits and Frameworks, including unrelated platforms, roles, and skill families per library policy.",
            "overlap_flags": [],
            "tentative_id": "d_merge_01"
          },
          {
            "description": "Index structures and libraries for approximate nearest-neighbor search over embeddings and feature vectors. FAISS fits strongly here because it is primarily used to build and query high-performance vector indexes for retrieval.",
            "exemplar_skills": [
              "FAISS",
              "approximate nearest neighbor search",
              "vector indexes",
              "embedding retrieval",
              "similarity search",
              "IVF indexing",
              "HNSW",
              "product quantization"
            ],
            "in_scope": "FAISS, approximate nearest neighbor search, vector indexes, embedding retrieval, similarity search, IVF indexes, HNSW indexes, PQ compression, cosine similarity, L2 distance",
            "name": "Vector Search Indexing",
            "out_of_scope": "General machine learning model training and experimentation, which belongs to applied-machine-learning-toolkits; database query planning and relational indexing, which belong to data platform or storage-related dimensions; full-text search engines, which are a separate search dimension",
            "overlap_flags": [
              {
                "reason": "Vector search tooling is often learned alongside ML libraries, but this dimension is specifically about retrieval index structures.",
                "with_dim_id": "applied-machine-learning-toolkits",
                "with_dim_name": null,
                "with_role": "Data Scientist"
              },
              {
                "reason": "Vector search is commonly part of AI application architecture, though that dimension focuses on system design rather than the indexing mechanism.",
                "with_dim_id": "ai-service-architecture-patterns",
                "with_dim_name": null,
                "with_role": "AI Engineer"
              }
            ],
            "tentative_id": "d_init_01"
          }
        ],
        "merge_log": [
          {
            "a_dim_id": "applied-machine-learning-toolkits",
            "a_name": "Applied Machine Learning Toolkits",
            "a_role": "__skill_focal__",
            "b_dim_id": "applied-machine-learning-toolkits",
            "b_name": "Applied Machine Learning Toolkits",
            "b_role": "Data Scientist",
            "into": "d_merge_01",
            "into_name": "Applied Machine Learning Toolkits and Frameworks",
            "merged_from": [
              "applied-machine-learning-toolkits",
              "applied-machine-learning-toolkits"
            ],
            "pair_kind": "cross_role",
            "reasoning": "Both dims describe the same applied-ML toolkit layer: libraries/frameworks for prototyping, comparing, indexing, and evaluating models. Dim A lists concrete examples like scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, and FAISS/vector search; Dim B says the same thing in broader terms and frames it as the tooling beneath modeling and evaluation. No distinct skill cluster appears in B, so this is a merge.",
            "similarity": 0.6779459938694667
          }
        ],
        "placed": {
          "name": "FAISS",
          "placement_confidence": 0.92,
          "primary_dimension": "d_merge_01",
          "reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
          "secondary_dimensions": [
            "d_init_01"
          ],
          "skill_id": "faiss"
        },
        "relationships": {
          "child_skills": [],
          "parent_skills": [],
          "related_to": [
            "mfa",
            "amm",
            "spl",
            "iscsi",
            "owasp-top-10",
            "foundry-fuzzing",
            "hsm",
            "bls",
            "dex",
            "aks"
          ],
          "requires": [],
          "skill_id": "faiss",
          "suppress_on_match": []
        },
        "skill_id": "faiss",
        "split_log": [],
        "typed": {
          "alternatives_considered": [],
          "confidence": 0.93,
          "name": "FAISS",
          "reasoning": "FAISS is fundamentally a code package imported by applications for similarity search, so under the Tool vs Framework rule it fits Library rather than a user-operated tool or hosted platform.",
          "skill_id": "faiss",
          "subtype": "vector_search_library",
          "type": "Library"
        },
        "warnings": [
          "stage3_post_filter_dropped_catalog_only_locked_dims:41-\u003e2"
        ]
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": [
    "AI/ML",
    "Amazon SageMaker",
    "Amazon Bedrock",
    "AWS Lambda",
    "ECS",
    "Amazon Athena",
    "AWS Data Pipeline",
    "S3",
    "Kinesis",
    "Amazon API Gateway",
    "Pinecone",
    "OpenSearch",
    "FAISS"
  ]
}
API 3 — final-role-output
{
  "chosen_role": {
    "display_name": "Machine Learning Engineer",
    "id": 10,
    "rationale": "The primary skills include a strong focus on AWS and AI/ML technologies, which aligns well with the role of a Machine Learning Engineer.",
    "role_archetype": null,
    "slug": "machine-learning-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "AWS",
      "tag": "in_db"
    },
    {
      "skill": "AI/ML",
      "tag": "new"
    },
    {
      "skill": "Amazon SageMaker",
      "tag": "new"
    },
    {
      "skill": "Amazon Bedrock",
      "tag": "new"
    },
    {
      "skill": "AWS Lambda",
      "tag": "new"
    },
    {
      "skill": "ECS",
      "tag": "new"
    },
    {
      "skill": "EKS",
      "tag": "in_db"
    },
    {
      "skill": "EC2",
      "tag": "in_db"
    },
    {
      "skill": "AWS Glue",
      "tag": "in_db"
    },
    {
      "skill": "Amazon Athena",
      "tag": "new"
    },
    {
      "skill": "Redshift",
      "tag": "in_db"
    },
    {
      "skill": "AWS Data Pipeline",
      "tag": "new"
    },
    {
      "skill": "S3",
      "tag": "new"
    },
    {
      "skill": "Kinesis",
      "tag": "new"
    },
    {
      "skill": "Amazon API Gateway",
      "tag": "new"
    },
    {
      "skill": "Python",
      "tag": "in_db"
    },
    {
      "skill": "TensorFlow",
      "tag": "in_db"
    },
    {
      "skill": "PyTorch",
      "tag": "in_db"
    },
    {
      "skill": "Scikit-learn",
      "tag": "in_db"
    },
    {
      "skill": "GitHub Actions",
      "tag": "in_db"
    },
    {
      "skill": "Airflow",
      "tag": "in_db"
    },
    {
      "skill": "Terraform",
      "tag": "in_db"
    },
    {
      "skill": "Docker",
      "tag": "in_db"
    },
    {
      "skill": "Kubernetes",
      "tag": "in_db"
    },
    {
      "skill": "Pinecone",
      "tag": "new"
    },
    {
      "skill": "OpenSearch",
      "tag": "new"
    },
    {
      "skill": "FAISS",
      "tag": "new"
    }
  ],
  "persistence": {
    "items": [
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Platform Operations",
          "id": 26,
          "rationale": "Uses cloud provider services to support delivery and runtime environments. The focus is on consumer-level operation of cloud services rather than deep cloud architecture ownership.",
          "slug": "cloud-platform-operations",
          "source": "db"
        },
        "dimension_id": 26,
        "input_skill": "AWS",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "DevOps Engineer",
            "id": 1,
            "rationale": null,
            "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
            "slug": "devops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 163,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Security Platforms",
          "id": 332,
          "rationale": "Cloud-native security products used to assess posture, detect misconfigurations, and monitor workloads across AWS, Azure, and GCP. This is a distinct product family because the role often works across multiple CNAPP/CSPM/CWPP offerings and cloud-native detectors.",
          "slug": "cloud-security-platforms",
          "source": "db"
        },
        "dimension_id": 332,
        "input_skill": "AWS",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cybersecurity Engineer",
            "id": 9,
            "rationale": null,
            "role_archetype": null,
            "slug": "cybersecurity-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 163,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Model Runtime Services",
          "id": 121,
          "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
          "slug": "cloud-model-runtime-services",
          "source": "db"
        },
        "dimension_id": 121,
        "input_skill": "EKS",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Machine Learning Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "machine-learning-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 725,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Orchestration Platforms",
          "id": 25,
          "rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
          "slug": "orchestration-platforms",
          "source": "db"
        },
        "dimension_id": 25,
        "input_skill": "EKS",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cloud Engineer",
            "id": 18,
            "rationale": null,
            "role_archetype": null,
            "slug": "cloud-engineer",
            "source": "db"
          },
          {
            "display_name": "DevOps Engineer",
            "id": 1,
            "rationale": null,
            "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
            "slug": "devops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 725,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Provider Core Services",
          "id": 290,
          "rationale": "Core managed services used to provision and operate cloud environments. This is the base cloud surface for compute, storage, networking, and platform primitives the role configures and maintains.",
          "slug": "cloud-provider-core-services",
          "source": "db"
        },
        "dimension_id": 290,
        "input_skill": "EC2",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cloud Engineer",
            "id": 18,
            "rationale": null,
            "role_archetype": null,
            "slug": "cloud-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1773,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Data Platform Services",
          "id": 81,
          "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
          "slug": "cloud-data-platform-services",
          "source": "db"
        },
        "dimension_id": 81,
        "input_skill": "AWS Glue",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 466,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Data Warehousing Platforms",
          "id": 72,
          "rationale": "Cloud and on-prem analytical storage systems used to persist curated datasets and serve downstream consumers. This cluster is about the warehouse/lakehouse layer where transformed data is organized for access.",
          "slug": "data-warehousing-platforms",
          "source": "db"
        },
        "dimension_id": 72,
        "input_skill": "Redshift",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2570,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Analytical Programming Languages",
          "id": 82,
          "rationale": "Languages used to clean, transform, analyze, and prototype models in notebooks and scripts. This is the core coding surface for expressing statistical logic and data manipulation in a reproducible way.",
          "slug": "analytical-programming-languages",
          "source": "db"
        },
        "dimension_id": 82,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Analyst",
            "id": 20,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-analyst",
            "source": "db"
          },
          {
            "display_name": "Data Scientist",
            "id": 7,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-scientist",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Automation Scripting and CLI",
          "id": 48,
          "rationale": "Uses scripts and command-line tooling to execute repeatable Azure operations and reduce manual work. This is a practical cluster because the role frequently automates provisioning, checks, and remediation tasks.",
          "slug": "automation-scripting-and-cli",
          "source": "db"
        },
        "dimension_id": 48,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Azure Cloud Engineer",
            "id": 4,
            "rationale": null,
            "role_archetype": null,
            "slug": "azure-cloud-engineer",
            "source": "db"
          },
          {
            "display_name": "Cloud Engineer",
            "id": 18,
            "rationale": null,
            "role_archetype": null,
            "slug": "cloud-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Automation and Scripting for Operations",
          "id": 361,
          "rationale": "Scripts and lightweight automation used to execute repetitive virtualization tasks and enforce operational consistency. This is the practical glue that reduces manual host and VM administration.",
          "slug": "automation-and-scripting-for-operations",
          "source": "db"
        },
        "dimension_id": 361,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Virtualization Engineer",
            "id": 26,
            "rationale": null,
            "role_archetype": null,
            "slug": "virtualization-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Network Automation and Scripting",
          "id": 285,
          "rationale": "Covers scripts and automation used to configure, validate, and audit network devices and services. This cluster is coherent because repeatable network operations increasingly depend on programmatic changes and checks.",
          "slug": "network-automation-and-scripting",
          "source": "db"
        },
        "dimension_id": 285,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Network Engineer",
            "id": 21,
            "rationale": null,
            "role_archetype": null,
            "slug": "network-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for AI Workflows",
          "id": 261,
          "rationale": "Languages used to implement AI feature logic, orchestration, and response handling inside product code. This is the core coding surface for turning prompts and model calls into reliable application behavior.",
          "slug": "programming-languages-for-ai-workflows",
          "source": "db"
        },
        "dimension_id": 261,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "AI Engineer",
            "id": 12,
            "rationale": null,
            "role_archetype": null,
            "slug": "ai-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Backend Systems",
          "id": 140,
          "rationale": "Languages used to implement server-side business logic, request handlers, workers, and service integrations. This is the core coding surface for backend feature delivery and maintenance.",
          "slug": "programming-languages-for-backend-systems",
          "source": "db"
        },
        "dimension_id": 140,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Backend Engineer",
            "id": 14,
            "rationale": null,
            "role_archetype": null,
            "slug": "backend-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Data Work",
          "id": 67,
          "rationale": "Languages used to implement data pipelines, transformations, and operational utilities. This is the code layer for expressing extraction, parsing, validation, and orchestration logic in data engineering workflows.",
          "slug": "programming-languages-for-data-work",
          "source": "db"
        },
        "dimension_id": 67,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for ML Systems",
          "id": 113,
          "rationale": "Languages used to implement model integration code, inference services, and feature-processing logic. This is the core coding surface for turning trained models into product-facing software components.",
          "slug": "programming-languages-for-ml-systems",
          "source": "db"
        },
        "dimension_id": 113,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Machine Learning Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "machine-learning-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Security Work",
          "id": 328,
          "rationale": "Languages used to automate security tasks, write detection logic, and build analysis or remediation tooling. This is the core coding surface for a cybersecurity engineer across scripts, queries, and small utilities.",
          "slug": "programming-languages-for-security-work",
          "source": "db"
        },
        "dimension_id": 328,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cybersecurity Engineer",
            "id": 9,
            "rationale": null,
            "role_archetype": null,
            "slug": "cybersecurity-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Test Automation",
          "id": 193,
          "rationale": "Languages used to implement automated checks, helper utilities, and test harness code. This is the core coding surface for turning test ideas into maintainable automation.",
          "slug": "programming-languages-for-test-automation",
          "source": "db"
        },
        "dimension_id": 193,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Automation Tester",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "automation-tester",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Security Automation and Scripting",
          "id": 258,
          "rationale": "Automating repeatable security checks, enrichment, and remediation workflows. This cluster is coherent because the role often needs lightweight automation to scale analysis and response.",
          "slug": "security-automation-and-scripting",
          "source": "db"
        },
        "dimension_id": 258,
        "input_skill": "Python",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cybersecurity Engineer",
            "id": 9,
            "rationale": null,
            "role_archetype": null,
            "slug": "cybersecurity-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 393,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Applied Machine Learning Toolkits",
          "id": 94,
          "rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
          "slug": "applied-machine-learning-toolkits",
          "source": "db"
        },
        "dimension_id": 94,
        "input_skill": "TensorFlow",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Scientist",
            "id": 7,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-scientist",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 558,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Applied Machine Learning Toolkits",
          "id": 94,
          "rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
          "slug": "applied-machine-learning-toolkits",
          "source": "db"
        },
        "dimension_id": 94,
        "input_skill": "PyTorch",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Scientist",
            "id": 7,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-scientist",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 557,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Applied Machine Learning Toolkits",
          "id": 94,
          "rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
          "slug": "applied-machine-learning-toolkits",
          "source": "db"
        },
        "dimension_id": 94,
        "input_skill": "Scikit-learn",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Scientist",
            "id": 7,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-scientist",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 554,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Continuous Integration Test Integration",
          "id": 207,
          "rationale": "Integrating automated checks into shared build and merge workflows so results are repeatable and visible. This cluster is coherent because automation testers commonly configure test execution triggers, artifacts, and reporting hooks.",
          "slug": "continuous-integration-test-integration",
          "source": "db"
        },
        "dimension_id": 207,
        "input_skill": "GitHub Actions",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Automation Tester",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "automation-tester",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1250,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Workflow Orchestration Systems",
          "id": 64,
          "rationale": "Operational orchestration of ML jobs, dependencies, and handoffs across training, validation, deployment, and retraining. This is a useful split from training pipelines because it emphasizes the scheduler and control plane.",
          "slug": "workflow-orchestration-systems",
          "source": "db"
        },
        "dimension_id": 64,
        "input_skill": "Airflow",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          },
          {
            "display_name": "MLOps Engineer",
            "id": 5,
            "rationale": null,
            "role_archetype": null,
            "slug": "mlops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 325,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Infrastructure Provisioning Templates",
          "id": 291,
          "rationale": "Declarative templates and modules used to create repeatable cloud resources and environments. This cluster covers the infrastructure definitions the role applies, reviews, and updates to keep environments consistent.",
          "slug": "infrastructure-provisioning-templates",
          "source": "db"
        },
        "dimension_id": 291,
        "input_skill": "Terraform",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cloud Engineer",
            "id": 18,
            "rationale": null,
            "role_archetype": null,
            "slug": "cloud-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 144,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Infrastructure as Code",
          "id": 22,
          "rationale": "Defines infrastructure and platform resources through versioned code so environments are repeatable and reviewable. This is a coherent cluster because it underpins environment consistency and change control.",
          "slug": "infrastructure-as-code",
          "source": "db"
        },
        "dimension_id": 22,
        "input_skill": "Terraform",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "DevOps Engineer",
            "id": 1,
            "rationale": null,
            "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
            "slug": "devops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 144,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Infrastructure as Code and Declarative Provisioning",
          "id": 36,
          "rationale": "Defines cloud and platform infrastructure declaratively through versioned code so environments are repeatable, reviewable, and automatable. This includes authoring and maintaining IaC templates/modules, managing parameters and state, and using plan/apply workflows to provision and update resources across Azure and other cloud platforms.",
          "slug": "infrastructure-as-code-and-declarative-provisioning",
          "source": "db"
        },
        "dimension_id": 36,
        "input_skill": "Terraform",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Azure Cloud Engineer",
            "id": 4,
            "rationale": null,
            "role_archetype": null,
            "slug": "azure-cloud-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 144,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Containerization and Image Delivery",
          "id": 24,
          "rationale": "Builds, packages, and ships application and support workloads as container images. This cluster covers the artifact format and the mechanics of producing deployable images.",
          "slug": "containerization-and-image-delivery",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "Docker",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "DevOps Engineer",
            "id": 1,
            "rationale": null,
            "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
            "slug": "devops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 153,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Model Serving Deployment and Runtime Packaging",
          "id": 52,
          "rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
          "slug": "model-serving-deployment-and-runtime-packaging",
          "source": "db"
        },
        "dimension_id": 52,
        "input_skill": "Docker",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "MLOps Engineer",
            "id": 5,
            "rationale": null,
            "role_archetype": null,
            "slug": "mlops-engineer",
            "source": "db"
          },
          {
            "display_name": "Machine Learning Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "machine-learning-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 153,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Orchestration Platforms",
          "id": 25,
          "rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
          "slug": "orchestration-platforms",
          "source": "db"
        },
        "dimension_id": 25,
        "input_skill": "Kubernetes",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cloud Engineer",
            "id": 18,
            "rationale": null,
            "role_archetype": null,
            "slug": "cloud-engineer",
            "source": "db"
          },
          {
            "display_name": "DevOps Engineer",
            "id": 1,
            "rationale": null,
            "role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
            "slug": "devops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 158,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Applied Machine Learning Tooling and Frameworks",
          "id": null,
          "rationale": "Libraries and frameworks used to prototype, build, train, tune, and compare machine learning models in practice. Includes hands-on AI/ML tooling such as scikit-learn, XGBoost, LightGBM, TensorFlow, and PyTorch, along with model training, feature engineering, hyperparameter tuning, and evaluation workflows. Excludes model serving, deployment packaging, online inference, registry/version promotion, distributed stream processing, and cloud infrastructure provisioning.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 94,
        "input_skill": "AI/ML",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2611,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "AI Service Integration and Orchestration Patterns",
          "id": null,
          "rationale": "Patterns for structuring and embedding AI/ML capabilities within products and services, including deciding where AI logic lives and how it interacts with application flows. Covers model-backed APIs, retrieval-augmented generation, agent and workflow orchestration, batch scoring services, online inference integration, feature pipelines, and placement across handlers, workers, gateways, and service boundaries.",
          "slug": "d_merge_02",
          "source": "llm"
        },
        "dimension_id": 270,
        "input_skill": "AI/ML",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2611,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "AI Inference Cost, Latency, and Throughput Optimization",
          "id": 260,
          "rationale": "Improving the speed, throughput, and cost efficiency of AI and ML-powered product features without sacrificing correctness or user experience. Includes token budgeting, prompt compression, batching, caching, model selection, quantization, pruning, async inference, warm starts, streaming UX, timeout tuning, concurrency control, and profiling. Excludes infrastructure autoscaling, model serving capacity planning, generic backend performance tuning, and unrelated data/warehouse optimization.",
          "slug": "ai-inference-cost-latency-and-throughput-optimization",
          "source": "db"
        },
        "dimension_id": 260,
        "input_skill": "AI/ML",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "AI Engineer",
            "id": 12,
            "rationale": null,
            "role_archetype": null,
            "slug": "ai-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2611,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Managed ML Platform Workflows",
          "id": null,
          "rationale": "Cloud ML platforms for building and operating models end-to-end, including notebooks, experiments, managed training jobs, and pipeline/studio workflows. Examples: SageMaker Studio, SageMaker notebooks, SageMaker Pipelines, managed training jobs.",
          "slug": "d_split_01_01",
          "source": "llm"
        },
        "dimension_id": 367,
        "input_skill": "Amazon SageMaker",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 New dimension saved (reconciliation separate) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2612,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Managed Model Hosting and Endpoints",
          "id": null,
          "rationale": "Cloud-managed services for deploying trained models as online or batch inference endpoints, including endpoint provisioning, batch transform, and rollout coordination. Examples: SageMaker endpoints, SageMaker batch transform.",
          "slug": "d_split_01_02",
          "source": "llm"
        },
        "dimension_id": 368,
        "input_skill": "Amazon SageMaker",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 New dimension saved (reconciliation separate) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2612,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Model Serving Runtime Packaging",
          "id": null,
          "rationale": "Packaging trained models and model servers for deployment, including containers, base images, runtime dependencies, entrypoints, GPU images, and image scanning. Examples: Docker images for model services.",
          "slug": "d_split_01_03",
          "source": "llm"
        },
        "dimension_id": 52,
        "input_skill": "Amazon SageMaker",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (embedding dedup) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2612,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Model Serving Frameworks and Platforms",
          "id": null,
          "rationale": "Serving frameworks and platforms used to run deployed models in online, batch, or streaming environments. Examples: TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, Seldon Core.",
          "slug": "d_split_01_04",
          "source": "llm"
        },
        "dimension_id": 52,
        "input_skill": "Amazon SageMaker",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (embedding dedup) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2612,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Model Runtime Services",
          "id": 121,
          "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
          "slug": "cloud-model-runtime-services",
          "source": "db"
        },
        "dimension_id": 121,
        "input_skill": "Amazon Bedrock",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Machine Learning Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "machine-learning-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2613,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Managed Cloud Data Platform Services",
          "id": null,
          "rationale": "Managed cloud services used to run data engineering and related application workloads, including serverless compute, workflow orchestration, storage, event triggers, and adjacent security/platform primitives. This covers services such as AWS Lambda, AWS Step Functions, AWS Glue, Amazon S3 event triggers, and other managed services used to build and operate pipelines and data platforms.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 81,
        "input_skill": "AWS Lambda",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2614,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "ECS",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2615,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Cloud Analytics Query Services",
          "id": null,
          "rationale": "Managed cloud services for querying and processing analytical data, including serverless SQL analytics, data lake querying, and federated query services such as Athena, Glue, Redshift Spectrum, and EMR-based analytics.",
          "slug": "d_split_01_01",
          "source": "llm"
        },
        "dimension_id": 367,
        "input_skill": "Amazon Athena",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2616,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Cloud Data Pipeline Runtime",
          "id": null,
          "rationale": "Managed cloud compute and orchestration services used to run data engineering jobs, transformations, and workflows.",
          "slug": "d_split_01_02",
          "source": "llm"
        },
        "dimension_id": 81,
        "input_skill": "Amazon Athena",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (embedding dedup) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2616,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Cloud Data Platform Storage",
          "id": null,
          "rationale": "Cloud storage and data access services used as the persistence layer for data engineering platforms and pipelines.",
          "slug": "d_split_01_03",
          "source": "llm"
        },
        "dimension_id": 81,
        "input_skill": "Amazon Athena",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (embedding dedup) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2616,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Cloud Data Platform Security and Networking",
          "id": null,
          "rationale": "Identity, access, secrets, and networking primitives used to support cloud data platforms and pipelines.",
          "slug": "d_split_01_04",
          "source": "llm"
        },
        "dimension_id": 369,
        "input_skill": "Amazon Athena",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 New dimension saved (reconciliation separate) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2616,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Data Platform Services",
          "id": 81,
          "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
          "slug": "cloud-data-platform-services",
          "source": "db"
        },
        "dimension_id": 81,
        "input_skill": "AWS Data Pipeline",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2617,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Storage Provisioning and Automation",
          "id": 311,
          "rationale": "Covers the scripts, APIs, and operational workflows used to create, resize, map, and retire storage resources. This cluster is coherent because storage engineers often automate repetitive provisioning and maintenance tasks.",
          "slug": "storage-provisioning-and-automation",
          "source": "db"
        },
        "dimension_id": 311,
        "input_skill": "S3",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Storage Engineer",
            "id": 22,
            "rationale": null,
            "role_archetype": null,
            "slug": "storage-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2618,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Streaming Data Processing",
          "id": 69,
          "rationale": "Tools and patterns for ingesting and transforming event streams with low latency. This cluster covers continuous processing, windowing, and stateful stream jobs used to keep data fresh.",
          "slug": "streaming-data-processing",
          "source": "db"
        },
        "dimension_id": 69,
        "input_skill": "Kinesis",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2619,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "HTTP API Frameworks and Gateway Layers",
          "id": null,
          "rationale": "Server-side frameworks and gateway layers used to build, route, validate, and manage HTTP APIs. Includes backend web service frameworks and API gateway configuration for endpoints, request/response mapping, routing, input validation, auth integration, throttling, usage plans, stages, custom domains, and backend integration.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 141,
        "input_skill": "Amazon API Gateway",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2620,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Model Runtime Services",
          "id": 121,
          "rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
          "slug": "cloud-model-runtime-services",
          "source": "db"
        },
        "dimension_id": 121,
        "input_skill": "Pinecone",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Machine Learning Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "machine-learning-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2621,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Data Platform Services",
          "id": 81,
          "rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
          "slug": "cloud-data-platform-services",
          "source": "db"
        },
        "dimension_id": 81,
        "input_skill": "OpenSearch",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2622,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "OpenSearch",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2622,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": null,
          "display_name": "Applied Machine Learning Toolkits and Frameworks",
          "id": null,
          "rationale": "Libraries and frameworks used to prototype, compare, index, and evaluate machine learning solutions quickly. Includes applied-ML toolkits such as scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, FAISS, vector search libraries, and embedding utilities. Excludes serving runtimes, deployment containers, online inference infrastructure, streaming systems, and database/storage engines.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 94,
        "input_skill": "FAISS",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2623,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Version Control Systems",
          "id": 365,
          "rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 365,
        "input_skill": "FAISS",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2623,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "AI Inference Cost, Latency, and Throughput Optimization",
          "id": null,
          "rationale": "Improving the runtime efficiency of AI/ML-powered features by reducing inference cost and latency while increasing throughput and preserving user experience. Includes token budgeting, prompt compression, batching, caching, quantization, pruning, model selection, async inference, warm starts, streaming UX, timeout tuning, concurrency control, GPU utilization, and profiling. Excludes model training, feature engineering, registry/versioning, infrastructure autoscaling, serving capacity planning, generic backend performance tuning, and unrelated data/warehouse optimization.",
          "slug": "d_merge_03",
          "source": "llm"
        },
        "dimension_id": 260,
        "input_skill": "AI/ML",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2611,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 10,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Streaming Data Processing",
          "id": null,
          "rationale": "Tools and patterns for ingesting, buffering, and transforming event streams with low latency. This includes continuous processing, windowing, stateful stream jobs, checkpointing, shard scaling, stream partitioning, and managed streaming services such as Kinesis, Amazon Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics.",
          "slug": "d_merge_01",
          "source": "llm"
        },
        "dimension_id": 69,
        "input_skill": "Kinesis",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 2619,
        "skill_tag": "in_db",
        "skipped_reason": null
      }
    ],
    "new_skills_created": 13,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 21,
    "skipped": 0
  },
  "planner_output": null,
  "run_id": "20755499-04f6-440f-80a9-bb023fddc1ff"
}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…