Pipeline run
20755499-04f6-440f-80a9-bb023fddc1ff
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA descriptionvocab breakdown (legacy)
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Machine Learning Engineer
slug: machine-learning-engineer · id: 10 · source: db
The primary skills include a strong focus on AWS and AI/ML technologies, which aligns well with the role of a Machine Learning Engineer.
Resolution:
in_db
— role exists in library; skill↔dim and role↔dim links saved when applicable.
Job description
About the job At Capgemini Invent, we believe difference drives change. As inventive transformation consultants, we blend our strategic, creative and scientific capabilities, collaborating closely with clients to deliver cutting-edge solutions. Join us to drive transformation tailored to our client's challenges of today and tomorrow. Informed and validated by science and data. Superpowered by creativity and design. All underpinned by technology created with purpose. Your Role We are seeking a highly skilled Solution Architect – AWS Cloud & AI/ML to design, architect, and implement advanced AI/ML and generative AI solutions on the AWS platform. The ideal candidate will have deep expertise in large-scale distributed systems, modern AI/ML architectures, LLMs, data engineering pipelines, and AWS-native services. This role involves partnering with cross-functional teams, understanding business challenges, and crafting end‑to‑end scalable, secure, and cost‑optimized solutions Architect and deliver end‑to‑end AI/ML solutions on AWS, covering data ingestion, training, inference, orchestration, monitoring, and governance. Design and integrate LLM‑based and Generative AI solutions, including retrieval-augmented generation (RAG), prompt workflows, and production deployment strategies. Develop feature engineering strategies and scalable data pipelines to support ML training and real-time inference workloads. Lead technical discussions and provide guidance on AI/ML best practices, model lifecycle, optimization, MLOps, and model governance. Design highly scalable, secure, and cost-efficient architectures using: Amazon SageMaker (Training Jobs, Inference Endpoints, Pipelines, Feature Store, Model Registry) Amazon Bedrock (Foundation models, Generative AI orchestration, prompt management) AWS Lambda, ECS, EKS, EC2 for building and orchestrating distributed AI workloads. Architect and optimize data engineering platforms using: AWS Glue, Amazon Athena, Redshift, AWS Data Pipeline, S3, Kinesis, and related services. Build secure, production-grade API services for AI model inference using Amazon API Gateway and AWS compute services. Your Profile 8+ years of experience in cloud architecture, with at least 5 years in AWS. Strong expertise in: Machine Learning, MLOps, and GenAI solution design. Amazon SageMaker (end‑to‑end ML lifecycle). Amazon Bedrock and modern LLM architectures. Data engineering with Glue, Redshift, Athena, and pipeline orchestration. Experience containerizing and scaling AI workloads on Lambda/ECS/EKS. Strong coding experience in Python and familiarity with ML frameworks (TensorFlow, PyTorch, Scikit‑learn). Deep understanding of security, networking, IAM, and compliance best practices for AI systems. Excellent communication, design thinking, and stakeholder management skills. AWS certifications (e.g., AWS Certified Solutions Architect – Professional, Machine Learning – Specialty). Experience with vector databases (e.g., Pinecone, OpenSearch, FAISS). Experience building RAG pipelines, multi‑agent orchestration frameworks, or custom LLM fine‑tuning workflows. Familiarity with DevOps/MLOps tools: GitHub Actions, Airflow, Terraform, Docker, Kubernetes Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem.
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Aliases — catalog
- Compaction (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Concept
- Sub-category
- Storage Maintenance Concept
- Confidence
- 0.74
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Compaction is a standard storage-maintenance concept in widely used systems like LSM databases and Kafka; it appears in many JDs for Cassandra, RocksDB, and Kafka ops roles, indicating broad market demand.
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 13
- Sub-category id
- 161
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Platform Operations Catalog dimension db id 26
Library dimension (catalog)
Roles linked in library: DevOps Engineer
-
Cloud Security Platforms Catalog dimension db id 332
Library dimension (catalog)
Roles linked in library: Cybersecurity Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Platform Operations
cloud-platform-operations
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Cloud Security Platforms
cloud-security-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
AI/ML appears in a broad share of software and data job postings, with major vendors (AWS, Google, Microsoft) offering mainstream ML platforms and tooling; it’s now a common hiring-pipeline requirement rather than a niche specialty.
(0.99)
AI/ML is a common combined domain label in JDs and usually clearly means artificial intelligence and machine learning, not a different catalog skill.
Not versioned
Domain ·artificial_intelligence_machine_learning confidence 0.98
AI/ML is a vertical body of knowledge and problem-space rather than a tool, framework, or methodology, so it fits the Domain type.
- Category
- Domain
- Sub-category
- artificial_intelligence_machine_learning
- Skill nature
- CONCEPT
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Applied Machine Learning Tooling and Frameworks Proposed / LLM
Proposed / LLM dimension (no DB id yet)
-
AI Service Integration and Orchestration Patterns Proposed / LLM
Proposed / LLM dimension (no DB id yet)
-
AI Inference Cost, Latency, and Throughput Optimization Catalog dimension db id 260
Library dimension (catalog)
Roles linked in library: AI Engineer
Locked dimensions (v3 placement)
-
Applied Machine Learning Tooling and Frameworks
Pipeline tentative id
Libraries and frameworks used to prototype, build, train, tune, and compare machine learning models in practice. Includes hands-on AI/ML tooling such as scikit-learn, XGBoost, LightGBM, TensorFlow, and PyTorch, along with model training, feature engineering, hyperparameter tuning, and evaluation workflows. Excludes model serving, deployment packaging, online inference, registry/version promotion, distributed stream processing, and cloud infrastructure provisioning.
-
AI Service Integration and Orchestration Patterns
Pipeline tentative id
Patterns for structuring and embedding AI/ML capabilities within products and services, including deciding where AI logic lives and how it interacts with application flows. Covers model-backed APIs, retrieval-augmented generation, agent and workflow orchestration, batch scoring services, online inference integration, feature pipelines, and placement across handlers, workers, gateways, and service boundaries.
-
AI Inference Cost, Latency, and Throughput Optimization
Pipeline tentative id
Improving the runtime efficiency of AI/ML-powered features by reducing inference cost and latency while increasing throughput and preserving user experience. Includes token budgeting, prompt compression, batching, caching, quantization, pruning, model selection, async inference, warm starts, streaming UX, timeout tuning, concurrency control, GPU utilization, and profiling. Excludes model training, feature engineering, registry/versioning, infrastructure autoscaling, serving capacity planning, ge
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Applied Machine Learning Tooling and Frameworks
d_merge_01
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) |
|
AI Service Integration and Orchestration Patterns
d_merge_02
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) |
|
AI Inference Cost, Latency, and Throughput Optimization
ai-inference-cost-latency-and-throughput-optimization
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
AI Inference Cost, Latency, and Throughput Optimization
d_merge_03
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Commonly listed in ML/DS job descriptions and AWS’s managed ML platform is broadly adopted for training, deployment, and MLOps across enterprises.
Amazon Web Services ·proprietary ·since 2017 (0.98)
Amazon SageMaker is a specific AWS ML platform name and is usually unambiguous in job descriptions; it is unlikely to be mistaken for a different catalog skill.
Not versioned
Platform ·ml_platform confidence 0.98
By the Platform vs Tool rule, Amazon SageMaker is a hosted multi-tenant AWS environment with APIs and managed machine-learning capabilities, so it is a Platform rather than a Tool or a single Service in this typology.
- Category
- Platform
- Sub-category
- ml_platform
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Managed ML Platform Workflows Proposed / LLM
Proposed / LLM dimension (no DB id yet)
-
Managed Model Hosting and Endpoints Proposed / LLM
Proposed / LLM dimension (no DB id yet)
-
Model Serving Runtime Packaging Proposed / LLM
Proposed / LLM dimension (no DB id yet)
-
Model Serving Frameworks and Platforms Proposed / LLM
Proposed / LLM dimension (no DB id yet)
Locked dimensions (v3 placement)
-
Managed ML Platform Workflows
Pipeline tentative id
Cloud ML platforms for building and operating models end-to-end, including notebooks, experiments, managed training jobs, and pipeline/studio workflows. Examples: SageMaker Studio, SageMaker notebooks, SageMaker Pipelines, managed training jobs.
-
Managed Model Hosting and Endpoints
Pipeline tentative id
Cloud-managed services for deploying trained models as online or batch inference endpoints, including endpoint provisioning, batch transform, and rollout coordination. Examples: SageMaker endpoints, SageMaker batch transform.
-
Model Serving Runtime Packaging
Pipeline tentative id
Packaging trained models and model servers for deployment, including containers, base images, runtime dependencies, entrypoints, GPU images, and image scanning. Examples: Docker images for model services.
-
Model Serving Frameworks and Platforms
Pipeline tentative id
Serving frameworks and platforms used to run deployed models in online, batch, or streaming environments. Examples: TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, Seldon Core.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Managed ML Platform Workflows
d_split_01_01
|
✓ | — | New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role) |
|
Managed Model Hosting and Endpoints
d_split_01_02
|
✓ | — | New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role) |
|
Model Serving Runtime Packaging
d_split_01_03
|
✓ | — | New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role) |
|
Model Serving Frameworks and Platforms
d_split_01_04
|
✓ | — | New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Appears increasingly in cloud/ML job descriptions and AWS partner materials, but JD volume is still far below core AWS services like S3 or Lambda.
Amazon Web Services ·proprietary ·since 2023 (0.98)
Amazon Bedrock is a specific AWS managed AI model service with a distinctive name; typical JDs mentioning it are unlikely to mean a different catalog skill.
Not versioned
Service ·managed_ai_model_service confidence 0.97
By the Platform vs Service rule, Amazon Bedrock is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.
- Category
- Service
- Sub-category
- managed_ai_model_service
- Skill nature
- CLOUD_SERVICE
- Volatility
- EMERGING
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Cloud Model Runtime Services Catalog dimension db id 121
Library dimension (catalog)
Roles linked in library: Machine Learning Engineer
-
Cloud Model Runtime Services Catalog dimension db id 121
Library dimension (catalog)
Roles linked in library: Machine Learning Engineer
Locked dimensions (v3 placement)
-
Cloud Model Runtime Services
Reuses catalog slug
Consumer-facing managed services used to run, invoke, and integrate foundation models and related AI capabilities in cloud applications. Amazon Bedrock belongs here because it provides hosted model access, orchestration features, and runtime APIs for generative AI workloads.
-
Cloud Model Runtime Services
Reuses catalog slug
Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Model Runtime Services
cloud-model-runtime-services
|
✓ | ✓ | New skill saved · Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
Broadly adopted serverless compute; AWS Lambda appears in many cloud/backend job descriptions and is a standard AWS offering with strong ecosystem support.
Amazon Web Services ·proprietary ·since 2014 (0.99)
AWS Lambda is a specific AWS serverless compute service with a distinctive full name; in typical JDs it is unlikely to be confused with unrelated skills in the catalog.
Not versioned
Service ·serverless_compute_service confidence 0.99
By the Service vs Platform rule, AWS Lambda is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.
- Category
- Service
- Sub-category
- serverless_compute_service
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Managed Cloud Data Platform Services Proposed / LLM
Proposed / LLM dimension (no DB id yet)
Locked dimensions (v3 placement)
-
Managed Cloud Data Platform Services
Pipeline tentative id
Managed cloud services used to run data engineering and related application workloads, including serverless compute, workflow orchestration, storage, event triggers, and adjacent security/platform primitives. This covers services such as AWS Lambda, AWS Step Functions, AWS Glue, Amazon S3 event triggers, and other managed services used to build and operate pipelines and data platforms.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Managed Cloud Data Platform Services
d_merge_01
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
ECS appears in many game-engine and engine-architecture job descriptions, especially in Unity/DOTS and Rust/C++ gameplay systems, and has strong GitHub/library activity; it’s a common modern architecture pattern rather than a niche tool.
(0.99)
Could be confused with: amazon_ecs, elastic_container_service
“ECS” is a common acronym and in JDs often means Amazon Elastic Container Service; it can also be read as the generic entity-component-system architecture concept.
Not versioned
Concept ·entity_component_system confidence 0.93
ECS is fundamentally the Entity-Component-System design pattern, so by the Architecture vs Concept rule it is best typed as a Concept rather than a tool or platform.
- Category
- Concept
- Sub-category
- entity_component_system
- Skill nature
- CONCEPT
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Version Control Systems Catalog dimension db id 365
Library dimension (catalog)
Locked dimensions (v3 placement)
-
Container Orchestration Services
Pipeline tentative id
Managed services for running and scaling containerized workloads. ECS belongs here because it is an orchestration platform for scheduling tasks, managing services, and coordinating container runtime operations.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Version Control Systems
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Ansible playbooks (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Format
- Sub-category
- Automation Playbook Format
- Vendor
- Red Hat
- License
- gpl_v3
- Year introduced
- 2012
- Confidence
- 0.88
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Common in DevOps JDs and widely used for infrastructure automation; Red Hat/Ansible remains a standard tool in hiring pipelines, with playbooks the core format.
Skill profile (library / DB)
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 14
- Sub-category id
- 251
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Model Runtime Services Catalog dimension db id 121
Library dimension (catalog)
Roles linked in library: Machine Learning Engineer
-
Orchestration Platforms Catalog dimension db id 25
Library dimension (catalog)
Roles linked in library: Cloud Engineer, DevOps Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Model Runtime Services
cloud-model-runtime-services
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
|
Orchestration Platforms
orchestration-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — from this run (catalog unavailable)
- EC2 (CANONICAL) primary
Skill profile (library / DB)
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 14
- Sub-category id
- 1544
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Provider Core Services Catalog dimension db id 290
Library dimension (catalog)
Roles linked in library: Cloud Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Provider Core Services
cloud-provider-core-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — from this run (catalog unavailable)
- AWS Glue (CANONICAL) primary
Skill profile (library / DB)
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 14
- Sub-category id
- 385
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Data Platform Services Catalog dimension db id 81
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Data Platform Services
cloud-data-platform-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Commonly listed in cloud/data analytics JDs and AWS’s own docs position Athena as a standard serverless SQL query service for S3 data lakes, indicating broad market adoption.
Amazon Web Services ·proprietary ·since 2016 (0.99)
Amazon Athena is a specific AWS query service with a distinctive full name; in typical JDs it is unlikely to be confused with another catalog skill.
Not versioned
Service ·query_service confidence 0.98
By the Platform vs Tool and Service vs Platform rules, Amazon Athena is a managed capability inside AWS rather than software you run yourself, so it is a Service.
- Category
- Service
- Sub-category
- query_service
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Cloud Analytics Query Services Proposed / LLM
Proposed / LLM dimension (no DB id yet)
-
Cloud Data Pipeline Runtime Proposed / LLM
Proposed / LLM dimension (no DB id yet)
-
Cloud Data Platform Storage Proposed / LLM
Proposed / LLM dimension (no DB id yet)
-
Cloud Data Platform Security and Networking Proposed / LLM
Proposed / LLM dimension (no DB id yet)
Locked dimensions (v3 placement)
-
Cloud Analytics Query Services
Pipeline tentative id
Managed cloud services for querying and processing analytical data, including serverless SQL analytics, data lake querying, and federated query services such as Athena, Glue, Redshift Spectrum, and EMR-based analytics.
-
Cloud Data Pipeline Runtime
Pipeline tentative id
Managed cloud compute and orchestration services used to run data engineering jobs, transformations, and workflows.
-
Cloud Data Platform Storage
Pipeline tentative id
Cloud storage and data access services used as the persistence layer for data engineering platforms and pipelines.
-
Cloud Data Platform Security and Networking
Pipeline tentative id
Identity, access, secrets, and networking primitives used to support cloud data platforms and pipelines.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Analytics Query Services
d_split_01_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Cloud Data Pipeline Runtime
d_split_01_02
|
✓ | — | New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role) |
|
Cloud Data Platform Storage
d_split_01_03
|
✓ | — | New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role) |
|
Cloud Data Platform Security and Networking
d_split_01_04
|
✓ | — | New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — from this run (catalog unavailable)
- Redshift (CANONICAL)
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 13
- Sub-category id
- 2098
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Data Warehousing Platforms Catalog dimension db id 72
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Data Warehousing Platforms
data-warehousing-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
AWS announced AWS Data Pipeline is in maintenance mode and recommends newer services like Glue/Step Functions; recent JDs rarely list it compared with modern AWS data tooling.
Amazon Web Services ·proprietary ·since 2012 (0.98)
AWS Data Pipeline is a specific AWS service name and is unlikely to be mistaken for another catalog skill in a typical JD.
Not versioned
Service ·data_pipeline_service confidence 0.97
By the Service vs Platform rule, AWS Data Pipeline is a specific managed capability inside AWS rather than the AWS platform itself.
- Category
- Service
- Sub-category
- data_pipeline_service
- Skill nature
- CLOUD_SERVICE
- Volatility
- DEPRECATED
- Typical lifespan
- SHORT_LIVED
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Cloud Data Platform Services Catalog dimension db id 81
Library dimension (catalog)
Roles linked in library: Data Engineer
-
Cloud Data Platform Services Catalog dimension db id 81
Library dimension (catalog)
Roles linked in library: Data Engineer
Locked dimensions (v3 placement)
-
Cloud Data Platform Services
Reuses catalog slug
Managed cloud services used to build and operate data engineering workloads. AWS Data Pipeline fits here because it is an AWS service for orchestrating data movement and scheduled processing across storage and compute services.
-
Cloud Data Platform Services
Reuses catalog slug
Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Data Platform Services
cloud-data-platform-services
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Amazon S3 is a default cloud storage requirement in many job descriptions and is a core AWS service with broad ecosystem support; no sunset or replacement signal exists.
Amazon Web Services ·proprietary ·since 2006 (0.99)
Could be confused with: s4
"S3" is a short acronym that in JDs can mean AWS S3, but could also be read as a generic storage tier/label or other S3-named products in the catalog. A reasonable extractor may confuse it with adjacent cloud storage skills.
Not versioned
Platform ·cloud_storage_platform confidence 0.91
By the Platform vs Service rule, S3 is a hosted multi-tenant AWS capability with APIs rather than software you run yourself, so it fits Platform best.
- Category
- Platform
- Sub-category
- cloud_storage_platform
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Storage Provisioning and Automation Catalog dimension db id 311
Library dimension (catalog)
Roles linked in library: Storage Engineer
Locked dimensions (v3 placement)
-
Object Storage Provisioning
Reuses catalog slug
Covers creating, configuring, and operating S3-style object storage resources and their access controls. S3 belongs here because it is the canonical AWS object storage service used for buckets, objects, lifecycle, and access policies.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Storage Provisioning and Automation
storage-provisioning-and-automation
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
AWS Kinesis appears in many cloud/data engineering job postings and is a standard managed streaming service in AWS stacks; no vendor sunset indicates active market demand.
Amazon Web Services ·proprietary ·since 2013 (0.98)
In JDs, Kinesis usually clearly refers to AWS Kinesis, a distinct streaming service. The name is not a common overloaded acronym or short token likely to be mistaken for another catalog skill.
Not versioned
Service ·streaming_data_service confidence 0.93
By the Platform vs Service rule, Kinesis is a specific managed capability within AWS rather than a standalone hosted environment, so it is a Service.
- Category
- Service
- Sub-category
- streaming_data_service
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Streaming Data Processing Catalog dimension db id 69
Library dimension (catalog)
Roles linked in library: Data Engineer
Locked dimensions (v3 placement)
-
Streaming Data Processing
Pipeline tentative id
Tools and patterns for ingesting, buffering, and transforming event streams with low latency. This includes continuous processing, windowing, stateful stream jobs, checkpointing, shard scaling, stream partitioning, and managed streaming services such as Kinesis, Amazon Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Streaming Data Processing
streaming-data-processing
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Streaming Data Processing
d_merge_01
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Broadly listed in cloud/backend JDs and AWS docs; commonly paired with Lambda, IAM, and serverless stacks, indicating staple market demand rather than niche use.
Amazon Web Services ·proprietary ·since 2015 (0.98)
Amazon API Gateway is a specific AWS service name with little overlap in typical JDs; it is unlikely to be confused with a different catalog skill.
Not versioned
Service ·api_management_service confidence 0.98
By the Platform vs Service rule, Amazon API Gateway is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.
- Category
- Service
- Sub-category
- api_management_service
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
HTTP API Frameworks and Gateway Layers Proposed / LLM
Proposed / LLM dimension (no DB id yet)
Locked dimensions (v3 placement)
-
HTTP API Frameworks and Gateway Layers
Pipeline tentative id
Server-side frameworks and gateway layers used to build, route, validate, and manage HTTP APIs. Includes backend web service frameworks and API gateway configuration for endpoints, request/response mapping, routing, input validation, auth integration, throttling, usage plans, stages, custom domains, and backend integration.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
HTTP API Frameworks and Gateway Layers
d_merge_01
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Cobalt Strike (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Tool
- Sub-category
- Adversary Simulation Tool
- Vendor
- Fortra
- License
- proprietary
- Year introduced
- 2012
- Confidence
- 0.98
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Appears in a limited set of red-team/pentest JDs and security vendor training, but far below mainstream devops tools; market signal is specialized adversary-simulation usage rather than broad hiring demand.
Skill profile (library / DB)
- Skill nature
- LANGUAGE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 5
- Sub-category id
- 54
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Analytical Programming Languages Catalog dimension db id 82
Library dimension (catalog)
Roles linked in library: Data Analyst, Data Scientist
-
Automation Scripting and CLI Catalog dimension db id 48
Library dimension (catalog)
Roles linked in library: Azure Cloud Engineer, Cloud Engineer
-
Automation and Scripting for Operations Catalog dimension db id 361
Library dimension (catalog)
Roles linked in library: Virtualization Engineer
-
Network Automation and Scripting Catalog dimension db id 285
Library dimension (catalog)
Roles linked in library: Network Engineer
-
Programming Languages for AI Workflows Catalog dimension db id 261
Library dimension (catalog)
Roles linked in library: AI Engineer
-
Programming Languages for Backend Systems Catalog dimension db id 140
Library dimension (catalog)
Roles linked in library: Backend Engineer
-
Programming Languages for Data Work Catalog dimension db id 67
Library dimension (catalog)
Roles linked in library: Data Engineer
-
Programming Languages for ML Systems Catalog dimension db id 113
Library dimension (catalog)
Roles linked in library: Machine Learning Engineer
-
Programming Languages for Security Work Catalog dimension db id 328
Library dimension (catalog)
Roles linked in library: Cybersecurity Engineer
-
Programming Languages for Test Automation Catalog dimension db id 193
Library dimension (catalog)
Roles linked in library: Automation Tester
-
Security Automation and Scripting Catalog dimension db id 258
Library dimension (catalog)
Roles linked in library: Cybersecurity Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Analytical Programming Languages
analytical-programming-languages
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Automation Scripting and CLI
automation-scripting-and-cli
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Automation and Scripting for Operations
automation-and-scripting-for-operations
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Network Automation and Scripting
network-automation-and-scripting
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages for AI Workflows
programming-languages-for-ai-workflows
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages for Backend Systems
programming-languages-for-backend-systems
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages for ML Systems
programming-languages-for-ml-systems
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
|
Programming Languages for Security Work
programming-languages-for-security-work
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages for Test Automation
programming-languages-for-test-automation
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Security Automation and Scripting
security-automation-and-scripting
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- shader graphs (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Framework
- Sub-category
- Visual Shader Authoring Framework
- Vendor
- Unity Technologies
- License
- proprietary
- Year introduced
- 2018
- Confidence
- 0.74
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Shader graphs appear in some Unity/Unreal and VFX job postings, but JD volume is far below core graphics skills like HLSL/GLSL; market use is concentrated in game/real-time rendering teams.
Skill profile (library / DB)
- Skill nature
- LIBRARY
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 6
- Sub-category id
- 456
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Applied Machine Learning Toolkits Catalog dimension db id 94
Library dimension (catalog)
Roles linked in library: Data Scientist
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- GLSL (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Language
- Sub-category
- Shader Language
- Vendor
- Khronos Group
- License
- other_open
- Year introduced
- 2004
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: GLSL appears in graphics/game-engine JDs but at much lower volume than mainstream languages; it’s specialized for shader programming and often replaced in newer pipelines by HLSL/Metal Shading Language or higher-level abstractions.
Skill profile (library / DB)
- Skill nature
- LIBRARY
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 6
- Sub-category id
- 456
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Applied Machine Learning Toolkits Catalog dimension db id 94
Library dimension (catalog)
Roles linked in library: Data Scientist
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- post-processing (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Concept
- Sub-category
- Graphics Effect Concept
- Confidence
- 0.86
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Job postings rarely list "post-processing" as a standalone skill; it appears mainly in graphics/VFX roles, while broader JDs usually specify tools like Unreal/Unity or Photoshop instead.
Skill profile (library / DB)
- Skill nature
- LIBRARY
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 6
- Sub-category id
- 458
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Applied Machine Learning Toolkits Catalog dimension db id 94
Library dimension (catalog)
Roles linked in library: Data Scientist
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- E5 (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Library
- Sub-category
- Embedding Model Library
- Vendor
- OpenAI
- License
- other_open
- Year introduced
- 2021
- Confidence
- 0.80
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: E5 is a specific embedding-model library with limited JD volume; market demand is concentrated in AI/ML roles rather than broad software hiring, unlike mainstream libraries.
Skill profile (library / DB)
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 14
- Sub-category id
- 1019
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Continuous Integration Test Integration Catalog dimension db id 207
Library dimension (catalog)
Roles linked in library: Automation Tester
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Continuous Integration Test Integration
continuous-integration-test-integration
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- OpenVAS (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Tool
- Sub-category
- Vulnerability Scanner
- Vendor
- Greenbone Networks
- License
- gpl_v2
- Year introduced
- 2009
- Confidence
- 0.98
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: OpenVAS appears in security-focused JDs far less often than mainstream scanners like Nessus or Qualys, and its usage is concentrated in pentest/vuln-management roles rather than general DevOps stacks.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 11
- Sub-category id
- 335
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Workflow Orchestration Systems Catalog dimension db id 64
Library dimension (catalog)
Roles linked in library: Data Engineer, MLOps Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Workflow Orchestration Systems
workflow-orchestration-systems
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Snapshot loads (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Methodology
- Sub-category
- Data Loading Methodology
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Snapshot loads are a specialized data-loading pattern; JD volume is very low compared with mainstream ETL/ELT tools, and market discussion is mostly in niche data-engineering forums rather than broad hiring pipelines.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 11
- Sub-category id
- 171
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Infrastructure Provisioning Templates Catalog dimension db id 291
Library dimension (catalog)
Roles linked in library: Cloud Engineer
-
Infrastructure as Code Catalog dimension db id 22
Library dimension (catalog)
Roles linked in library: DevOps Engineer
-
Infrastructure as Code and Declarative Provisioning Catalog dimension db id 36
Library dimension (catalog)
Roles linked in library: Azure Cloud Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Infrastructure Provisioning Templates
infrastructure-provisioning-templates
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Infrastructure as Code
infrastructure-as-code
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Infrastructure as Code and Declarative Provisioning
infrastructure-as-code-and-declarative-provisioning
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Metabase (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Tool
- Sub-category
- Bi Analytics Tool
- Vendor
- Metabase, Inc.
- License
- apache_2
- Year introduced
- 2014
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Metabase appears in many BI/analytics job postings and is growing in GitHub usage, but it is still far less universal than Tableau/Power BI in enterprise JDs.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 11
- Sub-category id
- 170
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Containerization and Image Delivery Catalog dimension db id 24
Library dimension (catalog)
Roles linked in library: DevOps Engineer
-
Model Serving Deployment and Runtime Packaging Catalog dimension db id 52
Library dimension (catalog)
Roles linked in library: MLOps Engineer, Machine Learning Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Containerization and Image Delivery
containerization-and-image-delivery
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Model Serving Deployment and Runtime Packaging
model-serving-deployment-and-runtime-packaging
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Aliases — catalog
- Column-level security (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Concept
- Sub-category
- Access Control Concept
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Appears in cloud/data platform JDs and vendor docs for Snowflake, BigQuery, and PostgreSQL RLS/column masking, but is not yet a universal hiring staple like core IAM or RBAC.
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 13
- Sub-category id
- 1524
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Orchestration Platforms Catalog dimension db id 25
Library dimension (catalog)
Roles linked in library: Cloud Engineer, DevOps Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Orchestration Platforms
orchestration-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Pinecone appears in many AI/vector-search job descriptions and vendor docs, but it’s still far less universal than PostgreSQL/AWS; market signal shows growing adoption rather than staple status.
Pinecone Systems, Inc. ·proprietary ·since 2019 (0.95)
Pinecone is a distinctive vector database platform name; in typical JDs it is unlikely to be confused with another catalog skill.
Not versioned
Platform ·vector_database_platform confidence 0.90
By the Vendor SaaS = Platform rule, Pinecone is a hosted multi-tenant vector database service consumed via APIs rather than software you run yourself.
- Category
- Platform
- Sub-category
- vector_database_platform
- Skill nature
- PLATFORM
- Volatility
- EMERGING
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Cloud Model Runtime Services Catalog dimension db id 121
Library dimension (catalog)
Roles linked in library: Machine Learning Engineer
Locked dimensions (v3 placement)
-
Vector Database Services
Reuses catalog slug
Managed services used to store, index, and query embeddings for semantic search and retrieval-augmented applications. Pinecone belongs here because it is a purpose-built vector database service rather than a general-purpose datastore.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Model Runtime Services
cloud-model-runtime-services
|
✓ | ✓ | New skill saved · Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
OpenSearch appears in growing numbers of JDs for search/log analytics, but Elasticsearch still dominates most postings; AWS also continues to position it as the open-source successor to Elasticsearch.
OpenSearch Project ·apache_2 ·since 2021 (0.98)
OpenSearch is a specific search engine/datastore name with little overlap in typical JDs; it is unlikely to be mistaken for another catalog skill.
Not versioned
Datastore ·search_engine_datastore confidence 0.93
OpenSearch is fundamentally a persistent search and analytics datastore, and under the Datastore vs Format rule it fits Datastore because it stores and indexes data rather than merely defining a format.
- Category
- Datastore
- Sub-category
- search_engine_datastore
- Skill nature
- TOOL
- Volatility
- EMERGING
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Cloud Data Platform Services Catalog dimension db id 81
Library dimension (catalog)
Roles linked in library: Data Engineer
-
Version Control Systems Catalog dimension db id 365
Library dimension (catalog)
Locked dimensions (v3 placement)
-
Search and Analytics Services
Reuses catalog slug
Managed search and indexing services used to store, query, and analyze large document or event datasets. OpenSearch belongs here because it is commonly used as a search engine and analytics backend in cloud data platforms.
-
Search Engine Administration
Pipeline tentative id
Operational setup and tuning of search clusters, indexes, and query behavior. This fits OpenSearch when the skill emphasis is on running and configuring the search engine itself rather than integrating it into an application.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Data Platform Services
cloud-data-platform-services
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Version Control Systems
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
FAISS appears in many ML/vector-search job descriptions and is widely used in RAG stacks, but it’s still less universal than Elasticsearch/PostgreSQL; market demand is growing rather than ubiquitous.
Meta ·mit ·since 2017 (0.99)
FAISS is a distinctive library name for vector similarity search; in typical JDs it is unlikely to be confused with another catalog skill.
Not versioned
Library ·vector_search_library confidence 0.93
FAISS is fundamentally a code package imported by applications for similarity search, so under the Tool vs Framework rule it fits Library rather than a user-operated tool or hosted platform.
- Category
- Library
- Sub-category
- vector_search_library
- Skill nature
- LIBRARY
- Volatility
- EMERGING
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Applied Machine Learning Toolkits and Frameworks Proposed / LLM
Proposed / LLM dimension (no DB id yet)
-
Version Control Systems Catalog dimension db id 365
Library dimension (catalog)
Locked dimensions (v3 placement)
-
Applied Machine Learning Toolkits and Frameworks
Pipeline tentative id
Libraries and frameworks used to prototype, compare, index, and evaluate machine learning solutions quickly. Includes applied-ML toolkits such as scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, FAISS, vector search libraries, and embedding utilities. Excludes serving runtimes, deployment containers, online inference infrastructure, streaming systems, and database/storage engines.
-
Vector Search Indexing
Pipeline tentative id
Index structures and libraries for approximate nearest-neighbor search over embeddings and feature vectors. FAISS fits strongly here because it is primarily used to build and query high-performance vector indexes for retrieval.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Applied Machine Learning Toolkits and Frameworks
d_merge_01
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) |
|
Version Control Systems
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
All API 3 persistence rows
Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.
| Skill | Tag | Dimension | Skill↔dim | Role↔dim | Outcome | Notes |
|---|---|---|---|---|---|---|
| AWS | in_db |
Cloud Platform Operations
cloud-platform-operations
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| AWS | in_db |
Cloud Security Platforms
cloud-security-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| EKS | in_db |
Cloud Model Runtime Services
cloud-model-runtime-services
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| EKS | in_db |
Orchestration Platforms
orchestration-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| EC2 | in_db |
Cloud Provider Core Services
cloud-provider-core-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| AWS Glue | in_db |
Cloud Data Platform Services
cloud-data-platform-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Redshift | in_db |
Data Warehousing Platforms
data-warehousing-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Analytical Programming Languages
analytical-programming-languages
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Automation Scripting and CLI
automation-scripting-and-cli
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Automation and Scripting for Operations
automation-and-scripting-for-operations
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Network Automation and Scripting
network-automation-and-scripting
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Programming Languages for AI Workflows
programming-languages-for-ai-workflows
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Programming Languages for Backend Systems
programming-languages-for-backend-systems
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Programming Languages for ML Systems
programming-languages-for-ml-systems
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Python | in_db |
Programming Languages for Security Work
programming-languages-for-security-work
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Programming Languages for Test Automation
programming-languages-for-test-automation
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Security Automation and Scripting
security-automation-and-scripting
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| TensorFlow | in_db |
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| PyTorch | in_db |
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Scikit-learn | in_db |
Applied Machine Learning Toolkits
applied-machine-learning-toolkits
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| GitHub Actions | in_db |
Continuous Integration Test Integration
continuous-integration-test-integration
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Airflow | in_db |
Workflow Orchestration Systems
workflow-orchestration-systems
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Terraform | in_db |
Infrastructure Provisioning Templates
infrastructure-provisioning-templates
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Terraform | in_db |
Infrastructure as Code
infrastructure-as-code
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Terraform | in_db |
Infrastructure as Code and Declarative Provisioning
infrastructure-as-code-and-declarative-provisioning
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Docker | in_db |
Containerization and Image Delivery
containerization-and-image-delivery
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Docker | in_db |
Model Serving Deployment and Runtime Packaging
model-serving-deployment-and-runtime-packaging
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Kubernetes | in_db |
Orchestration Platforms
orchestration-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| AI/ML | in_db |
Applied Machine Learning Tooling and Frameworks
d_merge_01
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) | |
| AI/ML | in_db |
AI Service Integration and Orchestration Patterns
d_merge_02
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) | |
| AI/ML | in_db |
AI Inference Cost, Latency, and Throughput Optimization
ai-inference-cost-latency-and-throughput-optimization
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon SageMaker | in_db |
Managed ML Platform Workflows
d_split_01_01
|
✓ | — | New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon SageMaker | in_db |
Managed Model Hosting and Endpoints
d_split_01_02
|
✓ | — | New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon SageMaker | in_db |
Model Serving Runtime Packaging
d_split_01_03
|
✓ | — | New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon SageMaker | in_db |
Model Serving Frameworks and Platforms
d_split_01_04
|
✓ | — | New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon Bedrock | in_db |
Cloud Model Runtime Services
cloud-model-runtime-services
|
✓ | ✓ | New skill saved · Existing dimension (library) · Role↔dimension saved | |
| AWS Lambda | in_db |
Managed Cloud Data Platform Services
d_merge_01
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) | |
| ECS | in_db |
Version Control Systems
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon Athena | in_db |
Cloud Analytics Query Services
d_split_01_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon Athena | in_db |
Cloud Data Pipeline Runtime
d_split_01_02
|
✓ | — | New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon Athena | in_db |
Cloud Data Platform Storage
d_split_01_03
|
✓ | — | New skill saved · Existing dimension (embedding dedup) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon Athena | in_db |
Cloud Data Platform Security and Networking
d_split_01_04
|
✓ | — | New skill saved · New dimension saved (reconciliation separate) · Role↔dimension skipped (dimension not under chosen role) | |
| AWS Data Pipeline | in_db |
Cloud Data Platform Services
cloud-data-platform-services
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| S3 | in_db |
Storage Provisioning and Automation
storage-provisioning-and-automation
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Kinesis | in_db |
Streaming Data Processing
streaming-data-processing
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon API Gateway | in_db |
HTTP API Frameworks and Gateway Layers
d_merge_01
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) | |
| Pinecone | in_db |
Cloud Model Runtime Services
cloud-model-runtime-services
|
✓ | ✓ | New skill saved · Existing dimension (library) · Role↔dimension saved | |
| OpenSearch | in_db |
Cloud Data Platform Services
cloud-data-platform-services
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| OpenSearch | in_db |
Version Control Systems
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| FAISS | in_db |
Applied Machine Learning Toolkits and Frameworks
d_merge_01
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) | |
| FAISS | in_db |
Version Control Systems
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| AI/ML | in_db |
AI Inference Cost, Latency, and Throughput Optimization
d_merge_03
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) | |
| Kinesis | in_db |
Streaming Data Processing
d_merge_01
|
✓ | — | New skill saved · Existing dimension (reconciliation merge) · Role↔dimension skipped (dimension not under chosen role) |
Library artifacts (this run)
| Kind | Detail | DB id |
|---|---|---|
| canonical_skill_added | AI/ML | 2611 |
| canonical_skill_added | Amazon SageMaker | 2612 |
| canonical_skill_added | Amazon Bedrock | 2613 |
| canonical_skill_added | AWS Lambda | 2614 |
| canonical_skill_added | ECS | 2615 |
| canonical_skill_added | Amazon Athena | 2616 |
| canonical_skill_added | AWS Data Pipeline | 2617 |
| canonical_skill_added | S3 | 2618 |
| canonical_skill_added | Kinesis | 2619 |
| canonical_skill_added | Amazon API Gateway | 2620 |
| canonical_skill_added | Pinecone | 2621 |
| canonical_skill_added | OpenSearch | 2622 |
| canonical_skill_added | FAISS | 2623 |
| dimension_skill_link | AI/ML ↔ Applied Machine Learning Tooling and Frameworks | 94 |
| dimension_skill_link | AI/ML ↔ AI Service Integration and Orchestration Patterns | 270 |
| dimension_skill_link | AI/ML ↔ AI Inference Cost, Latency, and Throughput Optimization | 260 |
| dimension_created | Managed ML Platform Workflows | 367 |
| dimension_skill_link | Amazon SageMaker ↔ Managed ML Platform Workflows | 367 |
| dimension_created | Managed Model Hosting and Endpoints | 368 |
| dimension_skill_link | Amazon SageMaker ↔ Managed Model Hosting and Endpoints | 368 |
| dimension_skill_link | Amazon SageMaker ↔ Model Serving Runtime Packaging | 52 |
| dimension_skill_link | Amazon Bedrock ↔ Cloud Model Runtime Services | 121 |
| dimension_skill_link | AWS Lambda ↔ Managed Cloud Data Platform Services | 81 |
| dimension_skill_link | ECS ↔ Version Control Systems | 365 |
| dimension_skill_link | Amazon Athena ↔ Cloud Analytics Query Services | 367 |
| dimension_skill_link | Amazon Athena ↔ Cloud Data Pipeline Runtime | 81 |
| dimension_created | Cloud Data Platform Security and Networking | 369 |
| dimension_skill_link | Amazon Athena ↔ Cloud Data Platform Security and Networking | 369 |
| dimension_skill_link | AWS Data Pipeline ↔ Cloud Data Platform Services | 81 |
| dimension_skill_link | S3 ↔ Storage Provisioning and Automation | 311 |
| dimension_skill_link | Kinesis ↔ Streaming Data Processing | 69 |
| dimension_skill_link | Amazon API Gateway ↔ HTTP API Frameworks and Gateway Layers | 141 |
| dimension_skill_link | Pinecone ↔ Cloud Model Runtime Services | 121 |
| dimension_skill_link | OpenSearch ↔ Cloud Data Platform Services | 81 |
| dimension_skill_link | OpenSearch ↔ Version Control Systems | 365 |
| dimension_skill_link | FAISS ↔ Applied Machine Learning Toolkits and Frameworks | 94 |
| dimension_skill_link | FAISS ↔ Version Control Systems | 365 |
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "AWS"
},
{
"is_primary": true,
"skill_name": "AI/ML"
},
{
"is_primary": true,
"skill_name": "Amazon SageMaker"
},
{
"is_primary": true,
"skill_name": "Amazon Bedrock"
},
{
"is_primary": true,
"skill_name": "AWS Lambda"
},
{
"is_primary": true,
"skill_name": "ECS"
},
{
"is_primary": true,
"skill_name": "EKS"
},
{
"is_primary": true,
"skill_name": "EC2"
},
{
"is_primary": true,
"skill_name": "AWS Glue"
},
{
"is_primary": true,
"skill_name": "Amazon Athena"
},
{
"is_primary": true,
"skill_name": "Redshift"
},
{
"is_primary": true,
"skill_name": "AWS Data Pipeline"
},
{
"is_primary": true,
"skill_name": "S3"
},
{
"is_primary": true,
"skill_name": "Kinesis"
},
{
"is_primary": true,
"skill_name": "Amazon API Gateway"
},
{
"is_primary": true,
"skill_name": "Python"
},
{
"is_primary": false,
"skill_name": "TensorFlow"
},
{
"is_primary": false,
"skill_name": "PyTorch"
},
{
"is_primary": false,
"skill_name": "Scikit-learn"
},
{
"is_primary": false,
"skill_name": "GitHub Actions"
},
{
"is_primary": false,
"skill_name": "Airflow"
},
{
"is_primary": false,
"skill_name": "Terraform"
},
{
"is_primary": false,
"skill_name": "Docker"
},
{
"is_primary": false,
"skill_name": "Kubernetes"
},
{
"is_primary": false,
"skill_name": "Pinecone"
},
{
"is_primary": false,
"skill_name": "OpenSearch"
},
{
"is_primary": false,
"skill_name": "FAISS"
}
],
"run_id": null
}
API 2 — extract-details
{
"alias_matches": [
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 348,
"existing_alias_text": "AWS",
"input_term": "AWS",
"matched_canonical": {
"category_id": 13,
"display_name": "AWS",
"id": 163,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "aws",
"sub_category_id": 161,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 1093,
"existing_alias_text": "EKS",
"input_term": "EKS",
"matched_canonical": {
"category_id": 14,
"display_name": "EKS",
"id": 725,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "eks",
"sub_category_id": 251,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2372,
"existing_alias_text": "EC2",
"input_term": "EC2",
"matched_canonical": {
"category_id": 14,
"display_name": "EC2",
"id": 1773,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "ec2",
"sub_category_id": 1544,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 730,
"existing_alias_text": "AWS Glue",
"input_term": "AWS Glue",
"matched_canonical": {
"category_id": 14,
"display_name": "AWS Glue",
"id": 466,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "aws-glue",
"sub_category_id": 385,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 3367,
"existing_alias_text": "Redshift",
"input_term": "Redshift",
"matched_canonical": {
"category_id": 13,
"display_name": "Redshift",
"id": 2570,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "redshift",
"sub_category_id": 2098,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 608,
"existing_alias_text": "Python",
"input_term": "Python",
"matched_canonical": {
"category_id": 5,
"display_name": "Python",
"id": 393,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "python",
"sub_category_id": 54,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 862,
"existing_alias_text": "TensorFlow",
"input_term": "TensorFlow",
"matched_canonical": {
"category_id": 6,
"display_name": "TensorFlow",
"id": 558,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LIBRARY",
"slug": "tensorflow",
"sub_category_id": 456,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 861,
"existing_alias_text": "PyTorch",
"input_term": "PyTorch",
"matched_canonical": {
"category_id": 6,
"display_name": "PyTorch",
"id": 557,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LIBRARY",
"slug": "pytorch",
"sub_category_id": 456,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 852,
"existing_alias_text": "scikit-learn",
"input_term": "Scikit-learn",
"matched_canonical": {
"category_id": 6,
"display_name": "scikit-learn",
"id": 554,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LIBRARY",
"slug": "scikit-learn",
"sub_category_id": 458,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 1800,
"existing_alias_text": "GitHub Actions",
"input_term": "GitHub Actions",
"matched_canonical": {
"category_id": 14,
"display_name": "GitHub Actions",
"id": 1250,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "github-actions",
"sub_category_id": 1019,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 540,
"existing_alias_text": "Airflow",
"input_term": "Airflow",
"matched_canonical": {
"category_id": 11,
"display_name": "Airflow",
"id": 325,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "airflow",
"sub_category_id": 335,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 290,
"existing_alias_text": "Terraform",
"input_term": "Terraform",
"matched_canonical": {
"category_id": 11,
"display_name": "Terraform",
"id": 144,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "terraform",
"sub_category_id": 171,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 299,
"existing_alias_text": "Docker",
"input_term": "Docker",
"matched_canonical": {
"category_id": 11,
"display_name": "Docker",
"id": 153,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "docker",
"sub_category_id": 170,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 304,
"existing_alias_text": "Kubernetes",
"input_term": "Kubernetes",
"matched_canonical": {
"category_id": 13,
"display_name": "Kubernetes",
"id": 158,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "kubernetes",
"sub_category_id": 1524,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
}
],
"candidate_roles": [
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
},
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "Data Analyst",
"id": 20,
"rationale": null,
"role_archetype": null,
"slug": "data-analyst",
"source": "db"
},
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
},
{
"display_name": "Azure Cloud Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "azure-cloud-engineer",
"source": "db"
},
{
"display_name": "Virtualization Engineer",
"id": 26,
"rationale": null,
"role_archetype": null,
"slug": "virtualization-engineer",
"source": "db"
},
{
"display_name": "Network Engineer",
"id": 21,
"rationale": null,
"role_archetype": null,
"slug": "network-engineer",
"source": "db"
},
{
"display_name": "AI Engineer",
"id": 12,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
},
{
"display_name": "Backend Engineer",
"id": 14,
"rationale": null,
"role_archetype": null,
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Automation Tester",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "automation-tester",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "mlops-engineer",
"source": "db"
},
{
"display_name": "Storage Engineer",
"id": 22,
"rationale": null,
"role_archetype": null,
"slug": "storage-engineer",
"source": "db"
}
],
"chosen_role": {
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": "The primary skills include a strong focus on AWS and AI/ML technologies, which aligns well with the role of a Machine Learning Engineer.",
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platform Operations",
"id": 26,
"rationale": "Uses cloud provider services to support delivery and runtime environments. The focus is on consumer-level operation of cloud services rather than deep cloud architecture ownership.",
"slug": "cloud-platform-operations",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Platforms",
"id": 332,
"rationale": "Cloud-native security products used to assess posture, detect misconfigurations, and monitor workloads across AWS, Azure, and GCP. This is a distinct product family because the role often works across multiple CNAPP/CSPM/CWPP offerings and cloud-native detectors.",
"slug": "cloud-security-platforms",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"input_skill": "EKS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Orchestration Platforms",
"id": 25,
"rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
"slug": "orchestration-platforms",
"source": "db"
},
"input_skill": "EKS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Provider Core Services",
"id": 290,
"rationale": "Core managed services used to provision and operate cloud environments. This is the base cloud surface for compute, storage, networking, and platform primitives the role configures and maintains.",
"slug": "cloud-provider-core-services",
"source": "db"
},
"input_skill": "EC2",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"input_skill": "AWS Glue",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Data Warehousing Platforms",
"id": 72,
"rationale": "Cloud and on-prem analytical storage systems used to persist curated datasets and serve downstream consumers. This cluster is about the warehouse/lakehouse layer where transformed data is organized for access.",
"slug": "data-warehousing-platforms",
"source": "db"
},
"input_skill": "Redshift",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Analytical Programming Languages",
"id": 82,
"rationale": "Languages used to clean, transform, analyze, and prototype models in notebooks and scripts. This is the core coding surface for expressing statistical logic and data manipulation in a reproducible way.",
"slug": "analytical-programming-languages",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Analyst",
"id": 20,
"rationale": null,
"role_archetype": null,
"slug": "data-analyst",
"source": "db"
},
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Automation Scripting and CLI",
"id": 48,
"rationale": "Uses scripts and command-line tooling to execute repeatable Azure operations and reduce manual work. This is a practical cluster because the role frequently automates provisioning, checks, and remediation tasks.",
"slug": "automation-scripting-and-cli",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Azure Cloud Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "azure-cloud-engineer",
"source": "db"
},
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Automation and Scripting for Operations",
"id": 361,
"rationale": "Scripts and lightweight automation used to execute repetitive virtualization tasks and enforce operational consistency. This is the practical glue that reduces manual host and VM administration.",
"slug": "automation-and-scripting-for-operations",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Virtualization Engineer",
"id": 26,
"rationale": null,
"role_archetype": null,
"slug": "virtualization-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Network Automation and Scripting",
"id": 285,
"rationale": "Covers scripts and automation used to configure, validate, and audit network devices and services. This cluster is coherent because repeatable network operations increasingly depend on programmatic changes and checks.",
"slug": "network-automation-and-scripting",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Network Engineer",
"id": 21,
"rationale": null,
"role_archetype": null,
"slug": "network-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for AI Workflows",
"id": 261,
"rationale": "Languages used to implement AI feature logic, orchestration, and response handling inside product code. This is the core coding surface for turning prompts and model calls into reliable application behavior.",
"slug": "programming-languages-for-ai-workflows",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 12,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Backend Systems",
"id": 140,
"rationale": "Languages used to implement server-side business logic, request handlers, workers, and service integrations. This is the core coding surface for backend feature delivery and maintenance.",
"slug": "programming-languages-for-backend-systems",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 14,
"rationale": null,
"role_archetype": null,
"slug": "backend-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 67,
"rationale": "Languages used to implement data pipelines, transformations, and operational utilities. This is the code layer for expressing extraction, parsing, validation, and orchestration logic in data engineering workflows.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 113,
"rationale": "Languages used to implement model integration code, inference services, and feature-processing logic. This is the core coding surface for turning trained models into product-facing software components.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Security Work",
"id": 328,
"rationale": "Languages used to automate security tasks, write detection logic, and build analysis or remediation tooling. This is the core coding surface for a cybersecurity engineer across scripts, queries, and small utilities.",
"slug": "programming-languages-for-security-work",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Test Automation",
"id": 193,
"rationale": "Languages used to implement automated checks, helper utilities, and test harness code. This is the core coding surface for turning test ideas into maintainable automation.",
"slug": "programming-languages-for-test-automation",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Automation Tester",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "automation-tester",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Security Automation and Scripting",
"id": 258,
"rationale": "Automating repeatable security checks, enrichment, and remediation workflows. This cluster is coherent because the role often needs lightweight automation to scale analysis and response.",
"slug": "security-automation-and-scripting",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Applied Machine Learning Toolkits",
"id": 94,
"rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
"slug": "applied-machine-learning-toolkits",
"source": "db"
},
"input_skill": "TensorFlow",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Applied Machine Learning Toolkits",
"id": 94,
"rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
"slug": "applied-machine-learning-toolkits",
"source": "db"
},
"input_skill": "PyTorch",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Applied Machine Learning Toolkits",
"id": 94,
"rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
"slug": "applied-machine-learning-toolkits",
"source": "db"
},
"input_skill": "Scikit-learn",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Continuous Integration Test Integration",
"id": 207,
"rationale": "Integrating automated checks into shared build and merge workflows so results are repeatable and visible. This cluster is coherent because automation testers commonly configure test execution triggers, artifacts, and reporting hooks.",
"slug": "continuous-integration-test-integration",
"source": "db"
},
"input_skill": "GitHub Actions",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Automation Tester",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "automation-tester",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Workflow Orchestration Systems",
"id": 64,
"rationale": "Operational orchestration of ML jobs, dependencies, and handoffs across training, validation, deployment, and retraining. This is a useful split from training pipelines because it emphasizes the scheduler and control plane.",
"slug": "workflow-orchestration-systems",
"source": "db"
},
"input_skill": "Airflow",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "mlops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure Provisioning Templates",
"id": 291,
"rationale": "Declarative templates and modules used to create repeatable cloud resources and environments. This cluster covers the infrastructure definitions the role applies, reviews, and updates to keep environments consistent.",
"slug": "infrastructure-provisioning-templates",
"source": "db"
},
"input_skill": "Terraform",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure as Code",
"id": 22,
"rationale": "Defines infrastructure and platform resources through versioned code so environments are repeatable and reviewable. This is a coherent cluster because it underpins environment consistency and change control.",
"slug": "infrastructure-as-code",
"source": "db"
},
"input_skill": "Terraform",
"llm_role": null,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure as Code and Declarative Provisioning",
"id": 36,
"rationale": "Defines cloud and platform infrastructure declaratively through versioned code so environments are repeatable, reviewable, and automatable. This includes authoring and maintaining IaC templates/modules, managing parameters and state, and using plan/apply workflows to provision and update resources across Azure and other cloud platforms.",
"slug": "infrastructure-as-code-and-declarative-provisioning",
"source": "db"
},
"input_skill": "Terraform",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Azure Cloud Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "azure-cloud-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Containerization and Image Delivery",
"id": 24,
"rationale": "Builds, packages, and ships application and support workloads as container images. This cluster covers the artifact format and the mechanics of producing deployable images.",
"slug": "containerization-and-image-delivery",
"source": "db"
},
"input_skill": "Docker",
"llm_role": null,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Model Serving Deployment and Runtime Packaging",
"id": 52,
"rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
"slug": "model-serving-deployment-and-runtime-packaging",
"source": "db"
},
"input_skill": "Docker",
"llm_role": null,
"roles_from_db": [
{
"display_name": "MLOps Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "mlops-engineer",
"source": "db"
},
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Orchestration Platforms",
"id": 25,
"rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
"slug": "orchestration-platforms",
"source": "db"
},
"input_skill": "Kubernetes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Applied Machine Learning Tooling and Frameworks",
"id": null,
"rationale": "Libraries and frameworks used to prototype, build, train, tune, and compare machine learning models in practice. Includes hands-on AI/ML tooling such as scikit-learn, XGBoost, LightGBM, TensorFlow, and PyTorch, along with model training, feature engineering, hyperparameter tuning, and evaluation workflows. Excludes model serving, deployment packaging, online inference, registry/version promotion, distributed stream processing, and cloud infrastructure provisioning.",
"slug": "d_merge_01",
"source": "llm"
},
"input_skill": "AI/ML",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "AI Service Integration and Orchestration Patterns",
"id": null,
"rationale": "Patterns for structuring and embedding AI/ML capabilities within products and services, including deciding where AI logic lives and how it interacts with application flows. Covers model-backed APIs, retrieval-augmented generation, agent and workflow orchestration, batch scoring services, online inference integration, feature pipelines, and placement across handlers, workers, gateways, and service boundaries.",
"slug": "d_merge_02",
"source": "llm"
},
"input_skill": "AI/ML",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "AI Inference Cost, Latency, and Throughput Optimization",
"id": 260,
"rationale": "Improving the speed, throughput, and cost efficiency of AI and ML-powered product features without sacrificing correctness or user experience. Includes token budgeting, prompt compression, batching, caching, model selection, quantization, pruning, async inference, warm starts, streaming UX, timeout tuning, concurrency control, and profiling. Excludes infrastructure autoscaling, model serving capacity planning, generic backend performance tuning, and unrelated data/warehouse optimization.",
"slug": "ai-inference-cost-latency-and-throughput-optimization",
"source": "db"
},
"input_skill": "AI/ML",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 12,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Managed ML Platform Workflows",
"id": null,
"rationale": "Cloud ML platforms for building and operating models end-to-end, including notebooks, experiments, managed training jobs, and pipeline/studio workflows. Examples: SageMaker Studio, SageMaker notebooks, SageMaker Pipelines, managed training jobs.",
"slug": "d_split_01_01",
"source": "llm"
},
"input_skill": "Amazon SageMaker",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Managed Model Hosting and Endpoints",
"id": null,
"rationale": "Cloud-managed services for deploying trained models as online or batch inference endpoints, including endpoint provisioning, batch transform, and rollout coordination. Examples: SageMaker endpoints, SageMaker batch transform.",
"slug": "d_split_01_02",
"source": "llm"
},
"input_skill": "Amazon SageMaker",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Model Serving Runtime Packaging",
"id": null,
"rationale": "Packaging trained models and model servers for deployment, including containers, base images, runtime dependencies, entrypoints, GPU images, and image scanning. Examples: Docker images for model services.",
"slug": "d_split_01_03",
"source": "llm"
},
"input_skill": "Amazon SageMaker",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Model Serving Frameworks and Platforms",
"id": null,
"rationale": "Serving frameworks and platforms used to run deployed models in online, batch, or streaming environments. Examples: TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, Seldon Core.",
"slug": "d_split_01_04",
"source": "llm"
},
"input_skill": "Amazon SageMaker",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"input_skill": "Amazon Bedrock",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"input_skill": "Amazon Bedrock",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Managed Cloud Data Platform Services",
"id": null,
"rationale": "Managed cloud services used to run data engineering and related application workloads, including serverless compute, workflow orchestration, storage, event triggers, and adjacent security/platform primitives. This covers services such as AWS Lambda, AWS Step Functions, AWS Glue, Amazon S3 event triggers, and other managed services used to build and operate pipelines and data platforms.",
"slug": "d_merge_01",
"source": "llm"
},
"input_skill": "AWS Lambda",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Version Control Systems",
"id": 365,
"rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "ECS",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Analytics Query Services",
"id": null,
"rationale": "Managed cloud services for querying and processing analytical data, including serverless SQL analytics, data lake querying, and federated query services such as Athena, Glue, Redshift Spectrum, and EMR-based analytics.",
"slug": "d_split_01_01",
"source": "llm"
},
"input_skill": "Amazon Athena",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Data Pipeline Runtime",
"id": null,
"rationale": "Managed cloud compute and orchestration services used to run data engineering jobs, transformations, and workflows.",
"slug": "d_split_01_02",
"source": "llm"
},
"input_skill": "Amazon Athena",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Data Platform Storage",
"id": null,
"rationale": "Cloud storage and data access services used as the persistence layer for data engineering platforms and pipelines.",
"slug": "d_split_01_03",
"source": "llm"
},
"input_skill": "Amazon Athena",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Data Platform Security and Networking",
"id": null,
"rationale": "Identity, access, secrets, and networking primitives used to support cloud data platforms and pipelines.",
"slug": "d_split_01_04",
"source": "llm"
},
"input_skill": "Amazon Athena",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"input_skill": "AWS Data Pipeline",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"input_skill": "AWS Data Pipeline",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Storage Provisioning and Automation",
"id": 311,
"rationale": "Covers the scripts, APIs, and operational workflows used to create, resize, map, and retire storage resources. This cluster is coherent because storage engineers often automate repetitive provisioning and maintenance tasks.",
"slug": "storage-provisioning-and-automation",
"source": "db"
},
"input_skill": "S3",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Storage Engineer",
"id": 22,
"rationale": null,
"role_archetype": null,
"slug": "storage-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Streaming Data Processing",
"id": 69,
"rationale": "Tools and patterns for ingesting and transforming event streams with low latency. This cluster covers continuous processing, windowing, and stateful stream jobs used to keep data fresh.",
"slug": "streaming-data-processing",
"source": "db"
},
"input_skill": "Kinesis",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "HTTP API Frameworks and Gateway Layers",
"id": null,
"rationale": "Server-side frameworks and gateway layers used to build, route, validate, and manage HTTP APIs. Includes backend web service frameworks and API gateway configuration for endpoints, request/response mapping, routing, input validation, auth integration, throttling, usage plans, stages, custom domains, and backend integration.",
"slug": "d_merge_01",
"source": "llm"
},
"input_skill": "Amazon API Gateway",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"input_skill": "Pinecone",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"input_skill": "OpenSearch",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Version Control Systems",
"id": 365,
"rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "OpenSearch",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Applied Machine Learning Toolkits and Frameworks",
"id": null,
"rationale": "Libraries and frameworks used to prototype, compare, index, and evaluate machine learning solutions quickly. Includes applied-ML toolkits such as scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, FAISS, vector search libraries, and embedding utilities. Excludes serving runtimes, deployment containers, online inference infrastructure, streaming systems, and database/storage engines.",
"slug": "d_merge_01",
"source": "llm"
},
"input_skill": "FAISS",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Version Control Systems",
"id": 365,
"rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "FAISS",
"llm_role": null,
"roles_from_db": []
}
],
"input_final_skills": [
"AWS",
"AI/ML",
"Amazon SageMaker",
"Amazon Bedrock",
"AWS Lambda",
"ECS",
"EKS",
"EC2",
"AWS Glue",
"Amazon Athena",
"Redshift",
"AWS Data Pipeline",
"S3",
"Kinesis",
"Amazon API Gateway",
"Python",
"TensorFlow",
"PyTorch",
"Scikit-learn",
"GitHub Actions",
"Airflow",
"Terraform",
"Docker",
"Kubernetes",
"Pinecone",
"OpenSearch",
"FAISS"
],
"input_llm_skills": [
"AWS",
"AI/ML",
"Amazon SageMaker",
"Amazon Bedrock",
"AWS Lambda",
"ECS",
"EKS",
"EC2",
"AWS Glue",
"Amazon Athena",
"Redshift",
"AWS Data Pipeline",
"S3",
"Kinesis",
"Amazon API Gateway",
"Python",
"TensorFlow",
"PyTorch",
"Scikit-learn",
"GitHub Actions",
"Airflow",
"Terraform",
"Docker",
"Kubernetes",
"Pinecone",
"OpenSearch",
"FAISS"
],
"new_aliases_persisted": 0,
"run_id": "20755499-04f6-440f-80a9-bb023fddc1ff",
"skills_detail": [
{
"aliases_in_db": [
{
"alias_text": "AWS",
"alias_type": "CANONICAL",
"id": 348,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 13,
"display_name": "AWS",
"id": 163,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "aws",
"sub_category_id": 161,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platform Operations",
"id": 26,
"rationale": "Uses cloud provider services to support delivery and runtime environments. The focus is on consumer-level operation of cloud services rather than deep cloud architecture ownership.",
"slug": "cloud-platform-operations",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Platforms",
"id": 332,
"rationale": "Cloud-native security products used to assess posture, detect misconfigurations, and monitor workloads across AWS, Azure, and GCP. This is a distinct product family because the role often works across multiple CNAPP/CSPM/CWPP offerings and cloud-native detectors.",
"slug": "cloud-security-platforms",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
}
],
"input_skill": "AWS",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": null,
"display_name": "Applied Machine Learning Tooling and Frameworks",
"id": null,
"rationale": "Libraries and frameworks used to prototype, build, train, tune, and compare machine learning models in practice. Includes hands-on AI/ML tooling such as scikit-learn, XGBoost, LightGBM, TensorFlow, and PyTorch, along with model training, feature engineering, hyperparameter tuning, and evaluation workflows. Excludes model serving, deployment packaging, online inference, registry/version promotion, distributed stream processing, and cloud infrastructure provisioning.",
"slug": "d_merge_01",
"source": "llm"
},
"input_skill": "AI/ML",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "AI Service Integration and Orchestration Patterns",
"id": null,
"rationale": "Patterns for structuring and embedding AI/ML capabilities within products and services, including deciding where AI logic lives and how it interacts with application flows. Covers model-backed APIs, retrieval-augmented generation, agent and workflow orchestration, batch scoring services, online inference integration, feature pipelines, and placement across handlers, workers, gateways, and service boundaries.",
"slug": "d_merge_02",
"source": "llm"
},
"input_skill": "AI/ML",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "AI Inference Cost, Latency, and Throughput Optimization",
"id": 260,
"rationale": "Improving the speed, throughput, and cost efficiency of AI and ML-powered product features without sacrificing correctness or user experience. Includes token budgeting, prompt compression, batching, caching, model selection, quantization, pruning, async inference, warm starts, streaming UX, timeout tuning, concurrency control, and profiling. Excludes infrastructure autoscaling, model serving capacity planning, generic backend performance tuning, and unrelated data/warehouse optimization.",
"slug": "ai-inference-cost-latency-and-throughput-optimization",
"source": "db"
},
"input_skill": "AI/ML",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 12,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
]
}
],
"input_skill": "AI/ML",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Domain",
"skill_nature": "CONCEPT",
"sub_category": "artificial_intelligence_machine_learning",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "AI/ML is a common combined domain label in JDs and usually clearly means artificial intelligence and machine learning, not a different catalog skill."
},
"context_keywords": {
"context_keywords": [
"TensorFlow",
"PyTorch",
"scikit-learn",
"deep learning",
"neural networks",
"NLP",
"computer vision",
"model training",
"feature engineering",
"hyperparameter tuning",
"classification",
"regression",
"clustering",
"reinforcement learning",
"MLOps"
]
},
"maturity": {
"confidence": 0.93,
"maturity": "well_known",
"reasoning": "AI/ML appears in a broad share of software and data job postings, with major vendors (AWS, Google, Microsoft) offering mainstream ML platforms and tooling; it\u2019s now a common hiring-pipeline requirement rather than a niche specialty."
},
"skill_id": "ai-ml",
"vendor_license": {
"confidence": 0.99,
"license": null,
"vendor": null,
"year_introduced": null
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Libraries and frameworks used to prototype, build, train, tune, and compare machine learning models in practice. Includes hands-on AI/ML tooling such as scikit-learn, XGBoost, LightGBM, TensorFlow, and PyTorch, along with model training, feature engineering, hyperparameter tuning, and evaluation workflows. Excludes model serving, deployment packaging, online inference, registry/version promotion, distributed stream processing, and cloud infrastructure provisioning.",
"exemplar_skills": [
"Applied Machine Learning Tooling and Frameworks"
],
"in_scope": "Skills, tools, and practices that belong under Applied Machine Learning Tooling and Frameworks for the target role, including items implied by the dimension rationale.",
"name": "Applied Machine Learning Tooling and Frameworks",
"out_of_scope": "Adjacent clusters explicitly not owned by Applied Machine Learning Tooling and Frameworks, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_merge_01"
},
{
"description": "Patterns for structuring and embedding AI/ML capabilities within products and services, including deciding where AI logic lives and how it interacts with application flows. Covers model-backed APIs, retrieval-augmented generation, agent and workflow orchestration, batch scoring services, online inference integration, feature pipelines, and placement across handlers, workers, gateways, and service boundaries.",
"exemplar_skills": [
"AI Service Integration and Orchestration Patterns"
],
"in_scope": "Skills, tools, and practices that belong under AI Service Integration and Orchestration Patterns for the target role, including items implied by the dimension rationale.",
"name": "AI Service Integration and Orchestration Patterns",
"out_of_scope": "Adjacent clusters explicitly not owned by AI Service Integration and Orchestration Patterns, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_merge_02"
},
{
"description": "Improving the runtime efficiency of AI/ML-powered features by reducing inference cost and latency while increasing throughput and preserving user experience. Includes token budgeting, prompt compression, batching, caching, quantization, pruning, model selection, async inference, warm starts, streaming UX, timeout tuning, concurrency control, GPU utilization, and profiling. Excludes model training, feature engineering, registry/versioning, infrastructure autoscaling, serving capacity planning, generic backend performance tuning, and unrelated data/warehouse optimization.",
"exemplar_skills": [
"AI Inference Cost, Latency, and Throughput Optimization"
],
"in_scope": "Skills, tools, and practices that belong under AI Inference Cost, Latency, and Throughput Optimization for the target role, including items implied by the dimension rationale.",
"name": "AI Inference Cost, Latency, and Throughput Optimization",
"out_of_scope": "Adjacent clusters explicitly not owned by AI Inference Cost, Latency, and Throughput Optimization, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_merge_03"
}
],
"merge_log": [
{
"a_dim_id": "applied-machine-learning-toolkits",
"a_name": "Applied Machine Learning Toolkits",
"a_role": "__skill_focal__",
"b_dim_id": "applied-machine-learning-toolkits",
"b_name": "Applied Machine Learning Toolkits",
"b_role": "Data Scientist",
"into": "d_merge_01",
"into_name": "Applied Machine Learning Tooling and Frameworks",
"merged_from": [
"applied-machine-learning-toolkits",
"applied-machine-learning-toolkits"
],
"pair_kind": "cross_role",
"reasoning": "Dim A and Dim B describe the same conceptual cluster: hands-on machine learning model development using common libraries/frameworks. Dim A explicitly includes scikit-learn, XGBoost, LightGBM, TensorFlow, PyTorch, model training, feature engineering, hyperparameter tuning, and evaluation workflows. Dim B\u2019s description says the same thing in slightly different words: tools to prototype and compare models quickly, capturing the concrete tooling layer beneath modeling methods and evaluation. The overlap is not just naming; the substance is identical, and the cross-role difference does not imply a different skill cluster here because both are about the same applied ML toolkit stack rather than role-specific responsibilities like deployment or infrastructure.",
"similarity": 0.8079542209553862
},
{
"a_dim_id": "ai-service-architecture-patterns",
"a_name": "AI Service Architecture Patterns",
"a_role": "__skill_focal__",
"b_dim_id": "ai-service-architecture-patterns",
"b_name": "AI Service Architecture Patterns",
"b_role": "AI Engineer",
"into": "d_merge_02",
"into_name": "AI Service Integration and Orchestration Patterns",
"merged_from": [
"ai-service-architecture-patterns",
"ai-service-architecture-patterns"
],
"pair_kind": "cross_role",
"reasoning": "Both dims describe the same skill cluster: placing and orchestrating AI capabilities inside product/service architecture. Dim A covers embedding AI/ML into products and services with examples like model-backed APIs, RAG, agent orchestration, and online inference integration. Dim B says the same thing in architectural terms, naming handlers, workers, gateways, and dedicated orchestration services. The overlap is substantive, not just lexical.",
"similarity": 0.8337425449661009
},
{
"a_dim_id": "ai-inference-cost-latency-and-throughput-optimization",
"a_name": "AI Inference Performance Optimization",
"a_role": "__skill_focal__",
"b_dim_id": "ai-inference-cost-latency-and-throughput-optimization",
"b_name": "AI Inference Cost, Latency, and Throughput Optimization",
"b_role": "AI Engineer",
"into": "d_merge_03",
"into_name": "AI Inference Cost, Latency, and Throughput Optimization",
"merged_from": [
"ai-inference-cost-latency-and-throughput-optimization",
"ai-inference-cost-latency-and-throughput-optimization"
],
"pair_kind": "cross_role",
"reasoning": "Both dims target the same AI inference optimization cluster: reducing latency, cost, and improving throughput for AI/ML-powered features at runtime. Dim A includes inference latency, throughput tuning, batching, quantization, caching, GPU utilization, and concurrency control. Dim B covers the same core skills and adds token budgeting, prompt compression, async inference, warm starts, streaming UX, timeout tuning, and profiling. The overlap on batching, caching, quantization, and concurrency control shows they are not distinct clusters; the cross-role difference is only wording.",
"similarity": 0.8101390953678244
}
],
"placed": {
"name": "AI/ML",
"placement_confidence": 0.92,
"primary_dimension": "d_merge_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 3 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [
"d_merge_02",
"d_merge_03"
],
"skill_id": "ai-ml"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"anomaly-investigation",
"missing-data-analysis",
"capacity-forecasting",
"bls",
"rapid7-insightvm",
"azure-defender-for-cloud",
"aws-iam-review",
"mfa",
"azure-ad",
"azure-ad-conditional-access"
],
"requires": [],
"skill_id": "ai-ml",
"suppress_on_match": []
},
"skill_id": "ai-ml",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.98,
"name": "AI/ML",
"reasoning": "AI/ML is a vertical body of knowledge and problem-space rather than a tool, framework, or methodology, so it fits the Domain type.",
"skill_id": "ai-ml",
"subtype": "artificial_intelligence_machine_learning",
"type": "Domain"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:40-\u003e3"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": null,
"display_name": "Managed ML Platform Workflows",
"id": null,
"rationale": "Cloud ML platforms for building and operating models end-to-end, including notebooks, experiments, managed training jobs, and pipeline/studio workflows. Examples: SageMaker Studio, SageMaker notebooks, SageMaker Pipelines, managed training jobs.",
"slug": "d_split_01_01",
"source": "llm"
},
"input_skill": "Amazon SageMaker",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Managed Model Hosting and Endpoints",
"id": null,
"rationale": "Cloud-managed services for deploying trained models as online or batch inference endpoints, including endpoint provisioning, batch transform, and rollout coordination. Examples: SageMaker endpoints, SageMaker batch transform.",
"slug": "d_split_01_02",
"source": "llm"
},
"input_skill": "Amazon SageMaker",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Model Serving Runtime Packaging",
"id": null,
"rationale": "Packaging trained models and model servers for deployment, including containers, base images, runtime dependencies, entrypoints, GPU images, and image scanning. Examples: Docker images for model services.",
"slug": "d_split_01_03",
"source": "llm"
},
"input_skill": "Amazon SageMaker",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Model Serving Frameworks and Platforms",
"id": null,
"rationale": "Serving frameworks and platforms used to run deployed models in online, batch, or streaming environments. Examples: TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, Seldon Core.",
"slug": "d_split_01_04",
"source": "llm"
},
"input_skill": "Amazon SageMaker",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Amazon SageMaker",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Platform",
"skill_nature": "PLATFORM",
"sub_category": "ml_platform",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "Amazon SageMaker is a specific AWS ML platform name and is usually unambiguous in job descriptions; it is unlikely to be mistaken for a different catalog skill."
},
"context_keywords": {
"context_keywords": [
"MLOps",
"notebooks",
"training jobs",
"hyperparameter tuning",
"model registry",
"endpoint deployment",
"batch transform",
"feature store",
"pipelines",
"ground truth",
"AutoML",
"S3",
"IAM",
"ECR",
"CloudWatch"
]
},
"maturity": {
"confidence": 0.9,
"maturity": "well_known",
"reasoning": "Commonly listed in ML/DS job descriptions and AWS\u2019s managed ML platform is broadly adopted for training, deployment, and MLOps across enterprises."
},
"skill_id": "amazon-sagemaker",
"vendor_license": {
"confidence": 0.98,
"license": "proprietary",
"vendor": "Amazon Web Services",
"year_introduced": 2017
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Cloud ML platforms for building and operating models end-to-end, including notebooks, experiments, managed training jobs, and pipeline/studio workflows. Examples: SageMaker Studio, SageMaker notebooks, SageMaker Pipelines, managed training jobs.",
"exemplar_skills": [
"Managed ML Platform Workflows"
],
"in_scope": "Skills, tools, and practices that belong under Managed ML Platform Workflows for the target role, including items implied by the dimension rationale.",
"name": "Managed ML Platform Workflows",
"out_of_scope": "Adjacent clusters explicitly not owned by Managed ML Platform Workflows, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_split_01_01"
},
{
"description": "Cloud-managed services for deploying trained models as online or batch inference endpoints, including endpoint provisioning, batch transform, and rollout coordination. Examples: SageMaker endpoints, SageMaker batch transform.",
"exemplar_skills": [
"Managed Model Hosting and Endpoints"
],
"in_scope": "Skills, tools, and practices that belong under Managed Model Hosting and Endpoints for the target role, including items implied by the dimension rationale.",
"name": "Managed Model Hosting and Endpoints",
"out_of_scope": "Adjacent clusters explicitly not owned by Managed Model Hosting and Endpoints, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_split_01_02"
},
{
"description": "Packaging trained models and model servers for deployment, including containers, base images, runtime dependencies, entrypoints, GPU images, and image scanning. Examples: Docker images for model services.",
"exemplar_skills": [
"Model Serving Runtime Packaging"
],
"in_scope": "Skills, tools, and practices that belong under Model Serving Runtime Packaging for the target role, including items implied by the dimension rationale.",
"name": "Model Serving Runtime Packaging",
"out_of_scope": "Adjacent clusters explicitly not owned by Model Serving Runtime Packaging, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_split_01_03"
},
{
"description": "Serving frameworks and platforms used to run deployed models in online, batch, or streaming environments. Examples: TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, Seldon Core.",
"exemplar_skills": [
"Model Serving Frameworks and Platforms"
],
"in_scope": "Skills, tools, and practices that belong under Model Serving Frameworks and Platforms for the target role, including items implied by the dimension rationale.",
"name": "Model Serving Frameworks and Platforms",
"out_of_scope": "Adjacent clusters explicitly not owned by Model Serving Frameworks and Platforms, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_split_01_04"
}
],
"merge_log": [],
"placed": {
"name": "Amazon SageMaker",
"placement_confidence": 0.92,
"primary_dimension": "d_split_01_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 4 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [
"d_split_01_02",
"d_split_01_03"
],
"skill_id": "amazon-sagemaker"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"aws"
],
"related_to": [
"aws-s3",
"aws-cdk",
"aws-cloudformation",
"aws-kms",
"aws-guardduty",
"microsoft-sentinel",
"aws-iam-review",
"azure-ad",
"aks",
"azure"
],
"requires": [],
"skill_id": "amazon-sagemaker",
"suppress_on_match": []
},
"skill_id": "amazon-sagemaker",
"split_log": [
{
"a_dim_id": "cloud-model-runtime-services",
"a_name": "Cloud Model Runtime Services",
"a_role": "__skill_focal__",
"b_dim_id": "model-serving-deployment-and-runtime-packaging",
"b_name": "Model Serving Deployment and Runtime Packaging",
"b_role": "__skill_focal__",
"into": [
"d_split_01_01",
"d_split_01_02",
"d_split_01_03",
"d_split_01_04"
],
"into_names": [
"Managed ML Platform Workflows",
"Managed Model Hosting and Endpoints",
"Model Serving Runtime Packaging",
"Model Serving Frameworks and Platforms"
],
"pair_kind": "intra_role",
"reasoning": "Dim A is broader: it covers training, notebooks, pipelines, endpoints, and MLOps workflows, with SageMaker Studio/Pipelines/training jobs as exemplars. Dim B is narrower and specifically about packaging trained models for serving runtimes, with TensorFlow Serving, TorchServe, Triton, BentoML, KServe, Seldon Core, plus Docker/GPU container concerns. The overlap is only around deployment/serving; A also includes managed training and platform workflow skills that are not B\u0027s focus. So A should be split into narrower siblings rather than merged.",
"similarity": 0.7320229250688253,
"split_from": [
"cloud-model-runtime-services",
"model-serving-deployment-and-runtime-packaging"
]
}
],
"typed": {
"alternatives_considered": [],
"confidence": 0.98,
"name": "Amazon SageMaker",
"reasoning": "By the Platform vs Tool rule, Amazon SageMaker is a hosted multi-tenant AWS environment with APIs and managed machine-learning capabilities, so it is a Platform rather than a Tool or a single Service in this typology.",
"skill_id": "amazon-sagemaker",
"subtype": "ml_platform",
"type": "Platform"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:42-\u003e4"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"input_skill": "Amazon Bedrock",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"input_skill": "Amazon Bedrock",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
}
],
"input_skill": "Amazon Bedrock",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Service",
"skill_nature": "CLOUD_SERVICE",
"sub_category": "managed_ai_model_service",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "EMERGING"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "Amazon Bedrock is a specific AWS managed AI model service with a distinctive name; typical JDs mentioning it are unlikely to mean a different catalog skill."
},
"context_keywords": {
"context_keywords": [
"foundation models",
"FM",
"prompt engineering",
"RAG",
"vector database",
"embeddings",
"guardrails",
"Agents for Amazon Bedrock",
"Knowledge Bases",
"model invocation",
"fine-tuning",
"inference",
"LLM",
"LangChain",
"Anthropic Claude"
]
},
"maturity": {
"confidence": 0.86,
"maturity": "emerging",
"reasoning": "Appears increasingly in cloud/ML job descriptions and AWS partner materials, but JD volume is still far below core AWS services like S3 or Lambda."
},
"skill_id": "amazon-bedrock",
"vendor_license": {
"confidence": 0.98,
"license": "proprietary",
"vendor": "Amazon Web Services",
"year_introduced": 2023
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [
{
"a_dim_id": "cloud-model-runtime-services",
"a_name": "Cloud Model Runtime Services",
"a_role": "__skill_focal__",
"b_dim_id": "cloud-model-runtime-services",
"b_name": "Cloud Model Runtime Services",
"b_role": "Machine Learning Engineer",
"pair_kind": "cross_role",
"reasoning": "Dim A is about managed foundation-model product services like Amazon Bedrock, Bedrock Agents, Bedrock Knowledge Bases, prompt orchestration, and guardrails. Dim B is broader cloud inference/runtime support for MLEs, emphasizing deployment and tuning on cloud compute, networking, and storage primitives. The overlap is only the shared runtime/inference wording; the concrete skills and anchors differ, so they are distinct clusters.",
"similarity": 0.6605349086129982
}
],
"locked_dimensions": [
{
"description": "Consumer-facing managed services used to run, invoke, and integrate foundation models and related AI capabilities in cloud applications. Amazon Bedrock belongs here because it provides hosted model access, orchestration features, and runtime APIs for generative AI workloads.",
"exemplar_skills": [
"Amazon Bedrock",
"Bedrock Agents",
"Bedrock Knowledge Bases",
"foundation model APIs",
"prompt orchestration",
"guardrails for generative AI"
],
"in_scope": "Amazon Bedrock, model invocation APIs, foundation model access, prompt orchestration, guardrails, agents, knowledge bases, embeddings, managed inference endpoints",
"name": "Cloud Model Runtime Services",
"out_of_scope": "Model training pipelines, offline feature engineering, model registry workflows, these belong to model development and MLOps dimensions; generic cloud storage or networking, which are covered elsewhere",
"overlap_flags": [
{
"reason": "Bedrock is often used as part of broader AI application architecture, but this dimension focuses on the managed runtime service itself.",
"with_dim_id": "ai-service-architecture-patterns",
"with_dim_name": null,
"with_role": "AI Engineer"
},
{
"reason": "Bedrock usage can involve tuning latency and cost, but that dimension owns optimization concerns rather than service selection.",
"with_dim_id": "ai-inference-cost-latency-and-throughput-optimization",
"with_dim_name": null,
"with_role": "AI Engineer"
}
],
"tentative_id": "cloud-model-runtime-services"
},
{
"description": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"exemplar_skills": [
"Cloud Model Runtime Services"
],
"in_scope": "Skills, tools, and practices that belong under Cloud Model Runtime Services for the target role, including items implied by the dimension rationale.",
"name": "Cloud Model Runtime Services",
"out_of_scope": "Adjacent clusters explicitly not owned by Cloud Model Runtime Services, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "cloud-model-runtime-services"
}
],
"merge_log": [],
"placed": {
"name": "Amazon Bedrock",
"placement_confidence": 0.92,
"primary_dimension": "cloud-model-runtime-services",
"reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "amazon-bedrock"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"aws"
],
"related_to": [
"azure",
"azure-ad",
"azure-defender-for-cloud",
"azure-key-vault",
"azure-expressroute",
"aws-cdk",
"aws-cloudformation",
"aws-kms",
"aws-s3",
"aws-vpc"
],
"requires": [],
"skill_id": "amazon-bedrock",
"suppress_on_match": []
},
"skill_id": "amazon-bedrock",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.97,
"name": "Amazon Bedrock",
"reasoning": "By the Platform vs Service rule, Amazon Bedrock is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.",
"skill_id": "amazon-bedrock",
"subtype": "managed_ai_model_service",
"type": "Service"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:41-\u003e2"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": null,
"display_name": "Managed Cloud Data Platform Services",
"id": null,
"rationale": "Managed cloud services used to run data engineering and related application workloads, including serverless compute, workflow orchestration, storage, event triggers, and adjacent security/platform primitives. This covers services such as AWS Lambda, AWS Step Functions, AWS Glue, Amazon S3 event triggers, and other managed services used to build and operate pipelines and data platforms.",
"slug": "d_merge_01",
"source": "llm"
},
"input_skill": "AWS Lambda",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "AWS Lambda",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Service",
"skill_nature": "CLOUD_SERVICE",
"sub_category": "serverless_compute_service",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "AWS Lambda is a specific AWS serverless compute service with a distinctive full name; in typical JDs it is unlikely to be confused with unrelated skills in the catalog."
},
"context_keywords": {
"context_keywords": [
"serverless",
"event-driven",
"API Gateway",
"CloudWatch",
"IAM role",
"S3 trigger",
"SNS",
"SQS",
"Step Functions",
"DynamoDB",
"Lambda layers",
"cold start",
"Node.js",
"Python",
"VPC"
]
},
"maturity": {
"confidence": 0.97,
"maturity": "well_known",
"reasoning": "Broadly adopted serverless compute; AWS Lambda appears in many cloud/backend job descriptions and is a standard AWS offering with strong ecosystem support."
},
"skill_id": "aws-lambda",
"vendor_license": {
"confidence": 0.99,
"license": "proprietary",
"vendor": "Amazon Web Services",
"year_introduced": 2014
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Managed cloud services used to run data engineering and related application workloads, including serverless compute, workflow orchestration, storage, event triggers, and adjacent security/platform primitives. This covers services such as AWS Lambda, AWS Step Functions, AWS Glue, Amazon S3 event triggers, and other managed services used to build and operate pipelines and data platforms.",
"exemplar_skills": [
"Managed Cloud Data Platform Services"
],
"in_scope": "Skills, tools, and practices that belong under Managed Cloud Data Platform Services for the target role, including items implied by the dimension rationale.",
"name": "Managed Cloud Data Platform Services",
"out_of_scope": "Adjacent clusters explicitly not owned by Managed Cloud Data Platform Services, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_merge_01"
}
],
"merge_log": [
{
"a_dim_id": "cloud-data-platform-services",
"a_name": "Cloud Data Platform Services",
"a_role": "__skill_focal__",
"b_dim_id": "cloud-data-platform-services",
"b_name": "Cloud Data Platform Services",
"b_role": "Data Engineer",
"into": "d_merge_01",
"into_name": "Managed Cloud Data Platform Services",
"merged_from": [
"cloud-data-platform-services",
"cloud-data-platform-services"
],
"pair_kind": "cross_role",
"reasoning": "Both dims describe the same managed-cloud service cluster for data workloads. Dim A centers on serverless/managed execution and orchestration with concrete examples like AWS Lambda, AWS Step Functions, AWS Glue, and S3 event triggers. Dim B describes cloud services used to run data engineering pipelines, including managed compute, storage, networking-adjacent services, and security primitives. Those are the same skills in practice; B is just broader wording and A gives specific exemplars.",
"similarity": 0.7804737777237593
}
],
"placed": {
"name": "AWS Lambda",
"placement_confidence": 0.92,
"primary_dimension": "d_merge_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "aws-lambda"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"aws"
],
"related_to": [
"aws-s3",
"aws-cloudformation",
"aws-guardduty",
"aws-kms",
"aws-cdk",
"aws-vpc",
"aws-direct-connect",
"ec2",
"azure-expressroute",
"rest-apis"
],
"requires": [
"aws-iam-review"
],
"skill_id": "aws-lambda",
"suppress_on_match": []
},
"skill_id": "aws-lambda",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.99,
"name": "AWS Lambda",
"reasoning": "By the Service vs Platform rule, AWS Lambda is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.",
"skill_id": "aws-lambda",
"subtype": "serverless_compute_service",
"type": "Service"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:40-\u003e1"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Version Control Systems",
"id": 365,
"rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "ECS",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "ECS",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Concept",
"skill_nature": "CONCEPT",
"sub_category": "entity_component_system",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": true,
"confused_with": [
"amazon_ecs",
"elastic_container_service"
],
"reasoning": "\u201cECS\u201d is a common acronym and in JDs often means Amazon Elastic Container Service; it can also be read as the generic entity-component-system architecture concept."
},
"context_keywords": {
"context_keywords": [
"entity-component-system",
"game engine",
"gameplay architecture",
"component-based architecture",
"systems",
"entities",
"components",
"data-oriented design",
"Unity",
"Unreal Engine",
"rendering pipeline",
"physics engine",
"scheduling",
"serialization",
"scene graph"
]
},
"maturity": {
"confidence": 0.78,
"maturity": "well_known",
"reasoning": "ECS appears in many game-engine and engine-architecture job descriptions, especially in Unity/DOTS and Rust/C++ gameplay systems, and has strong GitHub/library activity; it\u2019s a common modern architecture pattern rather than a niche tool."
},
"skill_id": "ecs",
"vendor_license": {
"confidence": 0.99,
"license": null,
"vendor": null,
"year_introduced": null
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Managed services for running and scaling containerized workloads. ECS belongs here because it is an orchestration platform for scheduling tasks, managing services, and coordinating container runtime operations.",
"exemplar_skills": [
"ECS",
"Amazon ECS",
"ECS task definitions",
"ECS services",
"ECS clusters",
"Fargate",
"capacity providers",
"service autoscaling"
],
"in_scope": "ECS, Amazon ECS, task definitions, services, clusters, capacity providers, service autoscaling, rolling deployments, Fargate, EC2 launch type, container scheduling",
"name": "Container Orchestration Services",
"out_of_scope": "Kubernetes control planes and manifests, image building and registry management, application code inside containers, load balancer design, general cloud networking",
"overlap_flags": [
{
"reason": "ECS capacity and autoscaling decisions often intersect with broader scaling strategy and workload sizing.",
"with_dim_id": "scalability-and-performance-architecture",
"with_dim_name": null,
"with_role": "Cloud Architect"
},
{
"reason": "ECS is sometimes used as a managed compute substrate in cloud data workflows, but the orchestration layer is the primary fit.",
"with_dim_id": "cloud-data-platform-services",
"with_dim_name": null,
"with_role": "Data Engineer"
}
],
"tentative_id": "d_init_01"
}
],
"merge_log": [],
"placed": {
"name": "ECS",
"placement_confidence": 0.92,
"primary_dimension": "d_init_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "ecs"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"ec2",
"aks",
"gke",
"vmware-esxi",
"vcenter-server",
"dex",
"ethereum",
"erc-20",
"erc-1155",
"ethers-js"
],
"requires": [],
"skill_id": "ecs",
"suppress_on_match": []
},
"skill_id": "ecs",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.93,
"name": "ECS",
"reasoning": "ECS is fundamentally the Entity-Component-System design pattern, so by the Architecture vs Concept rule it is best typed as a Concept rather than a tool or platform.",
"skill_id": "ecs",
"subtype": "entity_component_system",
"type": "Concept"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "EKS",
"alias_type": "CANONICAL",
"id": 1093,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 14,
"display_name": "EKS",
"id": 725,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "eks",
"sub_category_id": 251,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"input_skill": "EKS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Orchestration Platforms",
"id": 25,
"rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
"slug": "orchestration-platforms",
"source": "db"
},
"input_skill": "EKS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
]
}
],
"input_skill": "EKS",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "EC2",
"alias_type": "CANONICAL",
"id": 2372,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 14,
"display_name": "EC2",
"id": 1773,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "ec2",
"sub_category_id": 1544,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Provider Core Services",
"id": 290,
"rationale": "Core managed services used to provision and operate cloud environments. This is the base cloud surface for compute, storage, networking, and platform primitives the role configures and maintains.",
"slug": "cloud-provider-core-services",
"source": "db"
},
"input_skill": "EC2",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
}
]
}
],
"input_skill": "EC2",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "AWS Glue",
"alias_type": "CANONICAL",
"id": 730,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 14,
"display_name": "AWS Glue",
"id": 466,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "aws-glue",
"sub_category_id": 385,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"input_skill": "AWS Glue",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "AWS Glue",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Analytics Query Services",
"id": null,
"rationale": "Managed cloud services for querying and processing analytical data, including serverless SQL analytics, data lake querying, and federated query services such as Athena, Glue, Redshift Spectrum, and EMR-based analytics.",
"slug": "d_split_01_01",
"source": "llm"
},
"input_skill": "Amazon Athena",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Data Pipeline Runtime",
"id": null,
"rationale": "Managed cloud compute and orchestration services used to run data engineering jobs, transformations, and workflows.",
"slug": "d_split_01_02",
"source": "llm"
},
"input_skill": "Amazon Athena",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Data Platform Storage",
"id": null,
"rationale": "Cloud storage and data access services used as the persistence layer for data engineering platforms and pipelines.",
"slug": "d_split_01_03",
"source": "llm"
},
"input_skill": "Amazon Athena",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Data Platform Security and Networking",
"id": null,
"rationale": "Identity, access, secrets, and networking primitives used to support cloud data platforms and pipelines.",
"slug": "d_split_01_04",
"source": "llm"
},
"input_skill": "Amazon Athena",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Amazon Athena",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Service",
"skill_nature": "CLOUD_SERVICE",
"sub_category": "query_service",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "Amazon Athena is a specific AWS query service with a distinctive full name; in typical JDs it is unlikely to be confused with another catalog skill."
},
"context_keywords": {
"context_keywords": [
"AWS Glue",
"S3",
"Presto",
"Trino",
"SQL",
"CTAS",
"partitioning",
"Parquet",
"ORC",
"Glue Data Catalog",
"Athena Federated Query",
"IAM",
"Lake Formation",
"JDBC",
"serverless analytics"
]
},
"maturity": {
"confidence": 0.91,
"maturity": "well_known",
"reasoning": "Commonly listed in cloud/data analytics JDs and AWS\u2019s own docs position Athena as a standard serverless SQL query service for S3 data lakes, indicating broad market adoption."
},
"skill_id": "amazon-athena",
"vendor_license": {
"confidence": 0.99,
"license": "proprietary",
"vendor": "Amazon Web Services",
"year_introduced": 2016
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Managed cloud services for querying and processing analytical data, including serverless SQL analytics, data lake querying, and federated query services such as Athena, Glue, Redshift Spectrum, and EMR-based analytics.",
"exemplar_skills": [
"Cloud Analytics Query Services"
],
"in_scope": "Skills, tools, and practices that belong under Cloud Analytics Query Services for the target role, including items implied by the dimension rationale.",
"name": "Cloud Analytics Query Services",
"out_of_scope": "Adjacent clusters explicitly not owned by Cloud Analytics Query Services, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_split_01_01"
},
{
"description": "Managed cloud compute and orchestration services used to run data engineering jobs, transformations, and workflows.",
"exemplar_skills": [
"Cloud Data Pipeline Runtime"
],
"in_scope": "Skills, tools, and practices that belong under Cloud Data Pipeline Runtime for the target role, including items implied by the dimension rationale.",
"name": "Cloud Data Pipeline Runtime",
"out_of_scope": "Adjacent clusters explicitly not owned by Cloud Data Pipeline Runtime, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_split_01_02"
},
{
"description": "Cloud storage and data access services used as the persistence layer for data engineering platforms and pipelines.",
"exemplar_skills": [
"Cloud Data Platform Storage"
],
"in_scope": "Skills, tools, and practices that belong under Cloud Data Platform Storage for the target role, including items implied by the dimension rationale.",
"name": "Cloud Data Platform Storage",
"out_of_scope": "Adjacent clusters explicitly not owned by Cloud Data Platform Storage, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_split_01_03"
},
{
"description": "Identity, access, secrets, and networking primitives used to support cloud data platforms and pipelines.",
"exemplar_skills": [
"Cloud Data Platform Security and Networking"
],
"in_scope": "Skills, tools, and practices that belong under Cloud Data Platform Security and Networking for the target role, including items implied by the dimension rationale.",
"name": "Cloud Data Platform Security and Networking",
"out_of_scope": "Adjacent clusters explicitly not owned by Cloud Data Platform Security and Networking, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_split_01_04"
}
],
"merge_log": [],
"placed": {
"name": "Amazon Athena",
"placement_confidence": 0.92,
"primary_dimension": "d_split_01_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 4 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [
"d_split_01_02",
"d_split_01_03"
],
"skill_id": "amazon-athena"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"aws"
],
"related_to": [
"aws-s3",
"aws-kms",
"aws-cloudformation",
"aws-direct-connect",
"aws-cdk",
"aws-guardduty",
"aws-vpc",
"rest-apis"
],
"requires": [],
"skill_id": "amazon-athena",
"suppress_on_match": []
},
"skill_id": "amazon-athena",
"split_log": [
{
"a_dim_id": "cloud-data-platform-services",
"a_name": "Cloud Data Platform Services",
"a_role": "__skill_focal__",
"b_dim_id": "cloud-data-platform-services",
"b_name": "Cloud Data Platform Services",
"b_role": "Data Engineer",
"into": [
"d_split_01_01",
"d_split_01_02",
"d_split_01_03",
"d_split_01_04"
],
"into_names": [
"Cloud Analytics Query Services",
"Cloud Data Pipeline Runtime",
"Cloud Data Platform Storage",
"Cloud Data Platform Security and Networking"
],
"pair_kind": "cross_role",
"reasoning": "Dim A is a narrow analytics/query-services cluster (Athena, Glue, Redshift Spectrum, EMR, serverless SQL analytics, data lake querying). Dim B is a broader umbrella for cloud services used in data engineering pipelines, including compute, storage, networking-adjacent, and security primitives. The overlap comes from B being too broad, not from identical substance. Split B into narrower siblings so the analytics-query piece stays separate from pipeline/runtime and platform-infra services.",
"similarity": 0.7316862613085077,
"split_from": [
"cloud-data-platform-services",
"cloud-data-platform-services"
]
}
],
"typed": {
"alternatives_considered": [],
"confidence": 0.98,
"name": "Amazon Athena",
"reasoning": "By the Platform vs Tool and Service vs Platform rules, Amazon Athena is a managed capability inside AWS rather than software you run yourself, so it is a Service.",
"skill_id": "amazon-athena",
"subtype": "query_service",
"type": "Service"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:43-\u003e4"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Redshift",
"alias_type": "CANONICAL",
"id": 3367,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 13,
"display_name": "Redshift",
"id": 2570,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "redshift",
"sub_category_id": 2098,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Data Warehousing Platforms",
"id": 72,
"rationale": "Cloud and on-prem analytical storage systems used to persist curated datasets and serve downstream consumers. This cluster is about the warehouse/lakehouse layer where transformed data is organized for access.",
"slug": "data-warehousing-platforms",
"source": "db"
},
"input_skill": "Redshift",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Redshift",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"input_skill": "AWS Data Pipeline",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"input_skill": "AWS Data Pipeline",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "AWS Data Pipeline",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Service",
"skill_nature": "CLOUD_SERVICE",
"sub_category": "data_pipeline_service",
"typical_lifespan": "SHORT_LIVED",
"version_strategy": "NOT_APPLICABLE",
"volatility": "DEPRECATED"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "AWS Data Pipeline is a specific AWS service name and is unlikely to be mistaken for another catalog skill in a typical JD."
},
"context_keywords": {
"context_keywords": [
"ETL",
"S3",
"Redshift",
"EMR",
"Glue",
"RDS",
"EC2",
"Lambda",
"Step Functions",
"Kinesis",
"Athena",
"Data Lake",
"Apache Spark",
"cron",
"orchestration"
]
},
"maturity": {
"confidence": 0.96,
"maturity": "deprecated",
"reasoning": "AWS announced AWS Data Pipeline is in maintenance mode and recommends newer services like Glue/Step Functions; recent JDs rarely list it compared with modern AWS data tooling."
},
"skill_id": "aws-data-pipeline",
"vendor_license": {
"confidence": 0.98,
"license": "proprietary",
"vendor": "Amazon Web Services",
"year_introduced": 2012
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [
{
"a_dim_id": "cloud-data-platform-services",
"a_name": "Cloud Data Platform Services",
"a_role": "__skill_focal__",
"b_dim_id": "cloud-data-platform-services",
"b_name": "Cloud Data Platform Services",
"b_role": "Data Engineer",
"pair_kind": "cross_role",
"reasoning": "Dim A is about managed cloud data-platform products and orchestration services, e.g. AWS Glue, Amazon EMR, Amazon Redshift, and AWS Data Pipeline for ETL and scheduled data movement. Dim B describes consumer use of cloud services that support data engineering workloads, including managed compute, storage, networking-adjacent services, and security primitives. A is service/catalog focused; B is infrastructure/usage focused. Same label, different skill clusters.",
"similarity": 0.828494432740371
}
],
"locked_dimensions": [
{
"description": "Managed cloud services used to build and operate data engineering workloads. AWS Data Pipeline fits here because it is an AWS service for orchestrating data movement and scheduled processing across storage and compute services.",
"exemplar_skills": [
"AWS Data Pipeline",
"AWS Glue",
"Amazon EMR",
"Amazon Redshift",
"Amazon S3 ETL workflows"
],
"in_scope": "AWS Data Pipeline, AWS Glue, Amazon EMR, Amazon Redshift, Amazon S3 data workflows, managed ETL orchestration, scheduled batch data movement, cloud data ingestion services",
"name": "Cloud Data Platform Services",
"out_of_scope": "Streaming engines and low-latency event processing, which belong to streaming-data-processing; model training or inference services, which belong to ML platform dimensions; generic infrastructure provisioning, which belongs to infrastructure-provisioning-templates",
"overlap_flags": [
{
"reason": "Both can move and transform data, but AWS Data Pipeline is primarily batch/scheduled orchestration rather than continuous stream processing.",
"with_dim_id": "streaming-data-processing",
"with_dim_name": null,
"with_role": "Data Engineer"
},
{
"reason": "Pipeline setup may involve infrastructure definitions, but the core skill is data service orchestration rather than declarative resource provisioning.",
"with_dim_id": "infrastructure-provisioning-templates",
"with_dim_name": null,
"with_role": "Cloud Engineer"
}
],
"tentative_id": "cloud-data-platform-services"
},
{
"description": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"exemplar_skills": [
"Cloud Data Platform Services"
],
"in_scope": "Skills, tools, and practices that belong under Cloud Data Platform Services for the target role, including items implied by the dimension rationale.",
"name": "Cloud Data Platform Services",
"out_of_scope": "Adjacent clusters explicitly not owned by Cloud Data Platform Services, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "cloud-data-platform-services"
}
],
"merge_log": [],
"placed": {
"name": "AWS Data Pipeline",
"placement_confidence": 0.92,
"primary_dimension": "cloud-data-platform-services",
"reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "aws-data-pipeline"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"aws"
],
"related_to": [
"aws-s3",
"aws-cloudformation",
"aws-cdk",
"aws-direct-connect",
"aws-vpc",
"azure",
"azure-expressroute",
"aks",
"rest-apis",
"ec2"
],
"requires": [],
"skill_id": "aws-data-pipeline",
"suppress_on_match": []
},
"skill_id": "aws-data-pipeline",
"split_log": [],
"typed": {
"alternatives_considered": [
"Platform: ruled out \u2014 AWS is the platform, while Data Pipeline is one managed capability within it.",
"Tool: ruled out \u2014 it is consumed as a managed AWS offering, not software you run yourself."
],
"confidence": 0.97,
"name": "AWS Data Pipeline",
"reasoning": "By the Service vs Platform rule, AWS Data Pipeline is a specific managed capability inside AWS rather than the AWS platform itself.",
"skill_id": "aws-data-pipeline",
"subtype": "data_pipeline_service",
"type": "Service"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:41-\u003e2"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Storage Provisioning and Automation",
"id": 311,
"rationale": "Covers the scripts, APIs, and operational workflows used to create, resize, map, and retire storage resources. This cluster is coherent because storage engineers often automate repetitive provisioning and maintenance tasks.",
"slug": "storage-provisioning-and-automation",
"source": "db"
},
"input_skill": "S3",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Storage Engineer",
"id": 22,
"rationale": null,
"role_archetype": null,
"slug": "storage-engineer",
"source": "db"
}
]
}
],
"input_skill": "S3",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Platform",
"skill_nature": "PLATFORM",
"sub_category": "cloud_storage_platform",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": true,
"confused_with": [
"s4"
],
"reasoning": "\"S3\" is a short acronym that in JDs can mean AWS S3, but could also be read as a generic storage tier/label or other S3-named products in the catalog. A reasonable extractor may confuse it with adjacent cloud storage skills."
},
"context_keywords": {
"context_keywords": [
"bucket",
"object storage",
"prefix",
"versioning",
"lifecycle policy",
"bucket policy",
"IAM",
"replication",
"multipart upload",
"presigned URL",
"SSE-S3",
"SSE-KMS",
"event notifications",
"static website hosting",
"storage class"
]
},
"maturity": {
"confidence": 0.98,
"maturity": "well_known",
"reasoning": "Amazon S3 is a default cloud storage requirement in many job descriptions and is a core AWS service with broad ecosystem support; no sunset or replacement signal exists."
},
"skill_id": "s3",
"vendor_license": {
"confidence": 0.99,
"license": "proprietary",
"vendor": "Amazon Web Services",
"year_introduced": 2006
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Covers creating, configuring, and operating S3-style object storage resources and their access controls. S3 belongs here because it is the canonical AWS object storage service used for buckets, objects, lifecycle, and access policies.",
"exemplar_skills": [
"S3",
"Amazon S3",
"S3 bucket policies",
"S3 lifecycle management",
"S3 versioning",
"S3 multipart upload"
],
"in_scope": "S3, S3 buckets, object storage, bucket policies, lifecycle rules, versioning, encryption at rest, access control lists, presigned URLs, multipart upload, object tagging",
"name": "Object Storage Provisioning",
"out_of_scope": "Block storage volumes, file shares, database storage, and storage migration planning, which belong to other storage or migration dimensions.",
"overlap_flags": [
{
"reason": "S3 is often used as a managed data lake landing zone, so it can overlap with cloud data platform usage patterns.",
"with_dim_id": "cloud-data-platform-services",
"with_dim_name": null,
"with_role": "Data Engineer"
},
{
"reason": "Model artifacts and inference assets are sometimes stored in S3, creating incidental overlap with ML deployment workflows.",
"with_dim_id": "cloud-model-runtime-services",
"with_dim_name": null,
"with_role": "Machine Learning Engineer"
}
],
"tentative_id": "storage-provisioning-and-automation"
}
],
"merge_log": [],
"placed": {
"name": "S3",
"placement_confidence": 0.92,
"primary_dimension": "storage-provisioning-and-automation",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "s3"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"aws-s3"
],
"related_to": [
"ec2",
"ebs-snapshots",
"aks",
"iscsi",
"hsm",
"hsms",
"avalanche",
"sui"
],
"requires": [],
"skill_id": "s3",
"suppress_on_match": [
"aws-s3"
]
},
"skill_id": "s3",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.91,
"name": "S3",
"reasoning": "By the Platform vs Service rule, S3 is a hosted multi-tenant AWS capability with APIs rather than software you run yourself, so it fits Platform best.",
"skill_id": "s3",
"subtype": "cloud_storage_platform",
"type": "Platform"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Streaming Data Processing",
"id": 69,
"rationale": "Tools and patterns for ingesting and transforming event streams with low latency. This cluster covers continuous processing, windowing, and stateful stream jobs used to keep data fresh.",
"slug": "streaming-data-processing",
"source": "db"
},
"input_skill": "Kinesis",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Kinesis",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Service",
"skill_nature": "CLOUD_SERVICE",
"sub_category": "streaming_data_service",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "In JDs, Kinesis usually clearly refers to AWS Kinesis, a distinct streaming service. The name is not a common overloaded acronym or short token likely to be mistaken for another catalog skill."
},
"context_keywords": {
"context_keywords": [
"streaming",
"event-driven",
"real-time ingestion",
"shards",
"producers",
"consumers",
"Kinesis Data Streams",
"Kinesis Data Firehose",
"Kinesis Data Analytics",
"Lambda",
"S3",
"CloudWatch",
"partition key",
"checkpointing",
"throughput"
]
},
"maturity": {
"confidence": 0.89,
"maturity": "well_known",
"reasoning": "AWS Kinesis appears in many cloud/data engineering job postings and is a standard managed streaming service in AWS stacks; no vendor sunset indicates active market demand."
},
"skill_id": "kinesis",
"vendor_license": {
"confidence": 0.98,
"license": "proprietary",
"vendor": "Amazon Web Services",
"year_introduced": 2013
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Tools and patterns for ingesting, buffering, and transforming event streams with low latency. This includes continuous processing, windowing, stateful stream jobs, checkpointing, shard scaling, stream partitioning, and managed streaming services such as Kinesis, Amazon Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics.",
"exemplar_skills": [
"Streaming Data Processing"
],
"in_scope": "Skills, tools, and practices that belong under Streaming Data Processing for the target role, including items implied by the dimension rationale.",
"name": "Streaming Data Processing",
"out_of_scope": "Adjacent clusters explicitly not owned by Streaming Data Processing, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_merge_01"
}
],
"merge_log": [
{
"a_dim_id": "streaming-data-processing",
"a_name": "Streaming Data Processing",
"a_role": "__skill_focal__",
"b_dim_id": "streaming-data-processing",
"b_name": "Streaming Data Processing",
"b_role": "Data Engineer",
"into": "d_merge_01",
"into_name": "Streaming Data Processing",
"merged_from": [
"streaming-data-processing",
"streaming-data-processing"
],
"pair_kind": "cross_role",
"reasoning": "Both dims define the same skill cluster: low-latency ingestion and transformation of event streams. A includes Kinesis, shard scaling, checkpointing, and windowed processing; B describes continuous processing, windowing, and stateful stream jobs. The wording differs, but the substance is identical, and Kinesis is clearly part of the same streaming-processing backbone rather than a separate concept.",
"similarity": 0.7704853050706059
}
],
"placed": {
"name": "Kinesis",
"placement_confidence": 0.92,
"primary_dimension": "d_merge_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "kinesis"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"aws"
],
"related_to": [
"gke",
"aks",
"ec2",
"aws-kms",
"aws-cdk",
"quicknode",
"the-graph",
"avalanche",
"event-emission",
"idempotency"
],
"requires": [],
"skill_id": "kinesis",
"suppress_on_match": []
},
"skill_id": "kinesis",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.93,
"name": "Kinesis",
"reasoning": "By the Platform vs Service rule, Kinesis is a specific managed capability within AWS rather than a standalone hosted environment, so it is a Service.",
"skill_id": "kinesis",
"subtype": "streaming_data_service",
"type": "Service"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:40-\u003e1"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": null,
"display_name": "HTTP API Frameworks and Gateway Layers",
"id": null,
"rationale": "Server-side frameworks and gateway layers used to build, route, validate, and manage HTTP APIs. Includes backend web service frameworks and API gateway configuration for endpoints, request/response mapping, routing, input validation, auth integration, throttling, usage plans, stages, custom domains, and backend integration.",
"slug": "d_merge_01",
"source": "llm"
},
"input_skill": "Amazon API Gateway",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Amazon API Gateway",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Service",
"skill_nature": "CLOUD_SERVICE",
"sub_category": "api_management_service",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "Amazon API Gateway is a specific AWS service name with little overlap in typical JDs; it is unlikely to be confused with a different catalog skill."
},
"context_keywords": {
"context_keywords": [
"REST APIs",
"HTTP APIs",
"Lambda proxy",
"OpenAPI",
"Swagger",
"CORS",
"authorizers",
"usage plans",
"throttling",
"stages",
"deployments",
"request validation",
"mapping templates",
"VPC Link",
"CloudWatch"
]
},
"maturity": {
"confidence": 0.95,
"maturity": "well_known",
"reasoning": "Broadly listed in cloud/backend JDs and AWS docs; commonly paired with Lambda, IAM, and serverless stacks, indicating staple market demand rather than niche use."
},
"skill_id": "amazon-api-gateway",
"vendor_license": {
"confidence": 0.98,
"license": "proprietary",
"vendor": "Amazon Web Services",
"year_introduced": 2015
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Server-side frameworks and gateway layers used to build, route, validate, and manage HTTP APIs. Includes backend web service frameworks and API gateway configuration for endpoints, request/response mapping, routing, input validation, auth integration, throttling, usage plans, stages, custom domains, and backend integration.",
"exemplar_skills": [
"HTTP API Frameworks and Gateway Layers"
],
"in_scope": "Skills, tools, and practices that belong under HTTP API Frameworks and Gateway Layers for the target role, including items implied by the dimension rationale.",
"name": "HTTP API Frameworks and Gateway Layers",
"out_of_scope": "Adjacent clusters explicitly not owned by HTTP API Frameworks and Gateway Layers, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_merge_01"
}
],
"merge_log": [
{
"a_dim_id": "web-service-frameworks",
"a_name": "Web Service Frameworks",
"a_role": "__skill_focal__",
"b_dim_id": "web-service-frameworks",
"b_name": "Web Service Frameworks",
"b_role": "Backend Engineer",
"into": "d_merge_01",
"into_name": "HTTP API Frameworks and Gateway Layers",
"merged_from": [
"web-service-frameworks",
"web-service-frameworks"
],
"pair_kind": "cross_role",
"reasoning": "Both dims target the same backend HTTP API cluster. A focuses on gateway-layer details like Amazon API Gateway, request/response mapping, authorizers, throttling, and backend integration. B describes the same server-side API framework space with routing, input validation, and backend service structure. The exemplar skills in A all fit B\u2019s scope, and there is no separate skill cluster here.",
"similarity": 0.762365718275353
}
],
"placed": {
"name": "Amazon API Gateway",
"placement_confidence": 0.92,
"primary_dimension": "d_merge_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "amazon-api-gateway"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"aws"
],
"related_to": [
"rest-apis",
"aws-cloudformation",
"aws-cdk",
"aws-s3",
"aws-kms",
"aws-vpc",
"aws-direct-connect",
"azure-expressroute",
"azure-ad",
"azure-key-vault"
],
"requires": [
"aws-iam-review"
],
"skill_id": "amazon-api-gateway",
"suppress_on_match": []
},
"skill_id": "amazon-api-gateway",
"split_log": [],
"typed": {
"alternatives_considered": [
"Platform: ruled out \u2014 AWS is the platform, while API Gateway is one managed offering within it.",
"Tool: ruled out \u2014 it is consumed as a hosted managed service, not software you run yourself."
],
"confidence": 0.98,
"name": "Amazon API Gateway",
"reasoning": "By the Platform vs Service rule, Amazon API Gateway is a specific managed capability inside AWS rather than the whole hosted environment, so it is a Service.",
"skill_id": "amazon-api-gateway",
"subtype": "api_management_service",
"type": "Service"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:40-\u003e1"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Python",
"alias_type": "CANONICAL",
"id": 608,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 2",
"alias_type": "VERSION",
"id": 611,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 2.x",
"alias_type": "VERSION",
"id": 613,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 3",
"alias_type": "VERSION",
"id": 612,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 3.10",
"alias_type": "VERSION",
"id": 2330,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 3.11",
"alias_type": "VERSION",
"id": 2331,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 3.12",
"alias_type": "VERSION",
"id": 2332,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 3.x",
"alias_type": "VERSION",
"id": 614,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "py2",
"alias_type": "VERSION",
"id": 609,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "py3",
"alias_type": "VERSION",
"id": 610,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python 2",
"alias_type": "VERSION",
"id": 2152,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python 2.x",
"alias_type": "VERSION",
"id": 2154,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python 3",
"alias_type": "VERSION",
"id": 990,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python 3.10",
"alias_type": "VERSION",
"id": 992,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python 3.11",
"alias_type": "VERSION",
"id": 993,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python 3.12",
"alias_type": "VERSION",
"id": 994,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python 3.x",
"alias_type": "VERSION",
"id": 991,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python2",
"alias_type": "VERSION",
"id": 2150,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python3",
"alias_type": "VERSION",
"id": 989,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 5,
"display_name": "Python",
"id": 393,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "python",
"sub_category_id": 54,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Analytical Programming Languages",
"id": 82,
"rationale": "Languages used to clean, transform, analyze, and prototype models in notebooks and scripts. This is the core coding surface for expressing statistical logic and data manipulation in a reproducible way.",
"slug": "analytical-programming-languages",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Analyst",
"id": 20,
"rationale": null,
"role_archetype": null,
"slug": "data-analyst",
"source": "db"
},
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Automation Scripting and CLI",
"id": 48,
"rationale": "Uses scripts and command-line tooling to execute repeatable Azure operations and reduce manual work. This is a practical cluster because the role frequently automates provisioning, checks, and remediation tasks.",
"slug": "automation-scripting-and-cli",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Azure Cloud Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "azure-cloud-engineer",
"source": "db"
},
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Automation and Scripting for Operations",
"id": 361,
"rationale": "Scripts and lightweight automation used to execute repetitive virtualization tasks and enforce operational consistency. This is the practical glue that reduces manual host and VM administration.",
"slug": "automation-and-scripting-for-operations",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Virtualization Engineer",
"id": 26,
"rationale": null,
"role_archetype": null,
"slug": "virtualization-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Network Automation and Scripting",
"id": 285,
"rationale": "Covers scripts and automation used to configure, validate, and audit network devices and services. This cluster is coherent because repeatable network operations increasingly depend on programmatic changes and checks.",
"slug": "network-automation-and-scripting",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Network Engineer",
"id": 21,
"rationale": null,
"role_archetype": null,
"slug": "network-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for AI Workflows",
"id": 261,
"rationale": "Languages used to implement AI feature logic, orchestration, and response handling inside product code. This is the core coding surface for turning prompts and model calls into reliable application behavior.",
"slug": "programming-languages-for-ai-workflows",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 12,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Backend Systems",
"id": 140,
"rationale": "Languages used to implement server-side business logic, request handlers, workers, and service integrations. This is the core coding surface for backend feature delivery and maintenance.",
"slug": "programming-languages-for-backend-systems",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 14,
"rationale": null,
"role_archetype": null,
"slug": "backend-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 67,
"rationale": "Languages used to implement data pipelines, transformations, and operational utilities. This is the code layer for expressing extraction, parsing, validation, and orchestration logic in data engineering workflows.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 113,
"rationale": "Languages used to implement model integration code, inference services, and feature-processing logic. This is the core coding surface for turning trained models into product-facing software components.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Security Work",
"id": 328,
"rationale": "Languages used to automate security tasks, write detection logic, and build analysis or remediation tooling. This is the core coding surface for a cybersecurity engineer across scripts, queries, and small utilities.",
"slug": "programming-languages-for-security-work",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Test Automation",
"id": 193,
"rationale": "Languages used to implement automated checks, helper utilities, and test harness code. This is the core coding surface for turning test ideas into maintainable automation.",
"slug": "programming-languages-for-test-automation",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Automation Tester",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "automation-tester",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Security Automation and Scripting",
"id": 258,
"rationale": "Automating repeatable security checks, enrichment, and remediation workflows. This cluster is coherent because the role often needs lightweight automation to scale analysis and response.",
"slug": "security-automation-and-scripting",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
}
],
"input_skill": "Python",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "TensorFlow",
"alias_type": "CANONICAL",
"id": 862,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "TF1",
"alias_type": "VERSION",
"id": 863,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "TF2",
"alias_type": "VERSION",
"id": 864,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "TensorFlow 1",
"alias_type": "VERSION",
"id": 865,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "TensorFlow 1.x",
"alias_type": "VERSION",
"id": 867,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "TensorFlow 2",
"alias_type": "VERSION",
"id": 866,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "TensorFlow 2.x",
"alias_type": "VERSION",
"id": 868,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 6,
"display_name": "TensorFlow",
"id": 558,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LIBRARY",
"slug": "tensorflow",
"sub_category_id": 456,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Applied Machine Learning Toolkits",
"id": 94,
"rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
"slug": "applied-machine-learning-toolkits",
"source": "db"
},
"input_skill": "TensorFlow",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
]
}
],
"input_skill": "TensorFlow",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "PyTorch",
"alias_type": "CANONICAL",
"id": 861,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 6,
"display_name": "PyTorch",
"id": 557,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LIBRARY",
"slug": "pytorch",
"sub_category_id": 456,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Applied Machine Learning Toolkits",
"id": 94,
"rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
"slug": "applied-machine-learning-toolkits",
"source": "db"
},
"input_skill": "PyTorch",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
]
}
],
"input_skill": "PyTorch",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "scikit-learn",
"alias_type": "CANONICAL",
"id": 852,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 6,
"display_name": "scikit-learn",
"id": 554,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LIBRARY",
"slug": "scikit-learn",
"sub_category_id": 458,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Applied Machine Learning Toolkits",
"id": 94,
"rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
"slug": "applied-machine-learning-toolkits",
"source": "db"
},
"input_skill": "Scikit-learn",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
]
}
],
"input_skill": "Scikit-learn",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "GitHub Actions",
"alias_type": "CANONICAL",
"id": 1800,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 14,
"display_name": "GitHub Actions",
"id": 1250,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "github-actions",
"sub_category_id": 1019,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Continuous Integration Test Integration",
"id": 207,
"rationale": "Integrating automated checks into shared build and merge workflows so results are repeatable and visible. This cluster is coherent because automation testers commonly configure test execution triggers, artifacts, and reporting hooks.",
"slug": "continuous-integration-test-integration",
"source": "db"
},
"input_skill": "GitHub Actions",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Automation Tester",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "automation-tester",
"source": "db"
}
]
}
],
"input_skill": "GitHub Actions",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Airflow",
"alias_type": "CANONICAL",
"id": 540,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 11,
"display_name": "Airflow",
"id": 325,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "airflow",
"sub_category_id": 335,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Workflow Orchestration Systems",
"id": 64,
"rationale": "Operational orchestration of ML jobs, dependencies, and handoffs across training, validation, deployment, and retraining. This is a useful split from training pipelines because it emphasizes the scheduler and control plane.",
"slug": "workflow-orchestration-systems",
"source": "db"
},
"input_skill": "Airflow",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "mlops-engineer",
"source": "db"
}
]
}
],
"input_skill": "Airflow",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Terraform",
"alias_type": "CANONICAL",
"id": 290,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 11,
"display_name": "Terraform",
"id": 144,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "terraform",
"sub_category_id": 171,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure Provisioning Templates",
"id": 291,
"rationale": "Declarative templates and modules used to create repeatable cloud resources and environments. This cluster covers the infrastructure definitions the role applies, reviews, and updates to keep environments consistent.",
"slug": "infrastructure-provisioning-templates",
"source": "db"
},
"input_skill": "Terraform",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure as Code",
"id": 22,
"rationale": "Defines infrastructure and platform resources through versioned code so environments are repeatable and reviewable. This is a coherent cluster because it underpins environment consistency and change control.",
"slug": "infrastructure-as-code",
"source": "db"
},
"input_skill": "Terraform",
"llm_role": null,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure as Code and Declarative Provisioning",
"id": 36,
"rationale": "Defines cloud and platform infrastructure declaratively through versioned code so environments are repeatable, reviewable, and automatable. This includes authoring and maintaining IaC templates/modules, managing parameters and state, and using plan/apply workflows to provision and update resources across Azure and other cloud platforms.",
"slug": "infrastructure-as-code-and-declarative-provisioning",
"source": "db"
},
"input_skill": "Terraform",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Azure Cloud Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "azure-cloud-engineer",
"source": "db"
}
]
}
],
"input_skill": "Terraform",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Docker",
"alias_type": "CANONICAL",
"id": 299,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 11,
"display_name": "Docker",
"id": 153,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "docker",
"sub_category_id": 170,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Containerization and Image Delivery",
"id": 24,
"rationale": "Builds, packages, and ships application and support workloads as container images. This cluster covers the artifact format and the mechanics of producing deployable images.",
"slug": "containerization-and-image-delivery",
"source": "db"
},
"input_skill": "Docker",
"llm_role": null,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Model Serving Deployment and Runtime Packaging",
"id": 52,
"rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
"slug": "model-serving-deployment-and-runtime-packaging",
"source": "db"
},
"input_skill": "Docker",
"llm_role": null,
"roles_from_db": [
{
"display_name": "MLOps Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "mlops-engineer",
"source": "db"
},
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
}
],
"input_skill": "Docker",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Kubernetes",
"alias_type": "CANONICAL",
"id": 304,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.0",
"alias_type": "VERSION",
"id": 307,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.0+",
"alias_type": "VERSION",
"id": 2366,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.1",
"alias_type": "VERSION",
"id": 308,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.10",
"alias_type": "VERSION",
"id": 318,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.11",
"alias_type": "VERSION",
"id": 319,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.12",
"alias_type": "VERSION",
"id": 320,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.13",
"alias_type": "VERSION",
"id": 321,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.14",
"alias_type": "VERSION",
"id": 322,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.15",
"alias_type": "VERSION",
"id": 323,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.16",
"alias_type": "VERSION",
"id": 324,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.17",
"alias_type": "VERSION",
"id": 325,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.18",
"alias_type": "VERSION",
"id": 326,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.19",
"alias_type": "VERSION",
"id": 327,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.2",
"alias_type": "VERSION",
"id": 309,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.20",
"alias_type": "VERSION",
"id": 328,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.21",
"alias_type": "VERSION",
"id": 329,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.22",
"alias_type": "VERSION",
"id": 330,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.23",
"alias_type": "VERSION",
"id": 331,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.24",
"alias_type": "VERSION",
"id": 332,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.25",
"alias_type": "VERSION",
"id": 333,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.26",
"alias_type": "VERSION",
"id": 334,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.27",
"alias_type": "VERSION",
"id": 335,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.28",
"alias_type": "VERSION",
"id": 336,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.29",
"alias_type": "VERSION",
"id": 337,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.3",
"alias_type": "VERSION",
"id": 310,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.30",
"alias_type": "VERSION",
"id": 338,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.4",
"alias_type": "VERSION",
"id": 311,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.5",
"alias_type": "VERSION",
"id": 312,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.6",
"alias_type": "VERSION",
"id": 313,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.7",
"alias_type": "VERSION",
"id": 314,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.8",
"alias_type": "VERSION",
"id": 315,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.9",
"alias_type": "VERSION",
"id": 316,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.x",
"alias_type": "VERSION",
"id": 317,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes v1",
"alias_type": "VERSION",
"id": 306,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "k8s",
"alias_type": "VERSION",
"id": 305,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 13,
"display_name": "Kubernetes",
"id": 158,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "kubernetes",
"sub_category_id": 1524,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Orchestration Platforms",
"id": 25,
"rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
"slug": "orchestration-platforms",
"source": "db"
},
"input_skill": "Kubernetes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
]
}
],
"input_skill": "Kubernetes",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"input_skill": "Pinecone",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
]
}
],
"input_skill": "Pinecone",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Platform",
"skill_nature": "PLATFORM",
"sub_category": "vector_database_platform",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "EMERGING"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "Pinecone is a distinctive vector database platform name; in typical JDs it is unlikely to be confused with another catalog skill."
},
"context_keywords": {
"context_keywords": [
"vector database",
"embeddings",
"semantic search",
"similarity search",
"ANN",
"approximate nearest neighbor",
"RAG",
"retrieval augmented generation",
"indexing",
"namespace",
"metadata filtering",
"upsert",
"vector index",
"hybrid search",
"OpenAI"
]
},
"maturity": {
"confidence": 0.86,
"maturity": "emerging",
"reasoning": "Pinecone appears in many AI/vector-search job descriptions and vendor docs, but it\u2019s still far less universal than PostgreSQL/AWS; market signal shows growing adoption rather than staple status."
},
"skill_id": "pinecone",
"vendor_license": {
"confidence": 0.95,
"license": "proprietary",
"vendor": "Pinecone Systems, Inc.",
"year_introduced": 2019
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Managed services used to store, index, and query embeddings for semantic search and retrieval-augmented applications. Pinecone belongs here because it is a purpose-built vector database service rather than a general-purpose datastore.",
"exemplar_skills": [
"Pinecone",
"vector database",
"similarity search",
"embedding index",
"metadata filtering",
"approximate nearest neighbor search"
],
"in_scope": "Pinecone, vector indexes, similarity search, embedding storage, metadata filtering, ANN retrieval, namespace partitioning, hybrid search",
"name": "Vector Database Services",
"out_of_scope": "traditional relational databases, document stores, cache layers, model training pipelines, prompt engineering, which belong to other dimensions",
"overlap_flags": [
{
"reason": "Vector databases are often consumed as part of broader data platforms, but this dimension focuses specifically on managed vector retrieval services.",
"with_dim_id": "cloud-data-platform-services",
"with_dim_name": null,
"with_role": "Data Engineer"
},
{
"reason": "Pinecone is frequently used inside AI application architectures, though the service itself is the storage/retrieval layer rather than the overall system design.",
"with_dim_id": "ai-service-architecture-patterns",
"with_dim_name": null,
"with_role": "AI Engineer"
}
],
"tentative_id": "cloud-model-runtime-services"
}
],
"merge_log": [],
"placed": {
"name": "Pinecone",
"placement_confidence": 0.92,
"primary_dimension": "cloud-model-runtime-services",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "pinecone"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"pandas",
"quicknode",
"avalanche",
"aptos",
"helm",
"snapshot",
"gcp-security-command-center",
"anchor",
"gke",
"hardhat"
],
"requires": [],
"skill_id": "pinecone",
"suppress_on_match": []
},
"skill_id": "pinecone",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.9,
"name": "Pinecone",
"reasoning": "By the Vendor SaaS = Platform rule, Pinecone is a hosted multi-tenant vector database service consumed via APIs rather than software you run yourself.",
"skill_id": "pinecone",
"subtype": "vector_database_platform",
"type": "Platform"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"input_skill": "OpenSearch",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Version Control Systems",
"id": 365,
"rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "OpenSearch",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "OpenSearch",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Datastore",
"skill_nature": "TOOL",
"sub_category": "search_engine_datastore",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "EMERGING"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "OpenSearch is a specific search engine/datastore name with little overlap in typical JDs; it is unlikely to be mistaken for another catalog skill."
},
"context_keywords": {
"context_keywords": [
"Elasticsearch",
"Kibana",
"Lucene",
"index mapping",
"shards",
"replicas",
"full-text search",
"aggregations",
"query DSL",
"ingest pipeline",
"cluster management",
"index templates",
"analyzers",
"vector search",
"OpenSearch Dashboards"
]
},
"maturity": {
"confidence": 0.84,
"maturity": "emerging",
"reasoning": "OpenSearch appears in growing numbers of JDs for search/log analytics, but Elasticsearch still dominates most postings; AWS also continues to position it as the open-source successor to Elasticsearch."
},
"skill_id": "opensearch",
"vendor_license": {
"confidence": 0.98,
"license": "apache_2",
"vendor": "OpenSearch Project",
"year_introduced": 2021
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Managed search and indexing services used to store, query, and analyze large document or event datasets. OpenSearch belongs here because it is commonly used as a search engine and analytics backend in cloud data platforms.",
"exemplar_skills": [
"OpenSearch",
"Amazon OpenSearch Service",
"Elasticsearch",
"full-text search",
"indexing",
"search aggregations",
"query DSL"
],
"in_scope": "OpenSearch, Amazon OpenSearch Service, Elasticsearch-compatible search clusters, index management, full-text search, faceting, aggregations, query DSL, shard and replica configuration, ingest pipelines, search dashboards",
"name": "Search and Analytics Services",
"out_of_scope": "Streaming ingestion logic and event processing, which belong to streaming-data-processing; application-side API calls to search endpoints, which belong to api-integration-and-data-fetching; generic database administration, which belongs to storage-provisioning-and-automation",
"overlap_flags": [
{
"reason": "OpenSearch often stores and queries logs/metrics, so operational observability use cases can overlap with monitoring systems.",
"with_dim_id": "monitoring-and-alerting",
"with_dim_name": null,
"with_role": "Azure Cloud Engineer"
},
{
"reason": "OpenSearch is frequently consumed as a managed cloud data service rather than only as a standalone search engine.",
"with_dim_id": "cloud-data-platform-services",
"with_dim_name": null,
"with_role": "Data Engineer"
}
],
"tentative_id": "cloud-data-platform-services"
},
{
"description": "Operational setup and tuning of search clusters, indexes, and query behavior. This fits OpenSearch when the skill emphasis is on running and configuring the search engine itself rather than integrating it into an application.",
"exemplar_skills": [
"OpenSearch",
"index mappings",
"shard allocation",
"analyzers",
"reindexing",
"snapshot and restore",
"query performance tuning"
],
"in_scope": "OpenSearch, cluster sizing, shard allocation, index templates, analyzers, mappings, refresh intervals, replicas, query performance tuning, reindexing, snapshot and restore",
"name": "Search Engine Administration",
"out_of_scope": "Frontend or backend API integration with search results, which belongs to api-integration-and-data-fetching; general cloud provisioning, which belongs to infrastructure-provisioning-templates; log dashboarding and alerting, which belongs to monitoring-and-alerting",
"overlap_flags": [
{
"reason": "Search cluster sizing and query tuning can overlap with broader scalability work when the focus is capacity and throughput.",
"with_dim_id": "scalability-and-performance-architecture",
"with_dim_name": null,
"with_role": "Cloud Architect"
}
],
"tentative_id": "d_init_01"
}
],
"merge_log": [],
"placed": {
"name": "OpenSearch",
"placement_confidence": 0.92,
"primary_dimension": "cloud-data-platform-services",
"reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [
"d_init_01"
],
"skill_id": "opensearch"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"microsoft-sentinel",
"rapid7-insightvm",
"owasp-top-10",
"tls-internals",
"rest-apis",
"aws-cdk",
"gke",
"aks",
"go",
"javascript"
],
"requires": [],
"skill_id": "opensearch",
"suppress_on_match": []
},
"skill_id": "opensearch",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.93,
"name": "OpenSearch",
"reasoning": "OpenSearch is fundamentally a persistent search and analytics datastore, and under the Datastore vs Format rule it fits Datastore because it stores and indexes data rather than merely defining a format.",
"skill_id": "opensearch",
"subtype": "search_engine_datastore",
"type": "Datastore"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": null,
"display_name": "Applied Machine Learning Toolkits and Frameworks",
"id": null,
"rationale": "Libraries and frameworks used to prototype, compare, index, and evaluate machine learning solutions quickly. Includes applied-ML toolkits such as scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, FAISS, vector search libraries, and embedding utilities. Excludes serving runtimes, deployment containers, online inference infrastructure, streaming systems, and database/storage engines.",
"slug": "d_merge_01",
"source": "llm"
},
"input_skill": "FAISS",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Version Control Systems",
"id": 365,
"rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "FAISS",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "FAISS",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Library",
"skill_nature": "LIBRARY",
"sub_category": "vector_search_library",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "EMERGING"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "FAISS is a distinctive library name for vector similarity search; in typical JDs it is unlikely to be confused with another catalog skill."
},
"context_keywords": {
"context_keywords": [
"approximate nearest neighbor",
"ANN",
"vector index",
"similarity search",
"embeddings",
"cosine similarity",
"L2 distance",
"IVF",
"HNSW",
"PQ",
"flat index",
"GPU acceleration",
"k-NN",
"semantic search",
"re-ranking"
]
},
"maturity": {
"confidence": 0.84,
"maturity": "emerging",
"reasoning": "FAISS appears in many ML/vector-search job descriptions and is widely used in RAG stacks, but it\u2019s still less universal than Elasticsearch/PostgreSQL; market demand is growing rather than ubiquitous."
},
"skill_id": "faiss",
"vendor_license": {
"confidence": 0.99,
"license": "mit",
"vendor": "Meta",
"year_introduced": 2017
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Libraries and frameworks used to prototype, compare, index, and evaluate machine learning solutions quickly. Includes applied-ML toolkits such as scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, FAISS, vector search libraries, and embedding utilities. Excludes serving runtimes, deployment containers, online inference infrastructure, streaming systems, and database/storage engines.",
"exemplar_skills": [
"Applied Machine Learning Toolkits and Frameworks"
],
"in_scope": "Skills, tools, and practices that belong under Applied Machine Learning Toolkits and Frameworks for the target role, including items implied by the dimension rationale.",
"name": "Applied Machine Learning Toolkits and Frameworks",
"out_of_scope": "Adjacent clusters explicitly not owned by Applied Machine Learning Toolkits and Frameworks, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "d_merge_01"
},
{
"description": "Index structures and libraries for approximate nearest-neighbor search over embeddings and feature vectors. FAISS fits strongly here because it is primarily used to build and query high-performance vector indexes for retrieval.",
"exemplar_skills": [
"FAISS",
"approximate nearest neighbor search",
"vector indexes",
"embedding retrieval",
"similarity search",
"IVF indexing",
"HNSW",
"product quantization"
],
"in_scope": "FAISS, approximate nearest neighbor search, vector indexes, embedding retrieval, similarity search, IVF indexes, HNSW indexes, PQ compression, cosine similarity, L2 distance",
"name": "Vector Search Indexing",
"out_of_scope": "General machine learning model training and experimentation, which belongs to applied-machine-learning-toolkits; database query planning and relational indexing, which belong to data platform or storage-related dimensions; full-text search engines, which are a separate search dimension",
"overlap_flags": [
{
"reason": "Vector search tooling is often learned alongside ML libraries, but this dimension is specifically about retrieval index structures.",
"with_dim_id": "applied-machine-learning-toolkits",
"with_dim_name": null,
"with_role": "Data Scientist"
},
{
"reason": "Vector search is commonly part of AI application architecture, though that dimension focuses on system design rather than the indexing mechanism.",
"with_dim_id": "ai-service-architecture-patterns",
"with_dim_name": null,
"with_role": "AI Engineer"
}
],
"tentative_id": "d_init_01"
}
],
"merge_log": [
{
"a_dim_id": "applied-machine-learning-toolkits",
"a_name": "Applied Machine Learning Toolkits",
"a_role": "__skill_focal__",
"b_dim_id": "applied-machine-learning-toolkits",
"b_name": "Applied Machine Learning Toolkits",
"b_role": "Data Scientist",
"into": "d_merge_01",
"into_name": "Applied Machine Learning Toolkits and Frameworks",
"merged_from": [
"applied-machine-learning-toolkits",
"applied-machine-learning-toolkits"
],
"pair_kind": "cross_role",
"reasoning": "Both dims describe the same applied-ML toolkit layer: libraries/frameworks for prototyping, comparing, indexing, and evaluating models. Dim A lists concrete examples like scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, and FAISS/vector search; Dim B says the same thing in broader terms and frames it as the tooling beneath modeling and evaluation. No distinct skill cluster appears in B, so this is a merge.",
"similarity": 0.6779459938694667
}
],
"placed": {
"name": "FAISS",
"placement_confidence": 0.92,
"primary_dimension": "d_merge_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 2 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [
"d_init_01"
],
"skill_id": "faiss"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"mfa",
"amm",
"spl",
"iscsi",
"owasp-top-10",
"foundry-fuzzing",
"hsm",
"bls",
"dex",
"aks"
],
"requires": [],
"skill_id": "faiss",
"suppress_on_match": []
},
"skill_id": "faiss",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.93,
"name": "FAISS",
"reasoning": "FAISS is fundamentally a code package imported by applications for similarity search, so under the Tool vs Framework rule it fits Library rather than a user-operated tool or hosted platform.",
"skill_id": "faiss",
"subtype": "vector_search_library",
"type": "Library"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:41-\u003e2"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
}
],
"unmatched_skills": [
"AI/ML",
"Amazon SageMaker",
"Amazon Bedrock",
"AWS Lambda",
"ECS",
"Amazon Athena",
"AWS Data Pipeline",
"S3",
"Kinesis",
"Amazon API Gateway",
"Pinecone",
"OpenSearch",
"FAISS"
]
}
API 3 — final-role-output
{
"chosen_role": {
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": "The primary skills include a strong focus on AWS and AI/ML technologies, which aligns well with the role of a Machine Learning Engineer.",
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
},
"chosen_role_resolution": "in_db",
"final_input_skills": [
{
"skill": "AWS",
"tag": "in_db"
},
{
"skill": "AI/ML",
"tag": "new"
},
{
"skill": "Amazon SageMaker",
"tag": "new"
},
{
"skill": "Amazon Bedrock",
"tag": "new"
},
{
"skill": "AWS Lambda",
"tag": "new"
},
{
"skill": "ECS",
"tag": "new"
},
{
"skill": "EKS",
"tag": "in_db"
},
{
"skill": "EC2",
"tag": "in_db"
},
{
"skill": "AWS Glue",
"tag": "in_db"
},
{
"skill": "Amazon Athena",
"tag": "new"
},
{
"skill": "Redshift",
"tag": "in_db"
},
{
"skill": "AWS Data Pipeline",
"tag": "new"
},
{
"skill": "S3",
"tag": "new"
},
{
"skill": "Kinesis",
"tag": "new"
},
{
"skill": "Amazon API Gateway",
"tag": "new"
},
{
"skill": "Python",
"tag": "in_db"
},
{
"skill": "TensorFlow",
"tag": "in_db"
},
{
"skill": "PyTorch",
"tag": "in_db"
},
{
"skill": "Scikit-learn",
"tag": "in_db"
},
{
"skill": "GitHub Actions",
"tag": "in_db"
},
{
"skill": "Airflow",
"tag": "in_db"
},
{
"skill": "Terraform",
"tag": "in_db"
},
{
"skill": "Docker",
"tag": "in_db"
},
{
"skill": "Kubernetes",
"tag": "in_db"
},
{
"skill": "Pinecone",
"tag": "new"
},
{
"skill": "OpenSearch",
"tag": "new"
},
{
"skill": "FAISS",
"tag": "new"
}
],
"persistence": {
"items": [
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platform Operations",
"id": 26,
"rationale": "Uses cloud provider services to support delivery and runtime environments. The focus is on consumer-level operation of cloud services rather than deep cloud architecture ownership.",
"slug": "cloud-platform-operations",
"source": "db"
},
"dimension_id": 26,
"input_skill": "AWS",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 163,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Platforms",
"id": 332,
"rationale": "Cloud-native security products used to assess posture, detect misconfigurations, and monitor workloads across AWS, Azure, and GCP. This is a distinct product family because the role often works across multiple CNAPP/CSPM/CWPP offerings and cloud-native detectors.",
"slug": "cloud-security-platforms",
"source": "db"
},
"dimension_id": 332,
"input_skill": "AWS",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 163,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"dimension_id": 121,
"input_skill": "EKS",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 725,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Orchestration Platforms",
"id": 25,
"rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
"slug": "orchestration-platforms",
"source": "db"
},
"dimension_id": 25,
"input_skill": "EKS",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 725,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Provider Core Services",
"id": 290,
"rationale": "Core managed services used to provision and operate cloud environments. This is the base cloud surface for compute, storage, networking, and platform primitives the role configures and maintains.",
"slug": "cloud-provider-core-services",
"source": "db"
},
"dimension_id": 290,
"input_skill": "EC2",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1773,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"dimension_id": 81,
"input_skill": "AWS Glue",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 466,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Data Warehousing Platforms",
"id": 72,
"rationale": "Cloud and on-prem analytical storage systems used to persist curated datasets and serve downstream consumers. This cluster is about the warehouse/lakehouse layer where transformed data is organized for access.",
"slug": "data-warehousing-platforms",
"source": "db"
},
"dimension_id": 72,
"input_skill": "Redshift",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 2570,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Analytical Programming Languages",
"id": 82,
"rationale": "Languages used to clean, transform, analyze, and prototype models in notebooks and scripts. This is the core coding surface for expressing statistical logic and data manipulation in a reproducible way.",
"slug": "analytical-programming-languages",
"source": "db"
},
"dimension_id": 82,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Analyst",
"id": 20,
"rationale": null,
"role_archetype": null,
"slug": "data-analyst",
"source": "db"
},
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Automation Scripting and CLI",
"id": 48,
"rationale": "Uses scripts and command-line tooling to execute repeatable Azure operations and reduce manual work. This is a practical cluster because the role frequently automates provisioning, checks, and remediation tasks.",
"slug": "automation-scripting-and-cli",
"source": "db"
},
"dimension_id": 48,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Azure Cloud Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "azure-cloud-engineer",
"source": "db"
},
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Automation and Scripting for Operations",
"id": 361,
"rationale": "Scripts and lightweight automation used to execute repetitive virtualization tasks and enforce operational consistency. This is the practical glue that reduces manual host and VM administration.",
"slug": "automation-and-scripting-for-operations",
"source": "db"
},
"dimension_id": 361,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Virtualization Engineer",
"id": 26,
"rationale": null,
"role_archetype": null,
"slug": "virtualization-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Network Automation and Scripting",
"id": 285,
"rationale": "Covers scripts and automation used to configure, validate, and audit network devices and services. This cluster is coherent because repeatable network operations increasingly depend on programmatic changes and checks.",
"slug": "network-automation-and-scripting",
"source": "db"
},
"dimension_id": 285,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Network Engineer",
"id": 21,
"rationale": null,
"role_archetype": null,
"slug": "network-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for AI Workflows",
"id": 261,
"rationale": "Languages used to implement AI feature logic, orchestration, and response handling inside product code. This is the core coding surface for turning prompts and model calls into reliable application behavior.",
"slug": "programming-languages-for-ai-workflows",
"source": "db"
},
"dimension_id": 261,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 12,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Backend Systems",
"id": 140,
"rationale": "Languages used to implement server-side business logic, request handlers, workers, and service integrations. This is the core coding surface for backend feature delivery and maintenance.",
"slug": "programming-languages-for-backend-systems",
"source": "db"
},
"dimension_id": 140,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 14,
"rationale": null,
"role_archetype": null,
"slug": "backend-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 67,
"rationale": "Languages used to implement data pipelines, transformations, and operational utilities. This is the code layer for expressing extraction, parsing, validation, and orchestration logic in data engineering workflows.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"dimension_id": 67,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 113,
"rationale": "Languages used to implement model integration code, inference services, and feature-processing logic. This is the core coding surface for turning trained models into product-facing software components.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"dimension_id": 113,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Security Work",
"id": 328,
"rationale": "Languages used to automate security tasks, write detection logic, and build analysis or remediation tooling. This is the core coding surface for a cybersecurity engineer across scripts, queries, and small utilities.",
"slug": "programming-languages-for-security-work",
"source": "db"
},
"dimension_id": 328,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Test Automation",
"id": 193,
"rationale": "Languages used to implement automated checks, helper utilities, and test harness code. This is the core coding surface for turning test ideas into maintainable automation.",
"slug": "programming-languages-for-test-automation",
"source": "db"
},
"dimension_id": 193,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Automation Tester",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "automation-tester",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Security Automation and Scripting",
"id": 258,
"rationale": "Automating repeatable security checks, enrichment, and remediation workflows. This cluster is coherent because the role often needs lightweight automation to scale analysis and response.",
"slug": "security-automation-and-scripting",
"source": "db"
},
"dimension_id": 258,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 393,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Applied Machine Learning Toolkits",
"id": 94,
"rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
"slug": "applied-machine-learning-toolkits",
"source": "db"
},
"dimension_id": 94,
"input_skill": "TensorFlow",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 558,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Applied Machine Learning Toolkits",
"id": 94,
"rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
"slug": "applied-machine-learning-toolkits",
"source": "db"
},
"dimension_id": 94,
"input_skill": "PyTorch",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 557,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Applied Machine Learning Toolkits",
"id": 94,
"rationale": "Libraries and frameworks used to prototype and compare models quickly. This dimension captures the concrete tooling layer beneath modeling methods and evaluation.",
"slug": "applied-machine-learning-toolkits",
"source": "db"
},
"dimension_id": 94,
"input_skill": "Scikit-learn",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Scientist",
"id": 7,
"rationale": null,
"role_archetype": null,
"slug": "data-scientist",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 554,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Continuous Integration Test Integration",
"id": 207,
"rationale": "Integrating automated checks into shared build and merge workflows so results are repeatable and visible. This cluster is coherent because automation testers commonly configure test execution triggers, artifacts, and reporting hooks.",
"slug": "continuous-integration-test-integration",
"source": "db"
},
"dimension_id": 207,
"input_skill": "GitHub Actions",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Automation Tester",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "automation-tester",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1250,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Workflow Orchestration Systems",
"id": 64,
"rationale": "Operational orchestration of ML jobs, dependencies, and handoffs across training, validation, deployment, and retraining. This is a useful split from training pipelines because it emphasizes the scheduler and control plane.",
"slug": "workflow-orchestration-systems",
"source": "db"
},
"dimension_id": 64,
"input_skill": "Airflow",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "mlops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 325,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure Provisioning Templates",
"id": 291,
"rationale": "Declarative templates and modules used to create repeatable cloud resources and environments. This cluster covers the infrastructure definitions the role applies, reviews, and updates to keep environments consistent.",
"slug": "infrastructure-provisioning-templates",
"source": "db"
},
"dimension_id": 291,
"input_skill": "Terraform",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 144,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure as Code",
"id": 22,
"rationale": "Defines infrastructure and platform resources through versioned code so environments are repeatable and reviewable. This is a coherent cluster because it underpins environment consistency and change control.",
"slug": "infrastructure-as-code",
"source": "db"
},
"dimension_id": 22,
"input_skill": "Terraform",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 144,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Infrastructure as Code and Declarative Provisioning",
"id": 36,
"rationale": "Defines cloud and platform infrastructure declaratively through versioned code so environments are repeatable, reviewable, and automatable. This includes authoring and maintaining IaC templates/modules, managing parameters and state, and using plan/apply workflows to provision and update resources across Azure and other cloud platforms.",
"slug": "infrastructure-as-code-and-declarative-provisioning",
"source": "db"
},
"dimension_id": 36,
"input_skill": "Terraform",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Azure Cloud Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "azure-cloud-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 144,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Containerization and Image Delivery",
"id": 24,
"rationale": "Builds, packages, and ships application and support workloads as container images. This cluster covers the artifact format and the mechanics of producing deployable images.",
"slug": "containerization-and-image-delivery",
"source": "db"
},
"dimension_id": 24,
"input_skill": "Docker",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 153,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Model Serving Deployment and Runtime Packaging",
"id": 52,
"rationale": "Operational deployment of trained models into online, batch, or streaming serving environments, including packaging models and model servers into containers or managed inference runtimes, coordinating rollout, and handing off to inference systems. Covers serving frameworks and platforms such as TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, and Seldon Core, plus container/runtime concerns like Docker images, GPU-enabled containers, base image selection, container entrypoints, runtime dependencies, and image scanning for model services.",
"slug": "model-serving-deployment-and-runtime-packaging",
"source": "db"
},
"dimension_id": 52,
"input_skill": "Docker",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "MLOps Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "mlops-engineer",
"source": "db"
},
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 153,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Orchestration Platforms",
"id": 25,
"rationale": "Operates the platforms that schedule and run containerized workloads and related deployment primitives. This is separate from image delivery because it concerns runtime placement and service rollout behavior.",
"slug": "orchestration-platforms",
"source": "db"
},
"dimension_id": 25,
"input_skill": "Kubernetes",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Engineer",
"id": 18,
"rationale": null,
"role_archetype": null,
"slug": "cloud-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A DevOps Engineer enables reliable, repeatable delivery of software by designing and operating the processes that connect development and production. They focus on improving deployment flow, operational stability, and collaboration between teams through automation, standardization, and monitoring of delivery and runtime practices.",
"slug": "devops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 158,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Applied Machine Learning Tooling and Frameworks",
"id": null,
"rationale": "Libraries and frameworks used to prototype, build, train, tune, and compare machine learning models in practice. Includes hands-on AI/ML tooling such as scikit-learn, XGBoost, LightGBM, TensorFlow, and PyTorch, along with model training, feature engineering, hyperparameter tuning, and evaluation workflows. Excludes model serving, deployment packaging, online inference, registry/version promotion, distributed stream processing, and cloud infrastructure provisioning.",
"slug": "d_merge_01",
"source": "llm"
},
"dimension_id": 94,
"input_skill": "AI/ML",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2611,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "AI Service Integration and Orchestration Patterns",
"id": null,
"rationale": "Patterns for structuring and embedding AI/ML capabilities within products and services, including deciding where AI logic lives and how it interacts with application flows. Covers model-backed APIs, retrieval-augmented generation, agent and workflow orchestration, batch scoring services, online inference integration, feature pipelines, and placement across handlers, workers, gateways, and service boundaries.",
"slug": "d_merge_02",
"source": "llm"
},
"dimension_id": 270,
"input_skill": "AI/ML",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2611,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "AI Inference Cost, Latency, and Throughput Optimization",
"id": 260,
"rationale": "Improving the speed, throughput, and cost efficiency of AI and ML-powered product features without sacrificing correctness or user experience. Includes token budgeting, prompt compression, batching, caching, model selection, quantization, pruning, async inference, warm starts, streaming UX, timeout tuning, concurrency control, and profiling. Excludes infrastructure autoscaling, model serving capacity planning, generic backend performance tuning, and unrelated data/warehouse optimization.",
"slug": "ai-inference-cost-latency-and-throughput-optimization",
"source": "db"
},
"dimension_id": 260,
"input_skill": "AI/ML",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 12,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 2611,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Managed ML Platform Workflows",
"id": null,
"rationale": "Cloud ML platforms for building and operating models end-to-end, including notebooks, experiments, managed training jobs, and pipeline/studio workflows. Examples: SageMaker Studio, SageMaker notebooks, SageMaker Pipelines, managed training jobs.",
"slug": "d_split_01_01",
"source": "llm"
},
"dimension_id": 367,
"input_skill": "Amazon SageMaker",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 New dimension saved (reconciliation separate) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2612,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Managed Model Hosting and Endpoints",
"id": null,
"rationale": "Cloud-managed services for deploying trained models as online or batch inference endpoints, including endpoint provisioning, batch transform, and rollout coordination. Examples: SageMaker endpoints, SageMaker batch transform.",
"slug": "d_split_01_02",
"source": "llm"
},
"dimension_id": 368,
"input_skill": "Amazon SageMaker",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 New dimension saved (reconciliation separate) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2612,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Model Serving Runtime Packaging",
"id": null,
"rationale": "Packaging trained models and model servers for deployment, including containers, base images, runtime dependencies, entrypoints, GPU images, and image scanning. Examples: Docker images for model services.",
"slug": "d_split_01_03",
"source": "llm"
},
"dimension_id": 52,
"input_skill": "Amazon SageMaker",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (embedding dedup) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2612,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Model Serving Frameworks and Platforms",
"id": null,
"rationale": "Serving frameworks and platforms used to run deployed models in online, batch, or streaming environments. Examples: TensorFlow Serving, TorchServe, Triton Inference Server, BentoML, KServe, Seldon Core.",
"slug": "d_split_01_04",
"source": "llm"
},
"dimension_id": 52,
"input_skill": "Amazon SageMaker",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (embedding dedup) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2612,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"dimension_id": 121,
"input_skill": "Amazon Bedrock",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 2613,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Managed Cloud Data Platform Services",
"id": null,
"rationale": "Managed cloud services used to run data engineering and related application workloads, including serverless compute, workflow orchestration, storage, event triggers, and adjacent security/platform primitives. This covers services such as AWS Lambda, AWS Step Functions, AWS Glue, Amazon S3 event triggers, and other managed services used to build and operate pipelines and data platforms.",
"slug": "d_merge_01",
"source": "llm"
},
"dimension_id": 81,
"input_skill": "AWS Lambda",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2614,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Version Control Systems",
"id": 365,
"rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 365,
"input_skill": "ECS",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2615,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Analytics Query Services",
"id": null,
"rationale": "Managed cloud services for querying and processing analytical data, including serverless SQL analytics, data lake querying, and federated query services such as Athena, Glue, Redshift Spectrum, and EMR-based analytics.",
"slug": "d_split_01_01",
"source": "llm"
},
"dimension_id": 367,
"input_skill": "Amazon Athena",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2616,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Data Pipeline Runtime",
"id": null,
"rationale": "Managed cloud compute and orchestration services used to run data engineering jobs, transformations, and workflows.",
"slug": "d_split_01_02",
"source": "llm"
},
"dimension_id": 81,
"input_skill": "Amazon Athena",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (embedding dedup) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2616,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Data Platform Storage",
"id": null,
"rationale": "Cloud storage and data access services used as the persistence layer for data engineering platforms and pipelines.",
"slug": "d_split_01_03",
"source": "llm"
},
"dimension_id": 81,
"input_skill": "Amazon Athena",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (embedding dedup) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2616,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Cloud Data Platform Security and Networking",
"id": null,
"rationale": "Identity, access, secrets, and networking primitives used to support cloud data platforms and pipelines.",
"slug": "d_split_01_04",
"source": "llm"
},
"dimension_id": 369,
"input_skill": "Amazon Athena",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 New dimension saved (reconciliation separate) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2616,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"dimension_id": 81,
"input_skill": "AWS Data Pipeline",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 2617,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Storage Provisioning and Automation",
"id": 311,
"rationale": "Covers the scripts, APIs, and operational workflows used to create, resize, map, and retire storage resources. This cluster is coherent because storage engineers often automate repetitive provisioning and maintenance tasks.",
"slug": "storage-provisioning-and-automation",
"source": "db"
},
"dimension_id": 311,
"input_skill": "S3",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Storage Engineer",
"id": 22,
"rationale": null,
"role_archetype": null,
"slug": "storage-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 2618,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Streaming Data Processing",
"id": 69,
"rationale": "Tools and patterns for ingesting and transforming event streams with low latency. This cluster covers continuous processing, windowing, and stateful stream jobs used to keep data fresh.",
"slug": "streaming-data-processing",
"source": "db"
},
"dimension_id": 69,
"input_skill": "Kinesis",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 2619,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "HTTP API Frameworks and Gateway Layers",
"id": null,
"rationale": "Server-side frameworks and gateway layers used to build, route, validate, and manage HTTP APIs. Includes backend web service frameworks and API gateway configuration for endpoints, request/response mapping, routing, input validation, auth integration, throttling, usage plans, stages, custom domains, and backend integration.",
"slug": "d_merge_01",
"source": "llm"
},
"dimension_id": 141,
"input_skill": "Amazon API Gateway",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2620,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Model Runtime Services",
"id": 121,
"rationale": "Consumer-level use of cloud services that host or support model inference applications. This cluster is coherent because MLEs often deploy and tune services on managed cloud compute, networking, and storage primitives.",
"slug": "cloud-model-runtime-services",
"source": "db"
},
"dimension_id": 121,
"input_skill": "Pinecone",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Machine Learning Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "machine-learning-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 2621,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Platform Services",
"id": 81,
"rationale": "Consumer-level use of cloud services that support data engineering workloads. This includes managed compute, storage, networking-adjacent services, and security primitives used to run pipelines and data platforms.",
"slug": "cloud-data-platform-services",
"source": "db"
},
"dimension_id": 81,
"input_skill": "OpenSearch",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 2622,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Version Control Systems",
"id": 365,
"rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 365,
"input_skill": "OpenSearch",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2622,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": null,
"display_name": "Applied Machine Learning Toolkits and Frameworks",
"id": null,
"rationale": "Libraries and frameworks used to prototype, compare, index, and evaluate machine learning solutions quickly. Includes applied-ML toolkits such as scikit-learn, XGBoost, LightGBM, PyTorch, TensorFlow, NumPy, pandas, FAISS, vector search libraries, and embedding utilities. Excludes serving runtimes, deployment containers, online inference infrastructure, streaming systems, and database/storage engines.",
"slug": "d_merge_01",
"source": "llm"
},
"dimension_id": 94,
"input_skill": "FAISS",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2623,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Version Control Systems",
"id": 365,
"rationale": "Tools and workflows for tracking source changes, branching, merging, and collaborating on code history. Git belongs here because it is the canonical distributed version control system used to manage revisions and coordinate team development.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 365,
"input_skill": "FAISS",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2623,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "AI Inference Cost, Latency, and Throughput Optimization",
"id": null,
"rationale": "Improving the runtime efficiency of AI/ML-powered features by reducing inference cost and latency while increasing throughput and preserving user experience. Includes token budgeting, prompt compression, batching, caching, quantization, pruning, model selection, async inference, warm starts, streaming UX, timeout tuning, concurrency control, GPU utilization, and profiling. Excludes model training, feature engineering, registry/versioning, infrastructure autoscaling, serving capacity planning, generic backend performance tuning, and unrelated data/warehouse optimization.",
"slug": "d_merge_03",
"source": "llm"
},
"dimension_id": 260,
"input_skill": "AI/ML",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2611,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 10,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Streaming Data Processing",
"id": null,
"rationale": "Tools and patterns for ingesting, buffering, and transforming event streams with low latency. This includes continuous processing, windowing, stateful stream jobs, checkpointing, shard scaling, stream partitioning, and managed streaming services such as Kinesis, Amazon Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics.",
"slug": "d_merge_01",
"source": "llm"
},
"dimension_id": 69,
"input_skill": "Kinesis",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (reconciliation merge) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 2619,
"skill_tag": "in_db",
"skipped_reason": null
}
],
"new_skills_created": 13,
"role_dimension_saved": 0,
"skill_dimension_saved": 21,
"skipped": 0
},
"planner_output": null,
"run_id": "20755499-04f6-440f-80a9-bb023fddc1ff"
}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.