Pipeline run
1f106d71-338e-40ee-a69a-09957abcd98f
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA descriptionvocab breakdown (legacy)
Signals
Post-classification
Captured for admin review
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Data Engineer
CASE Fslug: data-engineer · id: 2 · source: db
The primary skills indicate a strong focus on data processing technologies and cloud platforms, aligning well with a Data Engineer role.
Resolution:
in_db
— role exists in library; skill↔dim and role↔dim links saved when applicable.
Job description
ey Responsibilities • Design and build scalable, distributed data systems for real-time and batch processing • Develop high-throughput data pipelines processing 10B+ events per day • Contribute to and own key components in the technical design and implementation of core AdCloud platform components • Work with technologies such as Apache Spark, Kafka, Hadoop ecosystem, and modern data platforms • Ensure performance, scalability, reliability, and cost efficiency of data systems • Collaborate with Product, Engineering, and Data Science teams to deliver end-to-end solutions • Participate in design and code reviews to maintain high engineering standards • Continuously improve system robustness, scalability, and developer productivity Qualifications • 10+ years of experience in designing and developing large-scale data-driven systems • Strong experience with distributed data processing frameworks (Spark, Kafka, Hadoop, etc.) • Experience with NoSQL systems (HBase, Aerospike, Cassandra) and RDBMS • Strong programming skills in Java, Scala, or similar languages • Solid understanding of data structures, algorithms, and system design • Experience building scalable systems on cloud platforms (AWS/GCP/Azure) • Strong focus on performance optimization and cost efficiency • Excellent communication and collaboration skills Nice to Have • Experience in AdTech or high-scale event-driven systems • Exposure to data governance, data quality, and metadata systems • Experience supporting ML/AI data pipelines • Familiarity with modern data architectures (data lakes, lakehouse, etc.) • Contributions to open-source or technical communities
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Skill enrichment (orchestrator / LLM)
Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.
Apache Software Foundation ·apache_2 ·since 2010 (0.95)
“Apache Spark” is a specific, widely recognized distributed data processing framework; typical JDs won’t confuse it with other distinct ETL/streaming tools.
Versioned 3.x
{
"apache spark 3": "3.x",
"spark": "3.x",
"spark 3": "3.x",
"spark 3.x": "3.x",
"spark3": "3.x"
}
Framework ·distributed_data_processing_framework confidence 0.94
Apache Spark is a structured codebase that users build data applications and pipelines inside, so by the Tool vs Framework rule it is a Framework rather than a Tool.
- Category
- Framework
- Sub-category
- distributed_data_processing_framework
- Skill nature
- FRAMEWORK
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- SEPARATE_ENTITY
Dimensions (API 2 worklist)
-
ETL and ELT Tooling Catalog dimension db id 24
Library dimension (catalog)
Roles linked in library: Data Engineer
Locked dimensions (v3 placement)
-
Distributed Data Processing Frameworks
Reuses catalog slug
Frameworks used to process large datasets in batch or streaming pipelines, often as part of ETL/ELT workflows. Apache Spark belongs here because it is a core engine for distributed transformation, aggregation, and data movement.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | New skill saved · Existing dimension (library) · Role↔dimension saved |
Aliases — catalog
- Kafka (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Datastore
- Sub-category
- Event Stream Store
- Vendor
- Confluent
- License
- apache_2
- Year introduced
- 2011
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Kafka appears in many production JDs for event streaming and data pipelines, and remains a standard platform in cloud/vendor offerings (e.g., Confluent, AWS MSK), indicating broad hiring demand.
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 9
- Sub-category id
- 47
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Messaging and Event Streaming Catalog dimension db id 8
Library dimension (catalog)
Roles linked in library: Backend Engineer, Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Messaging and Event Streaming
messaging-and-event-streaming
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
Job postings still mention Hadoop for legacy big-data stacks, but JD volume has fallen as Spark and cloud warehouses replaced MapReduce-era clusters.
Apache Software Foundation ·apache_2 ·since 2006 (0.95)
“Hadoop” is a specific data processing framework; typical JDs distinguish it from other big data tools like Spark or Hive.
Not versioned
Framework ·data_processing_framework confidence 0.90
Hadoop is fundamentally a structured software stack that users build distributed data applications and jobs within, so it fits the Framework category rather than a Tool or Platform.
- Category
- Framework
- Sub-category
- data_processing_framework
- Skill nature
- FRAMEWORK
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
ETL and ELT Tooling Catalog dimension db id 24
Library dimension (catalog)
Roles linked in library: Data Engineer
Locked dimensions (v3 placement)
-
Distributed Data Processing Platforms
Reuses catalog slug
Tools and frameworks used to ingest, store, and process large-scale batch data across clusters. Hadoop belongs here because it is a foundational platform for distributed storage and computation in data engineering workflows.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | New skill saved · Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
HBase appears in a limited set of big-data/legacy Hadoop job postings, while newer JDs more often specify DynamoDB, Bigtable, or Cassandra; its market demand is specialized rather than broad.
Apache Software Foundation ·apache_2 ·since 2010 (0.95)
HBase is a specific Apache wide-column NoSQL datastore; JDs typically distinguish it from other datastores.
Not versioned
Datastore ·wide_column_store confidence 0.98
HBase is fundamentally a persistent data system, so by the Datastore vs Format rule it is a datastore rather than a tool or framework.
- Category
- Datastore
- Sub-category
- wide_column_store
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Cloud Storage and Data Services Catalog dimension db id 144
Library dimension (catalog)
Roles linked in library: Cloud Architect
Locked dimensions (v3 placement)
-
Distributed Data Storage Systems
Reuses catalog slug
Storage systems used to persist large-scale application and analytical data with low-latency access and horizontal scaling. HBase fits here as a distributed NoSQL store built on top of Hadoop storage infrastructure.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Aerospike appears in a limited set of high-scale datastore JDs and vendor case studies, but it is far less common than PostgreSQL, Redis, or MongoDB in general hiring pipelines.
Aerospike Inc. ·apache_2 ·since 2012 (0.95)
Aerospike is a specific distributed NoSQL database name; unlikely to be confused with other catalog datastore skills.
Not versioned
Datastore ·distributed_nosql_datastore confidence 0.98
Aerospike is fundamentally a system that persists and serves data, so by the Datastore vs Format rule it is a Datastore rather than a tool or platform.
- Category
- Datastore
- Sub-category
- distributed_nosql_datastore
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
Locked dimensions (v3 placement)
-
Distributed NoSQL Databases
Pipeline tentative id
Distributed NoSQL databases used for low-latency key-value access, horizontal scaling, and high availability. Aerospike belongs here because it is a distributed database platform rather than a general storage or cloud service.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Apache Cassandra appears in many production data-platform JDs and is a common choice for high-write, distributed workloads; GitHub and vendor docs show sustained activity rather than sunset signals.
Apache Software Foundation ·apache_2 ·since 2008 (0.95)
“Cassandra” is a specific wide-column NoSQL database name; unlikely to be confused with other catalog datastore skills.
Not versioned
Datastore ·wide_column_store confidence 0.99
Cassandra is fundamentally a distributed database that persists data, so by the Datastore vs Format rule it is a Datastore.
- Category
- Datastore
- Sub-category
- wide_column_store
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Cloud Storage and Data Services Catalog dimension db id 144
Library dimension (catalog)
Roles linked in library: Cloud Architect
Locked dimensions (v3 placement)
-
Distributed Data Storage Systems
Reuses catalog slug
Managed and self-hosted data stores used to persist application data with high availability and horizontal scale. Cassandra belongs here because it is a distributed wide-column database chosen for partitioning, replication, and fault-tolerant storage.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Java (CANONICAL) primary
- JDK (VERSION)
- JDK 10 (VERSION)
- JDK 11 (VERSION)
- JDK 12 (VERSION)
- JDK 13 (VERSION)
- JDK 14 (VERSION)
- JDK 15 (VERSION)
- JDK 16 (VERSION)
- JDK 17 (VERSION)
- JDK 18 (VERSION)
- JDK 19 (VERSION)
- JDK 20 (VERSION)
- JDK 21 (VERSION)
- JDK 5 (VERSION)
- JDK 6 (VERSION)
- JDK 7 (VERSION)
- JDK 8 (VERSION)
- JDK 9 (VERSION)
- Java 1.0 (VERSION)
- Java 1.1 (VERSION)
- Java 1.2 (VERSION)
- Java 1.3 (VERSION)
- Java 1.4 (VERSION)
- Java 1.5 (VERSION)
- Java 1.6 (VERSION)
- Java 1.7 (VERSION)
- Java 1.8 (VERSION)
- Java 10 (VERSION)
- Java 11 (VERSION)
- Java 12 (VERSION)
- Java 13 (VERSION)
- Java 14 (VERSION)
- Java 15 (VERSION)
- Java 16 (VERSION)
- Java 17 (VERSION)
- Java 18 (VERSION)
- Java 19 (VERSION)
- Java 20 (VERSION)
- Java 21 (VERSION)
- Java 5 (VERSION)
- Java 6 (VERSION)
- Java 7 (VERSION)
- Java 8 (VERSION)
- Java 9 (VERSION)
- Java11 (VERSION)
- Java17 (VERSION)
- Java21 (VERSION)
- Java8 (VERSION)
- OpenJDK 11 (VERSION)
- OpenJDK 17 (VERSION)
- OpenJDK 21 (VERSION)
- OpenJDK 8 (VERSION)
- java 11 (VERSION)
- java 17 (VERSION)
- java 21 (VERSION)
- java 4 (VERSION)
- java 5 (VERSION)
- java 6 (VERSION)
- java 7 (VERSION)
- java 8 (VERSION)
- java lts (VERSION)
- java-11 (VERSION)
- java-17 (VERSION)
- java-21 (VERSION)
- java-4 (VERSION)
- java-5 (VERSION)
- java-6 (VERSION)
- java-7 (VERSION)
- java-8 (VERSION)
- java11 (VERSION)
- java17 (VERSION)
- java21 (VERSION)
- java4 (VERSION)
- java5 (VERSION)
- java6 (VERSION)
- java7 (VERSION)
- java8 (VERSION)
- jdk 11 (VERSION)
- jdk 17 (VERSION)
- jdk 21 (VERSION)
- jdk 4 (VERSION)
- jdk 5 (VERSION)
- jdk 6 (VERSION)
- jdk 7 (VERSION)
- jdk 8 (VERSION)
- jdk11 (VERSION)
- jdk17 (VERSION)
- jdk21 (VERSION)
- jdk4 (VERSION)
- jdk5 (VERSION)
- jdk6 (VERSION)
- jdk7 (VERSION)
- jdk8 (VERSION)
- jvm21 (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Language
- Sub-category
- Programming Language
- Vendor
- Oracle
- License
- other_open
- Year introduced
- 1995
- Confidence
- 0.99
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 21
Maturity reasoning: Java is a hiring-pipeline staple with very high JD volume across enterprise backend, Android, and cloud roles; it remains widely supported by major vendors and frameworks like Spring.
Skill profile (library / DB)
- Skill nature
- LANGUAGE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 6
- Sub-category id
- 96
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Kotlin and Java Catalog dimension db id 161
Library dimension (catalog)
Roles linked in library: Android Engineer
-
Programming Languages Catalog dimension db id 1
Library dimension (catalog)
Roles linked in library: Backend Engineer
-
Programming Languages for Data Work Catalog dimension db id 21
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Kotlin and Java
kotlin-and-java
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages
programming-languages
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Aliases — catalog
- Scala (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Language
- Sub-category
- Programming Language
- Vendor
- EPFL
- License
- apache_2
- Year introduced
- 2004
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Scala still appears in many backend/data engineering JDs, especially with Spark and Akka, and remains supported by major JVM ecosystems; it’s not a sunset technology.
Skill profile (library / DB)
- Skill nature
- LANGUAGE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 6
- Sub-category id
- 96
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Programming Languages for Data Work Catalog dimension db id 21
Library dimension (catalog)
Roles linked in library: Data Engineer
-
Programming Languages for ML Systems Catalog dimension db id 39
Library dimension (catalog)
Roles linked in library: ML Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
|
Programming Languages for ML Systems
programming-languages-for-ml-systems
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- AWS (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Platform
- Sub-category
- Cloud Platform
- Vendor
- Amazon
- License
- other_open
- Year introduced
- 2006
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: AWS is a hiring-pipeline staple: it appears in a large share of cloud/DevOps job descriptions and dominates public cloud market share, with broad certification and vendor ecosystem support.
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 9
- Sub-category id
- 46
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Platforms Catalog dimension db id 20
Library dimension (catalog)
Roles linked in library: Backend Engineer, Cybersecurity Engineer, Data Engineer, DevOps Engineer, ML Engineer
-
Cloud Platforms for AI Deployment Catalog dimension db id 211
Library dimension (catalog)
Roles linked in library: AI Engineer
-
Cloud Provider Platforms Catalog dimension db id 131
Library dimension (catalog)
Roles linked in library: Cloud Architect
-
Cloud Security Posture Tools Catalog dimension db id 64
Library dimension (catalog)
Roles linked in library: Cybersecurity Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Platforms
cloud-platforms
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
|
Cloud Platforms for AI Deployment
cloud-platforms-for-ai-deployment
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Cloud Provider Platforms
cloud-provider-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Cloud Security Posture Tools
cloud-security-posture-tools
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- GCP (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Platform
- Sub-category
- Cloud Platform
- Vendor
- License
- other_open
- Year introduced
- 2011
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: GCP appears frequently in cloud/platform job descriptions and is a major hyperscaler alongside AWS/Azure, with broad enterprise adoption and active vendor investment.
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 9
- Sub-category id
- 46
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Platforms Catalog dimension db id 20
Library dimension (catalog)
Roles linked in library: Backend Engineer, Cybersecurity Engineer, Data Engineer, DevOps Engineer, ML Engineer
-
Cloud Platforms for AI Deployment Catalog dimension db id 211
Library dimension (catalog)
Roles linked in library: AI Engineer
-
Cloud Security Posture Tools Catalog dimension db id 64
Library dimension (catalog)
Roles linked in library: Cybersecurity Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Platforms
cloud-platforms
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
|
Cloud Platforms for AI Deployment
cloud-platforms-for-ai-deployment
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Cloud Security Posture Tools
cloud-security-posture-tools
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Azure (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Platform
- Sub-category
- Cloud Platform
- Vendor
- Microsoft
- License
- proprietary
- Year introduced
- 2010
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Azure is broadly adopted and frequently appears in cloud/platform job descriptions alongside AWS and GCP; Microsoft’s ongoing enterprise investment and Azure certification demand signal strong hiring-pipeline relevance.
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 9
- Sub-category id
- 46
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Platforms Catalog dimension db id 20
Library dimension (catalog)
Roles linked in library: Backend Engineer, Cybersecurity Engineer, Data Engineer, DevOps Engineer, ML Engineer
-
Cloud Platforms for AI Deployment Catalog dimension db id 211
Library dimension (catalog)
Roles linked in library: AI Engineer
-
Cloud Provider Platforms Catalog dimension db id 131
Library dimension (catalog)
Roles linked in library: Cloud Architect
-
Cloud Security Posture Tools Catalog dimension db id 64
Library dimension (catalog)
Roles linked in library: Cybersecurity Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Platforms
cloud-platforms
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
|
Cloud Platforms for AI Deployment
cloud-platforms-for-ai-deployment
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Cloud Provider Platforms
cloud-provider-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Cloud Security Posture Tools
cloud-security-posture-tools
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
RDBMS is a core requirement in many job descriptions across backend, data, and DBA roles; PostgreSQL, MySQL, and SQL Server remain standard enterprise stacks.
(0.90)
RDBMS is a standard, specific datastore category (relational DBMS) with little overlap with other distinct skills in typical JDs.
Not versioned
Datastore ·relational_database_management_system confidence 0.98
RDBMS is fundamentally a system that persists and manages data, so under the Datastore vs Format rule it is a Datastore rather than a tool or concept.
- Category
- Datastore
- Sub-category
- relational_database_management_system
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
Locked dimensions (v3 placement)
-
Relational Database Systems
Pipeline tentative id
Relational database management systems used to store, query, and maintain structured data with tables, keys, constraints, and SQL. RDBMS fits here because it names the core database engine category rather than a specific vendor or data workflow tool.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- NoSQL (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Concept
- Sub-category
- Database Paradigm
- Confidence
- 0.93
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: NoSQL is broadly listed in job descriptions across backend/data roles, with MongoDB, DynamoDB, and Cassandra appearing as common market signals; it remains a hiring-pipeline staple rather than a niche or sunset tech.
Skill profile (library / DB)
- Skill nature
- CONCEPT
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 2
- Sub-category id
- 1019
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
NoSQL Databases Catalog dimension db id 19
Library dimension (catalog)
Roles linked in library: Backend Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
NoSQL Databases
nosql-databases
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Machine Learning appears in large volumes of job descriptions across data, product, and platform roles, and major cloud vendors (AWS, Google Cloud, Azure) offer dedicated ML services and certifications, indicating broad adoption.
(0.95)
“Machine Learning” is a standard, specific concept and is unlikely to be confused with other distinct catalog skills in typical job descriptions.
Not versioned
Concept ·machine_learning confidence 0.98
Machine Learning is a named knowledge unit about building models that learn from data, so by the Concept vs Methodology rule it is a Concept rather than an Architecture or Methodology.
- Category
- Concept
- Sub-category
- machine_learning
- Skill nature
- CONCEPT
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
-
AI Governance and Model Security Catalog dimension db id 50
Library dimension (catalog)
Roles linked in library: AI Engineer, ML Engineer
-
AI Governance and Model Security Catalog dimension db id 50
Library dimension (catalog)
Roles linked in library: AI Engineer, ML Engineer
Locked dimensions (v3 placement)
-
Machine Learning Fundamentals
Pipeline tentative id
Core concepts, methods, and workflows for building predictive models from data. This fits the target skill because machine learning is the umbrella discipline covering model selection, training, validation, and deployment-oriented thinking.
-
AI Governance and Model Security
Reuses catalog slug
Controls and documentation used to make models safer, auditable, and compliant. Machine learning practitioners may need this when training or deploying models in regulated or risk-sensitive environments.
-
AI Governance and Model Security
Reuses catalog slug
Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
AI Governance and Model Security
ai-governance-and-model-security
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
AI appears in a large and growing share of job descriptions across software, data, and product roles, and major vendors (Microsoft, Google, AWS) have standardized AI offerings, signaling broad market adoption.
(1.00)
“Artificial Intelligence” is a broad, standard concept and is unlikely to be confused with a different catalog skill in typical job descriptions.
Not versioned
Concept ·artificial_intelligence confidence 0.98
Artificial Intelligence is a named knowledge unit about a field of techniques and theory, so by the Concept vs Methodology rule it is a Concept rather than a tool, platform, or methodology.
- Category
- Concept
- Sub-category
- artificial_intelligence
- Skill nature
- CONCEPT
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
Locked dimensions (v3 placement)
-
Artificial Intelligence Concepts
Pipeline tentative id
Core concepts, methods, and terminology for building AI systems across symbolic, statistical, and machine-learning approaches. This skill is broad enough to stand as a top-level conceptual dimension when the intent is general AI literacy rather than a specific subdomain.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Data lakes are widely listed in cloud/data platform job descriptions and are a standard architecture in AWS, Azure, and GCP ecosystems; they’re a common hiring-pipeline staple rather than a niche pattern.
(0.80)
“Data Lakes” is a specific architecture pattern (data lake storage/processing) and is unlikely to be confused with other distinct catalog skills.
Not versioned
Architecture ·data_lake_architecture confidence 0.90
By the Architecture vs Concept rule, data lakes describe a system-shape for organizing and storing data rather than a specific knowledge unit or product.
- Category
- Architecture
- Sub-category
- data_lake_architecture
- Skill nature
- PATTERN
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
Cloud Storage and Data Services Catalog dimension db id 144
Library dimension (catalog)
Roles linked in library: Cloud Architect
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
-
Cloud Storage and Data Services Catalog dimension db id 144
Library dimension (catalog)
Roles linked in library: Cloud Architect
Locked dimensions (v3 placement)
-
Cloud Storage and Data Services
Reuses catalog slug
Cloud-native storage and managed data services used to store large analytical datasets, define retention, and support lake-style architectures. Data Lakes fit here because they are typically built on object storage and adjacent managed services for durable, scalable data storage.
-
Lakehouse Data Architecture
Pipeline tentative id
Architectural patterns for organizing analytical data across raw, curated, and consumption-ready layers in a lake or lakehouse. This fits Data Lakes when the skill is used to design how data is structured, governed, and accessed for analytics.
-
Cloud Storage and Data Services
Reuses catalog slug
Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Lakehouse is increasingly listed in data-platform JDs and vendor docs (Databricks, Snowflake, Microsoft Fabric), but it is not yet as universal as core warehouse or lake skills.
(0.80)
“Lakehouse” is a specific data platform architecture term and is unlikely to be confused with other catalog skills.
Not versioned
Architecture ·data_platform_architecture confidence 0.90
Lakehouse is fundamentally a system-shape pattern that combines data lake and warehouse characteristics, so by the Architecture vs Concept rule it fits Architecture rather than a tool or datastore.
- Category
- Architecture
- Sub-category
- data_platform_architecture
- Skill nature
- PATTERN
- Volatility
- EMERGING
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
Locked dimensions (v3 placement)
-
Lakehouse Architecture
Pipeline tentative id
Unified data platform patterns that combine data lake storage with warehouse-style management, governance, and analytics. Lakehouse belongs here because it refers to the architectural approach and platform capabilities used to store, process, and serve analytical data.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
Common in cloud-native JDs and vendor docs; AWS, Azure, and Confluent all market event-driven patterns with Kafka/PubSub, showing broad hiring demand.
(0.90)
Event-Driven Architecture is a specific architecture pattern; typical JDs won’t confuse it with other distinct architecture skills.
Not versioned
Architecture ·event_driven_architecture confidence 0.99
By the Architecture vs Concept rule, Event-Driven Architecture is a system-shape pattern that influences how systems are built, not just a knowledge unit.
- Category
- Architecture
- Sub-category
- event_driven_architecture
- Skill nature
- PATTERN
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- NOT_APPLICABLE
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
Locked dimensions (v3 placement)
-
Event-Driven Architecture
Pipeline tentative id
Architectural patterns for building systems around events, asynchronous messaging, and decoupled producers and consumers. This fits the target skill because it covers how services publish, route, process, and react to domain and integration events.
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
All API 3 persistence rows
Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.
| Skill | Tag | Dimension | Skill↔dim | Role↔dim | Outcome | Notes |
|---|---|---|---|---|---|---|
| Kafka | in_db |
Messaging and Event Streaming
messaging-and-event-streaming
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Java | in_db |
Kotlin and Java
kotlin-and-java
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Java | in_db |
Programming Languages
programming-languages
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Java | in_db |
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Scala | in_db |
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Scala | in_db |
Programming Languages for ML Systems
programming-languages-for-ml-systems
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| AWS | in_db |
Cloud Platforms
cloud-platforms
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| AWS | in_db |
Cloud Platforms for AI Deployment
cloud-platforms-for-ai-deployment
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| AWS | in_db |
Cloud Provider Platforms
cloud-provider-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| AWS | in_db |
Cloud Security Posture Tools
cloud-security-posture-tools
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| GCP | in_db |
Cloud Platforms
cloud-platforms
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| GCP | in_db |
Cloud Platforms for AI Deployment
cloud-platforms-for-ai-deployment
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| GCP | in_db |
Cloud Security Posture Tools
cloud-security-posture-tools
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Azure | in_db |
Cloud Platforms
cloud-platforms
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Azure | in_db |
Cloud Platforms for AI Deployment
cloud-platforms-for-ai-deployment
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Azure | in_db |
Cloud Provider Platforms
cloud-provider-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Azure | in_db |
Cloud Security Posture Tools
cloud-security-posture-tools
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| NoSQL | in_db |
NoSQL Databases
nosql-databases
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Apache Spark | in_db |
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | New skill saved · Existing dimension (library) · Role↔dimension saved | |
| Hadoop | in_db |
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | New skill saved · Existing dimension (library) · Role↔dimension saved | |
| HBase | in_db |
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Aerospike | in_db |
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Cassandra | in_db |
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| RDBMS | in_db |
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Machine Learning | in_db |
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Machine Learning | in_db |
AI Governance and Model Security
ai-governance-and-model-security
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Artificial Intelligence | in_db |
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Data Lakes | in_db |
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Data Lakes | in_db |
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Lakehouse | in_db |
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Event-Driven Architecture | in_db |
React Frontend Development
d_init_01
|
✓ | — | New skill saved · Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Library artifacts (this run)
| Kind | Detail | DB id |
|---|---|---|
| canonical_skill_added | Apache Spark | 1350 |
| canonical_skill_added | Hadoop | 1351 |
| canonical_skill_added | HBase | 1352 |
| canonical_skill_added | Aerospike | 1353 |
| canonical_skill_added | Cassandra | 1354 |
| canonical_skill_added | RDBMS | 1355 |
| canonical_skill_added | Machine Learning | 1356 |
| canonical_skill_added | Artificial Intelligence | 1357 |
| canonical_skill_added | Data Lakes | 1358 |
| canonical_skill_added | Lakehouse | 1359 |
| canonical_skill_added | Event-Driven Architecture | 1360 |
| dimension_skill_link | Apache Spark ↔ ETL and ELT Tooling | 24 |
| dimension_skill_link | Hadoop ↔ ETL and ELT Tooling | 24 |
| dimension_skill_link | HBase ↔ Cloud Storage and Data Services | 144 |
| dimension_skill_link | Aerospike ↔ React Frontend Development | 96 |
| dimension_skill_link | Cassandra ↔ Cloud Storage and Data Services | 144 |
| dimension_skill_link | RDBMS ↔ React Frontend Development | 96 |
| dimension_skill_link | Machine Learning ↔ React Frontend Development | 96 |
| dimension_skill_link | Machine Learning ↔ AI Governance and Model Security | 50 |
| dimension_skill_link | Artificial Intelligence ↔ React Frontend Development | 96 |
| dimension_skill_link | Data Lakes ↔ Cloud Storage and Data Services | 144 |
| dimension_skill_link | Data Lakes ↔ React Frontend Development | 96 |
| dimension_skill_link | Lakehouse ↔ React Frontend Development | 96 |
| dimension_skill_link | Event-Driven Architecture ↔ React Frontend Development | 96 |
nano JD Parser — gpt-4.1-nano click to toggle
Show raw JSON
{
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": null,
"min": 10,
"raw": "10+ years of experience in designing and developing large-scale data-driven systems"
},
"job_locations": [],
"role": null,
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 8,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Design and build scalable,",
"last_5_words": "robustness, scalability, and developer productivity"
},
"text": "\u2022 Design and build scalable, distributed data systems for real-time and batch processing\n\u2022 Develop high-throughput data pipelines processing 10B+ events per day\n\u2022 Contribute to and own key components in the technical design and implementation of core AdCloud platform components\n\u2022 Work with technologies such as Apache Spark, Kafka, Hadoop ecosystem, and modern data platforms\n\u2022 Ensure performance, scalability, reliability, and cost efficiency of data systems\n\u2022 Collaborate with Product, Engineering, and Data Science teams to deliver end-to-end solutions\n\u2022 Participate in design and code reviews to maintain high engineering standards\n\u2022 Continuously improve system robustness, scalability, and developer productivity",
"word_count": 108
},
{
"bullet_count": 8,
"heading": "Qualifications",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 10+ years of experience in",
"last_5_words": "communication and collaboration skills"
},
"text": "\u2022 10+ years of experience in designing and developing large-scale data-driven systems\n\u2022 Strong experience with distributed data processing frameworks (Spark, Kafka, Hadoop, etc.)\n\u2022 Experience with NoSQL systems (HBase, Aerospike, Cassandra) and RDBMS\n\u2022 Strong programming skills in Java, Scala, or similar languages\n\u2022 Solid understanding of data structures, algorithms, and system design\n\u2022 Experience building scalable systems on cloud platforms (AWS/GCP/Azure)\n\u2022 Strong focus on performance optimization and cost efficiency\n\u2022 Excellent communication and collaboration skills",
"word_count": 108
},
{
"bullet_count": 5,
"heading": "Nice to Have",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Experience in AdTech or",
"last_5_words": "or technical communities"
},
"text": "\u2022 Experience in AdTech or high-scale event-driven systems\n\u2022 Exposure to data governance, data quality, and metadata systems\n\u2022 Experience supporting ML/AI data pipelines\n\u2022 Familiarity with modern data architectures (data lakes, lakehouse, etc.)\n\u2022 Contributions to open-source or technical communities",
"word_count": 56
}
],
"urls": []
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "Apache Spark"
},
{
"is_primary": true,
"skill_name": "Kafka"
},
{
"is_primary": true,
"skill_name": "Hadoop"
},
{
"is_primary": true,
"skill_name": "HBase"
},
{
"is_primary": true,
"skill_name": "Aerospike"
},
{
"is_primary": true,
"skill_name": "Cassandra"
},
{
"is_primary": true,
"skill_name": "Java"
},
{
"is_primary": true,
"skill_name": "Scala"
},
{
"is_primary": true,
"skill_name": "AWS"
},
{
"is_primary": true,
"skill_name": "GCP"
},
{
"is_primary": true,
"skill_name": "Azure"
},
{
"is_primary": true,
"skill_name": "RDBMS"
},
{
"is_primary": true,
"skill_name": "NoSQL"
},
{
"is_primary": false,
"skill_name": "Machine Learning"
},
{
"is_primary": false,
"skill_name": "Artificial Intelligence"
},
{
"is_primary": false,
"skill_name": "Data Lakes"
},
{
"is_primary": false,
"skill_name": "Lakehouse"
},
{
"is_primary": false,
"skill_name": "Event-Driven Architecture"
}
],
"jd_role": null,
"nano_parsed": {
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": null,
"min": 10,
"raw": "10+ years of experience in designing and developing large-scale data-driven systems"
},
"job_locations": [],
"role": null,
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 8,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Design and build scalable,",
"last_5_words": "robustness, scalability, and developer productivity"
},
"text": "\u2022 Design and build scalable, distributed data systems for real-time and batch processing\n\u2022 Develop high-throughput data pipelines processing 10B+ events per day\n\u2022 Contribute to and own key components in the technical design and implementation of core AdCloud platform components\n\u2022 Work with technologies such as Apache Spark, Kafka, Hadoop ecosystem, and modern data platforms\n\u2022 Ensure performance, scalability, reliability, and cost efficiency of data systems\n\u2022 Collaborate with Product, Engineering, and Data Science teams to deliver end-to-end solutions\n\u2022 Participate in design and code reviews to maintain high engineering standards\n\u2022 Continuously improve system robustness, scalability, and developer productivity",
"word_count": 108
},
{
"bullet_count": 8,
"heading": "Qualifications",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 10+ years of experience in",
"last_5_words": "communication and collaboration skills"
},
"text": "\u2022 10+ years of experience in designing and developing large-scale data-driven systems\n\u2022 Strong experience with distributed data processing frameworks (Spark, Kafka, Hadoop, etc.)\n\u2022 Experience with NoSQL systems (HBase, Aerospike, Cassandra) and RDBMS\n\u2022 Strong programming skills in Java, Scala, or similar languages\n\u2022 Solid understanding of data structures, algorithms, and system design\n\u2022 Experience building scalable systems on cloud platforms (AWS/GCP/Azure)\n\u2022 Strong focus on performance optimization and cost efficiency\n\u2022 Excellent communication and collaboration skills",
"word_count": 108
},
{
"bullet_count": 5,
"heading": "Nice to Have",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Experience in AdTech or",
"last_5_words": "or technical communities"
},
"text": "\u2022 Experience in AdTech or high-scale event-driven systems\n\u2022 Exposure to data governance, data quality, and metadata systems\n\u2022 Experience supporting ML/AI data pipelines\n\u2022 Familiarity with modern data architectures (data lakes, lakehouse, etc.)\n\u2022 Contributions to open-source or technical communities",
"word_count": 56
}
],
"urls": []
},
"rejected": false,
"rejection_reason": null,
"run_id": "1f106d71-338e-40ee-a69a-09957abcd98f",
"stage3_signals": {
"alias_match_roles": [],
"kra_match_roles": [
{
"display_name": "Cloud Architect",
"matched_count": null,
"role_id": 9,
"score": 0.429,
"slug": "cloud-architect",
"total_count": null
},
{
"display_name": "Data Engineer",
"matched_count": null,
"role_id": 2,
"score": 0.4131,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "AR/VR Engineer",
"matched_count": null,
"role_id": 8,
"score": 0.4098,
"slug": "ar-vr-engineer",
"total_count": null
},
{
"display_name": "Android Engineer",
"matched_count": null,
"role_id": 4,
"score": 0.3901,
"slug": "android-engineer",
"total_count": null
},
{
"display_name": "Backend Engineer",
"matched_count": null,
"role_id": 1,
"score": 0.3778,
"slug": "backend-engineer",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Backend Engineer",
"matched_count": 6,
"role_id": 1,
"score": 0.3333,
"slug": "backend-engineer",
"total_count": 18
},
{
"display_name": "Data Engineer",
"matched_count": 6,
"role_id": 2,
"score": 0.3333,
"slug": "data-engineer",
"total_count": 18
},
{
"display_name": "ML Engineer",
"matched_count": 4,
"role_id": 3,
"score": 0.2222,
"slug": "ml-engineer",
"total_count": 18
},
{
"display_name": "AI Engineer",
"matched_count": 3,
"role_id": 13,
"score": 0.1667,
"slug": "ai-engineer",
"total_count": 18
},
{
"display_name": "Cybersecurity Engineer",
"matched_count": 3,
"role_id": 5,
"score": 0.1667,
"slug": "cybersecurity-engineer",
"total_count": 18
}
],
"stage35_ran": false
},
"stage4_decision": {
"alias_collision_detected": false,
"case": "F",
"chosen_role": {
"display_name": "Data Engineer",
"matched_count": null,
"role_id": 2,
"score": 0.4131,
"slug": "data-engineer",
"total_count": null
},
"confidence": 0.95,
"llm2_fired": true,
"llm2_reasoning": "The JD focuses heavily on building large-scale, distributed data pipelines with Spark, Kafka, and Hadoop, which aligns directly with typical Data Engineer responsibilities.",
"queued": false,
"reasoning": "LLM2 picked data-engineer (confidence 0.95)"
},
"stage5_updates": {
"centroid_n_after": 16,
"centroid_updated": true,
"collision_log_id": null,
"new_kra_attached": null,
"new_skills_attached": [
{
"is_primary": true,
"queue_id": 1088,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Apache Spark",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 1089,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Hadoop",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 1090,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "HBase",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 1091,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Aerospike",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 1092,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Cassandra",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 1093,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "RDBMS",
"status": "pending"
},
{
"is_primary": false,
"queue_id": 1094,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Machine Learning",
"status": "pending"
},
{
"is_primary": false,
"queue_id": 1095,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Artificial Intelligence",
"status": "pending"
},
{
"is_primary": false,
"queue_id": 1096,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Lakes",
"status": "pending"
},
{
"is_primary": false,
"queue_id": 1097,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Lakehouse",
"status": "pending"
},
{
"is_primary": false,
"queue_id": 1098,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Event-Driven Architecture",
"status": "pending"
}
],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{
"alias_matches": [
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 173,
"existing_alias_text": "Kafka",
"input_term": "Kafka",
"matched_canonical": {
"category_id": 9,
"display_name": "Kafka",
"id": 36,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "kafka",
"sub_category_id": 47,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 1,
"existing_alias_text": "Java",
"input_term": "Java",
"matched_canonical": {
"category_id": 6,
"display_name": "Java",
"id": 1,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "java",
"sub_category_id": 96,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 272,
"existing_alias_text": "Scala",
"input_term": "Scala",
"matched_canonical": {
"category_id": 6,
"display_name": "Scala",
"id": 102,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "scala",
"sub_category_id": 96,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 406,
"existing_alias_text": "AWS",
"input_term": "AWS",
"matched_canonical": {
"category_id": 9,
"display_name": "AWS",
"id": 187,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "aws",
"sub_category_id": 46,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 405,
"existing_alias_text": "GCP",
"input_term": "GCP",
"matched_canonical": {
"category_id": 9,
"display_name": "GCP",
"id": 186,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "gcp",
"sub_category_id": 46,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 407,
"existing_alias_text": "Azure",
"input_term": "Azure",
"matched_canonical": {
"category_id": 9,
"display_name": "Azure",
"id": 188,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "azure",
"sub_category_id": 46,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 1989,
"existing_alias_text": "NoSQL",
"input_term": "NoSQL",
"matched_canonical": {
"category_id": 2,
"display_name": "NoSQL",
"id": 1346,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CONCEPT",
"slug": "nosql",
"sub_category_id": 1019,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
}
],
"candidate_roles": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "Android Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "android-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
},
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "The primary skills indicate a strong focus on data processing technologies and cloud platforms, aligning well with a Data Engineer role.",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Kotlin and Java",
"id": 161,
"rationale": "Primary implementation languages for Android app features, platform integration, and client-side business logic. Android engineers use these languages to build screens, state flows, service adapters, and device-aware behavior.",
"slug": "kotlin-and-java",
"source": "db"
},
"input_skill": "Java",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Android Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "android-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages",
"id": 1,
"rationale": "Core server-side languages used to implement backend business logic, integrations, and service internals. This is the primary coding surface for the role across application layers.",
"slug": "programming-languages",
"source": "db"
},
"input_skill": "Java",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "Java",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "Scala",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 39,
"rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"input_skill": "Scala",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms",
"id": 20,
"rationale": "Proficiency in major cloud service provider platforms and their core services.",
"slug": "cloud-platforms",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms for AI Deployment",
"id": 211,
"rationale": "Major cloud services that provide infrastructure and managed services for AI workloads.",
"slug": "cloud-platforms-for-ai-deployment",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Provider Platforms",
"id": 131,
"rationale": "Major cloud platforms and their core service ecosystems used to design target-state architectures, choose deployment boundaries, and evaluate managed capabilities. This is the primary substrate for cloud architecture decisions.",
"slug": "cloud-provider-platforms",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Posture Tools",
"id": 64,
"rationale": "Cloud-native security platforms used to assess misconfiguration, workload exposure, and cloud control coverage. This dimension includes the major CNAPP/CSPM/CWPP vendors and cloud security services the role reviews and tunes.",
"slug": "cloud-security-posture-tools",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms",
"id": 20,
"rationale": "Proficiency in major cloud service provider platforms and their core services.",
"slug": "cloud-platforms",
"source": "db"
},
"input_skill": "GCP",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms for AI Deployment",
"id": 211,
"rationale": "Major cloud services that provide infrastructure and managed services for AI workloads.",
"slug": "cloud-platforms-for-ai-deployment",
"source": "db"
},
"input_skill": "GCP",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Posture Tools",
"id": 64,
"rationale": "Cloud-native security platforms used to assess misconfiguration, workload exposure, and cloud control coverage. This dimension includes the major CNAPP/CSPM/CWPP vendors and cloud security services the role reviews and tunes.",
"slug": "cloud-security-posture-tools",
"source": "db"
},
"input_skill": "GCP",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms",
"id": 20,
"rationale": "Proficiency in major cloud service provider platforms and their core services.",
"slug": "cloud-platforms",
"source": "db"
},
"input_skill": "Azure",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms for AI Deployment",
"id": 211,
"rationale": "Major cloud services that provide infrastructure and managed services for AI workloads.",
"slug": "cloud-platforms-for-ai-deployment",
"source": "db"
},
"input_skill": "Azure",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Provider Platforms",
"id": 131,
"rationale": "Major cloud platforms and their core service ecosystems used to design target-state architectures, choose deployment boundaries, and evaluate managed capabilities. This is the primary substrate for cloud architecture decisions.",
"slug": "cloud-provider-platforms",
"source": "db"
},
"input_skill": "Azure",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Posture Tools",
"id": 64,
"rationale": "Cloud-native security platforms used to assess misconfiguration, workload exposure, and cloud control coverage. This dimension includes the major CNAPP/CSPM/CWPP vendors and cloud security services the role reviews and tunes.",
"slug": "cloud-security-posture-tools",
"source": "db"
},
"input_skill": "Azure",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"input_skill": "NoSQL",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Apache Spark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Hadoop",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "HBase",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Aerospike",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "Cassandra",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "RDBMS",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Machine Learning",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "AI Governance and Model Security",
"id": 50,
"rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
"slug": "ai-governance-and-model-security",
"source": "db"
},
"input_skill": "Machine Learning",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "AI Governance and Model Security",
"id": 50,
"rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
"slug": "ai-governance-and-model-security",
"source": "db"
},
"input_skill": "Machine Learning",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Artificial Intelligence",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "Data Lakes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Data Lakes",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "Data Lakes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Lakehouse",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Event-Driven Architecture",
"llm_role": null,
"roles_from_db": []
}
],
"input_final_skills": [
"Apache Spark",
"Kafka",
"Hadoop",
"HBase",
"Aerospike",
"Cassandra",
"Java",
"Scala",
"AWS",
"GCP",
"Azure",
"RDBMS",
"NoSQL",
"Machine Learning",
"Artificial Intelligence",
"Data Lakes",
"Lakehouse",
"Event-Driven Architecture"
],
"input_llm_skills": [
"Apache Spark",
"Kafka",
"Hadoop",
"HBase",
"Aerospike",
"Cassandra",
"Java",
"Scala",
"AWS",
"GCP",
"Azure",
"RDBMS",
"NoSQL",
"Machine Learning",
"Artificial Intelligence",
"Data Lakes",
"Lakehouse",
"Event-Driven Architecture"
],
"new_aliases_persisted": 0,
"run_id": "1f106d71-338e-40ee-a69a-09957abcd98f",
"skills_detail": [
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Apache Spark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Apache Spark",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Framework",
"skill_nature": "FRAMEWORK",
"sub_category": "distributed_data_processing_framework",
"typical_lifespan": "EVERGREEN",
"version_strategy": "SEPARATE_ENTITY",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "\u201cApache Spark\u201d is a specific, widely recognized distributed data processing framework; typical JDs won\u2019t confuse it with other distinct ETL/streaming tools."
},
"context_keywords": {
"context_keywords": [
"RDD",
"DataFrame",
"Spark SQL",
"MLlib",
"Spark Streaming",
"DAGScheduler",
"Cluster Manager",
"Apache Kafka",
"Hadoop",
"PySpark",
"Scala",
"SparkSession",
"ETL",
"Data Lake",
"Machine Learning"
]
},
"maturity": {
"confidence": 0.95,
"maturity": "well_known",
"reasoning": "Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it."
},
"skill_id": "apache-spark",
"vendor_license": {
"confidence": 0.95,
"license": "apache_2",
"vendor": "Apache Software Foundation",
"year_introduced": 2010
},
"versioning": {
"current_version": "3.x",
"version_aliases": {
"apache spark 3": "3.x",
"spark": "3.x",
"spark 3": "3.x",
"spark 3.x": "3.x",
"spark3": "3.x"
},
"versioned": true
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Frameworks used to process large datasets in batch or streaming pipelines, often as part of ETL/ELT workflows. Apache Spark belongs here because it is a core engine for distributed transformation, aggregation, and data movement.",
"exemplar_skills": [
"Apache Spark",
"Spark SQL",
"Spark Structured Streaming",
"Spark DataFrame API",
"Spark Dataset API",
"distributed batch processing",
"distributed joins",
"windowed aggregations"
],
"in_scope": "Apache Spark, Spark SQL, Spark Structured Streaming, DataFrame and Dataset APIs, batch ETL jobs, distributed joins and aggregations, window functions, partitioning strategies, cluster execution on YARN/Kubernetes/Databricks",
"name": "Distributed Data Processing Frameworks",
"out_of_scope": "BI dashboards and semantic layers, storage systems like data lakes and warehouses, low-level programming language syntax, workflow orchestration platforms, model training frameworks",
"overlap_flags": [
{
"reason": "Spark commonly reads from and writes to cloud object stores and managed data services, but the processing engine itself is the primary fit here.",
"with_dim_id": "cloud-storage-and-data-services",
"with_dim_name": null,
"with_role": "Cloud Architect"
},
{
"reason": "Spark jobs are written in languages like Scala, Python, and Java, but language fluency is a separate dimension from the Spark framework itself.",
"with_dim_id": "programming-languages-and-scripting",
"with_dim_name": null,
"with_role": "Cybersecurity Engineer"
}
],
"tentative_id": "etl-and-elt-tooling"
}
],
"merge_log": [],
"placed": {
"name": "Apache Spark",
"placement_confidence": 0.92,
"primary_dimension": "etl-and-elt-tooling",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "apache-spark"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"spark"
],
"related_to": [
"flink",
"databricks",
"kubernetes",
"aws",
"azure"
],
"requires": [],
"skill_id": "apache-spark",
"suppress_on_match": []
},
"skill_id": "apache-spark",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.94,
"name": "Apache Spark",
"reasoning": "Apache Spark is a structured codebase that users build data applications and pipelines inside, so by the Tool vs Framework rule it is a Framework rather than a Tool.",
"skill_id": "apache-spark",
"subtype": "distributed_data_processing_framework",
"type": "Framework"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Kafka",
"alias_type": "CANONICAL",
"id": 173,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 9,
"display_name": "Kafka",
"id": 36,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "kafka",
"sub_category_id": 47,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Kafka",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Hadoop",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Hadoop",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Framework",
"skill_nature": "FRAMEWORK",
"sub_category": "data_processing_framework",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "\u201cHadoop\u201d is a specific data processing framework; typical JDs distinguish it from other big data tools like Spark or Hive."
},
"context_keywords": {
"context_keywords": [
"MapReduce",
"HDFS",
"YARN",
"Hive",
"Pig",
"Spark",
"Sqoop",
"Flume",
"Oozie",
"Kafka",
"NoSQL",
"Big Data",
"Data Lake",
"ETL",
"ELT",
"Distributed Computing"
]
},
"maturity": {
"confidence": 0.91,
"maturity": "niche",
"reasoning": "Job postings still mention Hadoop for legacy big-data stacks, but JD volume has fallen as Spark and cloud warehouses replaced MapReduce-era clusters."
},
"skill_id": "hadoop",
"vendor_license": {
"confidence": 0.95,
"license": "apache_2",
"vendor": "Apache Software Foundation",
"year_introduced": 2006
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Tools and frameworks used to ingest, store, and process large-scale batch data across clusters. Hadoop belongs here because it is a foundational platform for distributed storage and computation in data engineering workflows.",
"exemplar_skills": [
"Hadoop",
"HDFS",
"MapReduce",
"YARN",
"Hive",
"Spark on Hadoop"
],
"in_scope": "Hadoop, HDFS, MapReduce, YARN, Hive on Hadoop, Spark on Hadoop clusters, cluster-based batch ingestion, distributed file storage, job scheduling on Hadoop",
"name": "Distributed Data Processing Platforms",
"out_of_scope": "Cloud object storage and managed warehouses, streaming-only systems like Kafka and Flink, BI dashboards and reporting, application backend APIs and services",
"overlap_flags": [
{
"reason": "Hadoop includes distributed storage concepts via HDFS, which can overlap with broader storage platform skills.",
"with_dim_id": "cloud-storage-and-data-services",
"with_dim_name": null,
"with_role": "Cloud Architect"
},
{
"reason": "MapReduce and cluster execution rely on parallel processing concepts, though the dimension is primarily data-platform oriented.",
"with_dim_id": "concurrency-and-parallel-processing",
"with_dim_name": null,
"with_role": "Backend Engineer"
}
],
"tentative_id": "etl-and-elt-tooling"
}
],
"merge_log": [],
"placed": {
"name": "Hadoop",
"placement_confidence": 0.92,
"primary_dimension": "etl-and-elt-tooling",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "hadoop"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"spark",
"databricks",
"jvm",
"nosql",
"kubernetes",
"jenkins",
"git",
"gradle"
],
"requires": [],
"skill_id": "hadoop",
"suppress_on_match": []
},
"skill_id": "hadoop",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.9,
"name": "Hadoop",
"reasoning": "Hadoop is fundamentally a structured software stack that users build distributed data applications and jobs within, so it fits the Framework category rather than a Tool or Platform.",
"skill_id": "hadoop",
"subtype": "data_processing_framework",
"type": "Framework"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "HBase",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
}
],
"input_skill": "HBase",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Datastore",
"skill_nature": "TOOL",
"sub_category": "wide_column_store",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "HBase is a specific Apache wide-column NoSQL datastore; JDs typically distinguish it from other datastores."
},
"context_keywords": {
"context_keywords": [
"Hadoop",
"NoSQL",
"Bigtable",
"MapReduce",
"Apache",
"column family",
"scalability",
"distributed",
"data model",
"real-time",
"table design",
"region server",
"Thrift",
"REST API",
"data replication"
]
},
"maturity": {
"confidence": 0.86,
"maturity": "niche",
"reasoning": "HBase appears in a limited set of big-data/legacy Hadoop job postings, while newer JDs more often specify DynamoDB, Bigtable, or Cassandra; its market demand is specialized rather than broad."
},
"skill_id": "hbase",
"vendor_license": {
"confidence": 0.95,
"license": "apache_2",
"vendor": "Apache Software Foundation",
"year_introduced": 2010
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Storage systems used to persist large-scale application and analytical data with low-latency access and horizontal scaling. HBase fits here as a distributed NoSQL store built on top of Hadoop storage infrastructure.",
"exemplar_skills": [
"HBase",
"HBase data modeling",
"row-key design in HBase",
"column families",
"region server management",
"HBase replication"
],
"in_scope": "HBase, distributed NoSQL tables, column-family storage, row-key design, region splitting and balancing, replication, compactions, scans and gets, Hadoop ecosystem storage services",
"name": "Distributed Data Storage Systems",
"out_of_scope": "Relational databases and SQL schema design, stream processing engines, object storage buckets, in-memory caches, query orchestration and BI tools",
"overlap_flags": [
{
"reason": "HBase is often deployed on cloud infrastructure, but this dimension is about the storage system itself rather than the hosting platform.",
"with_dim_id": "cloud-platforms",
"with_dim_name": null,
"with_role": "Backend Engineer, Cybersecurity Engineer, Data Engineer, DevOps Engineer, ML Engineer"
},
{
"reason": "HBase is frequently used in ingestion pipelines, but pipeline orchestration belongs to ETL/ELT tooling rather than the database dimension.",
"with_dim_id": "etl-and-elt-tooling",
"with_dim_name": null,
"with_role": "Data Engineer"
}
],
"tentative_id": "cloud-storage-and-data-services"
}
],
"merge_log": [],
"placed": {
"name": "HBase",
"placement_confidence": 0.92,
"primary_dimension": "cloud-storage-and-data-services",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "hbase"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"nosql",
"sqlite",
"rds",
"vector-db",
"databricks",
"flink",
"kubernetes",
"jvm"
],
"requires": [],
"skill_id": "hbase",
"suppress_on_match": []
},
"skill_id": "hbase",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.98,
"name": "HBase",
"reasoning": "HBase is fundamentally a persistent data system, so by the Datastore vs Format rule it is a datastore rather than a tool or framework.",
"skill_id": "hbase",
"subtype": "wide_column_store",
"type": "Datastore"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Aerospike",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Aerospike",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Datastore",
"skill_nature": "TOOL",
"sub_category": "distributed_nosql_datastore",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "Aerospike is a specific distributed NoSQL database name; unlikely to be confused with other catalog datastore skills."
},
"context_keywords": {
"context_keywords": [
"NoSQL",
"distributed database",
"high availability",
"scalability",
"data model",
"key-value store",
"latency",
"replication",
"cluster management",
"data partitioning",
"Aerospike client",
"TTL",
"secondary indexes",
"real-time analytics",
"data persistence"
]
},
"maturity": {
"confidence": 0.86,
"maturity": "niche",
"reasoning": "Aerospike appears in a limited set of high-scale datastore JDs and vendor case studies, but it is far less common than PostgreSQL, Redis, or MongoDB in general hiring pipelines."
},
"skill_id": "aerospike",
"vendor_license": {
"confidence": 0.95,
"license": "apache_2",
"vendor": "Aerospike Inc.",
"year_introduced": 2012
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Distributed NoSQL databases used for low-latency key-value access, horizontal scaling, and high availability. Aerospike belongs here because it is a distributed database platform rather than a general storage or cloud service.",
"exemplar_skills": [
"Aerospike",
"NoSQL databases",
"key-value stores",
"distributed database clustering",
"secondary indexing",
"data replication",
"partitioning and sharding"
],
"in_scope": "Aerospike, key-value data modeling, secondary indexes, TTL and record expiration, replication and partitioning, cluster management, strong consistency options, low-latency reads and writes",
"name": "Distributed NoSQL Databases",
"out_of_scope": "Vector search and embedding stores, relational SQL databases, object storage, cache-only systems like Redis when used purely as cache, application-level data serialization",
"overlap_flags": [
{
"reason": "Some teams deploy Aerospike as a managed data service, but the core skill is database operation and modeling rather than cloud storage selection.",
"with_dim_id": "cloud-storage-and-data-services",
"with_dim_name": null,
"with_role": "Cloud Architect"
},
{
"reason": "Aerospike work often involves latency and throughput tuning, but that dimension is broader and not database-specific.",
"with_dim_id": "performance-and-scalability-tuning",
"with_dim_name": null,
"with_role": "Backend Engineer"
}
],
"tentative_id": "d_init_01"
}
],
"merge_log": [],
"placed": {
"name": "Aerospike",
"placement_confidence": 0.92,
"primary_dimension": "d_init_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "aerospike"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"nosql",
"sqlite",
"rds",
"spark",
"kubernetes",
"databricks",
"aws",
"aks"
],
"requires": [],
"skill_id": "aerospike",
"suppress_on_match": []
},
"skill_id": "aerospike",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.98,
"name": "Aerospike",
"reasoning": "Aerospike is fundamentally a system that persists and serves data, so by the Datastore vs Format rule it is a Datastore rather than a tool or platform.",
"skill_id": "aerospike",
"subtype": "distributed_nosql_datastore",
"type": "Datastore"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "Cassandra",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
}
],
"input_skill": "Cassandra",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Datastore",
"skill_nature": "TOOL",
"sub_category": "wide_column_store",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "\u201cCassandra\u201d is a specific wide-column NoSQL database name; unlikely to be confused with other catalog datastore skills."
},
"context_keywords": {
"context_keywords": [
"CQL",
"DataStax",
"TinkerPop",
"Spark",
"ScyllaDB",
"Replication",
"Partitioning",
"Cluster",
"NoSQL",
"Wide Column",
"Consistency",
"Data Modeling",
"DSE",
"Thrift",
"Eventual Consistency"
]
},
"maturity": {
"confidence": 0.84,
"maturity": "well_known",
"reasoning": "Apache Cassandra appears in many production data-platform JDs and is a common choice for high-write, distributed workloads; GitHub and vendor docs show sustained activity rather than sunset signals."
},
"skill_id": "cassandra",
"vendor_license": {
"confidence": 0.95,
"license": "apache_2",
"vendor": "Apache Software Foundation",
"year_introduced": 2008
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Managed and self-hosted data stores used to persist application data with high availability and horizontal scale. Cassandra belongs here because it is a distributed wide-column database chosen for partitioning, replication, and fault-tolerant storage.",
"exemplar_skills": [
"Cassandra",
"Apache Cassandra",
"CQL",
"partition keys",
"clustering keys",
"replication",
"consistency levels",
"compaction",
"tombstones"
],
"in_scope": "Cassandra, Apache Cassandra, wide-column data modeling, partition keys, clustering keys, replication, consistency levels, compaction, tombstones, secondary indexes, CQL, data distribution, sharding, multi-datacenter replication",
"name": "Distributed Data Storage Systems",
"out_of_scope": "Relational schema design and SQL tuning, which belong to database/relational data modeling; vector similarity search, which belongs to vector-databases; cache-only systems like Redis, which are a separate in-memory data store cluster",
"overlap_flags": [
{
"reason": "Cassandra is often selected and tuned for latency, throughput, and scale, so operational performance concerns can overlap.",
"with_dim_id": "performance-and-scalability-tuning",
"with_dim_name": null,
"with_role": "Backend Engineer"
},
{
"reason": "Cassandra is frequently deployed on cloud infrastructure or as a managed service, but the core skill is the database system itself.",
"with_dim_id": "cloud-platforms",
"with_dim_name": null,
"with_role": "Backend Engineer, Cybersecurity Engineer, Data Engineer, DevOps Engineer, ML Engineer"
}
],
"tentative_id": "cloud-storage-and-data-services"
}
],
"merge_log": [],
"placed": {
"name": "Cassandra",
"placement_confidence": 0.92,
"primary_dimension": "cloud-storage-and-data-services",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "cassandra"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"nosql",
"aws"
],
"related_to": [
"rds",
"sqlite",
"relational-databases",
"firebase-firestore",
"databricks",
"chromadb",
"kubernetes",
"ibm-cloud"
],
"requires": [],
"skill_id": "cassandra",
"suppress_on_match": []
},
"skill_id": "cassandra",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.99,
"name": "Cassandra",
"reasoning": "Cassandra is fundamentally a distributed database that persists data, so by the Datastore vs Format rule it is a Datastore.",
"skill_id": "cassandra",
"subtype": "wide_column_store",
"type": "Datastore"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Java",
"alias_type": "CANONICAL",
"id": 1,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "JDK 11",
"alias_type": "VERSION",
"id": 4,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "JDK 17",
"alias_type": "VERSION",
"id": 5,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "JDK 21",
"alias_type": "VERSION",
"id": 6,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "JDK 8",
"alias_type": "VERSION",
"id": 3,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 1.0",
"alias_type": "VERSION",
"id": 11,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 1.1",
"alias_type": "VERSION",
"id": 12,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 1.2",
"alias_type": "VERSION",
"id": 13,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 1.3",
"alias_type": "VERSION",
"id": 14,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 1.4",
"alias_type": "VERSION",
"id": 15,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 1.5",
"alias_type": "VERSION",
"id": 16,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 1.6",
"alias_type": "VERSION",
"id": 17,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 1.7",
"alias_type": "VERSION",
"id": 18,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 1.8",
"alias_type": "VERSION",
"id": 19,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 11",
"alias_type": "VERSION",
"id": 8,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 17",
"alias_type": "VERSION",
"id": 9,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 21",
"alias_type": "VERSION",
"id": 10,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 5",
"alias_type": "VERSION",
"id": 288,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 6",
"alias_type": "VERSION",
"id": 289,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 7",
"alias_type": "VERSION",
"id": 290,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Java 8",
"alias_type": "VERSION",
"id": 7,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "OpenJDK 11",
"alias_type": "VERSION",
"id": 21,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "OpenJDK 17",
"alias_type": "VERSION",
"id": 22,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "OpenJDK 21",
"alias_type": "VERSION",
"id": 23,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "OpenJDK 8",
"alias_type": "VERSION",
"id": 20,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java 11",
"alias_type": "VERSION",
"id": 1512,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java 17",
"alias_type": "VERSION",
"id": 1513,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java 21",
"alias_type": "VERSION",
"id": 1514,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java 4",
"alias_type": "VERSION",
"id": 1496,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java 5",
"alias_type": "VERSION",
"id": 1497,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java 6",
"alias_type": "VERSION",
"id": 1498,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java 7",
"alias_type": "VERSION",
"id": 1499,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java 8",
"alias_type": "VERSION",
"id": 1500,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java-11",
"alias_type": "VERSION",
"id": 1515,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java-17",
"alias_type": "VERSION",
"id": 1516,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java-21",
"alias_type": "VERSION",
"id": 1517,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java-4",
"alias_type": "VERSION",
"id": 1501,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java-5",
"alias_type": "VERSION",
"id": 1502,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java-6",
"alias_type": "VERSION",
"id": 1503,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java-7",
"alias_type": "VERSION",
"id": 1504,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java-8",
"alias_type": "VERSION",
"id": 1505,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java11",
"alias_type": "VERSION",
"id": 1506,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java17",
"alias_type": "VERSION",
"id": 1507,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java21",
"alias_type": "VERSION",
"id": 1508,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java4",
"alias_type": "VERSION",
"id": 1482,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java5",
"alias_type": "VERSION",
"id": 1483,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java6",
"alias_type": "VERSION",
"id": 1484,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java7",
"alias_type": "VERSION",
"id": 1485,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "java8",
"alias_type": "VERSION",
"id": 1486,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk 11",
"alias_type": "VERSION",
"id": 1509,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk 17",
"alias_type": "VERSION",
"id": 1510,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk 21",
"alias_type": "VERSION",
"id": 1511,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk 4",
"alias_type": "VERSION",
"id": 1487,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk 5",
"alias_type": "VERSION",
"id": 1488,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk 6",
"alias_type": "VERSION",
"id": 1489,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk 7",
"alias_type": "VERSION",
"id": 1490,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk 8",
"alias_type": "VERSION",
"id": 1491,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk11",
"alias_type": "VERSION",
"id": 1492,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk17",
"alias_type": "VERSION",
"id": 1493,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk21",
"alias_type": "VERSION",
"id": 1494,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk4",
"alias_type": "VERSION",
"id": 1477,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk5",
"alias_type": "VERSION",
"id": 1478,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk6",
"alias_type": "VERSION",
"id": 1479,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk7",
"alias_type": "VERSION",
"id": 1480,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jdk8",
"alias_type": "VERSION",
"id": 1481,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "jvm21",
"alias_type": "VERSION",
"id": 1495,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 6,
"display_name": "Java",
"id": 1,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "java",
"sub_category_id": 96,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Kotlin and Java",
"id": 161,
"rationale": "Primary implementation languages for Android app features, platform integration, and client-side business logic. Android engineers use these languages to build screens, state flows, service adapters, and device-aware behavior.",
"slug": "kotlin-and-java",
"source": "db"
},
"input_skill": "Java",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Android Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "android-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages",
"id": 1,
"rationale": "Core server-side languages used to implement backend business logic, integrations, and service internals. This is the primary coding surface for the role across application layers.",
"slug": "programming-languages",
"source": "db"
},
"input_skill": "Java",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "Java",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Java",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Scala",
"alias_type": "CANONICAL",
"id": 272,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 6,
"display_name": "Scala",
"id": 102,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "scala",
"sub_category_id": 96,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "Scala",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 39,
"rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"input_skill": "Scala",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
}
],
"input_skill": "Scala",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "AWS",
"alias_type": "CANONICAL",
"id": 406,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 9,
"display_name": "AWS",
"id": 187,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "aws",
"sub_category_id": 46,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms",
"id": 20,
"rationale": "Proficiency in major cloud service provider platforms and their core services.",
"slug": "cloud-platforms",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms for AI Deployment",
"id": 211,
"rationale": "Major cloud services that provide infrastructure and managed services for AI workloads.",
"slug": "cloud-platforms-for-ai-deployment",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Provider Platforms",
"id": 131,
"rationale": "Major cloud platforms and their core service ecosystems used to design target-state architectures, choose deployment boundaries, and evaluate managed capabilities. This is the primary substrate for cloud architecture decisions.",
"slug": "cloud-provider-platforms",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Posture Tools",
"id": 64,
"rationale": "Cloud-native security platforms used to assess misconfiguration, workload exposure, and cloud control coverage. This dimension includes the major CNAPP/CSPM/CWPP vendors and cloud security services the role reviews and tunes.",
"slug": "cloud-security-posture-tools",
"source": "db"
},
"input_skill": "AWS",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
}
],
"input_skill": "AWS",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "GCP",
"alias_type": "CANONICAL",
"id": 405,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 9,
"display_name": "GCP",
"id": 186,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "gcp",
"sub_category_id": 46,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms",
"id": 20,
"rationale": "Proficiency in major cloud service provider platforms and their core services.",
"slug": "cloud-platforms",
"source": "db"
},
"input_skill": "GCP",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms for AI Deployment",
"id": 211,
"rationale": "Major cloud services that provide infrastructure and managed services for AI workloads.",
"slug": "cloud-platforms-for-ai-deployment",
"source": "db"
},
"input_skill": "GCP",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Posture Tools",
"id": 64,
"rationale": "Cloud-native security platforms used to assess misconfiguration, workload exposure, and cloud control coverage. This dimension includes the major CNAPP/CSPM/CWPP vendors and cloud security services the role reviews and tunes.",
"slug": "cloud-security-posture-tools",
"source": "db"
},
"input_skill": "GCP",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
}
],
"input_skill": "GCP",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Azure",
"alias_type": "CANONICAL",
"id": 407,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 9,
"display_name": "Azure",
"id": 188,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "azure",
"sub_category_id": 46,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms",
"id": 20,
"rationale": "Proficiency in major cloud service provider platforms and their core services.",
"slug": "cloud-platforms",
"source": "db"
},
"input_skill": "Azure",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms for AI Deployment",
"id": 211,
"rationale": "Major cloud services that provide infrastructure and managed services for AI workloads.",
"slug": "cloud-platforms-for-ai-deployment",
"source": "db"
},
"input_skill": "Azure",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Provider Platforms",
"id": 131,
"rationale": "Major cloud platforms and their core service ecosystems used to design target-state architectures, choose deployment boundaries, and evaluate managed capabilities. This is the primary substrate for cloud architecture decisions.",
"slug": "cloud-provider-platforms",
"source": "db"
},
"input_skill": "Azure",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Posture Tools",
"id": 64,
"rationale": "Cloud-native security platforms used to assess misconfiguration, workload exposure, and cloud control coverage. This dimension includes the major CNAPP/CSPM/CWPP vendors and cloud security services the role reviews and tunes.",
"slug": "cloud-security-posture-tools",
"source": "db"
},
"input_skill": "Azure",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
}
],
"input_skill": "Azure",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "RDBMS",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "RDBMS",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Datastore",
"skill_nature": "TOOL",
"sub_category": "relational_database_management_system",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "RDBMS is a standard, specific datastore category (relational DBMS) with little overlap with other distinct skills in typical JDs."
},
"context_keywords": {
"context_keywords": [
"SQL",
"ACID",
"normalization",
"indexes",
"transactions",
"joins",
"stored procedures",
"views",
"foreign keys",
"data integrity",
"schema design",
"ER diagrams",
"database tuning",
"backup and recovery",
"query optimization",
"data modeling"
]
},
"maturity": {
"confidence": 0.98,
"maturity": "well_known",
"reasoning": "RDBMS is a core requirement in many job descriptions across backend, data, and DBA roles; PostgreSQL, MySQL, and SQL Server remain standard enterprise stacks."
},
"skill_id": "rdbms",
"vendor_license": {
"confidence": 0.9,
"license": null,
"vendor": null,
"year_introduced": null
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Relational database management systems used to store, query, and maintain structured data with tables, keys, constraints, and SQL. RDBMS fits here because it names the core database engine category rather than a specific vendor or data workflow tool.",
"exemplar_skills": [
"RDBMS",
"SQL",
"PostgreSQL",
"MySQL",
"Oracle Database",
"Microsoft SQL Server",
"schema design",
"indexing",
"transactions"
],
"in_scope": "RDBMS, relational database management systems, SQL query execution, tables and schemas, primary and foreign keys, indexes, transactions, ACID properties, normalization, joins, stored procedures",
"name": "Relational Database Systems",
"out_of_scope": "NoSQL document or key-value stores, data warehouse modeling, ETL orchestration, vector databases, cloud storage services, application ORM usage",
"overlap_flags": [
{
"reason": "Managed database services may be discussed alongside storage platforms, but this skill is specifically about relational database engines and SQL data modeling.",
"with_dim_id": "cloud-storage-and-data-services",
"with_dim_name": null,
"with_role": "Cloud Architect"
},
{
"reason": "Database tuning overlaps when discussing query plans and indexing, but that dimension is broader and centered on system performance rather than database technology itself.",
"with_dim_id": "performance-and-scalability-tuning",
"with_dim_name": null,
"with_role": "Backend Engineer"
}
],
"tentative_id": "d_init_01"
}
],
"merge_log": [],
"placed": {
"name": "RDBMS",
"placement_confidence": 0.92,
"primary_dimension": "d_init_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "rdbms"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"relational-databases",
"rds",
"sqlite",
"nosql",
"vector-db",
"rag",
"rollback-procedures",
"data-structures"
],
"requires": [],
"skill_id": "rdbms",
"suppress_on_match": []
},
"skill_id": "rdbms",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.98,
"name": "RDBMS",
"reasoning": "RDBMS is fundamentally a system that persists and manages data, so under the Datastore vs Format rule it is a Datastore rather than a tool or concept.",
"skill_id": "rdbms",
"subtype": "relational_database_management_system",
"type": "Datastore"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "NoSQL",
"alias_type": "CANONICAL",
"id": 1989,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 2,
"display_name": "NoSQL",
"id": 1346,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CONCEPT",
"slug": "nosql",
"sub_category_id": 1019,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"input_skill": "NoSQL",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
]
}
],
"input_skill": "NoSQL",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Machine Learning",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "AI Governance and Model Security",
"id": 50,
"rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
"slug": "ai-governance-and-model-security",
"source": "db"
},
"input_skill": "Machine Learning",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "AI Governance and Model Security",
"id": 50,
"rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
"slug": "ai-governance-and-model-security",
"source": "db"
},
"input_skill": "Machine Learning",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
]
}
],
"input_skill": "Machine Learning",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Concept",
"skill_nature": "CONCEPT",
"sub_category": "machine_learning",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "\u201cMachine Learning\u201d is a standard, specific concept and is unlikely to be confused with other distinct catalog skills in typical job descriptions."
},
"context_keywords": {
"context_keywords": [
"TensorFlow",
"scikit-learn",
"Keras",
"PyTorch",
"neural networks",
"supervised learning",
"unsupervised learning",
"reinforcement learning",
"feature engineering",
"model evaluation",
"hyperparameter tuning",
"data preprocessing",
"cross-validation",
"ensemble methods",
"natural language processing"
]
},
"maturity": {
"confidence": 0.97,
"maturity": "well_known",
"reasoning": "Machine Learning appears in large volumes of job descriptions across data, product, and platform roles, and major cloud vendors (AWS, Google Cloud, Azure) offer dedicated ML services and certifications, indicating broad adoption."
},
"skill_id": "machine-learning",
"vendor_license": {
"confidence": 0.95,
"license": null,
"vendor": null,
"year_introduced": null
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [
{
"a_dim_id": "ai-governance-and-model-security",
"a_name": "AI Governance and Model Security",
"a_role": "__skill_focal__",
"b_dim_id": "ai-training-and-deployment-controls",
"b_name": "AI Training and Deployment Controls",
"b_role": "AI Compliance Officer",
"pair_kind": "cross_role",
"reasoning": "Dim A covers model governance/security work for ML practitioners: training data provenance, model approvals, safety controls, auditability, policy compliance, and model supply chain integrity. Dim B is about an AI Compliance Officer reviewing lifecycle checkpoints around training, release, and deployment to ensure approvals and safeguards exist. A senior practitioner in A would not naturally be a senior practitioner in B: career-track: no, because governance/security engineering and compliance-officer checkpoint review are adjacent but distinct.",
"similarity": 0.6770906625953983
}
],
"locked_dimensions": [
{
"description": "Core concepts, methods, and workflows for building predictive models from data. This fits the target skill because machine learning is the umbrella discipline covering model selection, training, validation, and deployment-oriented thinking.",
"exemplar_skills": [
"Machine Learning",
"supervised learning",
"unsupervised learning",
"feature engineering",
"cross-validation",
"hyperparameter tuning"
],
"in_scope": "Machine Learning, supervised learning, unsupervised learning, feature engineering, model training, classification, regression, clustering, cross-validation, bias-variance tradeoff, hyperparameter tuning",
"name": "Machine Learning Fundamentals",
"out_of_scope": "Deep learning architecture design, transformer fine-tuning, and neural network implementation, which belong to specialized model architecture dimensions; experiment logging and run comparison, which belong to experiment tracking and evaluation",
"overlap_flags": [
{
"reason": "ML work often uses experiment tracking, but that dimension covers the tooling and evaluation workflow rather than the core modeling concepts.",
"with_dim_id": "experiment-tracking-and-evaluation",
"with_dim_name": null,
"with_role": "ML Engineer"
},
{
"reason": "Some ML roles optimize models, but that dimension is specifically about latency, throughput, and efficiency tuning.",
"with_dim_id": "model-optimization-and-acceleration",
"with_dim_name": null,
"with_role": "ML Engineer"
}
],
"tentative_id": "d_init_01"
},
{
"description": "Controls and documentation used to make models safer, auditable, and compliant. Machine learning practitioners may need this when training or deploying models in regulated or risk-sensitive environments.",
"exemplar_skills": [
"Machine Learning",
"model risk review",
"training data provenance",
"model approvals",
"safety controls",
"auditability"
],
"in_scope": "Machine Learning, model risk review, training data provenance, model approvals, safety controls, auditability, policy compliance, model supply chain integrity",
"name": "AI Governance and Model Security",
"out_of_scope": "General model building, feature engineering, and algorithm selection, which belong to core machine learning practice; release pipeline mechanics, which belong to deployment and CI/CD dimensions",
"overlap_flags": [
{
"reason": "Both can touch model release stages, but this dimension is about governance and safeguards while that one focuses on training/deployment gates.",
"with_dim_id": "ai-training-and-deployment-controls",
"with_dim_name": null,
"with_role": "AI Compliance Officer"
}
],
"tentative_id": "ai-governance-and-model-security"
},
{
"description": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
"exemplar_skills": [
"AI Governance and Model Security"
],
"in_scope": "Skills, tools, and practices that belong under AI Governance and Model Security for the target role, including items implied by the dimension rationale.",
"name": "AI Governance and Model Security",
"out_of_scope": "Adjacent clusters explicitly not owned by AI Governance and Model Security, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "ai-governance-and-model-security"
}
],
"merge_log": [],
"placed": {
"name": "Machine Learning",
"placement_confidence": 0.92,
"primary_dimension": "d_init_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 3 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [
"ai-governance-and-model-security"
],
"skill_id": "machine-learning"
},
"relationships": {
"child_skills": [],
"parent_skills": [
"ai"
],
"related_to": [
"mlops",
"intelligent-automation",
"embeddings",
"chatbots",
"pytorch",
"openai"
],
"requires": [
"algorithms",
"data-structures"
],
"skill_id": "machine-learning",
"suppress_on_match": []
},
"skill_id": "machine-learning",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.98,
"name": "Machine Learning",
"reasoning": "Machine Learning is a named knowledge unit about building models that learn from data, so by the Concept vs Methodology rule it is a Concept rather than an Architecture or Methodology.",
"skill_id": "machine-learning",
"subtype": "machine_learning",
"type": "Concept"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:42-\u003e3"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Artificial Intelligence",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Artificial Intelligence",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Concept",
"skill_nature": "CONCEPT",
"sub_category": "artificial_intelligence",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "\u201cArtificial Intelligence\u201d is a broad, standard concept and is unlikely to be confused with a different catalog skill in typical job descriptions."
},
"context_keywords": {
"context_keywords": [
"machine learning",
"neural networks",
"deep learning",
"natural language processing",
"computer vision",
"reinforcement learning",
"TensorFlow",
"PyTorch",
"data mining",
"predictive analytics",
"algorithm optimization",
"AI ethics",
"supervised learning",
"unsupervised learning",
"model training"
]
},
"maturity": {
"confidence": 0.96,
"maturity": "well_known",
"reasoning": "AI appears in a large and growing share of job descriptions across software, data, and product roles, and major vendors (Microsoft, Google, AWS) have standardized AI offerings, signaling broad market adoption."
},
"skill_id": "artificial-intelligence",
"vendor_license": {
"confidence": 1.0,
"license": null,
"vendor": null,
"year_introduced": null
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Core concepts, methods, and terminology for building AI systems across symbolic, statistical, and machine-learning approaches. This skill is broad enough to stand as a top-level conceptual dimension when the intent is general AI literacy rather than a specific subdomain.",
"exemplar_skills": [
"Artificial Intelligence",
"machine learning",
"neural networks",
"generative AI",
"reinforcement learning"
],
"in_scope": "Artificial Intelligence, machine learning basics, neural networks, supervised learning, unsupervised learning, reinforcement learning, generative AI concepts, model evaluation fundamentals",
"name": "Artificial Intelligence Concepts",
"out_of_scope": "AI governance and compliance controls, prompt management, vector databases, model deployment operations, these belong to more specialized AI or platform dimensions",
"overlap_flags": [
{
"reason": "AI systems often require governance and security controls, but this dimension is about the core AI concept itself rather than risk management.",
"with_dim_id": "ai-governance-and-model-security",
"with_dim_name": null,
"with_role": "AI Engineer, ML Engineer"
},
{
"reason": "Optimization is a downstream specialization for AI models, not the general AI concept.",
"with_dim_id": "model-optimization-and-acceleration",
"with_dim_name": null,
"with_role": "ML Engineer"
},
{
"reason": "Evaluation is commonly part of AI work, but this dimension focuses on the broader AI domain rather than experiment tooling and measurement.",
"with_dim_id": "experiment-tracking-and-evaluation",
"with_dim_name": null,
"with_role": "ML Engineer"
}
],
"tentative_id": "d_init_01"
}
],
"merge_log": [],
"placed": {
"name": "Artificial Intelligence",
"placement_confidence": 0.92,
"primary_dimension": "d_init_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "artificial-intelligence"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"ai",
"intelligent-automation",
"algorithms",
"chatbots",
"virtual-assistants",
"agentic-workflows",
"apis",
"openai",
"anthropic",
"openai-embeddings"
],
"requires": [],
"skill_id": "artificial-intelligence",
"suppress_on_match": []
},
"skill_id": "artificial-intelligence",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.98,
"name": "Artificial Intelligence",
"reasoning": "Artificial Intelligence is a named knowledge unit about a field of techniques and theory, so by the Concept vs Methodology rule it is a Concept rather than a tool, platform, or methodology.",
"skill_id": "artificial-intelligence",
"subtype": "artificial_intelligence",
"type": "Concept"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "Data Lakes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Data Lakes",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "Data Lakes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
}
],
"input_skill": "Data Lakes",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Architecture",
"skill_nature": "PATTERN",
"sub_category": "data_lake_architecture",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "\u201cData Lakes\u201d is a specific architecture pattern (data lake storage/processing) and is unlikely to be confused with other distinct catalog skills."
},
"context_keywords": {
"context_keywords": [
"AWS Lake Formation",
"Azure Data Lake",
"data ingestion",
"ETL",
"data governance",
"schema evolution",
"data catalog",
"big data",
"data warehousing",
"real-time analytics",
"data pipelines",
"data modeling",
"partitioning",
"data lakes vs data warehouses",
"serverless architecture"
]
},
"maturity": {
"confidence": 0.93,
"maturity": "well_known",
"reasoning": "Data lakes are widely listed in cloud/data platform job descriptions and are a standard architecture in AWS, Azure, and GCP ecosystems; they\u2019re a common hiring-pipeline staple rather than a niche pattern."
},
"skill_id": "data-lakes",
"vendor_license": {
"confidence": 0.8,
"license": null,
"vendor": null,
"year_introduced": null
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [
{
"a_dim_id": "cloud-storage-and-data-services",
"a_name": "Cloud Storage and Data Services",
"a_role": "__skill_focal__",
"b_dim_id": "cloud-storage-and-data-services",
"b_name": "Cloud Storage and Data Services",
"b_role": "Cloud Architect",
"pair_kind": "cross_role",
"reasoning": "Dim A is analytical data-lake storage: object storage, bucket/prefix design, lifecycle policies, raw/bronze and curated/silver/gold zones, and lakehouse storage design. Dim B is cloud-architecture storage placement: choosing durability tiers, placing workloads, and evaluating managed service tradeoffs. A senior in A (e.g., S3 data lake architecture, Azure Data Lake Storage) is not automatically a senior in B\u2019s broader platform-boundary work. career-track: no, because these are related but distinct specialties.",
"similarity": 0.853807177782391
}
],
"locked_dimensions": [
{
"description": "Cloud-native storage and managed data services used to store large analytical datasets, define retention, and support lake-style architectures. Data Lakes fit here because they are typically built on object storage and adjacent managed services for durable, scalable data storage.",
"exemplar_skills": [
"Data Lakes",
"object storage",
"lakehouse storage design",
"S3 data lake architecture",
"Azure Data Lake Storage",
"Google Cloud Storage",
"data retention policies"
],
"in_scope": "Data Lakes, object storage, data lake storage layouts, bucket and prefix design, lifecycle policies, retention tiers, cloud-native analytical storage, raw/bronze data zones, curated/silver/gold zones",
"name": "Cloud Storage and Data Services",
"out_of_scope": "Data warehouse modeling, BI dashboards, ETL job orchestration, streaming ingestion pipelines, which belong to analytics modeling, BI, or ETL dimensions",
"overlap_flags": [
{
"reason": "Data lakes often receive data from ETL/ELT pipelines, but this dimension is about the storage layer rather than ingestion/transformation workflows.",
"with_dim_id": "etl-and-elt-tooling",
"with_dim_name": null,
"with_role": "Data Engineer"
},
{
"reason": "Curated lake data may feed BI tools, but dashboarding and semantic reporting are separate concerns.",
"with_dim_id": "bi-and-visualization-tools",
"with_dim_name": null,
"with_role": "Data Engineer"
}
],
"tentative_id": "cloud-storage-and-data-services"
},
{
"description": "Architectural patterns for organizing analytical data across raw, curated, and consumption-ready layers in a lake or lakehouse. This fits Data Lakes when the skill is used to design how data is structured, governed, and accessed for analytics.",
"exemplar_skills": [
"Data Lakes",
"lakehouse architecture",
"medallion architecture",
"schema-on-read",
"partitioning strategy",
"bronze-silver-gold layers",
"analytical data platform design"
],
"in_scope": "Data Lakes, lakehouse architecture, medallion architecture, bronze-silver-gold layering, schema-on-read, partitioning strategy, table formats for lakes, analytical data organization",
"name": "Lakehouse Data Architecture",
"out_of_scope": "Physical cloud storage primitives, ETL connector setup, warehouse SQL modeling, and BI consumption layers, which are owned by storage, ETL, or analytics dimensions",
"overlap_flags": [
{
"reason": "Lakehouse design depends on underlying object storage, but the architectural pattern is broader than storage configuration alone.",
"with_dim_id": "cloud-storage-and-data-services",
"with_dim_name": null,
"with_role": "Cloud Architect"
},
{
"reason": "Lakehouse architectures are commonly populated by ETL/ELT pipelines, though pipeline implementation is not the core of this dimension.",
"with_dim_id": "etl-and-elt-tooling",
"with_dim_name": null,
"with_role": "Data Engineer"
}
],
"tentative_id": "d_init_01"
},
{
"description": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"exemplar_skills": [
"Cloud Storage and Data Services"
],
"in_scope": "Skills, tools, and practices that belong under Cloud Storage and Data Services for the target role, including items implied by the dimension rationale.",
"name": "Cloud Storage and Data Services",
"out_of_scope": "Adjacent clusters explicitly not owned by Cloud Storage and Data Services, including unrelated platforms, roles, and skill families per library policy.",
"overlap_flags": [],
"tentative_id": "cloud-storage-and-data-services"
}
],
"merge_log": [],
"placed": {
"name": "Data Lakes",
"placement_confidence": 0.92,
"primary_dimension": "cloud-storage-and-data-services",
"reasoning": "Deterministic JD placement: locked_dimensions has 3 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [
"d_init_01"
],
"skill_id": "data-lakes"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"nosql",
"relational-databases",
"rds",
"databricks",
"splunk",
"kubernetes",
"mlops",
"devops"
],
"requires": [],
"skill_id": "data-lakes",
"suppress_on_match": []
},
"skill_id": "data-lakes",
"split_log": [],
"typed": {
"alternatives_considered": [
"Concept: ruled out \u2014 although it is a known data-management idea, the term primarily denotes an architectural pattern.",
"Datastore: ruled out \u2014 a data lake is typically an architectural approach built on one or more storage systems, not a single datastore product."
],
"confidence": 0.9,
"name": "Data Lakes",
"reasoning": "By the Architecture vs Concept rule, data lakes describe a system-shape for organizing and storing data rather than a specific knowledge unit or product.",
"skill_id": "data-lakes",
"subtype": "data_lake_architecture",
"type": "Architecture"
},
"warnings": [
"stage3_post_filter_dropped_catalog_only_locked_dims:42-\u003e3"
]
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Lakehouse",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Lakehouse",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Architecture",
"skill_nature": "PATTERN",
"sub_category": "data_platform_architecture",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "EMERGING"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "\u201cLakehouse\u201d is a specific data platform architecture term and is unlikely to be confused with other catalog skills."
},
"context_keywords": {
"context_keywords": [
"Delta Lake",
"Apache Spark",
"data warehouse",
"data lake",
"ETL",
"streaming analytics",
"data governance",
"cloud storage",
"SQL",
"data modeling",
"real-time processing",
"data integration",
"analytics",
"data pipeline",
"metadata management"
]
},
"maturity": {
"confidence": 0.86,
"maturity": "emerging",
"reasoning": "Lakehouse is increasingly listed in data-platform JDs and vendor docs (Databricks, Snowflake, Microsoft Fabric), but it is not yet as universal as core warehouse or lake skills."
},
"skill_id": "lakehouse",
"vendor_license": {
"confidence": 0.8,
"license": null,
"vendor": null,
"year_introduced": null
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Unified data platform patterns that combine data lake storage with warehouse-style management, governance, and analytics. Lakehouse belongs here because it refers to the architectural approach and platform capabilities used to store, process, and serve analytical data.",
"exemplar_skills": [
"Lakehouse",
"Delta Lake",
"Apache Iceberg",
"Apache Hudi",
"medallion architecture",
"ACID table formats",
"schema evolution",
"time travel queries"
],
"in_scope": "Lakehouse, Delta Lake, Apache Iceberg, Apache Hudi, table formats, ACID tables, schema evolution, time travel, medallion architecture, unified batch and streaming analytics",
"name": "Lakehouse Architecture",
"out_of_scope": "Traditional data warehouse modeling and BI semantic layers, covered by BI and Visualization Tools; generic cloud storage primitives without table management, covered by Cloud Storage and Data Services; ETL job orchestration, covered by ETL and ELT Tooling",
"overlap_flags": [
{
"reason": "Lakehouse platforms rely on object storage, but this dimension is about the table and governance layer rather than raw storage services.",
"with_dim_id": "cloud-storage-and-data-services",
"with_dim_name": null,
"with_role": "Cloud Architect"
},
{
"reason": "Lakehouse implementations often use ETL/ELT pipelines, but pipeline tooling is a separate concern from the lakehouse architecture itself.",
"with_dim_id": "etl-and-elt-tooling",
"with_dim_name": null,
"with_role": "Data Engineer"
},
{
"reason": "Lakehouse data is frequently consumed by BI tools, but BI/semantic consumption is downstream of the storage and processing architecture.",
"with_dim_id": "bi-and-visualization-tools",
"with_dim_name": null,
"with_role": "Data Engineer"
}
],
"tentative_id": "d_init_01"
}
],
"merge_log": [],
"placed": {
"name": "Lakehouse",
"placement_confidence": 0.92,
"primary_dimension": "d_init_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "lakehouse"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"databricks",
"spark",
"flink",
"rds",
"sqlite",
"room",
"jenkins",
"gradle"
],
"requires": [],
"skill_id": "lakehouse",
"suppress_on_match": []
},
"skill_id": "lakehouse",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.9,
"name": "Lakehouse",
"reasoning": "Lakehouse is fundamentally a system-shape pattern that combines data lake and warehouse characteristics, so by the Architecture vs Concept rule it fits Architecture rather than a tool or datastore.",
"skill_id": "lakehouse",
"subtype": "data_platform_architecture",
"type": "Architecture"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Event-Driven Architecture",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Event-Driven Architecture",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Architecture",
"skill_nature": "PATTERN",
"sub_category": "event_driven_architecture",
"typical_lifespan": "EVERGREEN",
"version_strategy": "NOT_APPLICABLE",
"volatility": "STABLE"
},
"enrichment": {
"ambiguity": {
"ambiguity_flag": false,
"confused_with": [],
"reasoning": "Event-Driven Architecture is a specific architecture pattern; typical JDs won\u2019t confuse it with other distinct architecture skills."
},
"context_keywords": {
"context_keywords": [
"microservices",
"Kafka",
"RabbitMQ",
"event sourcing",
"CQRS",
"asynchronous messaging",
"publish-subscribe",
"stream processing",
"event bus",
"serverless",
"event-driven programming",
"message broker",
"real-time data",
"data pipeline",
"event schema"
]
},
"maturity": {
"confidence": 0.92,
"maturity": "well_known",
"reasoning": "Common in cloud-native JDs and vendor docs; AWS, Azure, and Confluent all market event-driven patterns with Kafka/PubSub, showing broad hiring demand."
},
"skill_id": "event-driven-architecture",
"vendor_license": {
"confidence": 0.9,
"license": null,
"vendor": null,
"year_introduced": null
},
"versioning": {
"current_version": null,
"version_aliases": {},
"versioned": false
}
},
"keep_log": [],
"locked_dimensions": [
{
"description": "Architectural patterns for building systems around events, asynchronous messaging, and decoupled producers and consumers. This fits the target skill because it covers how services publish, route, process, and react to domain and integration events.",
"exemplar_skills": [
"Event-Driven Architecture",
"Event Sourcing",
"Pub/Sub Design",
"Kafka",
"RabbitMQ",
"Asynchronous Messaging",
"Domain Events",
"Idempotent Consumer Design"
],
"in_scope": "Event-Driven Architecture, event sourcing, pub/sub, message-driven workflows, domain events, integration events, asynchronous processing, event contracts, idempotent consumers, eventual consistency, Kafka, RabbitMQ, SNS/SQS, Google Pub/Sub",
"name": "Event-Driven Architecture",
"out_of_scope": "Synchronous REST API design, client-side HTTP calls, database schema design, low-level serialization formats, which belong to networking, data modeling, or serialization dimensions",
"overlap_flags": [
{
"reason": "Event payloads often rely on schemas and wire formats, but this dimension is about the architectural pattern rather than serialization mechanics.",
"with_dim_id": "data-serialization-standards-protocols",
"with_dim_name": null,
"with_role": "Data Engineer"
},
{
"reason": "Event-driven systems frequently use async execution and coordination, but this dimension focuses on system architecture and messaging topology.",
"with_dim_id": "concurrency-and-parallel-processing",
"with_dim_name": null,
"with_role": "Backend Engineer"
},
{
"reason": "EDA is often chosen for scale and throughput, but performance tuning is a separate concern from the architectural style itself.",
"with_dim_id": "performance-and-scalability-tuning",
"with_dim_name": null,
"with_role": "Backend Engineer"
}
],
"tentative_id": "d_init_01"
}
],
"merge_log": [],
"placed": {
"name": "Event-Driven Architecture",
"placement_confidence": 0.92,
"primary_dimension": "d_init_01",
"reasoning": "Deterministic JD placement: locked_dimensions has 1 dimension(s) from skill-driven dimension generation after reconciliation; primary_dimension is the first locked dim.",
"secondary_dimensions": [],
"skill_id": "event-driven-architecture"
},
"relationships": {
"child_skills": [],
"parent_skills": [],
"related_to": [
"redux",
"repository-pattern",
"mvvm",
"apis",
"ci-cd",
"devops",
"scrum",
"nosql"
],
"requires": [],
"skill_id": "event-driven-architecture",
"suppress_on_match": []
},
"skill_id": "event-driven-architecture",
"split_log": [],
"typed": {
"alternatives_considered": [],
"confidence": 0.99,
"name": "Event-Driven Architecture",
"reasoning": "By the Architecture vs Concept rule, Event-Driven Architecture is a system-shape pattern that influences how systems are built, not just a knowledge unit.",
"skill_id": "event-driven-architecture",
"subtype": "event_driven_architecture",
"type": "Architecture"
},
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
}
],
"unmatched_skills": [
"Apache Spark",
"Hadoop",
"HBase",
"Aerospike",
"Cassandra",
"RDBMS",
"Machine Learning",
"Artificial Intelligence",
"Data Lakes",
"Lakehouse",
"Event-Driven Architecture"
]
}
API 3 — final-role-output
{
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "The primary skills indicate a strong focus on data processing technologies and cloud platforms, aligning well with a Data Engineer role.",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"chosen_role_resolution": "in_db",
"final_input_skills": [
{
"skill": "Apache Spark",
"tag": "new"
},
{
"skill": "Kafka",
"tag": "in_db"
},
{
"skill": "Hadoop",
"tag": "new"
},
{
"skill": "HBase",
"tag": "new"
},
{
"skill": "Aerospike",
"tag": "new"
},
{
"skill": "Cassandra",
"tag": "new"
},
{
"skill": "Java",
"tag": "in_db"
},
{
"skill": "Scala",
"tag": "in_db"
},
{
"skill": "AWS",
"tag": "in_db"
},
{
"skill": "GCP",
"tag": "in_db"
},
{
"skill": "Azure",
"tag": "in_db"
},
{
"skill": "RDBMS",
"tag": "new"
},
{
"skill": "NoSQL",
"tag": "in_db"
},
{
"skill": "Machine Learning",
"tag": "new"
},
{
"skill": "Artificial Intelligence",
"tag": "new"
},
{
"skill": "Data Lakes",
"tag": "new"
},
{
"skill": "Lakehouse",
"tag": "new"
},
{
"skill": "Event-Driven Architecture",
"tag": "new"
}
],
"persistence": {
"items": [
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"dimension_id": 8,
"input_skill": "Kafka",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 36,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Kotlin and Java",
"id": 161,
"rationale": "Primary implementation languages for Android app features, platform integration, and client-side business logic. Android engineers use these languages to build screens, state flows, service adapters, and device-aware behavior.",
"slug": "kotlin-and-java",
"source": "db"
},
"dimension_id": 161,
"input_skill": "Java",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Android Engineer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "android-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages",
"id": 1,
"rationale": "Core server-side languages used to implement backend business logic, integrations, and service internals. This is the primary coding surface for the role across application layers.",
"slug": "programming-languages",
"source": "db"
},
"dimension_id": 1,
"input_skill": "Java",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"dimension_id": 21,
"input_skill": "Java",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"dimension_id": 21,
"input_skill": "Scala",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 102,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 39,
"rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"dimension_id": 39,
"input_skill": "Scala",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 102,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms",
"id": 20,
"rationale": "Proficiency in major cloud service provider platforms and their core services.",
"slug": "cloud-platforms",
"source": "db"
},
"dimension_id": 20,
"input_skill": "AWS",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 187,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms for AI Deployment",
"id": 211,
"rationale": "Major cloud services that provide infrastructure and managed services for AI workloads.",
"slug": "cloud-platforms-for-ai-deployment",
"source": "db"
},
"dimension_id": 211,
"input_skill": "AWS",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 187,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Provider Platforms",
"id": 131,
"rationale": "Major cloud platforms and their core service ecosystems used to design target-state architectures, choose deployment boundaries, and evaluate managed capabilities. This is the primary substrate for cloud architecture decisions.",
"slug": "cloud-provider-platforms",
"source": "db"
},
"dimension_id": 131,
"input_skill": "AWS",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 187,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Posture Tools",
"id": 64,
"rationale": "Cloud-native security platforms used to assess misconfiguration, workload exposure, and cloud control coverage. This dimension includes the major CNAPP/CSPM/CWPP vendors and cloud security services the role reviews and tunes.",
"slug": "cloud-security-posture-tools",
"source": "db"
},
"dimension_id": 64,
"input_skill": "AWS",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 187,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms",
"id": 20,
"rationale": "Proficiency in major cloud service provider platforms and their core services.",
"slug": "cloud-platforms",
"source": "db"
},
"dimension_id": 20,
"input_skill": "GCP",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 186,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms for AI Deployment",
"id": 211,
"rationale": "Major cloud services that provide infrastructure and managed services for AI workloads.",
"slug": "cloud-platforms-for-ai-deployment",
"source": "db"
},
"dimension_id": 211,
"input_skill": "GCP",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 186,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Posture Tools",
"id": 64,
"rationale": "Cloud-native security platforms used to assess misconfiguration, workload exposure, and cloud control coverage. This dimension includes the major CNAPP/CSPM/CWPP vendors and cloud security services the role reviews and tunes.",
"slug": "cloud-security-posture-tools",
"source": "db"
},
"dimension_id": 64,
"input_skill": "GCP",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 186,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms",
"id": 20,
"rationale": "Proficiency in major cloud service provider platforms and their core services.",
"slug": "cloud-platforms",
"source": "db"
},
"dimension_id": 20,
"input_skill": "Azure",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 188,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms for AI Deployment",
"id": 211,
"rationale": "Major cloud services that provide infrastructure and managed services for AI workloads.",
"slug": "cloud-platforms-for-ai-deployment",
"source": "db"
},
"dimension_id": 211,
"input_skill": "Azure",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 188,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Provider Platforms",
"id": 131,
"rationale": "Major cloud platforms and their core service ecosystems used to design target-state architectures, choose deployment boundaries, and evaluate managed capabilities. This is the primary substrate for cloud architecture decisions.",
"slug": "cloud-provider-platforms",
"source": "db"
},
"dimension_id": 131,
"input_skill": "Azure",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 188,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Posture Tools",
"id": 64,
"rationale": "Cloud-native security platforms used to assess misconfiguration, workload exposure, and cloud control coverage. This dimension includes the major CNAPP/CSPM/CWPP vendors and cloud security services the role reviews and tunes.",
"slug": "cloud-security-posture-tools",
"source": "db"
},
"dimension_id": 64,
"input_skill": "Azure",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cybersecurity Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 188,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"dimension_id": 19,
"input_skill": "NoSQL",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Backend Engineer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1346,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"dimension_id": 24,
"input_skill": "Apache Spark",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1350,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"dimension_id": 24,
"input_skill": "Hadoop",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1351,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"dimension_id": 144,
"input_skill": "HBase",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1352,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Aerospike",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1353,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"dimension_id": 144,
"input_skill": "Cassandra",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1354,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "RDBMS",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1355,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Machine Learning",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1356,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "AI Governance and Model Security",
"id": 50,
"rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
"slug": "ai-governance-and-model-security",
"source": "db"
},
"dimension_id": 50,
"input_skill": "Machine Learning",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "AI Engineer",
"id": 13,
"rationale": null,
"role_archetype": null,
"slug": "ai-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1356,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Artificial Intelligence",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1357,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"dimension_id": 144,
"input_skill": "Data Lakes",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1358,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Data Lakes",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1358,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Lakehouse",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1359,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Event-Driven Architecture",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "New skill saved \u00b7 Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1360,
"skill_tag": "in_db",
"skipped_reason": null
}
],
"new_skills_created": 11,
"role_dimension_saved": 0,
"skill_dimension_saved": 13,
"skipped": 0
},
"planner_output": null,
"run_id": "1f106d71-338e-40ee-a69a-09957abcd98f"
}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.