Pipeline run
bae69e34-7cea-4d40-93b7-a7eea576fb78
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA descriptionvocab breakdown (legacy)
Signals
Post-classification
Captured for admin review
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Data Engineer
domain · Data Engineering & Analytics CASE DOMAINslug: data-engineer · id: 2 · source: db
Domain=Data Engineering & Analytics; The JD centers on Python-based data onboarding and collection with Kafka, Spark, Dataflow, Airflow, NoSQL databases, and data cleansing/normalization, which best fits a Data Engineer role.
Matched skills
Matched dimensions
Matched KRAs
Resolution:
in_db
— role exists in library; skill↔dim and role↔dim links saved when applicable.
Job description
Requriment 1 Sr. Engineer – Data Onboarding: (Priority 1 – Multiple positions)6+ years of development experience in development Python.Data streaming / Data processing technologies like Kafka, Spark, Google Dataflow, AirflowExperience with multiple NoSQL databases (BigTable, MongoDB, DynamoDB, HBaseExperience working with data – data cleansing, normalization, translation or similar concepts.Requriment 2 Software Engineer – Data Collection: (Priority 2 – Multiple positions)3+ years of experience in programming Language: PythonWeb Crawling or Web Scraping Tools: Scrapy, Selenium, and Beautiful soup.Recent work experience in web scrapping / crawling.
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Aliases — catalog
- Python (CANONICAL) primary
- Python 2 (VERSION)
- Python 2.x (VERSION)
- Python 3 (VERSION)
- Python 3.10 (VERSION)
- Python 3.11 (VERSION)
- Python 3.12 (VERSION)
- Python 3.x (VERSION)
- py (VERSION)
- py2 (VERSION)
- py3 (VERSION)
- python 3 (VERSION)
- python 3.x (VERSION)
- python2 (VERSION)
- python3 (VERSION)
- python3.x (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Language
- Sub-category
- Programming Language
- Vendor
- PSF
- License
- mit
- Year introduced
- 1991
- Confidence
- 0.99
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 3
Maturity reasoning: Python appears in a very high volume of job descriptions across data, backend, automation, and ML roles, and remains a default hiring-pipeline language on major job boards and tech stacks.
Skill profile (library / DB)
- Skill nature
- LANGUAGE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 6
- Sub-category id
- 96
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Security Scripting & DSL Languages Catalog dimension db id 248
Library dimension (catalog)
Roles linked in library: Cloud Security Engineer
-
Programming Languages Catalog dimension db id 1
Library dimension (catalog)
Roles linked in library: Backend Developer, Fullstack Developer
-
Programming Languages and Scripting Catalog dimension db id 59
Library dimension (catalog)
Roles linked in library: Cyber Security Engineer
-
Programming Languages for Data Work Catalog dimension db id 21
Library dimension (catalog)
Roles linked in library: Data Engineer
-
Programming Languages for ML Systems Catalog dimension db id 39
Library dimension (catalog)
Roles linked in library: ML Engineer, MLOps Engineer
-
Programming Languages for XR Catalog dimension db id 97
Library dimension (catalog)
Roles linked in library: AR/VR Engineer
-
Python Programming Catalog dimension db id 290
Library dimension (catalog)
Roles linked in library: Python Backend Developer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Security Scripting & DSL Languages
cloud-security-scripting-dsl-languages
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages
programming-languages
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages and Scripting
programming-languages-and-scripting
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
|
Programming Languages for ML Systems
programming-languages-for-ml-systems
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages for XR
programming-languages-for-xr
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Python Programming
python-programming
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Kafka (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Datastore
- Sub-category
- Event Stream Store
- Vendor
- Confluent
- License
- apache_2
- Year introduced
- 2011
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Kafka appears in many production JDs for event streaming and data pipelines, and remains a standard platform in cloud/vendor offerings (e.g., Confluent, AWS MSK), indicating broad hiring demand.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 3
- Sub-category id
- 3533
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Asynchronous Messaging and Event Streaming Catalog dimension db id 297
Library dimension (catalog)
Roles linked in library: .NET Backend Developer, Go Backend Developer, Kotlin Backend Developer, Node.js Backend Developer, Scala Backend Developer
-
Messaging and Background Jobs Catalog dimension db id 291
Library dimension (catalog)
Roles linked in library: PHP Backend Developer, Python Backend Developer, Ruby Backend Developer
-
Messaging and Event Streaming Catalog dimension db id 8
Library dimension (catalog)
Roles linked in library: Backend Developer, Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Asynchronous Messaging and Event Streaming
asynchronous-messaging-and-event-streaming
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Messaging and Background Jobs
messaging-and-background-jobs
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Messaging and Event Streaming
messaging-and-event-streaming
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Aliases — catalog
- Apache Spark (CANONICAL)
- apache spark 3 (VERSION)
- spark (VERSION)
- spark 3 (VERSION)
- spark 3.x (VERSION)
- spark3 (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Framework
- Sub-category
- Distributed Data Processing Framework
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2010
- Confidence
- 0.94
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 3.x
Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.
Skill profile (library / DB)
- Skill nature
- FRAMEWORK
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 5
- Sub-category id
- 1021
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
ETL and ELT Tooling Catalog dimension db id 24
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PLATFORM
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Airflow (CANONICAL) primary
- airflow 2 (VERSION)
- airflow-2 (VERSION)
- airflow2 (VERSION)
- airflow2.x (VERSION)
- apache airflow 2 (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Tool
- Sub-category
- Workflow Orchestration Tool
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2014
- Confidence
- 0.95
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 2.x
Maturity reasoning: Apache Airflow appears in many data engineering job postings and is a common orchestration choice in production stacks; its GitHub activity and ecosystem remain strong, with no vendor sunset or clear replacement dominating JDs.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 13
- Sub-category id
- 130
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Workflow Orchestration for ML Pipelines Catalog dimension db id 54
Library dimension (catalog)
Roles linked in library: ML Engineer, MLOps Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Workflow Orchestration for ML Pipelines
workflow-orchestration-for-ml-pipelines
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Databases
- Sub-category
- general
- Skill nature
- TOOL
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- MongoDB (CANONICAL) primary
- MongoDB 2.0 (VERSION)
- MongoDB 2.2 (VERSION)
- MongoDB 2.4 (VERSION)
- MongoDB 2.6 (VERSION)
- MongoDB 3.0 (VERSION)
- MongoDB 3.2 (VERSION)
- MongoDB 3.4 (VERSION)
- MongoDB 3.6 (VERSION)
- MongoDB 4 (VERSION)
- MongoDB 4.0 (VERSION)
- MongoDB 4.2 (VERSION)
- MongoDB 4.4 (VERSION)
- MongoDB 5 (VERSION)
- MongoDB 5.0 (VERSION)
- MongoDB 6 (VERSION)
- MongoDB 6.0 (VERSION)
- MongoDB 7 (VERSION)
- MongoDB 7.0 (VERSION)
- MongoDB 8 (VERSION)
- MongoDB 8.0 (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Datastore
- Sub-category
- Document Database
- Vendor
- MongoDB, Inc.
- License
- other_open
- Year introduced
- 2009
- Confidence
- 0.99
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 8.0
Maturity reasoning: MongoDB appears in many job descriptions across backend/data roles and is a standard document database in modern stacks; strong GitHub/community activity and broad cloud vendor support indicate mainstream adoption.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 3
- Sub-category id
- 27
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
NoSQL Databases Catalog dimension db id 19
Library dimension (catalog)
Roles linked in library: Backend Developer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
NoSQL Databases
nosql-databases
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Amazon DynamoDB (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Service
- Sub-category
- Managed Nosql Database Service
- Vendor
- Amazon Web Services
- License
- proprietary
- Year introduced
- 2012
- Confidence
- 0.98
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Commonly listed in cloud/backend job descriptions and widely used on AWS; strong vendor adoption and active ecosystem signal broad market demand.
Skill profile (library / DB)
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 11
- Sub-category id
- 55
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
NoSQL Databases Catalog dimension db id 19
Library dimension (catalog)
Roles linked in library: Backend Developer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
NoSQL Databases
nosql-databases
|
— | — |
Skipped — no persistable v3 meta for new skill
skill_not_in_db_v3_proposed
|
Aliases — catalog
- HBase (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Datastore
- Sub-category
- Wide Column Store
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2010
- Confidence
- 0.98
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: HBase appears in a limited set of big-data/legacy Hadoop job postings, while newer JDs more often specify DynamoDB, Bigtable, or Cassandra; its market demand is specialized rather than broad.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 3
- Sub-category id
- 31
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Storage and Data Services Catalog dimension db id 144
Library dimension (catalog)
Roles linked in library: Cloud Architect
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Web Frameworks
- Sub-category
- general
- Skill nature
- TOOL
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Testing Tools
- Sub-category
- general
- Skill nature
- TOOL
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Web Frameworks
- Sub-category
- general
- Skill nature
- TOOL
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
All API 3 persistence rows
Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.
| Skill | Tag | Dimension | Skill↔dim | Role↔dim | Outcome | Notes |
|---|---|---|---|---|---|---|
| Python | in_db |
Cloud Security Scripting & DSL Languages
cloud-security-scripting-dsl-languages
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Programming Languages
programming-languages
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Programming Languages and Scripting
programming-languages-and-scripting
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Python | in_db |
Programming Languages for ML Systems
programming-languages-for-ml-systems
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Programming Languages for XR
programming-languages-for-xr
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Python | in_db |
Python Programming
python-programming
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Kafka | in_db |
Asynchronous Messaging and Event Streaming
asynchronous-messaging-and-event-streaming
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Kafka | in_db |
Messaging and Background Jobs
messaging-and-background-jobs
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Kafka | in_db |
Messaging and Event Streaming
messaging-and-event-streaming
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Apache Spark | in_db |
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Airflow | in_db |
Workflow Orchestration for ML Pipelines
workflow-orchestration-for-ml-pipelines
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| MongoDB | in_db |
NoSQL Databases
nosql-databases
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| DynamoDB | new |
NoSQL Databases
nosql-databases
|
— | — | Skipped — no persistable v3 meta for new skill | skill_not_in_db_v3_proposed |
| HBase | in_db |
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Library artifacts (this run)
| Kind | Detail | DB id |
|---|---|---|
| canonical_skill_proposed | Google Dataflow | type=Data Engineering Tools subtype=general nature=PLATFORM lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Bigtable | type=Databases subtype=general nature=TOOL lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Scrapy | type=Web Frameworks subtype=general nature=TOOL lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Selenium | type=Testing Tools subtype=general nature=TOOL lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Beautiful Soup | type=Web Frameworks subtype=general nature=TOOL lifespan=MULTI_YEAR | |
| dimension_skill_link_proposed | DynamoDB ↔ NoSQL Databases |
nano JD Parser — gpt-4.1-nano click to toggle
Show raw JSON
{
"JD_type": "fail",
"archetype_override_applied": true,
"archetype_override_matched_skills": [
"Python",
"MongoDB",
"NoSQL",
"normalization",
"Kafka"
],
"role_archetype": "Engineering"
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "Python"
},
{
"is_primary": true,
"skill_name": "Kafka"
},
{
"is_primary": true,
"skill_name": "Apache Spark"
},
{
"is_primary": true,
"skill_name": "Google Dataflow"
},
{
"is_primary": true,
"skill_name": "Airflow"
},
{
"is_primary": true,
"skill_name": "Bigtable"
},
{
"is_primary": true,
"skill_name": "MongoDB"
},
{
"is_primary": true,
"skill_name": "DynamoDB"
},
{
"is_primary": true,
"skill_name": "HBase"
},
{
"is_primary": true,
"skill_name": "Scrapy"
},
{
"is_primary": true,
"skill_name": "Selenium"
},
{
"is_primary": true,
"skill_name": "Beautiful Soup"
}
],
"jd_role": null,
"nano_parsed": {
"JD_type": "fail",
"archetype_override_applied": true,
"archetype_override_matched_skills": [
"Python",
"MongoDB",
"NoSQL",
"normalization",
"Kafka"
],
"role_archetype": "Engineering"
},
"rejected": false,
"rejection_reason": null,
"run_id": "bae69e34-7cea-4d40-93b7-a7eea576fb78",
"stage3_signals": {
"alias_found": false,
"alias_match_roles": [],
"kra_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": [
{
"kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
"sentence": "Engineer \u2013 Data Onboarding:\u00a0(Priority 1\u00a0\u2013 Multiple positions)6+ years of development experience in\u00a0development\u00a0Python.Data streaming / Data processing\u00a0technologies like\u00a0Kafka, Spark, Google Dataflow, AirflowExperience with multiple\u00a0NoSQL databases (BigTable, MongoDB, DynamoDB, HBaseExperience working with data \u2013\u00a0data cleansing, normalization, translation\u00a0or similar concepts.Requriment 2",
"similarity": 0.5721
},
{
"kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
"sentence": "Software Engineer \u2013 Data Collection:\u00a0(Priority 2\u00a0\u2013 Multiple positions)3+ years of experience in programming Language:\u00a0PythonWeb Crawling or Web\u00a0Scraping Tools: Scrapy, Selenium, and Beautiful soup.Recent work experience in\u00a0web scrapping / crawling.",
"similarity": 0.3462
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 0.4592,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "ML Engineer",
"kra_matches": [
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Engineer \u2013 Data Onboarding:\u00a0(Priority 1\u00a0\u2013 Multiple positions)6+ years of development experience in\u00a0development\u00a0Python.Data streaming / Data processing\u00a0technologies like\u00a0Kafka, Spark, Google Dataflow, AirflowExperience with multiple\u00a0NoSQL databases (BigTable, MongoDB, DynamoDB, HBaseExperience working with data \u2013\u00a0data cleansing, normalization, translation\u00a0or similar concepts.Requriment 2",
"similarity": 0.4177
},
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Software Engineer \u2013 Data Collection:\u00a0(Priority 2\u00a0\u2013 Multiple positions)3+ years of experience in programming Language:\u00a0PythonWeb Crawling or Web\u00a0Scraping Tools: Scrapy, Selenium, and Beautiful soup.Recent work experience in\u00a0web scrapping / crawling.",
"similarity": 0.2909
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 3,
"score": 0.3543,
"slug": "ml-engineer",
"total_count": null
},
{
"display_name": "Svelte Frontend Developer",
"kra_matches": [
{
"kra_text": "backend data integration",
"sentence": "Engineer \u2013 Data Onboarding:\u00a0(Priority 1\u00a0\u2013 Multiple positions)6+ years of development experience in\u00a0development\u00a0Python.Data streaming / Data processing\u00a0technologies like\u00a0Kafka, Spark, Google Dataflow, AirflowExperience with multiple\u00a0NoSQL databases (BigTable, MongoDB, DynamoDB, HBaseExperience working with data \u2013\u00a0data cleansing, normalization, translation\u00a0or similar concepts.Requriment 2",
"similarity": 0.4174
},
{
"kra_text": "backend data integration",
"sentence": "Software Engineer \u2013 Data Collection:\u00a0(Priority 2\u00a0\u2013 Multiple positions)3+ years of experience in programming Language:\u00a0PythonWeb Crawling or Web\u00a0Scraping Tools: Scrapy, Selenium, and Beautiful soup.Recent work experience in\u00a0web scrapping / crawling.",
"similarity": 0.2793
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 92,
"score": 0.3483,
"slug": "svelte-frontend-developer",
"total_count": null
},
{
"display_name": "Fullstack Developer",
"kra_matches": [
{
"kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
"sentence": "Engineer \u2013 Data Onboarding:\u00a0(Priority 1\u00a0\u2013 Multiple positions)6+ years of development experience in\u00a0development\u00a0Python.Data streaming / Data processing\u00a0technologies like\u00a0Kafka, Spark, Google Dataflow, AirflowExperience with multiple\u00a0NoSQL databases (BigTable, MongoDB, DynamoDB, HBaseExperience working with data \u2013\u00a0data cleansing, normalization, translation\u00a0or similar concepts.Requriment 2",
"similarity": 0.3934
},
{
"kra_text": "Implements complete product features end-to-end from database schema design through backend API to frontend UI using JavaScript, TypeScript, Python, or Ruby on Rails.",
"sentence": "Software Engineer \u2013 Data Collection:\u00a0(Priority 2\u00a0\u2013 Multiple positions)3+ years of experience in programming Language:\u00a0PythonWeb Crawling or Web\u00a0Scraping Tools: Scrapy, Selenium, and Beautiful soup.Recent work experience in\u00a0web scrapping / crawling.",
"similarity": 0.295
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 15,
"score": 0.3442,
"slug": "full-stack-engineer",
"total_count": null
},
{
"display_name": "AI Engineer",
"kra_matches": [
{
"kra_text": "Designs and implements prompt engineering workflows, few-shot examples, chain-of-thought patterns, and structured output parsing for AI feature pipelines.",
"sentence": "Engineer \u2013 Data Onboarding:\u00a0(Priority 1\u00a0\u2013 Multiple positions)6+ years of development experience in\u00a0development\u00a0Python.Data streaming / Data processing\u00a0technologies like\u00a0Kafka, Spark, Google Dataflow, AirflowExperience with multiple\u00a0NoSQL databases (BigTable, MongoDB, DynamoDB, HBaseExperience working with data \u2013\u00a0data cleansing, normalization, translation\u00a0or similar concepts.Requriment 2",
"similarity": 0.3705
},
{
"kra_text": "Designs and implements prompt engineering workflows, few-shot examples, chain-of-thought patterns, and structured output parsing for AI feature pipelines.",
"sentence": "Software Engineer \u2013 Data Collection:\u00a0(Priority 2\u00a0\u2013 Multiple positions)3+ years of experience in programming Language:\u00a0PythonWeb Crawling or Web\u00a0Scraping Tools: Scrapy, Selenium, and Beautiful soup.Recent work experience in\u00a0web scrapping / crawling.",
"similarity": 0.3146
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 13,
"score": 0.3425,
"slug": "ai-engineer",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": 3,
"matched_skills": [
"Apache Spark",
"Kafka",
"Python"
],
"role_id": 2,
"score": 0.25,
"slug": "data-engineer",
"total_count": 12
},
{
"display_name": "Backend Developer",
"kra_matches": null,
"matched_count": 3,
"matched_skills": [
"Kafka",
"MongoDB",
"Python"
],
"role_id": 1,
"score": 0.25,
"slug": "backend-engineer",
"total_count": 12
},
{
"display_name": "MLOps Engineer",
"kra_matches": null,
"matched_count": 2,
"matched_skills": [
"Airflow",
"Python"
],
"role_id": 16,
"score": 0.1667,
"slug": "ml-ops-engineer",
"total_count": 12
},
{
"display_name": "ML Engineer",
"kra_matches": null,
"matched_count": 2,
"matched_skills": [
"Airflow",
"Python"
],
"role_id": 3,
"score": 0.1667,
"slug": "ml-engineer",
"total_count": 12
},
{
"display_name": "Python Backend Developer",
"kra_matches": null,
"matched_count": 2,
"matched_skills": [
"Kafka",
"Python"
],
"role_id": 80,
"score": 0.1667,
"slug": "python-backend-developer",
"total_count": 12
}
]
},
"stage4_decision": {
"alias_collision_detected": false,
"case": "DOMAIN",
"chosen_role": {
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 0.92,
"slug": "data-engineer",
"total_count": null
},
"confidence": 0.92,
"is_new_role": false,
"llm2_fired": false,
"llm2_reasoning": null,
"matched_dimensions": [
"Data Onboarding",
"Data Streaming and Processing",
"NoSQL Data Engineering",
"Data Collection and Web Scraping",
"Data Cleansing and Normalization"
],
"matched_kras": [
"Develop data onboarding pipelines",
"Build data streaming / data processing technologies",
"Work with multiple NoSQL databases",
"Perform data cleansing, normalization, translation",
"Develop web crawling / web scraping tools"
],
"matched_skills": [
"Python",
"Kafka",
"Spark",
"Google Dataflow",
"Airflow",
"BigTable",
"MongoDB",
"DynamoDB",
"HBase",
"Scrapy",
"Selenium",
"beautiful soup",
"web scraping",
"web crawling",
"data cleansing"
],
"new_role_display_name": null,
"new_role_slug": null,
"queued": false,
"reasoning": "Domain=Data Engineering \u0026 Analytics; The JD centers on Python-based data onboarding and collection with Kafka, Spark, Dataflow, Airflow, NoSQL databases, and data cleansing/normalization, which best fits a Data Engineer role.",
"sub_role": null
},
"stage5_updates": {
"centroid_n_after": 81,
"centroid_updated": true,
"collision_log_id": null,
"new_kra_attached": null,
"new_skills_attached": [
{
"is_primary": true,
"queue_id": 5240,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Google Dataflow",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5241,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Bigtable",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5242,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "DynamoDB",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5243,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Scrapy",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5244,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Selenium",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5245,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Beautiful Soup",
"status": "pending"
}
],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{
"alias_matches": [
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 67,
"existing_alias_text": "Python",
"input_term": "Python",
"matched_canonical": {
"category_id": 6,
"display_name": "Python",
"id": 5,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "python",
"sub_category_id": 96,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 173,
"existing_alias_text": "Kafka",
"input_term": "Kafka",
"matched_canonical": {
"category_id": 3,
"display_name": "Kafka",
"id": 36,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "kafka",
"sub_category_id": 3533,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2004,
"existing_alias_text": "Apache Spark",
"input_term": "Apache Spark",
"matched_canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 526,
"existing_alias_text": "Airflow",
"input_term": "Airflow",
"matched_canonical": {
"category_id": 13,
"display_name": "Airflow",
"id": 265,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "airflow",
"sub_category_id": 130,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 232,
"existing_alias_text": "MongoDB",
"input_term": "MongoDB",
"matched_canonical": {
"category_id": 3,
"display_name": "MongoDB",
"id": 91,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "mongodb",
"sub_category_id": 27,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "TODO: REMOVE AFTER TESTING \u2014 alias DB write disabled",
"alias_persisted": false,
"existing_alias_id": null,
"existing_alias_text": null,
"input_term": "DynamoDB",
"matched_canonical": {
"category_id": 11,
"display_name": "Amazon DynamoDB",
"id": 93,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "amazon-dynamodb",
"sub_category_id": 55,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "embedding_display_name"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2011,
"existing_alias_text": "HBase",
"input_term": "HBase",
"matched_canonical": {
"category_id": 3,
"display_name": "HBase",
"id": 1352,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "hbase",
"sub_category_id": 31,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
}
],
"candidate_roles": [
{
"display_name": "Cloud Security Engineer",
"id": 23,
"rationale": null,
"role_archetype": null,
"slug": "cloud-security-engineer",
"source": "db"
},
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Fullstack Developer",
"id": 15,
"rationale": null,
"role_archetype": null,
"slug": "full-stack-engineer",
"source": "db"
},
{
"display_name": "Cyber Security Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
},
{
"display_name": "AR/VR Engineer",
"id": 8,
"rationale": null,
"role_archetype": null,
"slug": "ar-vr-engineer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
},
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
},
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Domain=Data Engineering \u0026 Analytics; The JD centers on Python-based data onboarding and collection with Kafka, Spark, Dataflow, Airflow, NoSQL databases, and data cleansing/normalization, which best fits a Data Engineer role.",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Scripting \u0026 DSL Languages",
"id": 248,
"rationale": "Proficiency in programming and domain-specific languages used to automate and script cloud security controls.",
"slug": "cloud-security-scripting-dsl-languages",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Security Engineer",
"id": 23,
"rationale": null,
"role_archetype": null,
"slug": "cloud-security-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages",
"id": 1,
"rationale": "Primary implementation languages used to build client and server feature code. Full stack engineers need enough fluency to move across layers and implement product behavior end to end.",
"slug": "programming-languages",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Fullstack Developer",
"id": 15,
"rationale": null,
"role_archetype": null,
"slug": "full-stack-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages and Scripting",
"id": 59,
"rationale": "Languages used to write security automation, analysis scripts, detection logic, and remediation helpers. This is the primary implementation surface for a cybersecurity engineer across tooling and response workflows.",
"slug": "programming-languages-and-scripting",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cyber Security Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 39,
"rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for XR",
"id": 97,
"rationale": "Primary implementation languages used to build immersive client features, interaction logic, and device-specific runtime behavior. This is the core coding surface for AR/VR experiences.",
"slug": "programming-languages-for-xr",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AR/VR Engineer",
"id": 8,
"rationale": null,
"role_archetype": null,
"slug": "ar-vr-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Python Programming",
"id": 290,
"rationale": "Core Python language skills used to implement backend business logic, request handlers, integrations, and service internals. This is the primary coding surface for the role.",
"slug": "python-programming",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Asynchronous Messaging and Event Streaming",
"id": 297,
"rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
"slug": "asynchronous-messaging-and-event-streaming",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Background Jobs",
"id": 291,
"rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
"slug": "messaging-and-background-jobs",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Apache Spark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Workflow Orchestration for ML Pipelines",
"id": 54,
"rationale": "Workflow engines used to coordinate training, evaluation, deployment, and retraining jobs. This cluster covers dependencies, retries, scheduling, and pipeline composition for ML lifecycle automation.",
"slug": "workflow-orchestration-for-ml-pipelines",
"source": "db"
},
"input_skill": "Airflow",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"input_skill": "MongoDB",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"input_skill": "DynamoDB",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "HBase",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
}
],
"input_final_skills": [
"Python",
"Kafka",
"Apache Spark",
"Google Dataflow",
"Airflow",
"Bigtable",
"MongoDB",
"DynamoDB",
"HBase",
"Scrapy",
"Selenium",
"Beautiful Soup"
],
"input_llm_skills": [
"Python",
"Kafka",
"Apache Spark",
"Google Dataflow",
"Airflow",
"Bigtable",
"MongoDB",
"DynamoDB",
"HBase",
"Scrapy",
"Selenium",
"Beautiful Soup"
],
"new_aliases_persisted": 0,
"run_id": "bae69e34-7cea-4d40-93b7-a7eea576fb78",
"skills_detail": [
{
"aliases_in_db": [
{
"alias_text": "Python",
"alias_type": "CANONICAL",
"id": 67,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 2",
"alias_type": "VERSION",
"id": 72,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 2.x",
"alias_type": "VERSION",
"id": 74,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 3",
"alias_type": "VERSION",
"id": 73,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 3.10",
"alias_type": "VERSION",
"id": 76,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 3.11",
"alias_type": "VERSION",
"id": 77,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 3.12",
"alias_type": "VERSION",
"id": 78,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Python 3.x",
"alias_type": "VERSION",
"id": 75,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "py",
"alias_type": "VERSION",
"id": 2183,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "py2",
"alias_type": "VERSION",
"id": 68,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "py3",
"alias_type": "VERSION",
"id": 69,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python 3",
"alias_type": "VERSION",
"id": 2186,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python 3.x",
"alias_type": "VERSION",
"id": 2849,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python2",
"alias_type": "VERSION",
"id": 70,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python3",
"alias_type": "VERSION",
"id": 71,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "python3.x",
"alias_type": "VERSION",
"id": 2848,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 6,
"display_name": "Python",
"id": 5,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "python",
"sub_category_id": 96,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Scripting \u0026 DSL Languages",
"id": 248,
"rationale": "Proficiency in programming and domain-specific languages used to automate and script cloud security controls.",
"slug": "cloud-security-scripting-dsl-languages",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Security Engineer",
"id": 23,
"rationale": null,
"role_archetype": null,
"slug": "cloud-security-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages",
"id": 1,
"rationale": "Primary implementation languages used to build client and server feature code. Full stack engineers need enough fluency to move across layers and implement product behavior end to end.",
"slug": "programming-languages",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Fullstack Developer",
"id": 15,
"rationale": null,
"role_archetype": null,
"slug": "full-stack-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages and Scripting",
"id": 59,
"rationale": "Languages used to write security automation, analysis scripts, detection logic, and remediation helpers. This is the primary implementation surface for a cybersecurity engineer across tooling and response workflows.",
"slug": "programming-languages-and-scripting",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cyber Security Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 39,
"rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for XR",
"id": 97,
"rationale": "Primary implementation languages used to build immersive client features, interaction logic, and device-specific runtime behavior. This is the core coding surface for AR/VR experiences.",
"slug": "programming-languages-for-xr",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "AR/VR Engineer",
"id": 8,
"rationale": null,
"role_archetype": null,
"slug": "ar-vr-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Python Programming",
"id": 290,
"rationale": "Core Python language skills used to implement backend business logic, request handlers, integrations, and service internals. This is the primary coding surface for the role.",
"slug": "python-programming",
"source": "db"
},
"input_skill": "Python",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
}
]
}
],
"input_skill": "Python",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Kafka",
"alias_type": "CANONICAL",
"id": 173,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 3,
"display_name": "Kafka",
"id": 36,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "kafka",
"sub_category_id": 3533,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Asynchronous Messaging and Event Streaming",
"id": 297,
"rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
"slug": "asynchronous-messaging-and-event-streaming",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Background Jobs",
"id": 291,
"rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
"slug": "messaging-and-background-jobs",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Kafka",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Apache Spark",
"alias_type": "CANONICAL",
"id": 2004,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "apache spark 3",
"alias_type": "VERSION",
"id": 2006,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark",
"alias_type": "VERSION",
"id": 2510,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3",
"alias_type": "VERSION",
"id": 2007,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3.x",
"alias_type": "VERSION",
"id": 2009,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark3",
"alias_type": "VERSION",
"id": 2008,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Apache Spark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Apache Spark",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Google Dataflow",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PLATFORM",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "google-dataflow",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Airflow",
"alias_type": "CANONICAL",
"id": 526,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "airflow 2",
"alias_type": "VERSION",
"id": 2477,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "airflow-2",
"alias_type": "VERSION",
"id": 2478,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "airflow2",
"alias_type": "VERSION",
"id": 2476,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "airflow2.x",
"alias_type": "VERSION",
"id": 2479,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "apache airflow 2",
"alias_type": "VERSION",
"id": 2480,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 13,
"display_name": "Airflow",
"id": 265,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "airflow",
"sub_category_id": 130,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Workflow Orchestration for ML Pipelines",
"id": 54,
"rationale": "Workflow engines used to coordinate training, evaluation, deployment, and retraining jobs. This cluster covers dependencies, retries, scheduling, and pipeline composition for ML lifecycle automation.",
"slug": "workflow-orchestration-for-ml-pipelines",
"source": "db"
},
"input_skill": "Airflow",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
]
}
],
"input_skill": "Airflow",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Bigtable",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Databases",
"skill_nature": "TOOL",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "bigtable",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "MongoDB",
"alias_type": "CANONICAL",
"id": 232,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 2.0",
"alias_type": "VERSION",
"id": 238,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 2.2",
"alias_type": "VERSION",
"id": 239,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 2.4",
"alias_type": "VERSION",
"id": 240,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 2.6",
"alias_type": "VERSION",
"id": 241,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 3.0",
"alias_type": "VERSION",
"id": 242,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 3.2",
"alias_type": "VERSION",
"id": 243,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 3.4",
"alias_type": "VERSION",
"id": 244,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 3.6",
"alias_type": "VERSION",
"id": 245,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 4",
"alias_type": "VERSION",
"id": 233,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 4.0",
"alias_type": "VERSION",
"id": 246,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 4.2",
"alias_type": "VERSION",
"id": 247,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 4.4",
"alias_type": "VERSION",
"id": 248,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 5",
"alias_type": "VERSION",
"id": 234,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 5.0",
"alias_type": "VERSION",
"id": 249,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 6",
"alias_type": "VERSION",
"id": 235,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 6.0",
"alias_type": "VERSION",
"id": 250,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 7",
"alias_type": "VERSION",
"id": 236,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 7.0",
"alias_type": "VERSION",
"id": 251,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 8",
"alias_type": "VERSION",
"id": 237,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 8.0",
"alias_type": "VERSION",
"id": 252,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 3,
"display_name": "MongoDB",
"id": 91,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "mongodb",
"sub_category_id": 27,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"input_skill": "MongoDB",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
]
}
],
"input_skill": "MongoDB",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Amazon DynamoDB",
"alias_type": "CANONICAL",
"id": 254,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 11,
"display_name": "Amazon DynamoDB",
"id": 93,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "amazon-dynamodb",
"sub_category_id": 55,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"input_skill": "DynamoDB",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
]
}
],
"input_skill": "DynamoDB",
"matched_via": "embedding_display_name",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "HBase",
"alias_type": "CANONICAL",
"id": 2011,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 3,
"display_name": "HBase",
"id": 1352,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "hbase",
"sub_category_id": 31,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "HBase",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
}
],
"input_skill": "HBase",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Scrapy",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Web Frameworks",
"skill_nature": "TOOL",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "scrapy",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Selenium",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Testing Tools",
"skill_nature": "TOOL",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "selenium",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Beautiful Soup",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Web Frameworks",
"skill_nature": "TOOL",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "beautiful-soup",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
}
],
"unmatched_skills": [
"Google Dataflow",
"Bigtable",
"Scrapy",
"Selenium",
"Beautiful Soup"
]
}
API 3 — final-role-output
{
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Domain=Data Engineering \u0026 Analytics; The JD centers on Python-based data onboarding and collection with Kafka, Spark, Dataflow, Airflow, NoSQL databases, and data cleansing/normalization, which best fits a Data Engineer role.",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"chosen_role_resolution": "in_db",
"final_input_skills": [
{
"skill": "Python",
"tag": "in_db"
},
{
"skill": "Kafka",
"tag": "in_db"
},
{
"skill": "Apache Spark",
"tag": "in_db"
},
{
"skill": "Google Dataflow",
"tag": "new"
},
{
"skill": "Airflow",
"tag": "in_db"
},
{
"skill": "Bigtable",
"tag": "new"
},
{
"skill": "MongoDB",
"tag": "in_db"
},
{
"skill": "DynamoDB",
"tag": "in_db"
},
{
"skill": "HBase",
"tag": "in_db"
},
{
"skill": "Scrapy",
"tag": "new"
},
{
"skill": "Selenium",
"tag": "new"
},
{
"skill": "Beautiful Soup",
"tag": "new"
}
],
"llm_cost_api1_usd": null,
"llm_cost_api2_usd": null,
"llm_cost_api3_usd": null,
"llm_cost_total_usd": null,
"persistence": {
"items": [
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Security Scripting \u0026 DSL Languages",
"id": 248,
"rationale": "Proficiency in programming and domain-specific languages used to automate and script cloud security controls.",
"slug": "cloud-security-scripting-dsl-languages",
"source": "db"
},
"dimension_id": 248,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Security Engineer",
"id": 23,
"rationale": null,
"role_archetype": null,
"slug": "cloud-security-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 5,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages",
"id": 1,
"rationale": "Primary implementation languages used to build client and server feature code. Full stack engineers need enough fluency to move across layers and implement product behavior end to end.",
"slug": "programming-languages",
"source": "db"
},
"dimension_id": 1,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Fullstack Developer",
"id": 15,
"rationale": null,
"role_archetype": null,
"slug": "full-stack-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 5,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages and Scripting",
"id": 59,
"rationale": "Languages used to write security automation, analysis scripts, detection logic, and remediation helpers. This is the primary implementation surface for a cybersecurity engineer across tooling and response workflows.",
"slug": "programming-languages-and-scripting",
"source": "db"
},
"dimension_id": 59,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cyber Security Engineer",
"id": 5,
"rationale": null,
"role_archetype": null,
"slug": "cybersecurity-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 5,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"dimension_id": 21,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 5,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 39,
"rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"dimension_id": 39,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 5,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for XR",
"id": 97,
"rationale": "Primary implementation languages used to build immersive client features, interaction logic, and device-specific runtime behavior. This is the core coding surface for AR/VR experiences.",
"slug": "programming-languages-for-xr",
"source": "db"
},
"dimension_id": 97,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "AR/VR Engineer",
"id": 8,
"rationale": null,
"role_archetype": null,
"slug": "ar-vr-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 5,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Python Programming",
"id": 290,
"rationale": "Core Python language skills used to implement backend business logic, request handlers, integrations, and service internals. This is the primary coding surface for the role.",
"slug": "python-programming",
"source": "db"
},
"dimension_id": 290,
"input_skill": "Python",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 5,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Asynchronous Messaging and Event Streaming",
"id": 297,
"rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
"slug": "asynchronous-messaging-and-event-streaming",
"source": "db"
},
"dimension_id": 297,
"input_skill": "Kafka",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 36,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Background Jobs",
"id": 291,
"rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
"slug": "messaging-and-background-jobs",
"source": "db"
},
"dimension_id": 291,
"input_skill": "Kafka",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 36,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"dimension_id": 8,
"input_skill": "Kafka",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 36,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"dimension_id": 24,
"input_skill": "Apache Spark",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1350,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Workflow Orchestration for ML Pipelines",
"id": 54,
"rationale": "Workflow engines used to coordinate training, evaluation, deployment, and retraining jobs. This cluster covers dependencies, retries, scheduling, and pipeline composition for ML lifecycle automation.",
"slug": "workflow-orchestration-for-ml-pipelines",
"source": "db"
},
"dimension_id": 54,
"input_skill": "Airflow",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 265,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"dimension_id": 19,
"input_skill": "MongoDB",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 91,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"dimension_id": 19,
"input_skill": "DynamoDB",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
],
"skill_dimension_saved": false,
"skill_id": null,
"skill_tag": "new",
"skipped_reason": "skill_not_in_db_v3_proposed"
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"dimension_id": 144,
"input_skill": "HBase",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1352,
"skill_tag": "in_db",
"skipped_reason": null
}
],
"new_skills_created": 0,
"role_dimension_saved": 0,
"skill_dimension_saved": 0,
"skipped": 1
},
"planner_output": null,
"run_id": "bae69e34-7cea-4d40-93b7-a7eea576fb78"
}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.