Pipeline run
de82ff01-d921-48f8-bf90-5ded046d9f4d
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA descriptionvocab breakdown (legacy)
Signals
Post-classification
Captured for admin review
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Data Engineer
CASE Aslug: data-engineer · id: 2 · source: db
Exact alias hit on data-engineer (1.0) — no other alias at this confidence; skill_top data-engineer 0.30 does not contradict
Resolution:
in_db
— role exists in library; skill↔dim and role↔dim links saved when applicable.
Job description
Exp: 6-9 years; Big Data Engineer 6 years of Experience We are looking for a Big Data Engineer that will work on collecting, storing, processing, and analyzing huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company. Bangalore Experience 6 years' experience in Big Data Hadoop, MapReduce, Hive, Sqoop and Spark with hands-on expertise in design and implementation of high data volume solutions (ETL & Streaming). Experience with building stream-processing systems, using solutions such as Storm, Spark-Streaming, Kafka streams Extensive experience in working with Big Data tools like Pig, Hive, Athena, Glue, Snowflake and EMR Experience with NoSQL databases, such as HBase, Cassandra, MongoDB Knowledge of various ETL techniques and frameworks, such as Flume Experience with various messaging systems, such as Kafka or RabbitMQ Good understanding of Lambda Architecture, along with its advantages and drawbacks
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- UNVERSIONED
Aliases — catalog
- Hadoop (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Framework
- Sub-category
- Data Processing Framework
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2006
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Job postings still mention Hadoop for legacy big-data stacks, but JD volume has fallen as Spark and cloud warehouses replaced MapReduce-era clusters.
Skill profile (library / DB)
- Skill nature
- FRAMEWORK
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 5
- Sub-category id
- 91
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
ETL and ELT Tooling Catalog dimension db id 24
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Hive (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Datastore
- Sub-category
- Local Key Value Store
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2010
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Hive appears in Flutter/mobile JDs and package docs, but JD volume is far below SQLite/Realm and it’s mainly used for local key-value storage in Flutter apps.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 3
- Sub-category id
- 2242
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Local Persistence and Offline Behavior Catalog dimension db id 85
Library dimension (catalog)
Roles linked in library: Android Developer, Flutter Developer, Hybrid Mobile Developer, Native Mobile Developer, React Native Developer, iOS Developer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Local Persistence and Offline Behavior
local-persistence-and-offline-behavior
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- TOOL
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Apache Spark (CANONICAL)
- apache spark 3 (VERSION)
- spark (VERSION)
- spark 3 (VERSION)
- spark 3.x (VERSION)
- spark3 (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Framework
- Sub-category
- Distributed Data Processing Framework
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2010
- Confidence
- 0.94
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 3.x
Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.
Skill profile (library / DB)
- Skill nature
- FRAMEWORK
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 5
- Sub-category id
- 1021
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
ETL and ELT Tooling Catalog dimension db id 24
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PRACTICE
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- TOOL
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- DStreams (VERSION)
- Spark 2.x (VERSION)
- Spark 3.x (VERSION)
- Spark Streaming (VERSION)
- Spark Structured Streaming (VERSION)
- Structured Streaming (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Framework
- Sub-category
- Stream Processing Framework
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2013
- Confidence
- 0.90
- Version strategy
- SEPARATE_ENTITY
- Version tag
- Structured Streaming (Spark 2.0+)
Maturity reasoning: JD volume is far lower than Structured Streaming; most Spark streaming roles now specify Structured Streaming or Kafka/Flink, and Spark docs position Spark Streaming as the older API.
Skill profile (library / DB)
- Skill nature
- FRAMEWORK
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 5
- Sub-category id
- 94
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Stream Processing Systems Catalog dimension db id 25
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Stream Processing Systems
stream-processing-systems
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Aliases — catalog
- Kafka Streams (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Library
- Sub-category
- Stream Processing Library
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2016
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Common in JVM streaming JDs and Kafka ecosystem roles; widely used for stateful stream processing alongside Apache Kafka, with steady GitHub activity and no sunset signal.
Skill profile (library / DB)
- Skill nature
- LIBRARY
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 7
- Sub-category id
- 99
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Stream Processing Systems Catalog dimension db id 25
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Stream Processing Systems
stream-processing-systems
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- TOOL
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PLATFORM
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PLATFORM
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Snowflake (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Platform
- Sub-category
- Data Cloud Platform
- Vendor
- Snowflake Inc.
- License
- proprietary
- Year introduced
- 2012
- Confidence
- 0.98
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Snowflake appears frequently in data/analytics job postings and is a standard cloud data warehouse platform alongside BigQuery and Redshift.
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 9
- Sub-category id
- 113
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Data Warehouses Catalog dimension db id 22
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Data Warehouses
cloud-data-warehouses
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PLATFORM
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- HBase (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Datastore
- Sub-category
- Wide Column Store
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2010
- Confidence
- 0.98
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: HBase appears in a limited set of big-data/legacy Hadoop job postings, while newer JDs more often specify DynamoDB, Bigtable, or Cassandra; its market demand is specialized rather than broad.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 3
- Sub-category id
- 31
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Storage and Data Services Catalog dimension db id 144
Library dimension (catalog)
Roles linked in library: Cloud Architect
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Cassandra (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Datastore
- Sub-category
- Wide Column Store
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2008
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Apache Cassandra appears in many production data-platform JDs and is a common choice for high-write, distributed workloads; GitHub and vendor docs show sustained activity rather than sunset signals.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 3
- Sub-category id
- 31
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Storage and Data Services Catalog dimension db id 144
Library dimension (catalog)
Roles linked in library: Cloud Architect
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- MongoDB (CANONICAL) primary
- MongoDB 2.0 (VERSION)
- MongoDB 2.2 (VERSION)
- MongoDB 2.4 (VERSION)
- MongoDB 2.6 (VERSION)
- MongoDB 3.0 (VERSION)
- MongoDB 3.2 (VERSION)
- MongoDB 3.4 (VERSION)
- MongoDB 3.6 (VERSION)
- MongoDB 4 (VERSION)
- MongoDB 4.0 (VERSION)
- MongoDB 4.2 (VERSION)
- MongoDB 4.4 (VERSION)
- MongoDB 5 (VERSION)
- MongoDB 5.0 (VERSION)
- MongoDB 6 (VERSION)
- MongoDB 6.0 (VERSION)
- MongoDB 7 (VERSION)
- MongoDB 7.0 (VERSION)
- MongoDB 8 (VERSION)
- MongoDB 8.0 (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Datastore
- Sub-category
- Document Database
- Vendor
- MongoDB, Inc.
- License
- other_open
- Year introduced
- 2009
- Confidence
- 0.99
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 8.0
Maturity reasoning: MongoDB appears in many job descriptions across backend/data roles and is a standard document database in modern stacks; strong GitHub/community activity and broad cloud vendor support indicate mainstream adoption.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 3
- Sub-category id
- 27
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
NoSQL Databases Catalog dimension db id 19
Library dimension (catalog)
Roles linked in library: Backend Developer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
NoSQL Databases
nosql-databases
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- TOOL
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Kafka (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Datastore
- Sub-category
- Event Stream Store
- Vendor
- Confluent
- License
- apache_2
- Year introduced
- 2011
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Kafka appears in many production JDs for event streaming and data pipelines, and remains a standard platform in cloud/vendor offerings (e.g., Confluent, AWS MSK), indicating broad hiring demand.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 3
- Sub-category id
- 3533
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Asynchronous Messaging and Event Streaming Catalog dimension db id 297
Library dimension (catalog)
Roles linked in library: .NET Backend Developer, Go Backend Developer, Kotlin Backend Developer, Node.js Backend Developer, Scala Backend Developer
-
Messaging and Background Jobs Catalog dimension db id 291
Library dimension (catalog)
Roles linked in library: PHP Backend Developer, Python Backend Developer, Ruby Backend Developer
-
Messaging and Event Streaming Catalog dimension db id 8
Library dimension (catalog)
Roles linked in library: Backend Developer, Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Asynchronous Messaging and Event Streaming
asynchronous-messaging-and-event-streaming
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Messaging and Background Jobs
messaging-and-background-jobs
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Messaging and Event Streaming
messaging-and-event-streaming
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Aliases — catalog
- RabbitMQ (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Tool
- Sub-category
- Message Broker Tool
- Vendor
- Pivotal Software
- License
- apache_2
- Year introduced
- 2007
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: RabbitMQ appears in many backend/platform job descriptions and is a common message broker in production stacks; it remains actively maintained with broad ecosystem support, unlike niche brokers.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 13
- Sub-category id
- 1880
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Asynchronous Messaging and Event Streaming Catalog dimension db id 297
Library dimension (catalog)
Roles linked in library: .NET Backend Developer, Go Backend Developer, Kotlin Backend Developer, Node.js Backend Developer, Scala Backend Developer
-
Messaging and Background Jobs Catalog dimension db id 291
Library dimension (catalog)
Roles linked in library: PHP Backend Developer, Python Backend Developer, Ruby Backend Developer
-
Messaging and Event Streaming Catalog dimension db id 8
Library dimension (catalog)
Roles linked in library: Backend Developer, Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Asynchronous Messaging and Event Streaming
asynchronous-messaging-and-event-streaming
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Messaging and Background Jobs
messaging-and-background-jobs
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Messaging and Event Streaming
messaging-and-event-streaming
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
All API 3 persistence rows
Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.
| Skill | Tag | Dimension | Skill↔dim | Role↔dim | Outcome | Notes |
|---|---|---|---|---|---|---|
| Hadoop | in_db |
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Hive | in_db |
Local Persistence and Offline Behavior
local-persistence-and-offline-behavior
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Spark | in_db |
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Spark Streaming | in_db |
Stream Processing Systems
stream-processing-systems
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Kafka Streams | in_db |
Stream Processing Systems
stream-processing-systems
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Snowflake | in_db |
Cloud Data Warehouses
cloud-data-warehouses
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| HBase | in_db |
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Cassandra | in_db |
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| MongoDB | in_db |
NoSQL Databases
nosql-databases
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Kafka | in_db |
Asynchronous Messaging and Event Streaming
asynchronous-messaging-and-event-streaming
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Kafka | in_db |
Messaging and Background Jobs
messaging-and-background-jobs
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Kafka | in_db |
Messaging and Event Streaming
messaging-and-event-streaming
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| RabbitMQ | in_db |
Asynchronous Messaging and Event Streaming
asynchronous-messaging-and-event-streaming
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| RabbitMQ | in_db |
Messaging and Background Jobs
messaging-and-background-jobs
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| RabbitMQ | in_db |
Messaging and Event Streaming
messaging-and-event-streaming
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Library artifacts (this run)
| Kind | Detail | DB id |
|---|---|---|
| canonical_skill_proposed | Big Data | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=EVERGREEN | |
| canonical_skill_proposed | MapReduce | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Sqoop | type=Data Engineering Tools subtype=general nature=TOOL lifespan=MULTI_YEAR | |
| canonical_skill_proposed | ETL | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Streaming | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Storm | type=Data Engineering Tools subtype=general nature=TOOL lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Pig | type=Data Engineering Tools subtype=general nature=TOOL lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Athena | type=Data Engineering Tools subtype=general nature=PLATFORM lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Glue | type=Data Engineering Tools subtype=general nature=PLATFORM lifespan=MULTI_YEAR | |
| canonical_skill_proposed | EMR | type=Data Engineering Tools subtype=general nature=PLATFORM lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Flume | type=Data Engineering Tools subtype=general nature=TOOL lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Lambda Architecture | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR |
nano JD Parser — gpt-4.1-nano click to toggle
Show raw JSON
{
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": 9,
"min": 6,
"raw": "6-9 years"
},
"job_locations": [
{
"aliases": [
"Bengaluru"
],
"city": "Bangalore",
"country": "India",
"state": null,
"work_mode": null
}
],
"role": "Big Data Engineer",
"role_aliases": [
"Big Data Developer",
"Data Engineer",
"Big Data Specialist"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 0,
"heading": "Role Overview",
"heading_was_present": false,
"source_marker": {
"first_5_words": "We are looking for a",
"last_5_words": "architecture used across the company."
},
"text": "We are looking for a Big Data Engineer that will work on collecting, storing, processing, and analyzing huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company.",
"word_count": 52
},
{
"bullet_count": 6,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Experience 6 years\u0027 experience in",
"last_5_words": "its advantages and drawbacks."
},
"text": "Experience 6 years\u0027 experience in Big Data Hadoop, MapReduce, Hive, Sqoop and Spark with hands-on expertise in design and implementation of high data volume solutions (ETL \u0026 Streaming). Experience with building stream-processing systems, using solutions such as Storm, Spark-Streaming, Kafka streams. Extensive experience in working with Big Data tools like Pig, Hive, Athena, Glue, Snowflake and EMR. Experience with NoSQL databases, such as HBase, Cassandra, MongoDB. Knowledge of various ETL techniques and frameworks, such as Flume. Experience with various messaging systems, such as Kafka or RabbitMQ. Good understanding of Lambda Architecture, along with its advantages and drawbacks.",
"word_count": 104
}
],
"urls": []
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "Big Data"
},
{
"is_primary": true,
"skill_name": "Hadoop"
},
{
"is_primary": true,
"skill_name": "MapReduce"
},
{
"is_primary": true,
"skill_name": "Hive"
},
{
"is_primary": true,
"skill_name": "Sqoop"
},
{
"is_primary": true,
"skill_name": "Spark"
},
{
"is_primary": true,
"skill_name": "ETL"
},
{
"is_primary": true,
"skill_name": "Streaming"
},
{
"is_primary": true,
"skill_name": "Storm"
},
{
"is_primary": true,
"skill_name": "Spark Streaming"
},
{
"is_primary": true,
"skill_name": "Kafka Streams"
},
{
"is_primary": true,
"skill_name": "Pig"
},
{
"is_primary": true,
"skill_name": "Athena"
},
{
"is_primary": true,
"skill_name": "Glue"
},
{
"is_primary": true,
"skill_name": "Snowflake"
},
{
"is_primary": true,
"skill_name": "EMR"
},
{
"is_primary": true,
"skill_name": "HBase"
},
{
"is_primary": true,
"skill_name": "Cassandra"
},
{
"is_primary": true,
"skill_name": "MongoDB"
},
{
"is_primary": true,
"skill_name": "Flume"
},
{
"is_primary": true,
"skill_name": "Kafka"
},
{
"is_primary": true,
"skill_name": "RabbitMQ"
},
{
"is_primary": true,
"skill_name": "Lambda Architecture"
}
],
"jd_role": {
"display_name": "Big Data Engineer",
"rationale": null,
"role_aliases": [
"Big Data Developer",
"Data Engineer",
"Big Data Specialist"
],
"role_archetype": "Data",
"slug": ""
},
"nano_parsed": {
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": 9,
"min": 6,
"raw": "6-9 years"
},
"job_locations": [
{
"aliases": [
"Bengaluru"
],
"city": "Bangalore",
"country": "India",
"state": null,
"work_mode": null
}
],
"role": "Big Data Engineer",
"role_aliases": [
"Big Data Developer",
"Data Engineer",
"Big Data Specialist"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 0,
"heading": "Role Overview",
"heading_was_present": false,
"source_marker": {
"first_5_words": "We are looking for a",
"last_5_words": "architecture used across the company."
},
"text": "We are looking for a Big Data Engineer that will work on collecting, storing, processing, and analyzing huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company.",
"word_count": 52
},
{
"bullet_count": 6,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Experience 6 years\u0027 experience in",
"last_5_words": "its advantages and drawbacks."
},
"text": "Experience 6 years\u0027 experience in Big Data Hadoop, MapReduce, Hive, Sqoop and Spark with hands-on expertise in design and implementation of high data volume solutions (ETL \u0026 Streaming). Experience with building stream-processing systems, using solutions such as Storm, Spark-Streaming, Kafka streams. Extensive experience in working with Big Data tools like Pig, Hive, Athena, Glue, Snowflake and EMR. Experience with NoSQL databases, such as HBase, Cassandra, MongoDB. Knowledge of various ETL techniques and frameworks, such as Flume. Experience with various messaging systems, such as Kafka or RabbitMQ. Good understanding of Lambda Architecture, along with its advantages and drawbacks.",
"word_count": 104
}
],
"urls": []
},
"rejected": false,
"rejection_reason": null,
"run_id": "de82ff01-d921-48f8-bf90-5ded046d9f4d",
"stage3_signals": {
"alias_found": true,
"alias_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
}
],
"kra_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": [
{
"kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
"sentence": "Experience 6 years\u0027 experience in Big Data Hadoop, MapReduce, Hive, Sqoop and Spark with hands-on expertise in design and implementation of high data volume solutions (ETL \u0026 Streaming).",
"similarity": 0.5528
},
{
"kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
"sentence": "Extensive experience in working with Big Data tools like Pig, Hive, Athena, Glue, Snowflake and EMR.",
"similarity": 0.543
},
{
"kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
"sentence": "We are looking for a Big Data Engineer that will work on collecting, storing, processing, and analyzing huge sets of data.",
"similarity": 0.4996
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 0.5318,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "ML Engineer",
"kra_matches": [
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Extensive experience in working with Big Data tools like Pig, Hive, Athena, Glue, Snowflake and EMR.",
"similarity": 0.4301
},
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "We are looking for a Big Data Engineer that will work on collecting, storing, processing, and analyzing huge sets of data.",
"similarity": 0.4174
},
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Experience 6 years\u0027 experience in Big Data Hadoop, MapReduce, Hive, Sqoop and Spark with hands-on expertise in design and implementation of high data volume solutions (ETL \u0026 Streaming).",
"similarity": 0.4097
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 3,
"score": 0.4191,
"slug": "ml-engineer",
"total_count": null
},
{
"display_name": "Flutter Developer",
"kra_matches": [
{
"kra_text": "collaborate with design, product, and backend teams",
"sentence": "You will also be responsible for integrating them with the architecture used across the company.",
"similarity": 0.4626
},
{
"kra_text": "optimize responsiveness and performance",
"sentence": "The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.",
"similarity": 0.4222
},
{
"kra_text": "integrate external APIs and data sources",
"sentence": "Extensive experience in working with Big Data tools like Pig, Hive, Athena, Glue, Snowflake and EMR.",
"similarity": 0.3325
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 74,
"score": 0.4058,
"slug": "flutter-developer",
"total_count": null
},
{
"display_name": "Fullstack Developer",
"kra_matches": [
{
"kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
"sentence": "Extensive experience in working with Big Data tools like Pig, Hive, Athena, Glue, Snowflake and EMR.",
"similarity": 0.4226
},
{
"kra_text": "Works closely with product managers and UX designers to translate requirements and wireframes into working software features through iterative development.",
"sentence": "You will also be responsible for integrating them with the architecture used across the company.",
"similarity": 0.3965
},
{
"kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
"sentence": "We are looking for a Big Data Engineer that will work on collecting, storing, processing, and analyzing huge sets of data.",
"similarity": 0.3855
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 15,
"score": 0.4016,
"slug": "full-stack-engineer",
"total_count": null
},
{
"display_name": "Cloud Architect",
"kra_matches": [
{
"kra_text": "Conducts architecture reviews, approves technical design documents, and guides engineering teams through cloud migration and modernization projects.",
"sentence": "You will also be responsible for integrating them with the architecture used across the company.",
"similarity": 0.4463
},
{
"kra_text": "Defines cloud adoption roadmaps, lift-and-shift vs. refactor migration strategies, and landing zone architectures for workloads moving to AWS, Azure, or GCP.",
"sentence": "The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.",
"similarity": 0.3707
},
{
"kra_text": "Designs multi-region and multi-availability-zone cloud infrastructure architectures for high availability, fault tolerance, and horizontal scalability.",
"sentence": "Experience 6 years\u0027 experience in Big Data Hadoop, MapReduce, Hive, Sqoop and Spark with hands-on expertise in design and implementation of high data volume solutions (ETL \u0026 Streaming).",
"similarity": 0.3524
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 9,
"score": 0.3898,
"slug": "cloud-architect",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": 7,
"matched_skills": [
"Apache Spark",
"Hadoop",
"Kafka",
"Kafka Streams",
"RabbitMQ",
"Snowflake",
"Spark Streaming"
],
"role_id": 2,
"score": 0.3043,
"slug": "data-engineer",
"total_count": 23
},
{
"display_name": "Backend Developer",
"kra_matches": null,
"matched_count": 3,
"matched_skills": [
"Kafka",
"MongoDB",
"RabbitMQ"
],
"role_id": 1,
"score": 0.1304,
"slug": "backend-engineer",
"total_count": 23
},
{
"display_name": "Python Backend Developer",
"kra_matches": null,
"matched_count": 2,
"matched_skills": [
"Kafka",
"RabbitMQ"
],
"role_id": 80,
"score": 0.087,
"slug": "python-backend-developer",
"total_count": 23
},
{
"display_name": "Cloud Architect",
"kra_matches": null,
"matched_count": 2,
"matched_skills": [
"Cassandra",
"HBase"
],
"role_id": 9,
"score": 0.087,
"slug": "cloud-architect",
"total_count": 23
},
{
"display_name": "Go Backend Developer",
"kra_matches": null,
"matched_count": 2,
"matched_skills": [
"Kafka",
"RabbitMQ"
],
"role_id": 81,
"score": 0.087,
"slug": "go-backend-developer",
"total_count": 23
}
]
},
"stage4_decision": {
"alias_collision_detected": false,
"case": "A",
"chosen_role": {
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
},
"confidence": 1.0,
"is_new_role": false,
"llm2_fired": false,
"llm2_reasoning": null,
"matched_dimensions": [],
"matched_kras": [],
"matched_skills": [],
"new_role_display_name": null,
"new_role_slug": null,
"queued": false,
"reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.30 does not contradict",
"sub_role": null
},
"stage5_updates": {
"centroid_n_after": 414,
"centroid_updated": true,
"collision_log_id": null,
"new_kra_attached": null,
"new_skills_attached": [
{
"is_primary": true,
"queue_id": 19147,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Big Data",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19148,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "MapReduce",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19149,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Sqoop",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19150,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "ETL",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19151,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Streaming",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19152,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Storm",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19153,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Pig",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19154,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Athena",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19155,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Glue",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19156,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "EMR",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19157,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Flume",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 19158,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Lambda Architecture",
"status": "pending"
}
],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{
"alias_matches": [
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2010,
"existing_alias_text": "Hadoop",
"input_term": "Hadoop",
"matched_canonical": {
"category_id": 5,
"display_name": "Hadoop",
"id": 1351,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "hadoop",
"sub_category_id": 91,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 4198,
"existing_alias_text": "Hive",
"input_term": "Hive",
"matched_canonical": {
"category_id": 3,
"display_name": "Hive",
"id": 2754,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "hive",
"sub_category_id": 2242,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2510,
"existing_alias_text": "spark",
"input_term": "Spark",
"matched_canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 319,
"existing_alias_text": "Spark Streaming",
"input_term": "Spark Streaming",
"matched_canonical": {
"category_id": 5,
"display_name": "Spark Streaming",
"id": 121,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "spark-streaming",
"sub_category_id": 94,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 326,
"existing_alias_text": "Kafka Streams",
"input_term": "Kafka Streams",
"matched_canonical": {
"category_id": 7,
"display_name": "Kafka Streams",
"id": 122,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LIBRARY",
"slug": "kafka-streams",
"sub_category_id": 99,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 299,
"existing_alias_text": "Snowflake",
"input_term": "Snowflake",
"matched_canonical": {
"category_id": 9,
"display_name": "Snowflake",
"id": 105,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "snowflake",
"sub_category_id": 113,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2011,
"existing_alias_text": "HBase",
"input_term": "HBase",
"matched_canonical": {
"category_id": 3,
"display_name": "HBase",
"id": 1352,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "hbase",
"sub_category_id": 31,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2013,
"existing_alias_text": "Cassandra",
"input_term": "Cassandra",
"matched_canonical": {
"category_id": 3,
"display_name": "Cassandra",
"id": 1354,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "cassandra",
"sub_category_id": 31,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 232,
"existing_alias_text": "MongoDB",
"input_term": "MongoDB",
"matched_canonical": {
"category_id": 3,
"display_name": "MongoDB",
"id": 91,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "mongodb",
"sub_category_id": 27,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 173,
"existing_alias_text": "Kafka",
"input_term": "Kafka",
"matched_canonical": {
"category_id": 3,
"display_name": "Kafka",
"id": 36,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "kafka",
"sub_category_id": 3533,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 174,
"existing_alias_text": "RabbitMQ",
"input_term": "RabbitMQ",
"matched_canonical": {
"category_id": 13,
"display_name": "RabbitMQ",
"id": 37,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "rabbitmq",
"sub_category_id": 1880,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
}
],
"candidate_roles": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "Android Developer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "android-engineer",
"source": "db"
},
{
"display_name": "Flutter Developer",
"id": 74,
"rationale": null,
"role_archetype": "Engineering",
"slug": "flutter-developer",
"source": "db"
},
{
"display_name": "Hybrid Mobile Developer",
"id": 11,
"rationale": null,
"role_archetype": null,
"slug": "hybrid-mobile-developer",
"source": "db"
},
{
"display_name": "Native Mobile Developer",
"id": 75,
"rationale": null,
"role_archetype": "Engineering",
"slug": "native-mobile-developer",
"source": "db"
},
{
"display_name": "React Native Developer",
"id": 73,
"rationale": null,
"role_archetype": "Engineering",
"slug": "react-native-developer",
"source": "db"
},
{
"display_name": "iOS Developer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "ios-engineer",
"source": "db"
},
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
},
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
},
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
}
],
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.30 does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Hadoop",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Local Persistence and Offline Behavior",
"id": 85,
"rationale": "On-device storage used for caching, offline support, and durable client state. This cluster is coherent because iOS apps often need to preserve user progress and data when connectivity is limited.",
"slug": "local-persistence-and-offline-behavior",
"source": "db"
},
"input_skill": "Hive",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Android Developer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "android-engineer",
"source": "db"
},
{
"display_name": "Flutter Developer",
"id": 74,
"rationale": null,
"role_archetype": "Engineering",
"slug": "flutter-developer",
"source": "db"
},
{
"display_name": "Hybrid Mobile Developer",
"id": 11,
"rationale": null,
"role_archetype": null,
"slug": "hybrid-mobile-developer",
"source": "db"
},
{
"display_name": "Native Mobile Developer",
"id": 75,
"rationale": null,
"role_archetype": "Engineering",
"slug": "native-mobile-developer",
"source": "db"
},
{
"display_name": "React Native Developer",
"id": 73,
"rationale": null,
"role_archetype": "Engineering",
"slug": "react-native-developer",
"source": "db"
},
{
"display_name": "iOS Developer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "ios-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Spark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Stream Processing Systems",
"id": 25,
"rationale": "Technologies for processing event streams and near-real-time data flows. This includes stream transformations, windowing, stateful processing, and stream-to-warehouse delivery patterns.",
"slug": "stream-processing-systems",
"source": "db"
},
"input_skill": "Spark Streaming",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Stream Processing Systems",
"id": 25,
"rationale": "Technologies for processing event streams and near-real-time data flows. This includes stream transformations, windowing, stateful processing, and stream-to-warehouse delivery patterns.",
"slug": "stream-processing-systems",
"source": "db"
},
"input_skill": "Kafka Streams",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Warehouses",
"id": 22,
"rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
"slug": "cloud-data-warehouses",
"source": "db"
},
"input_skill": "Snowflake",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "HBase",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "Cassandra",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"input_skill": "MongoDB",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Asynchronous Messaging and Event Streaming",
"id": 297,
"rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
"slug": "asynchronous-messaging-and-event-streaming",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Background Jobs",
"id": 291,
"rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
"slug": "messaging-and-background-jobs",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Asynchronous Messaging and Event Streaming",
"id": 297,
"rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
"slug": "asynchronous-messaging-and-event-streaming",
"source": "db"
},
"input_skill": "RabbitMQ",
"llm_role": null,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Background Jobs",
"id": 291,
"rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
"slug": "messaging-and-background-jobs",
"source": "db"
},
"input_skill": "RabbitMQ",
"llm_role": null,
"roles_from_db": [
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"input_skill": "RabbitMQ",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_final_skills": [
"Big Data",
"Hadoop",
"MapReduce",
"Hive",
"Sqoop",
"Spark",
"ETL",
"Streaming",
"Storm",
"Spark Streaming",
"Kafka Streams",
"Pig",
"Athena",
"Glue",
"Snowflake",
"EMR",
"HBase",
"Cassandra",
"MongoDB",
"Flume",
"Kafka",
"RabbitMQ",
"Lambda Architecture"
],
"input_llm_skills": [
"Big Data",
"Hadoop",
"MapReduce",
"Hive",
"Sqoop",
"Spark",
"ETL",
"Streaming",
"Storm",
"Spark Streaming",
"Kafka Streams",
"Pig",
"Athena",
"Glue",
"Snowflake",
"EMR",
"HBase",
"Cassandra",
"MongoDB",
"Flume",
"Kafka",
"RabbitMQ",
"Lambda Architecture"
],
"new_aliases_persisted": 0,
"run_id": "de82ff01-d921-48f8-bf90-5ded046d9f4d",
"skills_detail": [
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Big Data",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "EVERGREEN",
"version_strategy": "UNVERSIONED",
"volatility": "STABLE"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "big-data",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Hadoop",
"alias_type": "CANONICAL",
"id": 2010,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 5,
"display_name": "Hadoop",
"id": 1351,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "hadoop",
"sub_category_id": 91,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Hadoop",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Hadoop",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "MapReduce",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "mapreduce",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Hive",
"alias_type": "CANONICAL",
"id": 4198,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 3,
"display_name": "Hive",
"id": 2754,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "hive",
"sub_category_id": 2242,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Local Persistence and Offline Behavior",
"id": 85,
"rationale": "On-device storage used for caching, offline support, and durable client state. This cluster is coherent because iOS apps often need to preserve user progress and data when connectivity is limited.",
"slug": "local-persistence-and-offline-behavior",
"source": "db"
},
"input_skill": "Hive",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Android Developer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "android-engineer",
"source": "db"
},
{
"display_name": "Flutter Developer",
"id": 74,
"rationale": null,
"role_archetype": "Engineering",
"slug": "flutter-developer",
"source": "db"
},
{
"display_name": "Hybrid Mobile Developer",
"id": 11,
"rationale": null,
"role_archetype": null,
"slug": "hybrid-mobile-developer",
"source": "db"
},
{
"display_name": "Native Mobile Developer",
"id": 75,
"rationale": null,
"role_archetype": "Engineering",
"slug": "native-mobile-developer",
"source": "db"
},
{
"display_name": "React Native Developer",
"id": 73,
"rationale": null,
"role_archetype": "Engineering",
"slug": "react-native-developer",
"source": "db"
},
{
"display_name": "iOS Developer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "ios-engineer",
"source": "db"
}
]
}
],
"input_skill": "Hive",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Sqoop",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "TOOL",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "sqoop",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Apache Spark",
"alias_type": "CANONICAL",
"id": 2004,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "apache spark 3",
"alias_type": "VERSION",
"id": 2006,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark",
"alias_type": "VERSION",
"id": 2510,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3",
"alias_type": "VERSION",
"id": 2007,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3.x",
"alias_type": "VERSION",
"id": 2009,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark3",
"alias_type": "VERSION",
"id": 2008,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Spark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Spark",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "ETL",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PRACTICE",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "etl",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Streaming",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "streaming",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Storm",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "TOOL",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "storm",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "DStreams",
"alias_type": "VERSION",
"id": 320,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Spark 2.x",
"alias_type": "VERSION",
"id": 321,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Spark 3.x",
"alias_type": "VERSION",
"id": 322,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Spark Streaming",
"alias_type": "VERSION",
"id": 319,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Spark Structured Streaming",
"alias_type": "VERSION",
"id": 325,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Structured Streaming",
"alias_type": "VERSION",
"id": 324,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 5,
"display_name": "Spark Streaming",
"id": 121,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "spark-streaming",
"sub_category_id": 94,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Stream Processing Systems",
"id": 25,
"rationale": "Technologies for processing event streams and near-real-time data flows. This includes stream transformations, windowing, stateful processing, and stream-to-warehouse delivery patterns.",
"slug": "stream-processing-systems",
"source": "db"
},
"input_skill": "Spark Streaming",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Spark Streaming",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Kafka Streams",
"alias_type": "CANONICAL",
"id": 326,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 7,
"display_name": "Kafka Streams",
"id": 122,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LIBRARY",
"slug": "kafka-streams",
"sub_category_id": 99,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Stream Processing Systems",
"id": 25,
"rationale": "Technologies for processing event streams and near-real-time data flows. This includes stream transformations, windowing, stateful processing, and stream-to-warehouse delivery patterns.",
"slug": "stream-processing-systems",
"source": "db"
},
"input_skill": "Kafka Streams",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Kafka Streams",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Pig",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "TOOL",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "pig",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Athena",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PLATFORM",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "athena",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Glue",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PLATFORM",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "glue",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Snowflake",
"alias_type": "CANONICAL",
"id": 299,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 9,
"display_name": "Snowflake",
"id": 105,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "snowflake",
"sub_category_id": 113,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Warehouses",
"id": 22,
"rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
"slug": "cloud-data-warehouses",
"source": "db"
},
"input_skill": "Snowflake",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Snowflake",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "EMR",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PLATFORM",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "emr",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "HBase",
"alias_type": "CANONICAL",
"id": 2011,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 3,
"display_name": "HBase",
"id": 1352,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "hbase",
"sub_category_id": 31,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "HBase",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
}
],
"input_skill": "HBase",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Cassandra",
"alias_type": "CANONICAL",
"id": 2013,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 3,
"display_name": "Cassandra",
"id": 1354,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "cassandra",
"sub_category_id": 31,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "Cassandra",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
}
],
"input_skill": "Cassandra",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "MongoDB",
"alias_type": "CANONICAL",
"id": 232,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 2.0",
"alias_type": "VERSION",
"id": 238,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 2.2",
"alias_type": "VERSION",
"id": 239,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 2.4",
"alias_type": "VERSION",
"id": 240,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 2.6",
"alias_type": "VERSION",
"id": 241,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 3.0",
"alias_type": "VERSION",
"id": 242,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 3.2",
"alias_type": "VERSION",
"id": 243,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 3.4",
"alias_type": "VERSION",
"id": 244,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 3.6",
"alias_type": "VERSION",
"id": 245,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 4",
"alias_type": "VERSION",
"id": 233,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 4.0",
"alias_type": "VERSION",
"id": 246,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 4.2",
"alias_type": "VERSION",
"id": 247,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 4.4",
"alias_type": "VERSION",
"id": 248,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 5",
"alias_type": "VERSION",
"id": 234,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 5.0",
"alias_type": "VERSION",
"id": 249,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 6",
"alias_type": "VERSION",
"id": 235,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 6.0",
"alias_type": "VERSION",
"id": 250,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 7",
"alias_type": "VERSION",
"id": 236,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 7.0",
"alias_type": "VERSION",
"id": 251,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 8",
"alias_type": "VERSION",
"id": 237,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "MongoDB 8.0",
"alias_type": "VERSION",
"id": 252,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 3,
"display_name": "MongoDB",
"id": 91,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "mongodb",
"sub_category_id": 27,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"input_skill": "MongoDB",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
]
}
],
"input_skill": "MongoDB",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Flume",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "TOOL",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "flume",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Kafka",
"alias_type": "CANONICAL",
"id": 173,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 3,
"display_name": "Kafka",
"id": 36,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "kafka",
"sub_category_id": 3533,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Asynchronous Messaging and Event Streaming",
"id": 297,
"rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
"slug": "asynchronous-messaging-and-event-streaming",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Background Jobs",
"id": 291,
"rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
"slug": "messaging-and-background-jobs",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"input_skill": "Kafka",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Kafka",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "RabbitMQ",
"alias_type": "CANONICAL",
"id": 174,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 13,
"display_name": "RabbitMQ",
"id": 37,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "rabbitmq",
"sub_category_id": 1880,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Asynchronous Messaging and Event Streaming",
"id": 297,
"rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
"slug": "asynchronous-messaging-and-event-streaming",
"source": "db"
},
"input_skill": "RabbitMQ",
"llm_role": null,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Background Jobs",
"id": 291,
"rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
"slug": "messaging-and-background-jobs",
"source": "db"
},
"input_skill": "RabbitMQ",
"llm_role": null,
"roles_from_db": [
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"input_skill": "RabbitMQ",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "RabbitMQ",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Lambda Architecture",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "lambda-architecture",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
}
],
"unmatched_skills": [
"Big Data",
"MapReduce",
"Sqoop",
"ETL",
"Streaming",
"Storm",
"Pig",
"Athena",
"Glue",
"EMR",
"Flume",
"Lambda Architecture"
]
}
API 3 — final-role-output
{
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.30 does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"chosen_role_resolution": "in_db",
"final_input_skills": [
{
"skill": "Big Data",
"tag": "new"
},
{
"skill": "Hadoop",
"tag": "in_db"
},
{
"skill": "MapReduce",
"tag": "new"
},
{
"skill": "Hive",
"tag": "in_db"
},
{
"skill": "Sqoop",
"tag": "new"
},
{
"skill": "Spark",
"tag": "in_db"
},
{
"skill": "ETL",
"tag": "new"
},
{
"skill": "Streaming",
"tag": "new"
},
{
"skill": "Storm",
"tag": "new"
},
{
"skill": "Spark Streaming",
"tag": "in_db"
},
{
"skill": "Kafka Streams",
"tag": "in_db"
},
{
"skill": "Pig",
"tag": "new"
},
{
"skill": "Athena",
"tag": "new"
},
{
"skill": "Glue",
"tag": "new"
},
{
"skill": "Snowflake",
"tag": "in_db"
},
{
"skill": "EMR",
"tag": "new"
},
{
"skill": "HBase",
"tag": "in_db"
},
{
"skill": "Cassandra",
"tag": "in_db"
},
{
"skill": "MongoDB",
"tag": "in_db"
},
{
"skill": "Flume",
"tag": "new"
},
{
"skill": "Kafka",
"tag": "in_db"
},
{
"skill": "RabbitMQ",
"tag": "in_db"
},
{
"skill": "Lambda Architecture",
"tag": "new"
}
],
"llm_cost_api1_usd": null,
"llm_cost_api2_usd": null,
"llm_cost_api3_usd": null,
"llm_cost_total_usd": null,
"persistence": {
"items": [
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"dimension_id": 24,
"input_skill": "Hadoop",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1351,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Local Persistence and Offline Behavior",
"id": 85,
"rationale": "On-device storage used for caching, offline support, and durable client state. This cluster is coherent because iOS apps often need to preserve user progress and data when connectivity is limited.",
"slug": "local-persistence-and-offline-behavior",
"source": "db"
},
"dimension_id": 85,
"input_skill": "Hive",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Android Developer",
"id": 4,
"rationale": null,
"role_archetype": null,
"slug": "android-engineer",
"source": "db"
},
{
"display_name": "Flutter Developer",
"id": 74,
"rationale": null,
"role_archetype": "Engineering",
"slug": "flutter-developer",
"source": "db"
},
{
"display_name": "Hybrid Mobile Developer",
"id": 11,
"rationale": null,
"role_archetype": null,
"slug": "hybrid-mobile-developer",
"source": "db"
},
{
"display_name": "Native Mobile Developer",
"id": 75,
"rationale": null,
"role_archetype": "Engineering",
"slug": "native-mobile-developer",
"source": "db"
},
{
"display_name": "React Native Developer",
"id": 73,
"rationale": null,
"role_archetype": "Engineering",
"slug": "react-native-developer",
"source": "db"
},
{
"display_name": "iOS Developer",
"id": 6,
"rationale": null,
"role_archetype": null,
"slug": "ios-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 2754,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"dimension_id": 24,
"input_skill": "Spark",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1350,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Stream Processing Systems",
"id": 25,
"rationale": "Technologies for processing event streams and near-real-time data flows. This includes stream transformations, windowing, stateful processing, and stream-to-warehouse delivery patterns.",
"slug": "stream-processing-systems",
"source": "db"
},
"dimension_id": 25,
"input_skill": "Spark Streaming",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 121,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Stream Processing Systems",
"id": 25,
"rationale": "Technologies for processing event streams and near-real-time data flows. This includes stream transformations, windowing, stateful processing, and stream-to-warehouse delivery patterns.",
"slug": "stream-processing-systems",
"source": "db"
},
"dimension_id": 25,
"input_skill": "Kafka Streams",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 122,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Warehouses",
"id": 22,
"rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
"slug": "cloud-data-warehouses",
"source": "db"
},
"dimension_id": 22,
"input_skill": "Snowflake",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 105,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"dimension_id": 144,
"input_skill": "HBase",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1352,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"dimension_id": 144,
"input_skill": "Cassandra",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1354,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "NoSQL Databases",
"id": 19,
"rationale": "Models and manages data using non-relational database systems.",
"slug": "nosql-databases",
"source": "db"
},
"dimension_id": 19,
"input_skill": "MongoDB",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 91,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Asynchronous Messaging and Event Streaming",
"id": 297,
"rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
"slug": "asynchronous-messaging-and-event-streaming",
"source": "db"
},
"dimension_id": 297,
"input_skill": "Kafka",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 36,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Background Jobs",
"id": 291,
"rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
"slug": "messaging-and-background-jobs",
"source": "db"
},
"dimension_id": 291,
"input_skill": "Kafka",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 36,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"dimension_id": 8,
"input_skill": "Kafka",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 36,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Asynchronous Messaging and Event Streaming",
"id": 297,
"rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
"slug": "asynchronous-messaging-and-event-streaming",
"source": "db"
},
"dimension_id": 297,
"input_skill": "RabbitMQ",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 37,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Background Jobs",
"id": 291,
"rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
"slug": "messaging-and-background-jobs",
"source": "db"
},
"dimension_id": 291,
"input_skill": "RabbitMQ",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "PHP Backend Developer",
"id": 86,
"rationale": null,
"role_archetype": "Engineering",
"slug": "php-backend-developer",
"source": "db"
},
{
"display_name": "Python Backend Developer",
"id": 80,
"rationale": null,
"role_archetype": "Engineering",
"slug": "python-backend-developer",
"source": "db"
},
{
"display_name": "Ruby Backend Developer",
"id": 85,
"rationale": null,
"role_archetype": "Engineering",
"slug": "ruby-backend-developer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 37,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Messaging and Event Streaming",
"id": 8,
"rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
"slug": "messaging-and-event-streaming",
"source": "db"
},
"dimension_id": 8,
"input_skill": "RabbitMQ",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Backend Developer",
"id": 1,
"rationale": null,
"role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
"slug": "backend-engineer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 37,
"skill_tag": "in_db",
"skipped_reason": null
}
],
"new_skills_created": 0,
"role_dimension_saved": 0,
"skill_dimension_saved": 0,
"skipped": 0
},
"planner_output": null,
"run_id": "de82ff01-d921-48f8-bf90-5ded046d9f4d"
}