Pipeline run
c4d0c034-56f3-41e7-a91c-05f8d00fc94b
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA descriptionvocab breakdown (legacy)
Signals
Post-classification
Captured for admin review
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Data Engineer
CASE Aslug: data-engineer · id: 2 · source: db
Exact alias hit on data-engineer (1.0) — no other alias at this confidence; skill_top data-engineer 0.25 does not contradict
Resolution:
in_db
— role exists in library; skill↔dim and role↔dim links saved when applicable.
Job description
Job Description Title:Big Data Developer Must Have : Pyspark, Scala Good to Have: Agile • Ingest data from disparate sources and Develop ETL jobs using Above Skills. • Do impact analysis and come up with estimates. • Take responsibility for end-to-end deliverable. Skills: scala,pyspark,agile,big data
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Aliases — catalog
- Apache Spark (CANONICAL)
- apache spark 3 (VERSION)
- spark (VERSION)
- spark 3 (VERSION)
- spark 3.x (VERSION)
- spark3 (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Framework
- Sub-category
- Distributed Data Processing Framework
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2010
- Confidence
- 0.94
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 3.x
Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.
Skill profile (library / DB)
- Skill nature
- FRAMEWORK
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 5
- Sub-category id
- 1021
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
ETL and ELT Tooling Catalog dimension db id 24
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
ETL and ELT Tooling
etl-and-elt-tooling
|
— | — |
Skipped — no persistable v3 meta for new skill
skill_not_in_db_v3_proposed
|
Aliases — catalog
- Scala (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Language
- Sub-category
- Programming Language
- Vendor
- EPFL
- License
- apache_2
- Year introduced
- 2004
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Scala still appears in many backend/data engineering JDs, especially with Spark and Akka, and remains supported by major JVM ecosystems; it’s not a sunset technology.
Skill profile (library / DB)
- Skill nature
- LANGUAGE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 6
- Sub-category id
- 96
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Programming Languages for Data Work Catalog dimension db id 21
Library dimension (catalog)
Roles linked in library: Data Engineer
-
Programming Languages for ML Systems Catalog dimension db id 39
Library dimension (catalog)
Roles linked in library: ML Engineer, MLOps Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
|
Programming Languages for ML Systems
programming-languages-for-ml-systems
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Agile (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Methodology
- Sub-category
- Agile
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Agile appears in a large share of software job descriptions and is a standard hiring-pipeline requirement; Scrum/Kanban are commonly listed alongside it, showing broad market adoption.
Skill profile (library / DB)
- Skill nature
- METHODOLOGY
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 8
- Sub-category id
- 367
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
All API 3 persistence rows
Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.
| Skill | Tag | Dimension | Skill↔dim | Role↔dim | Outcome | Notes |
|---|---|---|---|---|---|---|
| PySpark | new |
ETL and ELT Tooling
etl-and-elt-tooling
|
— | — | Skipped — no persistable v3 meta for new skill | skill_not_in_db_v3_proposed |
| Scala | in_db |
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Scala | in_db |
Programming Languages for ML Systems
programming-languages-for-ml-systems
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Agile | in_db |
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Library artifacts (this run)
| Kind | Detail | DB id |
|---|---|---|
| canonical_skill_proposed | Big Data | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| dimension_skill_link_proposed | PySpark ↔ ETL and ELT Tooling | |
| role_dimension_link_proposed | Data Engineer ↔ ETL and ELT Tooling |
nano JD Parser — gpt-4.1-nano click to toggle
Show raw JSON
{
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": null,
"min": null,
"raw": null
},
"job_locations": [],
"role": "Big Data Developer",
"role_aliases": [
"Big Data Engineer",
"Data Engineer"
],
"role_archetype": "Engineering",
"roles_and_responsibilities": [
{
"bullet_count": 0,
"heading": "Must Have",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Must Have : Pyspark, Scala",
"last_5_words": "Pyspark, Scala"
},
"text": "Pyspark, Scala",
"word_count": 2
},
{
"bullet_count": 0,
"heading": "Good to Have",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Good to Have:",
"last_5_words": "Agile"
},
"text": "Agile",
"word_count": 1
},
{
"bullet_count": 3,
"heading": "Responsibilities",
"heading_was_present": false,
"source_marker": {
"first_5_words": "\u2022 Ingest data from disparate",
"last_5_words": "end-to-end deliverable."
},
"text": "\u2022 Ingest data from disparate sources and Develop ETL jobs using Above Skills.\n\u2022 Do impact analysis and come up with estimates.\n\u2022 Take responsibility for end-to-end deliverable.",
"word_count": 33
},
{
"bullet_count": 0,
"heading": "Skills",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Skills: scala,pyspark,agile,big",
"last_5_words": "scala,pyspark,agile,big data"
},
"text": "scala,pyspark,agile,big data",
"word_count": 4
}
],
"urls": []
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "PySpark"
},
{
"is_primary": true,
"skill_name": "Scala"
},
{
"is_primary": true,
"skill_name": "Agile"
},
{
"is_primary": true,
"skill_name": "Big Data"
}
],
"jd_role": {
"display_name": "Big Data Developer",
"rationale": null,
"role_aliases": [
"Big Data Engineer",
"Data Engineer"
],
"role_archetype": "Engineering",
"slug": ""
},
"nano_parsed": {
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": null,
"min": null,
"raw": null
},
"job_locations": [],
"role": "Big Data Developer",
"role_aliases": [
"Big Data Engineer",
"Data Engineer"
],
"role_archetype": "Engineering",
"roles_and_responsibilities": [
{
"bullet_count": 0,
"heading": "Must Have",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Must Have : Pyspark, Scala",
"last_5_words": "Pyspark, Scala"
},
"text": "Pyspark, Scala",
"word_count": 2
},
{
"bullet_count": 0,
"heading": "Good to Have",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Good to Have:",
"last_5_words": "Agile"
},
"text": "Agile",
"word_count": 1
},
{
"bullet_count": 3,
"heading": "Responsibilities",
"heading_was_present": false,
"source_marker": {
"first_5_words": "\u2022 Ingest data from disparate",
"last_5_words": "end-to-end deliverable."
},
"text": "\u2022 Ingest data from disparate sources and Develop ETL jobs using Above Skills.\n\u2022 Do impact analysis and come up with estimates.\n\u2022 Take responsibility for end-to-end deliverable.",
"word_count": 33
},
{
"bullet_count": 0,
"heading": "Skills",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Skills: scala,pyspark,agile,big",
"last_5_words": "scala,pyspark,agile,big data"
},
"text": "scala,pyspark,agile,big data",
"word_count": 4
}
],
"urls": []
},
"rejected": false,
"rejection_reason": null,
"run_id": "c4d0c034-56f3-41e7-a91c-05f8d00fc94b",
"stage3_signals": {
"alias_found": true,
"alias_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
}
],
"kra_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": [
{
"kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
"sentence": "Ingest data from disparate sources and Develop ETL jobs using Above Skills.",
"similarity": 0.6053
},
{
"kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
"sentence": "Do impact analysis and come up with estimates.",
"similarity": 0.3341
},
{
"kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
"sentence": "Take responsibility for end-to-end deliverable.",
"similarity": 0.288
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 0.4091,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "Flutter Developer",
"kra_matches": [
{
"kra_text": "integrate external APIs and data sources",
"sentence": "Ingest data from disparate sources and Develop ETL jobs using Above Skills.",
"similarity": 0.4563
},
{
"kra_text": "collaborate with design, product, and backend teams",
"sentence": "Take responsibility for end-to-end deliverable.",
"similarity": 0.4092
},
{
"kra_text": "translate product and design requirements",
"sentence": "Do impact analysis and come up with estimates.",
"similarity": 0.3431
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 74,
"score": 0.4029,
"slug": "flutter-developer",
"total_count": null
},
{
"display_name": "Pega Developer",
"kra_matches": [
{
"kra_text": "Requirements analysis and process translation",
"sentence": "Do impact analysis and come up with estimates.",
"similarity": 0.4317
},
{
"kra_text": "defect troubleshooting and resolution",
"sentence": "Take responsibility for end-to-end deliverable.",
"similarity": 0.3534
},
{
"kra_text": "Requirements analysis and process translation",
"sentence": "Ingest data from disparate sources and Develop ETL jobs using Above Skills.",
"similarity": 0.34
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 24,
"score": 0.375,
"slug": "pega-developer",
"total_count": null
},
{
"display_name": "Cyber Security Engineer",
"kra_matches": [
{
"kra_text": "Performs threat modeling, security architecture reviews, and quantitative risk analysis for new product features and infrastructure changes.",
"sentence": "Do impact analysis and come up with estimates.",
"similarity": 0.4527
},
{
"kra_text": "Builds SIEM detection rules, correlation queries, and alerts to monitor for threat indicators and suspicious activity across systems.",
"sentence": "Ingest data from disparate sources and Develop ETL jobs using Above Skills.",
"similarity": 0.3436
},
{
"kra_text": "Defines secure engineering standards, secure coding guidelines, threat intelligence feeds, and compliance requirements for the organization.",
"sentence": "Take responsibility for end-to-end deliverable.",
"similarity": 0.3271
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 5,
"score": 0.3745,
"slug": "cybersecurity-engineer",
"total_count": null
},
{
"display_name": "MLOps Engineer",
"kra_matches": [
{
"kra_text": "Manages the end-to-end ML model release lifecycle from training job completion through validation gates to production deployment approval.",
"sentence": "Take responsibility for end-to-end deliverable.",
"similarity": 0.3997
},
{
"kra_text": "Automates ML platform operations including scheduled retraining triggers, pipeline orchestration, evaluation workflows, and alerting configuration.",
"sentence": "Ingest data from disparate sources and Develop ETL jobs using Above Skills.",
"similarity": 0.3729
},
{
"kra_text": "Defines and executes model rollback procedures including traffic shifting, shadow deployment cutover, and incident-triggered rollback automation.",
"sentence": "Do impact analysis and come up with estimates.",
"similarity": 0.3259
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 16,
"score": 0.3662,
"slug": "ml-ops-engineer",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": 1,
"matched_skills": [
"Scala"
],
"role_id": 2,
"score": 0.25,
"slug": "data-engineer",
"total_count": 4
},
{
"display_name": "ML Engineer",
"kra_matches": null,
"matched_count": 1,
"matched_skills": [
"Scala"
],
"role_id": 3,
"score": 0.25,
"slug": "ml-engineer",
"total_count": 4
},
{
"display_name": "MLOps Engineer",
"kra_matches": null,
"matched_count": 1,
"matched_skills": [
"Scala"
],
"role_id": 16,
"score": 0.25,
"slug": "ml-ops-engineer",
"total_count": 4
}
]
},
"stage4_decision": {
"alias_collision_detected": false,
"case": "A",
"chosen_role": {
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
},
"confidence": 1.0,
"is_new_role": false,
"llm2_fired": false,
"llm2_reasoning": null,
"matched_dimensions": [],
"matched_kras": [],
"matched_skills": [],
"new_role_display_name": null,
"new_role_slug": null,
"queued": false,
"reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.25 does not contradict",
"sub_role": null
},
"stage5_updates": {
"centroid_n_after": 203,
"centroid_updated": true,
"collision_log_id": null,
"new_kra_attached": null,
"new_skills_attached": [
{
"is_primary": true,
"queue_id": 10635,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "PySpark",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 10636,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Big Data",
"status": "pending"
}
],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{
"alias_matches": [
{
"alias_persist_skipped_reason": "TODO: REMOVE AFTER TESTING \u2014 alias DB write disabled",
"alias_persisted": false,
"existing_alias_id": 2004,
"existing_alias_text": "Apache Spark",
"input_term": "PySpark",
"matched_canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "embedding_alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 272,
"existing_alias_text": "Scala",
"input_term": "Scala",
"matched_canonical": {
"category_id": 6,
"display_name": "Scala",
"id": 102,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "scala",
"sub_category_id": 96,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 868,
"existing_alias_text": "Agile",
"input_term": "Agile",
"matched_canonical": {
"category_id": 8,
"display_name": "Agile",
"id": 520,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "METHODOLOGY",
"slug": "agile",
"sub_category_id": 367,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
}
],
"candidate_roles": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
],
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.25 does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "PySpark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "Scala",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 39,
"rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"input_skill": "Scala",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Agile",
"llm_role": null,
"roles_from_db": []
}
],
"input_final_skills": [
"PySpark",
"Scala",
"Agile",
"Big Data"
],
"input_llm_skills": [
"PySpark",
"Scala",
"Agile",
"Big Data"
],
"new_aliases_persisted": 0,
"run_id": "c4d0c034-56f3-41e7-a91c-05f8d00fc94b",
"skills_detail": [
{
"aliases_in_db": [
{
"alias_text": "Apache Spark",
"alias_type": "CANONICAL",
"id": 2004,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "apache spark 3",
"alias_type": "VERSION",
"id": 2006,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark",
"alias_type": "VERSION",
"id": 2510,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3",
"alias_type": "VERSION",
"id": 2007,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3.x",
"alias_type": "VERSION",
"id": 2009,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark3",
"alias_type": "VERSION",
"id": 2008,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "PySpark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "PySpark",
"matched_via": "embedding_alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Scala",
"alias_type": "CANONICAL",
"id": 272,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 6,
"display_name": "Scala",
"id": 102,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "scala",
"sub_category_id": 96,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "Scala",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 39,
"rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"input_skill": "Scala",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
]
}
],
"input_skill": "Scala",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Agile",
"alias_type": "CANONICAL",
"id": 868,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 8,
"display_name": "Agile",
"id": 520,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "METHODOLOGY",
"slug": "agile",
"sub_category_id": 367,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Agile",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Agile",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Big Data",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "big-data",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
}
],
"unmatched_skills": [
"Big Data"
]
}
API 3 — final-role-output
{
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.25 does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"chosen_role_resolution": "in_db",
"final_input_skills": [
{
"skill": "PySpark",
"tag": "in_db"
},
{
"skill": "Scala",
"tag": "in_db"
},
{
"skill": "Agile",
"tag": "in_db"
},
{
"skill": "Big Data",
"tag": "new"
}
],
"llm_cost_api1_usd": null,
"llm_cost_api2_usd": null,
"llm_cost_api3_usd": null,
"llm_cost_total_usd": null,
"persistence": {
"items": [
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"dimension_id": 24,
"input_skill": "PySpark",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": false,
"skill_id": null,
"skill_tag": "new",
"skipped_reason": "skill_not_in_db_v3_proposed"
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"dimension_id": 21,
"input_skill": "Scala",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 102,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for ML Systems",
"id": 39,
"rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
"slug": "programming-languages-for-ml-systems",
"source": "db"
},
"dimension_id": 39,
"input_skill": "Scala",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 102,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Agile",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 520,
"skill_tag": "in_db",
"skipped_reason": null
}
],
"new_skills_created": 0,
"role_dimension_saved": 0,
"skill_dimension_saved": 0,
"skipped": 1
},
"planner_output": null,
"run_id": "c4d0c034-56f3-41e7-a91c-05f8d00fc94b"
}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.