← Back to history

Pipeline run

9bd37876-38fa-4481-9ba5-d79d3dc251b3

Pipeline LLM cost (USD)
API 1: $0.0034 API 2: $0.0003 API 3: $0.0000 Total: $0.0037

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd
Nature of work · Data transformation and modeling
Develop Scala/Spark data transformation jobs in Maven, using Hive/SQL and HDFS/Unix scripting to handle large-volume data, and integrate changes through CI/CD while meeting business design standards.
"Develop and maintain highly scalable, high performance Data transformation applications using Apache Spark framework"
Tech stack maturity
Mainstream Legacy
The stack centers on Spark, Scala, Hive, and HBase with Maven and CI/CD, which is common in established data platforms but not strongly cloud-native or bleeding-edge.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
0.00 / 5
· Title match
· Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
Frameworks (×2):
Models / concepts (×3):
Evidence — skills matched in JD (13)
Scala Maven Apache Spark Unix Shell Scripting Unix HDFS Oracle SQL Hive SQL Spark SQL Impala HBase CI/CD Hive
Skill cluster (4 dimension groups, role-scoped)
Build and Dependency Management
Maven
ETL and ELT Tooling
Apache Spark
Programming Languages for Data Work
Scala
Cross-cutting / unaligned
Unix Shell Scripting Unix HDFS Oracle SQL Hive SQL Spark SQL Impala HBase CI/CD Hive
Show KRA description ↓
• Hands on development experience in programming languages such as SCALA using Maven, Apache Spark Frameworks and Unix Shell scripting • Should be comfortable with Unix File system as well as HDFS commands • Should have worked on query languages such as Oracle SQL, Hive SQL, Spark SQL, Impala, HBase DB • Should be flexible • Should have good communication and customer management skills • Design high quality deliverables adhering to business requirement with defined standards and design principles, patterns • Develop and maintain highly scalable, high performance Data transformation applications using Apache Spark framework • Develop/Integrate the code adhering to CI/CD, using Spark Framework in Scala • Provide solutions to Big data problems dealing with huge volumes of data using Spark based data transformation solutions , Hive

Signals

Skill devops-engineer
0.15
Alias data-engineer
1.00
KRA data-engineer
0.61

Post-classification

Centroidupdated · n=454
Alias collision log
New-role queue
New skills captured7
New KRA captured

Captured for admin review

Unix Shell Scripting primary Data Engineer pending
Unix primary Data Engineer pending
HDFS primary Data Engineer pending
Oracle SQL primary Data Engineer pending
Hive SQL primary Data Engineer pending
Spark SQL primary Data Engineer pending
Impala primary Data Engineer pending
Status: completed Created: 2026-05-27T16:38:04.082865Z Updated: 2026-05-27T16:39:31.064507Z API 3 duration: 17282 ms
Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

CASE A

slug: data-engineer · id: 2 · source: db

Exact alias hit on data-engineer (1.0) — no other alias at this confidence; skill_top devops-engineer 0.15 does not contradict

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

0
New skills
0
Skill↔dim saved
0
Role↔dim saved
2
Skipped

Job description

Greetings from TCS !!


Role : BigData Developer
Experience Range : 3 to 8 years
Location : Chennai/ Hyderabad/ Bangalore/ Pune/ Kolkata
Skills : BigData, Pyspark, Scala, Hadoop, Hive


Must Have :
• Hands on development experience in programming languages such as SCALA using Maven, Apache Spark Frameworks and Unix Shell scripting 
• Should be comfortable with Unix File system as well as HDFS commands 
• Should have worked on query languages such as Oracle SQL, Hive SQL, Spark SQL, Impala, HBase DB 
• Should be flexible 
• Should have good communication and customer management skills


Roles and Responsibilities :
• Design high quality deliverables adhering to business requirement with defined standards and design principles, patterns
• Develop and maintain highly scalable, high performance Data transformation applications using Apache Spark framework
• Develop/Integrate the code adhering to CI/CD, using Spark Framework in Scala
• Provide solutions to Big data problems dealing with huge volumes of data using Spark based data transformation solutions , Hive

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Scala Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Scala id=102 · scala

Aliases — catalog

  • Scala (CANONICAL) primary

Context tags (catalog)

Akka Apache Kafka Cats Flink JVM Monads Play Framework SBT ScalaTest Shapeless Spark Spark SQL ZIO case class for-comprehension functional programming implicit pattern matching typeclass

Stored enrichment (catalog DB)

Category
Language
Sub-category
Programming Language
Vendor
EPFL
License
apache_2
Year introduced
2004
Confidence
0.99
Version strategy
NOT_APPLICABLE

Maturity reasoning: Scala still appears in many backend/data engineering JDs, especially with Spark and Akka, and remains supported by major JVM ecosystems; it’s not a sunset technology.

Skill profile (library / DB)

Skill nature
LANGUAGE
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
6
Sub-category id
96
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Programming Languages for Data Work Catalog dimension db id 21

    Library dimension (catalog)

    Roles linked in library: Data Engineer

  • Programming Languages for ML Systems Catalog dimension db id 39

    Library dimension (catalog)

    Roles linked in library: ML Engineer, MLOps Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Programming Languages for Data Work
programming-languages-for-data-work
Existing dimension (library) · Role↔dimension saved
Programming Languages for ML Systems
programming-languages-for-ml-systems
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Maven Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Maven id=815 · maven

Aliases — catalog

  • Maven (CANONICAL) primary

Context tags (catalog)

CI/CD JAR Java Maven Central artifact build automation build lifecycle dependency management integration testing multi-module plugins pom.xml project object model release management repository site generation versioning

Stored enrichment (catalog DB)

Category
Tool
Sub-category
Build Tool
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2004
Confidence
0.95
Version strategy
NOT_APPLICABLE

Maturity reasoning: Maven remains a standard Java build tool, appearing in many Java/JVM job descriptions and widely used in enterprise CI/CD pipelines alongside Gradle.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
13
Sub-category id
358
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Build and Dependency Management Catalog dimension db id 286

    Library dimension (catalog)

    Roles linked in library: Java Backend Developer

  • Build and Packaging Tooling Catalog dimension db id 149

    Library dimension (catalog)

    Roles linked in library: DevOps Engineer, Ionic Developer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Build and Dependency Management
build-and-dependency-management
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Build and Packaging Tooling
build-and-packaging-tooling
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Apache Spark Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Apache Spark id=1350 · apache-spark

Aliases — catalog

  • Apache Spark (CANONICAL)
  • apache spark 3 (VERSION)
  • spark (VERSION)
  • spark 3 (VERSION)
  • spark 3.x (VERSION)
  • spark3 (VERSION)

Context tags (catalog)

Apache Kafka Cluster Manager DAGScheduler Data Lake DataFrame ETL Hadoop MLlib Machine Learning PySpark RDD Scala Spark SQL Spark Streaming SparkSession

Stored enrichment (catalog DB)

Category
Framework
Sub-category
Distributed Data Processing Framework
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2010
Confidence
0.94
Version strategy
SEPARATE_ENTITY
Version tag
3.x

Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.

Skill profile (library / DB)

Skill nature
FRAMEWORK
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
5
Sub-category id
1021
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • ETL and ELT Tooling Catalog dimension db id 24

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
Unix Shell Scripting Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Programming Languages
Sub-category
general
Skill nature
LANGUAGE
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Unix Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Operating Systems
Sub-category
general
Skill nature
PLATFORM
Volatility
STABLE
Typical lifespan
EVERGREEN
Version strategy
UNVERSIONED
HDFS Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
TOOL
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Oracle SQL Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Oracle Database id=1566 · oracle-database

Aliases — catalog

  • Oracle Database (CANONICAL) primary

Context tags (catalog)

Backup and Recovery Data Guard Data Warehousing Indexing Oracle APEX Oracle Cloud Oracle Enterprise Manager Oracle Exadata Oracle Forms Oracle Fusion Middleware Oracle RAC Oracle SQL Oracle SQL*Plus PL/SQL Partitioning Performance Tuning RMAN SQL Developer

Stored enrichment (catalog DB)

Category
Datastore
Sub-category
Relational Database
Vendor
Oracle Corporation
License
proprietary
Year introduced
1979
Confidence
0.98
Version strategy
NOT_APPLICABLE

Maturity reasoning: Oracle Database appears in many enterprise job postings and remains a standard RDBMS in large regulated environments; Oracle continues active vendor support and releases, indicating broad market demand.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
3
Sub-category id
29
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • React Frontend Development Catalog dimension db id 96

    Library dimension (catalog)

  • Relational Database Design Catalog dimension db id 4

    Library dimension (catalog)

    Roles linked in library: .NET Backend Developer, Backend Developer, Kotlin Backend Developer, Node.js Backend Developer, Python Backend Developer, Ruby Backend Developer, Scala Backend Developer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
React Frontend Development
d_init_01
Skipped — no persistable v3 meta for new skill
skill_not_in_db_v3_proposed
Relational Database Design
relational-database-design
Skipped — no persistable v3 meta for new skill
skill_not_in_db_v3_proposed
Hive SQL Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Query Languages
Sub-category
general
Skill nature
LANGUAGE
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Spark SQL Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Query Languages
Sub-category
general
Skill nature
LANGUAGE
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Impala Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
TOOL
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
HBase Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: HBase id=1352 · hbase

Aliases — catalog

  • HBase (CANONICAL)

Context tags (catalog)

Apache Bigtable Hadoop MapReduce NoSQL REST API Thrift column family data model data replication distributed real-time region server scalability table design

Stored enrichment (catalog DB)

Category
Datastore
Sub-category
Wide Column Store
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2010
Confidence
0.98
Version strategy
NOT_APPLICABLE

Maturity reasoning: HBase appears in a limited set of big-data/legacy Hadoop job postings, while newer JDs more often specify DynamoDB, Bigtable, or Cassandra; its market demand is specialized rather than broad.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
3
Sub-category id
31
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Cloud Storage and Data Services Catalog dimension db id 144

    Library dimension (catalog)

    Roles linked in library: Cloud Architect

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Cloud Storage and Data Services
cloud-storage-and-data-services
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
CI/CD Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: CI/CD id=1190 · ci-cd

Aliases — catalog

  • CI/CD (CANONICAL)

Context tags (catalog)

Ansible CircleCI Docker GitLab CI Jenkins Kubernetes Terraform Travis CI automated testing build automation continuous deployment continuous integration deployment pipelines monitoring version control

Stored enrichment (catalog DB)

Category
Methodology
Sub-category
Ci Cd Process
Confidence
0.93
Version strategy
NOT_APPLICABLE

Maturity reasoning: CI/CD appears in a large share of software engineering JDs and is a standard requirement across DevOps, platform, and backend roles; major vendors like GitHub, GitLab, and AWS all center product roadmaps on CI/CD pipelines.

Skill profile (library / DB)

Skill nature
METHODOLOGY
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
8
Sub-category id
900
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • CI/CD Pipeline Platforms Catalog dimension db id 150

    Library dimension (catalog)

    Roles linked in library: DevOps Engineer

  • CI/CD for Machine Learning Catalog dimension db id 56

    Library dimension (catalog)

    Roles linked in library: ML Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
CI/CD Pipeline Platforms
ci-cd-pipeline-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
CI/CD for Machine Learning
ci-cd-for-machine-learning
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Hive Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Hive id=2754 · hive

Aliases — catalog

  • Hive (CANONICAL) primary

Context tags (catalog)

Apache Apache Hive Bucketing ETL HQL Hive Metastore Hive SerDe HiveQL MapReduce SQL SQL-on-Hadoop big data bucketing columnar storage data lakes data warehousing integration metadata partitioning schema evolution

Stored enrichment (catalog DB)

Category
Datastore
Sub-category
Local Key Value Store
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2010
Confidence
0.90
Version strategy
NOT_APPLICABLE

Maturity reasoning: Hive appears in Flutter/mobile JDs and package docs, but JD volume is far below SQLite/Realm and it’s mainly used for local key-value storage in Flutter apps.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
3
Sub-category id
2242
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Local Persistence and Offline Behavior Catalog dimension db id 85

    Library dimension (catalog)

    Roles linked in library: Android Developer, Flutter Developer, Hybrid Mobile Developer, Native Mobile Developer, React Native Developer, iOS Developer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Local Persistence and Offline Behavior
local-persistence-and-offline-behavior
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill Tag Dimension Skill↔dim Role↔dim Outcome Notes
Scala in_db
Programming Languages for Data Work
programming-languages-for-data-work
Existing dimension (library) · Role↔dimension saved
Scala in_db
Programming Languages for ML Systems
programming-languages-for-ml-systems
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Maven in_db
Build and Dependency Management
build-and-dependency-management
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Maven in_db
Build and Packaging Tooling
build-and-packaging-tooling
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Apache Spark in_db
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
Oracle SQL new
React Frontend Development
d_init_01
Skipped — no persistable v3 meta for new skill skill_not_in_db_v3_proposed
Oracle SQL new
Relational Database Design
relational-database-design
Skipped — no persistable v3 meta for new skill skill_not_in_db_v3_proposed
HBase in_db
Cloud Storage and Data Services
cloud-storage-and-data-services
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
CI/CD in_db
CI/CD Pipeline Platforms
ci-cd-pipeline-platforms
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
CI/CD in_db
CI/CD for Machine Learning
ci-cd-for-machine-learning
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Hive in_db
Local Persistence and Offline Behavior
local-persistence-and-offline-behavior
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Library artifacts (this run)

Kind Detail DB id
canonical_skill_proposed Unix Shell Scripting | type=Programming Languages subtype=general nature=LANGUAGE lifespan=MULTI_YEAR
canonical_skill_proposed Unix | type=Operating Systems subtype=general nature=PLATFORM lifespan=EVERGREEN
canonical_skill_proposed HDFS | type=Data Engineering Tools subtype=general nature=TOOL lifespan=MULTI_YEAR
canonical_skill_proposed Hive SQL | type=Query Languages subtype=general nature=LANGUAGE lifespan=MULTI_YEAR
canonical_skill_proposed Spark SQL | type=Query Languages subtype=general nature=LANGUAGE lifespan=MULTI_YEAR
canonical_skill_proposed Impala | type=Data Engineering Tools subtype=general nature=TOOL lifespan=MULTI_YEAR
dimension_skill_link_proposed Oracle SQL ↔ React Frontend Development
dimension_skill_link_proposed Oracle SQL ↔ Relational Database Design
nano JD Parser — gpt-4.1-nano click to toggle
RoleBigData Developer
CompanyTCS
Experience3 to 8 years
DomainIT Services & Consulting
Location Chennai
JD type pass
Show raw JSON
{
  "JD_type": "pass",
  "about_company": null,
  "certifications": [],
  "company_name": "TCS",
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [
        "ITES",
        "BPO",
        "Tech Consulting"
      ],
      "domain": "IT Services \u0026 Consulting"
    },
    "secondary": null
  },
  "education": [],
  "experience": {
    "max": 8,
    "min": 3,
    "raw": "3 to 8 years"
  },
  "job_locations": [
    {
      "aliases": [],
      "city": "Chennai",
      "country": null,
      "state": null,
      "work_mode": null
    },
    {
      "aliases": [],
      "city": "Hyderabad",
      "country": null,
      "state": null,
      "work_mode": null
    },
    {
      "aliases": [
        "Bengaluru"
      ],
      "city": "Bangalore",
      "country": null,
      "state": null,
      "work_mode": null
    },
    {
      "aliases": [],
      "city": "Pune",
      "country": null,
      "state": null,
      "work_mode": null
    },
    {
      "aliases": [
        "Calcutta"
      ],
      "city": "Kolkata",
      "country": null,
      "state": null,
      "work_mode": null
    }
  ],
  "role": "BigData Developer",
  "role_aliases": [
    "Big Data Engineer",
    "Big Data Developer",
    "Data Engineer"
  ],
  "role_archetype": "Engineering",
  "roles_and_responsibilities": [
    {
      "bullet_count": 5,
      "heading": "Must Have",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "\u2022 Hands on development experience",
        "last_5_words": "communication and customer management skills"
      },
      "text": "\u2022 Hands on development experience in programming languages such as SCALA using Maven, Apache Spark Frameworks and Unix Shell scripting \n\u2022 Should be comfortable with Unix File system as well as HDFS commands \n\u2022 Should have worked on query languages such as Oracle SQL, Hive SQL, Spark SQL, Impala, HBase DB \n\u2022 Should be flexible \n\u2022 Should have good communication and customer management skills",
      "word_count": 51
    },
    {
      "bullet_count": 4,
      "heading": "Roles and Responsibilities",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "\u2022 Design high quality deliverables",
        "last_5_words": "data transformation solutions , Hive"
      },
      "text": "\u2022 Design high quality deliverables adhering to business requirement with defined standards and design principles, patterns\n\u2022 Develop and maintain highly scalable, high performance Data transformation applications using Apache Spark framework\n\u2022 Develop/Integrate the code adhering to CI/CD, using Spark Framework in Scala\n\u2022 Provide solutions to Big data problems dealing with huge volumes of data using Spark based data transformation solutions , Hive",
      "word_count": 64
    }
  ],
  "urls": []
}
API 1 — extract-from-jd click to toggle
{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Scala"
    },
    {
      "is_primary": true,
      "skill_name": "Maven"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Spark"
    },
    {
      "is_primary": true,
      "skill_name": "Unix Shell Scripting"
    },
    {
      "is_primary": true,
      "skill_name": "Unix"
    },
    {
      "is_primary": true,
      "skill_name": "HDFS"
    },
    {
      "is_primary": true,
      "skill_name": "Oracle SQL"
    },
    {
      "is_primary": true,
      "skill_name": "Hive SQL"
    },
    {
      "is_primary": true,
      "skill_name": "Spark SQL"
    },
    {
      "is_primary": true,
      "skill_name": "Impala"
    },
    {
      "is_primary": true,
      "skill_name": "HBase"
    },
    {
      "is_primary": true,
      "skill_name": "CI/CD"
    },
    {
      "is_primary": true,
      "skill_name": "Hive"
    }
  ],
  "jd_role": {
    "display_name": "BigData Developer",
    "rationale": null,
    "role_aliases": [
      "Big Data Engineer",
      "Big Data Developer",
      "Data Engineer"
    ],
    "role_archetype": "Engineering",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": null,
    "certifications": [],
    "company_name": "TCS",
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [
          "ITES",
          "BPO",
          "Tech Consulting"
        ],
        "domain": "IT Services \u0026 Consulting"
      },
      "secondary": null
    },
    "education": [],
    "experience": {
      "max": 8,
      "min": 3,
      "raw": "3 to 8 years"
    },
    "job_locations": [
      {
        "aliases": [],
        "city": "Chennai",
        "country": null,
        "state": null,
        "work_mode": null
      },
      {
        "aliases": [],
        "city": "Hyderabad",
        "country": null,
        "state": null,
        "work_mode": null
      },
      {
        "aliases": [
          "Bengaluru"
        ],
        "city": "Bangalore",
        "country": null,
        "state": null,
        "work_mode": null
      },
      {
        "aliases": [],
        "city": "Pune",
        "country": null,
        "state": null,
        "work_mode": null
      },
      {
        "aliases": [
          "Calcutta"
        ],
        "city": "Kolkata",
        "country": null,
        "state": null,
        "work_mode": null
      }
    ],
    "role": "BigData Developer",
    "role_aliases": [
      "Big Data Engineer",
      "Big Data Developer",
      "Data Engineer"
    ],
    "role_archetype": "Engineering",
    "roles_and_responsibilities": [
      {
        "bullet_count": 5,
        "heading": "Must Have",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "\u2022 Hands on development experience",
          "last_5_words": "communication and customer management skills"
        },
        "text": "\u2022 Hands on development experience in programming languages such as SCALA using Maven, Apache Spark Frameworks and Unix Shell scripting \n\u2022 Should be comfortable with Unix File system as well as HDFS commands \n\u2022 Should have worked on query languages such as Oracle SQL, Hive SQL, Spark SQL, Impala, HBase DB \n\u2022 Should be flexible \n\u2022 Should have good communication and customer management skills",
        "word_count": 51
      },
      {
        "bullet_count": 4,
        "heading": "Roles and Responsibilities",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "\u2022 Design high quality deliverables",
          "last_5_words": "data transformation solutions , Hive"
        },
        "text": "\u2022 Design high quality deliverables adhering to business requirement with defined standards and design principles, patterns\n\u2022 Develop and maintain highly scalable, high performance Data transformation applications using Apache Spark framework\n\u2022 Develop/Integrate the code adhering to CI/CD, using Spark Framework in Scala\n\u2022 Provide solutions to Big data problems dealing with huge volumes of data using Spark based data transformation solutions , Hive",
        "word_count": 64
      }
    ],
    "urls": []
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "9bd37876-38fa-4481-9ba5-d79d3dc251b3",
  "stage3_signals": {
    "alias_found": true,
    "alias_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 1.0,
        "slug": "data-engineer",
        "total_count": null
      }
    ],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": [
          {
            "kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
            "sentence": "Develop and maintain highly scalable, high performance Data transformation applications using Apache Spark framework",
            "similarity": 0.6959
          },
          {
            "kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
            "sentence": "Provide solutions to Big data problems dealing with huge volumes of data using Spark based data transformation solutions , Hive",
            "similarity": 0.5986
          },
          {
            "kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
            "sentence": "Develop/Integrate the code adhering to CI/CD, using Spark Framework in Scala",
            "similarity": 0.5424
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 0.6123,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "Fullstack Developer",
        "kra_matches": [
          {
            "kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
            "sentence": "Should have worked on query languages such as Oracle SQL, Hive SQL, Spark SQL, Impala, HBase DB",
            "similarity": 0.5044
          },
          {
            "kra_text": "Works closely with product managers and UX designers to translate requirements and wireframes into working software features through iterative development.",
            "sentence": "Design high quality deliverables adhering to business requirement with defined standards and design principles, patterns",
            "similarity": 0.4794
          },
          {
            "kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
            "sentence": "Develop and maintain highly scalable, high performance Data transformation applications using Apache Spark framework",
            "similarity": 0.4336
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 15,
        "score": 0.4725,
        "slug": "full-stack-engineer",
        "total_count": null
      },
      {
        "display_name": "ML Engineer",
        "kra_matches": [
          {
            "kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
            "sentence": "Develop and maintain highly scalable, high performance Data transformation applications using Apache Spark framework",
            "similarity": 0.4859
          },
          {
            "kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
            "sentence": "Provide solutions to Big data problems dealing with huge volumes of data using Spark based data transformation solutions , Hive",
            "similarity": 0.4503
          },
          {
            "kra_text": "Translates product requirements into machine learning system specifications including feature definitions, model architecture choices, and success metric definitions.",
            "sentence": "Design high quality deliverables adhering to business requirement with defined standards and design principles, patterns",
            "similarity": 0.3963
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 3,
        "score": 0.4442,
        "slug": "ml-engineer",
        "total_count": null
      },
      {
        "display_name": "Angular Frontend Developer",
        "kra_matches": [
          {
            "kra_text": "collaboration with design and QA",
            "sentence": "Design high quality deliverables adhering to business requirement with defined standards and design principles, patterns",
            "similarity": 0.5011
          },
          {
            "kra_text": "code review and refactoring",
            "sentence": "Develop/Integrate the code adhering to CI/CD, using Spark Framework in Scala",
            "similarity": 0.4346
          },
          {
            "kra_text": "code review and refactoring",
            "sentence": "Hands on development experience in programming languages such as SCALA using Maven, Apache Spark Frameworks and Unix Shell scripting",
            "similarity": 0.3626
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 90,
        "score": 0.4328,
        "slug": "angular-frontend-developer",
        "total_count": null
      },
      {
        "display_name": "Backend Developer",
        "kra_matches": [
          {
            "kra_text": "Writes database access logic including SQL queries, ORM mappings, stored procedures, and migration scripts for relational databases like PostgreSQL and MySQL.",
            "sentence": "Should have worked on query languages such as Oracle SQL, Hive SQL, Spark SQL, Impala, HBase DB",
            "similarity": 0.4447
          },
          {
            "kra_text": "Identifies and resolves backend performance bottlenecks through query optimization, indexing strategies, connection pooling, and distributed caching with Redis.",
            "sentence": "Develop and maintain highly scalable, high performance Data transformation applications using Apache Spark framework",
            "similarity": 0.4193
          },
          {
            "kra_text": "Integrates with third-party services, payment gateways, messaging queues like Kafka or RabbitMQ, and internal microservices via HTTP and event-driven patterns.",
            "sentence": "Develop/Integrate the code adhering to CI/CD, using Spark Framework in Scala",
            "similarity": 0.4159
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 1,
        "score": 0.4266,
        "slug": "backend-engineer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "DevOps Engineer",
        "kra_matches": null,
        "matched_count": 2,
        "matched_skills": [
          "CI/CD",
          "Maven"
        ],
        "role_id": 10,
        "score": 0.1538,
        "slug": "devops-engineer",
        "total_count": 13
      },
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": 2,
        "matched_skills": [
          "Apache Spark",
          "Scala"
        ],
        "role_id": 2,
        "score": 0.1538,
        "slug": "data-engineer",
        "total_count": 13
      },
      {
        "display_name": "ML Engineer",
        "kra_matches": null,
        "matched_count": 2,
        "matched_skills": [
          "CI/CD",
          "Scala"
        ],
        "role_id": 3,
        "score": 0.1538,
        "slug": "ml-engineer",
        "total_count": 13
      },
      {
        "display_name": "Cloud Architect",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "HBase"
        ],
        "role_id": 9,
        "score": 0.0769,
        "slug": "cloud-architect",
        "total_count": 13
      },
      {
        "display_name": "iOS Developer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Hive"
        ],
        "role_id": 6,
        "score": 0.0769,
        "slug": "ios-engineer",
        "total_count": 13
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "A",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 1.0,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 1.0,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [],
    "matched_kras": [],
    "matched_skills": [],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top devops-engineer 0.15 does not contradict",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 454,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": null,
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 21060,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Unix Shell Scripting",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 21061,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Unix",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 21062,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "HDFS",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 21063,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Oracle SQL",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 21064,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Hive SQL",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 21065,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Spark SQL",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 21066,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Impala",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}
API 2 — extract-details
{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 272,
      "existing_alias_text": "Scala",
      "input_term": "Scala",
      "matched_canonical": {
        "category_id": 6,
        "display_name": "Scala",
        "id": 102,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "scala",
        "sub_category_id": 96,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 1360,
      "existing_alias_text": "Maven",
      "input_term": "Maven",
      "matched_canonical": {
        "category_id": 13,
        "display_name": "Maven",
        "id": 815,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "maven",
        "sub_category_id": 358,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2004,
      "existing_alias_text": "Apache Spark",
      "input_term": "Apache Spark",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "TODO: REMOVE AFTER TESTING \u2014 alias DB write disabled",
      "alias_persisted": false,
      "existing_alias_id": 2512,
      "existing_alias_text": "Oracle Database",
      "input_term": "Oracle SQL",
      "matched_canonical": {
        "category_id": 3,
        "display_name": "Oracle Database",
        "id": 1566,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "oracle-database",
        "sub_category_id": 29,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "embedding_alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2011,
      "existing_alias_text": "HBase",
      "input_term": "HBase",
      "matched_canonical": {
        "category_id": 3,
        "display_name": "HBase",
        "id": 1352,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "hbase",
        "sub_category_id": 31,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 1826,
      "existing_alias_text": "CI/CD",
      "input_term": "CI/CD",
      "matched_canonical": {
        "category_id": 8,
        "display_name": "CI/CD",
        "id": 1190,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "METHODOLOGY",
        "slug": "ci-cd",
        "sub_category_id": 900,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 4198,
      "existing_alias_text": "Hive",
      "input_term": "Hive",
      "matched_canonical": {
        "category_id": 3,
        "display_name": "Hive",
        "id": 2754,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "hive",
        "sub_category_id": 2242,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "Data Engineer",
      "id": 2,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-engineer",
      "source": "db"
    },
    {
      "display_name": "ML Engineer",
      "id": 3,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-engineer",
      "source": "db"
    },
    {
      "display_name": "MLOps Engineer",
      "id": 16,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-ops-engineer",
      "source": "db"
    },
    {
      "display_name": "Java Backend Developer",
      "id": 79,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "java-backend-developer",
      "source": "db"
    },
    {
      "display_name": "DevOps Engineer",
      "id": 10,
      "rationale": null,
      "role_archetype": null,
      "slug": "devops-engineer",
      "source": "db"
    },
    {
      "display_name": "Ionic Developer",
      "id": 434,
      "rationale": null,
      "role_archetype": null,
      "slug": "ionic-developer",
      "source": "db"
    },
    {
      "display_name": ".NET Backend Developer",
      "id": 83,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "dotnet-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Backend Developer",
      "id": 1,
      "rationale": null,
      "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
      "slug": "backend-engineer",
      "source": "db"
    },
    {
      "display_name": "Kotlin Backend Developer",
      "id": 84,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "kotlin-server-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Node.js Backend Developer",
      "id": 82,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "node-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Python Backend Developer",
      "id": 80,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "python-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Ruby Backend Developer",
      "id": 85,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "ruby-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Scala Backend Developer",
      "id": 87,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "scala-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Cloud Architect",
      "id": 9,
      "rationale": null,
      "role_archetype": null,
      "slug": "cloud-architect",
      "source": "db"
    },
    {
      "display_name": "Android Developer",
      "id": 4,
      "rationale": null,
      "role_archetype": null,
      "slug": "android-engineer",
      "source": "db"
    },
    {
      "display_name": "Flutter Developer",
      "id": 74,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "flutter-developer",
      "source": "db"
    },
    {
      "display_name": "Hybrid Mobile Developer",
      "id": 11,
      "rationale": null,
      "role_archetype": null,
      "slug": "hybrid-mobile-developer",
      "source": "db"
    },
    {
      "display_name": "Native Mobile Developer",
      "id": 75,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "native-mobile-developer",
      "source": "db"
    },
    {
      "display_name": "React Native Developer",
      "id": 73,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "react-native-developer",
      "source": "db"
    },
    {
      "display_name": "iOS Developer",
      "id": 6,
      "rationale": null,
      "role_archetype": null,
      "slug": "ios-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top devops-engineer 0.15 does not contradict",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Data Work",
        "id": 21,
        "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
        "slug": "programming-languages-for-data-work",
        "source": "db"
      },
      "input_skill": "Scala",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for ML Systems",
        "id": 39,
        "rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
        "slug": "programming-languages-for-ml-systems",
        "source": "db"
      },
      "input_skill": "Scala",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        },
        {
          "display_name": "MLOps Engineer",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-ops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Build and Dependency Management",
        "id": 286,
        "rationale": "Project build, packaging, and dependency resolution skills used to compile and assemble Java services. This cluster matters because backend delivery depends on reproducible builds and controlled library versions.",
        "slug": "build-and-dependency-management",
        "source": "db"
      },
      "input_skill": "Maven",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Java Backend Developer",
          "id": 79,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "java-backend-developer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Build and Packaging Tooling",
        "id": 149,
        "rationale": "Tooling used to compile, package, and prepare Ionic apps for development and release. This cluster is coherent because Ionic developers often manage web assets, native wrappers, and environment-specific build outputs.",
        "slug": "build-and-packaging-tooling",
        "source": "db"
      },
      "input_skill": "Maven",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "DevOps Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "devops-engineer",
          "source": "db"
        },
        {
          "display_name": "Ionic Developer",
          "id": 434,
          "rationale": null,
          "role_archetype": null,
          "slug": "ionic-developer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "ETL and ELT Tooling",
        "id": 24,
        "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
        "slug": "etl-and-elt-tooling",
        "source": "db"
      },
      "input_skill": "Apache Spark",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "React Frontend Development",
        "id": 96,
        "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Oracle SQL",
      "llm_role": null,
      "roles_from_db": []
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Relational Database Design",
        "id": 4,
        "rationale": "Modeling and operating relational persistence for backend services. Includes schema design, normalization, indexing, transactions, and query tuning for operational data stores.",
        "slug": "relational-database-design",
        "source": "db"
      },
      "input_skill": "Oracle SQL",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": ".NET Backend Developer",
          "id": 83,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "dotnet-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Backend Developer",
          "id": 1,
          "rationale": null,
          "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
          "slug": "backend-engineer",
          "source": "db"
        },
        {
          "display_name": "Kotlin Backend Developer",
          "id": 84,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "kotlin-server-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Node.js Backend Developer",
          "id": 82,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "node-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Python Backend Developer",
          "id": 80,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "python-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Ruby Backend Developer",
          "id": 85,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "ruby-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Scala Backend Developer",
          "id": 87,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "scala-backend-developer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Storage and Data Services",
        "id": 144,
        "rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
        "slug": "cloud-storage-and-data-services",
        "source": "db"
      },
      "input_skill": "HBase",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Cloud Architect",
          "id": 9,
          "rationale": null,
          "role_archetype": null,
          "slug": "cloud-architect",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "CI/CD Pipeline Platforms",
        "id": 150,
        "rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
        "slug": "ci-cd-pipeline-platforms",
        "source": "db"
      },
      "input_skill": "CI/CD",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "DevOps Engineer",
          "id": 10,
          "rationale": null,
          "role_archetype": null,
          "slug": "devops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "CI/CD for Machine Learning",
        "id": 56,
        "rationale": "Tools and platforms for automating ML model integration, testing, and deployment pipelines.",
        "slug": "ci-cd-for-machine-learning",
        "source": "db"
      },
      "input_skill": "CI/CD",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Local Persistence and Offline Behavior",
        "id": 85,
        "rationale": "On-device storage used for caching, offline support, and durable client state. This cluster is coherent because iOS apps often need to preserve user progress and data when connectivity is limited.",
        "slug": "local-persistence-and-offline-behavior",
        "source": "db"
      },
      "input_skill": "Hive",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Android Developer",
          "id": 4,
          "rationale": null,
          "role_archetype": null,
          "slug": "android-engineer",
          "source": "db"
        },
        {
          "display_name": "Flutter Developer",
          "id": 74,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "flutter-developer",
          "source": "db"
        },
        {
          "display_name": "Hybrid Mobile Developer",
          "id": 11,
          "rationale": null,
          "role_archetype": null,
          "slug": "hybrid-mobile-developer",
          "source": "db"
        },
        {
          "display_name": "Native Mobile Developer",
          "id": 75,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "native-mobile-developer",
          "source": "db"
        },
        {
          "display_name": "React Native Developer",
          "id": 73,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "react-native-developer",
          "source": "db"
        },
        {
          "display_name": "iOS Developer",
          "id": 6,
          "rationale": null,
          "role_archetype": null,
          "slug": "ios-engineer",
          "source": "db"
        }
      ]
    }
  ],
  "input_final_skills": [
    "Scala",
    "Maven",
    "Apache Spark",
    "Unix Shell Scripting",
    "Unix",
    "HDFS",
    "Oracle SQL",
    "Hive SQL",
    "Spark SQL",
    "Impala",
    "HBase",
    "CI/CD",
    "Hive"
  ],
  "input_llm_skills": [
    "Scala",
    "Maven",
    "Apache Spark",
    "Unix Shell Scripting",
    "Unix",
    "HDFS",
    "Oracle SQL",
    "Hive SQL",
    "Spark SQL",
    "Impala",
    "HBase",
    "CI/CD",
    "Hive"
  ],
  "new_aliases_persisted": 0,
  "run_id": "9bd37876-38fa-4481-9ba5-d79d3dc251b3",
  "skills_detail": [
    {
      "aliases_in_db": [
        {
          "alias_text": "Scala",
          "alias_type": "CANONICAL",
          "id": 272,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 6,
        "display_name": "Scala",
        "id": 102,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "scala",
        "sub_category_id": 96,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Data Work",
            "id": 21,
            "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
            "slug": "programming-languages-for-data-work",
            "source": "db"
          },
          "input_skill": "Scala",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for ML Systems",
            "id": 39,
            "rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
            "slug": "programming-languages-for-ml-systems",
            "source": "db"
          },
          "input_skill": "Scala",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            },
            {
              "display_name": "MLOps Engineer",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-ops-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Scala",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Maven",
          "alias_type": "CANONICAL",
          "id": 1360,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 13,
        "display_name": "Maven",
        "id": 815,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "maven",
        "sub_category_id": 358,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Build and Dependency Management",
            "id": 286,
            "rationale": "Project build, packaging, and dependency resolution skills used to compile and assemble Java services. This cluster matters because backend delivery depends on reproducible builds and controlled library versions.",
            "slug": "build-and-dependency-management",
            "source": "db"
          },
          "input_skill": "Maven",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Java Backend Developer",
              "id": 79,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "java-backend-developer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Build and Packaging Tooling",
            "id": 149,
            "rationale": "Tooling used to compile, package, and prepare Ionic apps for development and release. This cluster is coherent because Ionic developers often manage web assets, native wrappers, and environment-specific build outputs.",
            "slug": "build-and-packaging-tooling",
            "source": "db"
          },
          "input_skill": "Maven",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "DevOps Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "devops-engineer",
              "source": "db"
            },
            {
              "display_name": "Ionic Developer",
              "id": 434,
              "rationale": null,
              "role_archetype": null,
              "slug": "ionic-developer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Maven",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Apache Spark",
          "alias_type": "CANONICAL",
          "id": 2004,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "apache spark 3",
          "alias_type": "VERSION",
          "id": 2006,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark",
          "alias_type": "VERSION",
          "id": 2510,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3",
          "alias_type": "VERSION",
          "id": 2007,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3.x",
          "alias_type": "VERSION",
          "id": 2009,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark3",
          "alias_type": "VERSION",
          "id": 2008,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "ETL and ELT Tooling",
            "id": 24,
            "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
            "slug": "etl-and-elt-tooling",
            "source": "db"
          },
          "input_skill": "Apache Spark",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Apache Spark",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Unix Shell Scripting",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Programming Languages",
          "skill_nature": "LANGUAGE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "unix-shell-scripting",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Unix",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Operating Systems",
          "skill_nature": "PLATFORM",
          "sub_category": "general",
          "typical_lifespan": "EVERGREEN",
          "version_strategy": "UNVERSIONED",
          "volatility": "STABLE"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "unix",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "HDFS",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "TOOL",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "hdfs",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Oracle Database",
          "alias_type": "CANONICAL",
          "id": 2512,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 3,
        "display_name": "Oracle Database",
        "id": 1566,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "oracle-database",
        "sub_category_id": 29,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "React Frontend Development",
            "id": 96,
            "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Oracle SQL",
          "llm_role": null,
          "roles_from_db": []
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Relational Database Design",
            "id": 4,
            "rationale": "Modeling and operating relational persistence for backend services. Includes schema design, normalization, indexing, transactions, and query tuning for operational data stores.",
            "slug": "relational-database-design",
            "source": "db"
          },
          "input_skill": "Oracle SQL",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": ".NET Backend Developer",
              "id": 83,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "dotnet-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Backend Developer",
              "id": 1,
              "rationale": null,
              "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
              "slug": "backend-engineer",
              "source": "db"
            },
            {
              "display_name": "Kotlin Backend Developer",
              "id": 84,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "kotlin-server-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Node.js Backend Developer",
              "id": 82,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "node-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Python Backend Developer",
              "id": 80,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "python-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Ruby Backend Developer",
              "id": 85,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "ruby-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Scala Backend Developer",
              "id": 87,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "scala-backend-developer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Oracle SQL",
      "matched_via": "embedding_alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Hive SQL",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Query Languages",
          "skill_nature": "LANGUAGE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "hive-sql",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Spark SQL",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Query Languages",
          "skill_nature": "LANGUAGE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "spark-sql",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Impala",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "TOOL",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "impala",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "HBase",
          "alias_type": "CANONICAL",
          "id": 2011,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 3,
        "display_name": "HBase",
        "id": 1352,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "hbase",
        "sub_category_id": 31,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Storage and Data Services",
            "id": 144,
            "rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
            "slug": "cloud-storage-and-data-services",
            "source": "db"
          },
          "input_skill": "HBase",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Cloud Architect",
              "id": 9,
              "rationale": null,
              "role_archetype": null,
              "slug": "cloud-architect",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "HBase",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "CI/CD",
          "alias_type": "CANONICAL",
          "id": 1826,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 8,
        "display_name": "CI/CD",
        "id": 1190,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "METHODOLOGY",
        "slug": "ci-cd",
        "sub_category_id": 900,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "CI/CD Pipeline Platforms",
            "id": 150,
            "rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
            "slug": "ci-cd-pipeline-platforms",
            "source": "db"
          },
          "input_skill": "CI/CD",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "DevOps Engineer",
              "id": 10,
              "rationale": null,
              "role_archetype": null,
              "slug": "devops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "CI/CD for Machine Learning",
            "id": 56,
            "rationale": "Tools and platforms for automating ML model integration, testing, and deployment pipelines.",
            "slug": "ci-cd-for-machine-learning",
            "source": "db"
          },
          "input_skill": "CI/CD",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "CI/CD",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Hive",
          "alias_type": "CANONICAL",
          "id": 4198,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 3,
        "display_name": "Hive",
        "id": 2754,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "hive",
        "sub_category_id": 2242,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Local Persistence and Offline Behavior",
            "id": 85,
            "rationale": "On-device storage used for caching, offline support, and durable client state. This cluster is coherent because iOS apps often need to preserve user progress and data when connectivity is limited.",
            "slug": "local-persistence-and-offline-behavior",
            "source": "db"
          },
          "input_skill": "Hive",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Android Developer",
              "id": 4,
              "rationale": null,
              "role_archetype": null,
              "slug": "android-engineer",
              "source": "db"
            },
            {
              "display_name": "Flutter Developer",
              "id": 74,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "flutter-developer",
              "source": "db"
            },
            {
              "display_name": "Hybrid Mobile Developer",
              "id": 11,
              "rationale": null,
              "role_archetype": null,
              "slug": "hybrid-mobile-developer",
              "source": "db"
            },
            {
              "display_name": "Native Mobile Developer",
              "id": 75,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "native-mobile-developer",
              "source": "db"
            },
            {
              "display_name": "React Native Developer",
              "id": 73,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "react-native-developer",
              "source": "db"
            },
            {
              "display_name": "iOS Developer",
              "id": 6,
              "rationale": null,
              "role_archetype": null,
              "slug": "ios-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Hive",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": [
    "Unix Shell Scripting",
    "Unix",
    "HDFS",
    "Hive SQL",
    "Spark SQL",
    "Impala"
  ]
}
API 3 — final-role-output
{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top devops-engineer 0.15 does not contradict",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "Scala",
      "tag": "in_db"
    },
    {
      "skill": "Maven",
      "tag": "in_db"
    },
    {
      "skill": "Apache Spark",
      "tag": "in_db"
    },
    {
      "skill": "Unix Shell Scripting",
      "tag": "new"
    },
    {
      "skill": "Unix",
      "tag": "new"
    },
    {
      "skill": "HDFS",
      "tag": "new"
    },
    {
      "skill": "Oracle SQL",
      "tag": "in_db"
    },
    {
      "skill": "Hive SQL",
      "tag": "new"
    },
    {
      "skill": "Spark SQL",
      "tag": "new"
    },
    {
      "skill": "Impala",
      "tag": "new"
    },
    {
      "skill": "HBase",
      "tag": "in_db"
    },
    {
      "skill": "CI/CD",
      "tag": "in_db"
    },
    {
      "skill": "Hive",
      "tag": "in_db"
    }
  ],
  "llm_cost_api1_usd": null,
  "llm_cost_api2_usd": null,
  "llm_cost_api3_usd": null,
  "llm_cost_total_usd": null,
  "persistence": {
    "items": [
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Data Work",
          "id": 21,
          "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
          "slug": "programming-languages-for-data-work",
          "source": "db"
        },
        "dimension_id": 21,
        "input_skill": "Scala",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 102,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for ML Systems",
          "id": 39,
          "rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
          "slug": "programming-languages-for-ml-systems",
          "source": "db"
        },
        "dimension_id": 39,
        "input_skill": "Scala",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          },
          {
            "display_name": "MLOps Engineer",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-ops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 102,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Build and Dependency Management",
          "id": 286,
          "rationale": "Project build, packaging, and dependency resolution skills used to compile and assemble Java services. This cluster matters because backend delivery depends on reproducible builds and controlled library versions.",
          "slug": "build-and-dependency-management",
          "source": "db"
        },
        "dimension_id": 286,
        "input_skill": "Maven",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Java Backend Developer",
            "id": 79,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "java-backend-developer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 815,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Build and Packaging Tooling",
          "id": 149,
          "rationale": "Tooling used to compile, package, and prepare Ionic apps for development and release. This cluster is coherent because Ionic developers often manage web assets, native wrappers, and environment-specific build outputs.",
          "slug": "build-and-packaging-tooling",
          "source": "db"
        },
        "dimension_id": 149,
        "input_skill": "Maven",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "DevOps Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "devops-engineer",
            "source": "db"
          },
          {
            "display_name": "Ionic Developer",
            "id": 434,
            "rationale": null,
            "role_archetype": null,
            "slug": "ionic-developer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 815,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "ETL and ELT Tooling",
          "id": 24,
          "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
          "slug": "etl-and-elt-tooling",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "Apache Spark",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1350,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "React Frontend Development",
          "id": 96,
          "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 96,
        "input_skill": "Oracle SQL",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": false,
        "skill_id": null,
        "skill_tag": "new",
        "skipped_reason": "skill_not_in_db_v3_proposed"
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Relational Database Design",
          "id": 4,
          "rationale": "Modeling and operating relational persistence for backend services. Includes schema design, normalization, indexing, transactions, and query tuning for operational data stores.",
          "slug": "relational-database-design",
          "source": "db"
        },
        "dimension_id": 4,
        "input_skill": "Oracle SQL",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": ".NET Backend Developer",
            "id": 83,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "dotnet-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Backend Developer",
            "id": 1,
            "rationale": null,
            "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
            "slug": "backend-engineer",
            "source": "db"
          },
          {
            "display_name": "Kotlin Backend Developer",
            "id": 84,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "kotlin-server-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Node.js Backend Developer",
            "id": 82,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "node-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Python Backend Developer",
            "id": 80,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "python-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Ruby Backend Developer",
            "id": 85,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "ruby-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Scala Backend Developer",
            "id": 87,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "scala-backend-developer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": false,
        "skill_id": null,
        "skill_tag": "new",
        "skipped_reason": "skill_not_in_db_v3_proposed"
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Storage and Data Services",
          "id": 144,
          "rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
          "slug": "cloud-storage-and-data-services",
          "source": "db"
        },
        "dimension_id": 144,
        "input_skill": "HBase",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Cloud Architect",
            "id": 9,
            "rationale": null,
            "role_archetype": null,
            "slug": "cloud-architect",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1352,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "CI/CD Pipeline Platforms",
          "id": 150,
          "rationale": "Systems used to define, run, and maintain automated build and deployment workflows. This cluster is coherent because the role owns delivery automation end to end, including pipeline reliability and promotion logic.",
          "slug": "ci-cd-pipeline-platforms",
          "source": "db"
        },
        "dimension_id": 150,
        "input_skill": "CI/CD",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "DevOps Engineer",
            "id": 10,
            "rationale": null,
            "role_archetype": null,
            "slug": "devops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1190,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "CI/CD for Machine Learning",
          "id": 56,
          "rationale": "Tools and platforms for automating ML model integration, testing, and deployment pipelines.",
          "slug": "ci-cd-for-machine-learning",
          "source": "db"
        },
        "dimension_id": 56,
        "input_skill": "CI/CD",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1190,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Local Persistence and Offline Behavior",
          "id": 85,
          "rationale": "On-device storage used for caching, offline support, and durable client state. This cluster is coherent because iOS apps often need to preserve user progress and data when connectivity is limited.",
          "slug": "local-persistence-and-offline-behavior",
          "source": "db"
        },
        "dimension_id": 85,
        "input_skill": "Hive",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Android Developer",
            "id": 4,
            "rationale": null,
            "role_archetype": null,
            "slug": "android-engineer",
            "source": "db"
          },
          {
            "display_name": "Flutter Developer",
            "id": 74,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "flutter-developer",
            "source": "db"
          },
          {
            "display_name": "Hybrid Mobile Developer",
            "id": 11,
            "rationale": null,
            "role_archetype": null,
            "slug": "hybrid-mobile-developer",
            "source": "db"
          },
          {
            "display_name": "Native Mobile Developer",
            "id": 75,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "native-mobile-developer",
            "source": "db"
          },
          {
            "display_name": "React Native Developer",
            "id": 73,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "react-native-developer",
            "source": "db"
          },
          {
            "display_name": "iOS Developer",
            "id": 6,
            "rationale": null,
            "role_archetype": null,
            "slug": "ios-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 2754,
        "skill_tag": "in_db",
        "skipped_reason": null
      }
    ],
    "new_skills_created": 0,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 0,
    "skipped": 2
  },
  "planner_output": null,
  "run_id": "9bd37876-38fa-4481-9ba5-d79d3dc251b3"
}