← Back to history

Pipeline run

ed0d70d6-8ed9-4c78-8000-b857d0c839e3

Pipeline LLM cost (USD)
API 1: $0.0075 API 2: $0.0001 API 3: $0.0000 Total: $0.0076

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd
Nature of work · Data pipeline development
Build and tune Spark batch/streaming jobs in Scala using RDDs, Spark SQL, and Kafka, with ongoing performance monitoring and debugging. Also gather requirements, handle gap analysis, and work with the team to meet SLA targets.
"Develop spark applications using RDD"
Tech stack maturity
Mainstream Modern
Apache Spark, Kafka, Scala, and Spark Streaming are established, widely adopted data engineering technologies that fit a mainstream modern stack rather than legacy or bleeding-edge categories.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
0.00 / 5
· Title match
· Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
Frameworks (×2):
Models / concepts (×3):
Evidence — skills matched in JD (6)
Apache Spark RDD Spark SQL Spark Streaming Kafka Scala
Skill cluster (5 dimension groups, role-scoped)
ETL and ELT Tooling
Apache Spark
Messaging and Event Streaming
Kafka
Programming Languages for Data Work
Scala
Stream Processing Systems
Spark Streaming
Cross-cutting / unaligned
RDD Spark SQL
Show KRA description ↓
1 Ability to work as a part of a team/work independently with minimal guidance 2 US Healthcare domain knowledge will be added advantage 1 Good knowledge in spark architecture 2 Develop spark applications using RDD 3 Good in writing Spark SQL 4 Spark Streaming with Kafka 5 Spark Programming with scalaBasics 6 Spark Performance Monitoring and Debugging 1 Experience in requirements analysis, gap analysis, requirements elicitation, requirements management and communication 2 Manages individual and team quality of work toward meeting or exceeding SLA targets

Signals

Skill data-engineer
0.67
Alias backend-engineer
1.00
KRA engineering-manager
0.47

Post-classification

Centroidupdated · n=377
Alias collision log
New-role queue
New skills captured2
New KRA capturedyes

Captured for admin review

RDD primary Data Engineer pending
Spark SQL primary Data Engineer pending
R&R fragment (sim 0.00) Data Engineer pending

1 Ability to work as a part of a team/work independently with minimal guidance 2 US Healthcare domain knowledge will be added advantage 1 Good knowledge in spark architecture 2 Develop spark applicat…

Status: completed Created: 2026-05-27T16:02:04.361014Z Updated: 2026-05-27T16:03:08.937239Z API 3 duration: 17467 ms
Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

domain · Data Engineering & Analytics CASE DOMAIN

slug: data-engineer · id: 2 · source: db

Domain=Data Engineering & Analytics; The JD is centered on Spark application development, Spark SQL, Kafka streaming, and performance/debugging, which best matches a Data Engineer focused on big data and streaming work.

Matched skills

spark architectureSpark applicationsRDDSpark SQLSpark StreamingKafkaSpark Programming with scalaBasicsSpark Performance Monitoring and Debuggingrequirements analysisgap analysisrequirements elicitationrequirements managementSLA targetsUS Healthcare domain

Matched dimensions

Big Data ProcessingStreaming Data EngineeringSpark DevelopmentRequirements Analysis and ManagementPerformance Monitoring and DebuggingHealthcare Domain KnowledgeTeam Collaboration

Matched KRAs

Develop spark applications using RDDGood in writing Spark SQLSpark Streaming with KafkaSpark Performance Monitoring and DebuggingExperience in requirements analysis, gap analysisrequirements elicitation, requirements management and communicationManages individual and team quality of workmeeting or exceeding SLA targets

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

0
New skills
0
Skill↔dim saved
0
Role↔dim saved
0
Skipped

Job description

About Accenture:  Accenture is a leading global professional services company, providing a broad range of services in strategy and consulting, interactive, technology and operations, with digital capabilities across all of these services. We combine unmatched experience and specialized capabilities across more than 40 industries — powered by the world’s largest network of Advanced Technology and Intelligent Operations centers. With 506,000 people serving clients in more than 120 countries, Accenture brings continuous innovation to help clients improve their performance and create lasting value across their enterprises. Visit us at www.accenture.com Project Role : Application Developer Project Role Description : Design, build and configure applications to meet business process and application requirements. Management Level : 10 Work Experience : 4-6 years Work location : Chennai Must Have Skills : Apache Spark Good To Have Skills : No Technology Specialization Job Requirements :  Key Responsibilities :  1 Ability to work as a part of a team/work independently with minimal guidance 2 US Healthcare domain knowledge will be added advantage Technical Experience :  1 Good knowledge in spark architecture 2 Develop spark applications using RDD 3 Good in writing Spark SQL 4 Spark Streaming with Kafka 5 Spark Programming with scalaBasics 6 Spark Performance Monitoring and Debugging Professional Attributes :  1 Experience in requirements analysis, gap analysis, requirements elicitation, requirements management and communication 2 Manages individual and team quality of work toward meeting or exceeding SLA targets 15 years of full time education

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Apache Spark Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Apache Spark id=1350 · apache-spark

Aliases — catalog

  • Apache Spark (CANONICAL)
  • apache spark 3 (VERSION)
  • spark (VERSION)
  • spark 3 (VERSION)
  • spark 3.x (VERSION)
  • spark3 (VERSION)

Context tags (catalog)

Apache Kafka Cluster Manager DAGScheduler Data Lake DataFrame ETL Hadoop MLlib Machine Learning PySpark RDD Scala Spark SQL Spark Streaming SparkSession

Stored enrichment (catalog DB)

Category
Framework
Sub-category
Distributed Data Processing Framework
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2010
Confidence
0.94
Version strategy
SEPARATE_ENTITY
Version tag
3.x

Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.

Skill profile (library / DB)

Skill nature
FRAMEWORK
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
5
Sub-category id
1021
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • ETL and ELT Tooling Catalog dimension db id 24

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
RDD Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Big Data Frameworks
Sub-category
general
Skill nature
CONCEPT
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Spark SQL Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Big Data Frameworks
Sub-category
general
Skill nature
TOOL
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Spark Streaming Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Spark Streaming id=121 · spark-streaming

Aliases — catalog

  • DStreams (VERSION)
  • Spark 2.x (VERSION)
  • Spark 3.x (VERSION)
  • Spark Streaming (VERSION)
  • Spark Structured Streaming (VERSION)
  • Structured Streaming (VERSION)

Context tags (catalog)

DStreams Kafka Kinesis Structured Streaming backpressure checkpointing event time exactly-once micro-batch stateful processing streaming ETL trigger intervals watermarking window functions windowing

Stored enrichment (catalog DB)

Category
Framework
Sub-category
Stream Processing Framework
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2013
Confidence
0.90
Version strategy
SEPARATE_ENTITY
Version tag
Structured Streaming (Spark 2.0+)

Maturity reasoning: JD volume is far lower than Structured Streaming; most Spark streaming roles now specify Structured Streaming or Kafka/Flink, and Spark docs position Spark Streaming as the older API.

Skill profile (library / DB)

Skill nature
FRAMEWORK
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
5
Sub-category id
94
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Stream Processing Systems Catalog dimension db id 25

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Stream Processing Systems
stream-processing-systems
Existing dimension (library) · Role↔dimension saved
Kafka Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Kafka id=36 · kafka

Aliases — catalog

  • Kafka (CANONICAL) primary

Context tags (catalog)

Apache Flink Apache Kafka Apache Pulsar Apache Spark Avro KSQL Kafka API Kafka Connect Kafka Streams ZooKeeper Zookeeper backpressure brokers consumer consumer group consumer groups event sourcing event-driven architecture exactly-once semantics fault tolerance high throughput log compaction message broker message queue microservices offsets partition partitioning partitions producer producer API real-time analytics real-time data replication schema registry stream processing topic topic partitioning topics

Stored enrichment (catalog DB)

Category
Datastore
Sub-category
Event Stream Store
Vendor
Confluent
License
apache_2
Year introduced
2011
Confidence
0.90
Version strategy
NOT_APPLICABLE

Maturity reasoning: Kafka appears in many production JDs for event streaming and data pipelines, and remains a standard platform in cloud/vendor offerings (e.g., Confluent, AWS MSK), indicating broad hiring demand.

Skill profile (library / DB)

Skill nature
TOOL
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
3
Sub-category id
3533
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Asynchronous Messaging and Event Streaming Catalog dimension db id 297

    Library dimension (catalog)

    Roles linked in library: .NET Backend Developer, Go Backend Developer, Kotlin Backend Developer, Node.js Backend Developer, Scala Backend Developer

  • Messaging and Background Jobs Catalog dimension db id 291

    Library dimension (catalog)

    Roles linked in library: PHP Backend Developer, Python Backend Developer, Ruby Backend Developer

  • Messaging and Event Streaming Catalog dimension db id 8

    Library dimension (catalog)

    Roles linked in library: Backend Developer, Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Asynchronous Messaging and Event Streaming
asynchronous-messaging-and-event-streaming
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Messaging and Background Jobs
messaging-and-background-jobs
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Messaging and Event Streaming
messaging-and-event-streaming
Existing dimension (library) · Role↔dimension saved
Scala Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Scala id=102 · scala

Aliases — catalog

  • Scala (CANONICAL) primary

Context tags (catalog)

Akka Apache Kafka Cats Flink JVM Monads Play Framework SBT ScalaTest Shapeless Spark Spark SQL ZIO case class for-comprehension functional programming implicit pattern matching typeclass

Stored enrichment (catalog DB)

Category
Language
Sub-category
Programming Language
Vendor
EPFL
License
apache_2
Year introduced
2004
Confidence
0.99
Version strategy
NOT_APPLICABLE

Maturity reasoning: Scala still appears in many backend/data engineering JDs, especially with Spark and Akka, and remains supported by major JVM ecosystems; it’s not a sunset technology.

Skill profile (library / DB)

Skill nature
LANGUAGE
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
6
Sub-category id
96
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • Programming Languages for Data Work Catalog dimension db id 21

    Library dimension (catalog)

    Roles linked in library: Data Engineer

  • Programming Languages for ML Systems Catalog dimension db id 39

    Library dimension (catalog)

    Roles linked in library: ML Engineer, MLOps Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
Programming Languages for Data Work
programming-languages-for-data-work
Existing dimension (library) · Role↔dimension saved
Programming Languages for ML Systems
programming-languages-for-ml-systems
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill Tag Dimension Skill↔dim Role↔dim Outcome Notes
Apache Spark in_db
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
Spark Streaming in_db
Stream Processing Systems
stream-processing-systems
Existing dimension (library) · Role↔dimension saved
Kafka in_db
Asynchronous Messaging and Event Streaming
asynchronous-messaging-and-event-streaming
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Kafka in_db
Messaging and Background Jobs
messaging-and-background-jobs
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Kafka in_db
Messaging and Event Streaming
messaging-and-event-streaming
Existing dimension (library) · Role↔dimension saved
Scala in_db
Programming Languages for Data Work
programming-languages-for-data-work
Existing dimension (library) · Role↔dimension saved
Scala in_db
Programming Languages for ML Systems
programming-languages-for-ml-systems
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Library artifacts (this run)

Kind Detail DB id
canonical_skill_proposed RDD | type=Big Data Frameworks subtype=general nature=CONCEPT lifespan=MULTI_YEAR
canonical_skill_proposed Spark SQL | type=Big Data Frameworks subtype=general nature=TOOL lifespan=MULTI_YEAR
nano JD Parser — gpt-4.1-nano click to toggle
RoleApplication Developer
CompanyAccenture
Experience4-6 years
DomainIT Services & Consulting
Location Chennai, India (null)
JD type pass
Show raw JSON
{
  "JD_type": "pass",
  "about_company": {
    "source_marker": {
      "first_5_words": "Accenture is a leading global",
      "last_5_words": "value across their enterprises."
    },
    "text": "Accenture is a leading global professional services company, providing a broad range of services in strategy and consulting, interactive, technology and operations, with digital capabilities across all of these services. We combine unmatched experience and specialized capabilities across more than 40 industries \u2014 powered by the world\u2019s largest network of Advanced Technology and Intelligent Operations centers. With 506,000 people serving clients in more than 120 countries, Accenture brings continuous innovation to help clients improve their performance and create lasting value across their enterprises.",
    "word_count": 84
  },
  "certifications": [],
  "company_name": "Accenture",
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [
        "ITES",
        "BPO",
        "Tech Consulting"
      ],
      "domain": "IT Services \u0026 Consulting"
    },
    "secondary": null
  },
  "education": [
    {
      "level": "Bachelor\u0027s",
      "qualification": "Bachelor\u0027s - Any Discipline",
      "raw": "15 years of full time education",
      "requirement": "required"
    }
  ],
  "experience": {
    "max": 6,
    "min": 4,
    "raw": "4-6 years"
  },
  "job_locations": [
    {
      "aliases": [
        "Madras"
      ],
      "city": "Chennai",
      "country": "India",
      "state": null,
      "work_mode": "null"
    }
  ],
  "role": "Application Developer",
  "role_aliases": [
    "App Developer",
    "Software Developer",
    "Application Engineer"
  ],
  "role_archetype": "Engineering",
  "roles_and_responsibilities": [
    {
      "bullet_count": 2,
      "heading": "Key Responsibilities",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "1 Ability to work as",
        "last_5_words": "added advantage"
      },
      "text": "1 Ability to work as a part of a team/work independently with minimal guidance\n2 US Healthcare domain knowledge will be added advantage",
      "word_count": 20
    },
    {
      "bullet_count": 6,
      "heading": "Technical Experience",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "1 Good knowledge in spark",
        "last_5_words": "Monitoring and Debugging"
      },
      "text": "1 Good knowledge in spark architecture\n2 Develop spark applications using RDD\n3 Good in writing Spark SQL\n4 Spark Streaming with Kafka\n5 Spark Programming with scalaBasics\n6 Spark Performance Monitoring and Debugging",
      "word_count": 36
    },
    {
      "bullet_count": 2,
      "heading": "Professional Attributes",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "1 Experience in requirements analysis,",
        "last_5_words": "exceeding SLA targets"
      },
      "text": "1 Experience in requirements analysis, gap analysis, requirements elicitation, requirements management and communication\n2 Manages individual and team quality of work toward meeting or exceeding SLA targets",
      "word_count": 32
    }
  ],
  "urls": [
    {
      "type": "website",
      "url": "http://www.accenture.com"
    }
  ]
}
API 1 — extract-from-jd click to toggle
{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Apache Spark"
    },
    {
      "is_primary": true,
      "skill_name": "RDD"
    },
    {
      "is_primary": true,
      "skill_name": "Spark SQL"
    },
    {
      "is_primary": true,
      "skill_name": "Spark Streaming"
    },
    {
      "is_primary": true,
      "skill_name": "Kafka"
    },
    {
      "is_primary": true,
      "skill_name": "Scala"
    }
  ],
  "jd_role": {
    "display_name": "Application Developer",
    "rationale": null,
    "role_aliases": [
      "App Developer",
      "Software Developer",
      "Application Engineer"
    ],
    "role_archetype": "Engineering",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": {
      "source_marker": {
        "first_5_words": "Accenture is a leading global",
        "last_5_words": "value across their enterprises."
      },
      "text": "Accenture is a leading global professional services company, providing a broad range of services in strategy and consulting, interactive, technology and operations, with digital capabilities across all of these services. We combine unmatched experience and specialized capabilities across more than 40 industries \u2014 powered by the world\u2019s largest network of Advanced Technology and Intelligent Operations centers. With 506,000 people serving clients in more than 120 countries, Accenture brings continuous innovation to help clients improve their performance and create lasting value across their enterprises.",
      "word_count": 84
    },
    "certifications": [],
    "company_name": "Accenture",
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [
          "ITES",
          "BPO",
          "Tech Consulting"
        ],
        "domain": "IT Services \u0026 Consulting"
      },
      "secondary": null
    },
    "education": [
      {
        "level": "Bachelor\u0027s",
        "qualification": "Bachelor\u0027s - Any Discipline",
        "raw": "15 years of full time education",
        "requirement": "required"
      }
    ],
    "experience": {
      "max": 6,
      "min": 4,
      "raw": "4-6 years"
    },
    "job_locations": [
      {
        "aliases": [
          "Madras"
        ],
        "city": "Chennai",
        "country": "India",
        "state": null,
        "work_mode": "null"
      }
    ],
    "role": "Application Developer",
    "role_aliases": [
      "App Developer",
      "Software Developer",
      "Application Engineer"
    ],
    "role_archetype": "Engineering",
    "roles_and_responsibilities": [
      {
        "bullet_count": 2,
        "heading": "Key Responsibilities",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "1 Ability to work as",
          "last_5_words": "added advantage"
        },
        "text": "1 Ability to work as a part of a team/work independently with minimal guidance\n2 US Healthcare domain knowledge will be added advantage",
        "word_count": 20
      },
      {
        "bullet_count": 6,
        "heading": "Technical Experience",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "1 Good knowledge in spark",
          "last_5_words": "Monitoring and Debugging"
        },
        "text": "1 Good knowledge in spark architecture\n2 Develop spark applications using RDD\n3 Good in writing Spark SQL\n4 Spark Streaming with Kafka\n5 Spark Programming with scalaBasics\n6 Spark Performance Monitoring and Debugging",
        "word_count": 36
      },
      {
        "bullet_count": 2,
        "heading": "Professional Attributes",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "1 Experience in requirements analysis,",
          "last_5_words": "exceeding SLA targets"
        },
        "text": "1 Experience in requirements analysis, gap analysis, requirements elicitation, requirements management and communication\n2 Manages individual and team quality of work toward meeting or exceeding SLA targets",
        "word_count": 32
      }
    ],
    "urls": [
      {
        "type": "website",
        "url": "http://www.accenture.com"
      }
    ]
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "ed0d70d6-8ed9-4c78-8000-b857d0c839e3",
  "stage3_signals": {
    "alias_found": true,
    "alias_match_roles": [
      {
        "display_name": "Backend Developer",
        "kra_matches": null,
        "matched_count": null,
        "matched_skills": null,
        "role_id": 1,
        "score": 1.0,
        "slug": "backend-engineer",
        "total_count": null
      }
    ],
    "kra_match_roles": [
      {
        "display_name": "Engineering Manager",
        "kra_matches": [
          {
            "kra_text": "Set team goals and delivery plans",
            "sentence": "2 Manages individual and team quality of work toward meeting or exceeding SLA targets",
            "similarity": 0.5773
          },
          {
            "kra_text": "monitor risks and dependencies",
            "sentence": "6 Spark Performance Monitoring and Debugging",
            "similarity": 0.4392
          },
          {
            "kra_text": "Set team goals and delivery plans",
            "sentence": "1 Ability to work as a part of a team/work independently with minimal guidance",
            "similarity": 0.3864
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 121,
        "score": 0.4676,
        "slug": "engineering-manager",
        "total_count": null
      },
      {
        "display_name": "Java Backend Developer",
        "kra_matches": [
          {
            "kra_text": "backend performance tuning",
            "sentence": "6 Spark Performance Monitoring and Debugging",
            "similarity": 0.6103
          },
          {
            "kra_text": "service contract collaboration",
            "sentence": "2 Manages individual and team quality of work toward meeting or exceeding SLA targets",
            "similarity": 0.4207
          },
          {
            "kra_text": "service contract collaboration",
            "sentence": "1 Experience in requirements analysis, gap analysis, requirements elicitation, requirements management and communication",
            "similarity": 0.3172
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 79,
        "score": 0.4494,
        "slug": "java-backend-developer",
        "total_count": null
      },
      {
        "display_name": "Kotlin Backend Developer",
        "kra_matches": [
          {
            "kra_text": "performance and reliability tuning",
            "sentence": "6 Spark Performance Monitoring and Debugging",
            "similarity": 0.6052
          },
          {
            "kra_text": "service contract design",
            "sentence": "2 Manages individual and team quality of work toward meeting or exceeding SLA targets",
            "similarity": 0.3797
          },
          {
            "kra_text": "request handling and validation",
            "sentence": "1 Experience in requirements analysis, gap analysis, requirements elicitation, requirements management and communication",
            "similarity": 0.3111
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 84,
        "score": 0.432,
        "slug": "kotlin-server-backend-developer",
        "total_count": null
      },
      {
        "display_name": "Pega Developer",
        "kra_matches": [
          {
            "kra_text": "Requirements analysis and process translation",
            "sentence": "1 Experience in requirements analysis, gap analysis, requirements elicitation, requirements management and communication",
            "similarity": 0.5528
          },
          {
            "kra_text": "defect troubleshooting and resolution",
            "sentence": "6 Spark Performance Monitoring and Debugging",
            "similarity": 0.3886
          },
          {
            "kra_text": "exception, escalation, and assignment handling",
            "sentence": "2 Manages individual and team quality of work toward meeting or exceeding SLA targets",
            "similarity": 0.3506
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 24,
        "score": 0.4307,
        "slug": "pega-developer",
        "total_count": null
      },
      {
        "display_name": "PHP Backend Developer",
        "kra_matches": [
          {
            "kra_text": "performance and reliability tuning",
            "sentence": "6 Spark Performance Monitoring and Debugging",
            "similarity": 0.6052
          },
          {
            "kra_text": "performance and reliability tuning",
            "sentence": "2 Manages individual and team quality of work toward meeting or exceeding SLA targets",
            "similarity": 0.3603
          },
          {
            "kra_text": "external system integration",
            "sentence": "1 Experience in requirements analysis, gap analysis, requirements elicitation, requirements management and communication",
            "similarity": 0.3089
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 86,
        "score": 0.4248,
        "slug": "php-backend-developer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": 4,
        "matched_skills": [
          "Apache Spark",
          "Kafka",
          "Scala",
          "Spark Streaming"
        ],
        "role_id": 2,
        "score": 0.6667,
        "slug": "data-engineer",
        "total_count": 6
      },
      {
        "display_name": "ML Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Scala"
        ],
        "role_id": 3,
        "score": 0.1667,
        "slug": "ml-engineer",
        "total_count": 6
      },
      {
        "display_name": "MLOps Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Scala"
        ],
        "role_id": 16,
        "score": 0.1667,
        "slug": "ml-ops-engineer",
        "total_count": 6
      },
      {
        "display_name": "Python Backend Developer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Kafka"
        ],
        "role_id": 80,
        "score": 0.1667,
        "slug": "python-backend-developer",
        "total_count": 6
      },
      {
        "display_name": "Backend Developer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Kafka"
        ],
        "role_id": 1,
        "score": 0.1667,
        "slug": "backend-engineer",
        "total_count": 6
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "DOMAIN",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 0.96,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 0.96,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [
      "Big Data Processing",
      "Streaming Data Engineering",
      "Spark Development",
      "Requirements Analysis and Management",
      "Performance Monitoring and Debugging",
      "Healthcare Domain Knowledge",
      "Team Collaboration"
    ],
    "matched_kras": [
      "Develop spark applications using RDD",
      "Good in writing Spark SQL",
      "Spark Streaming with Kafka",
      "Spark Performance Monitoring and Debugging",
      "Experience in requirements analysis, gap analysis",
      "requirements elicitation, requirements management and communication",
      "Manages individual and team quality of work",
      "meeting or exceeding SLA targets"
    ],
    "matched_skills": [
      "spark architecture",
      "Spark applications",
      "RDD",
      "Spark SQL",
      "Spark Streaming",
      "Kafka",
      "Spark Programming with scalaBasics",
      "Spark Performance Monitoring and Debugging",
      "requirements analysis",
      "gap analysis",
      "requirements elicitation",
      "requirements management",
      "SLA targets",
      "US Healthcare domain"
    ],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Domain=Data Engineering \u0026 Analytics; The JD is centered on Spark application development, Spark SQL, Kafka streaming, and performance/debugging, which best matches a Data Engineer focused on big data and streaming work.",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 377,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": {
      "best_kra_similarity": 0.0,
      "queue_id": 1302,
      "r_and_r_preview": "1 Ability to work as a part of a team/work independently with minimal guidance\n2 US Healthcare domain knowledge will be added advantage\n\n1 Good knowledge in spark architecture\n2 Develop spark applicat",
      "role_display_name": "Data Engineer",
      "role_slug": "data-engineer",
      "status": "pending"
    },
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 17890,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "RDD",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 17891,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Spark SQL",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}
API 2 — extract-details
{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2004,
      "existing_alias_text": "Apache Spark",
      "input_term": "Apache Spark",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 319,
      "existing_alias_text": "Spark Streaming",
      "input_term": "Spark Streaming",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Spark Streaming",
        "id": 121,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "spark-streaming",
        "sub_category_id": 94,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 173,
      "existing_alias_text": "Kafka",
      "input_term": "Kafka",
      "matched_canonical": {
        "category_id": 3,
        "display_name": "Kafka",
        "id": 36,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "kafka",
        "sub_category_id": 3533,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 272,
      "existing_alias_text": "Scala",
      "input_term": "Scala",
      "matched_canonical": {
        "category_id": 6,
        "display_name": "Scala",
        "id": 102,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "scala",
        "sub_category_id": 96,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "Data Engineer",
      "id": 2,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-engineer",
      "source": "db"
    },
    {
      "display_name": ".NET Backend Developer",
      "id": 83,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "dotnet-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Go Backend Developer",
      "id": 81,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "go-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Kotlin Backend Developer",
      "id": 84,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "kotlin-server-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Node.js Backend Developer",
      "id": 82,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "node-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Scala Backend Developer",
      "id": 87,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "scala-backend-developer",
      "source": "db"
    },
    {
      "display_name": "PHP Backend Developer",
      "id": 86,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "php-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Python Backend Developer",
      "id": 80,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "python-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Ruby Backend Developer",
      "id": 85,
      "rationale": null,
      "role_archetype": "Engineering",
      "slug": "ruby-backend-developer",
      "source": "db"
    },
    {
      "display_name": "Backend Developer",
      "id": 1,
      "rationale": null,
      "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
      "slug": "backend-engineer",
      "source": "db"
    },
    {
      "display_name": "ML Engineer",
      "id": 3,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-engineer",
      "source": "db"
    },
    {
      "display_name": "MLOps Engineer",
      "id": 16,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-ops-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD is centered on Spark application development, Spark SQL, Kafka streaming, and performance/debugging, which best matches a Data Engineer focused on big data and streaming work.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "ETL and ELT Tooling",
        "id": 24,
        "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
        "slug": "etl-and-elt-tooling",
        "source": "db"
      },
      "input_skill": "Apache Spark",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Stream Processing Systems",
        "id": 25,
        "rationale": "Technologies for processing event streams and near-real-time data flows. This includes stream transformations, windowing, stateful processing, and stream-to-warehouse delivery patterns.",
        "slug": "stream-processing-systems",
        "source": "db"
      },
      "input_skill": "Spark Streaming",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Asynchronous Messaging and Event Streaming",
        "id": 297,
        "rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
        "slug": "asynchronous-messaging-and-event-streaming",
        "source": "db"
      },
      "input_skill": "Kafka",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": ".NET Backend Developer",
          "id": 83,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "dotnet-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Go Backend Developer",
          "id": 81,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "go-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Kotlin Backend Developer",
          "id": 84,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "kotlin-server-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Node.js Backend Developer",
          "id": 82,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "node-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Scala Backend Developer",
          "id": 87,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "scala-backend-developer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Messaging and Background Jobs",
        "id": 291,
        "rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
        "slug": "messaging-and-background-jobs",
        "source": "db"
      },
      "input_skill": "Kafka",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "PHP Backend Developer",
          "id": 86,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "php-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Python Backend Developer",
          "id": 80,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "python-backend-developer",
          "source": "db"
        },
        {
          "display_name": "Ruby Backend Developer",
          "id": 85,
          "rationale": null,
          "role_archetype": "Engineering",
          "slug": "ruby-backend-developer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Messaging and Event Streaming",
        "id": 8,
        "rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
        "slug": "messaging-and-event-streaming",
        "source": "db"
      },
      "input_skill": "Kafka",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Backend Developer",
          "id": 1,
          "rationale": null,
          "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
          "slug": "backend-engineer",
          "source": "db"
        },
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for Data Work",
        "id": 21,
        "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
        "slug": "programming-languages-for-data-work",
        "source": "db"
      },
      "input_skill": "Scala",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Programming Languages for ML Systems",
        "id": 39,
        "rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
        "slug": "programming-languages-for-ml-systems",
        "source": "db"
      },
      "input_skill": "Scala",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        },
        {
          "display_name": "MLOps Engineer",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-ops-engineer",
          "source": "db"
        }
      ]
    }
  ],
  "input_final_skills": [
    "Apache Spark",
    "RDD",
    "Spark SQL",
    "Spark Streaming",
    "Kafka",
    "Scala"
  ],
  "input_llm_skills": [
    "Apache Spark",
    "RDD",
    "Spark SQL",
    "Spark Streaming",
    "Kafka",
    "Scala"
  ],
  "new_aliases_persisted": 0,
  "run_id": "ed0d70d6-8ed9-4c78-8000-b857d0c839e3",
  "skills_detail": [
    {
      "aliases_in_db": [
        {
          "alias_text": "Apache Spark",
          "alias_type": "CANONICAL",
          "id": 2004,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "apache spark 3",
          "alias_type": "VERSION",
          "id": 2006,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark",
          "alias_type": "VERSION",
          "id": 2510,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3",
          "alias_type": "VERSION",
          "id": 2007,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3.x",
          "alias_type": "VERSION",
          "id": 2009,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark3",
          "alias_type": "VERSION",
          "id": 2008,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "ETL and ELT Tooling",
            "id": 24,
            "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
            "slug": "etl-and-elt-tooling",
            "source": "db"
          },
          "input_skill": "Apache Spark",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Apache Spark",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "RDD",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Big Data Frameworks",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "rdd",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Spark SQL",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Big Data Frameworks",
          "skill_nature": "TOOL",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "spark-sql",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "DStreams",
          "alias_type": "VERSION",
          "id": 320,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Spark 2.x",
          "alias_type": "VERSION",
          "id": 321,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Spark 3.x",
          "alias_type": "VERSION",
          "id": 322,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Spark Streaming",
          "alias_type": "VERSION",
          "id": 319,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Spark Structured Streaming",
          "alias_type": "VERSION",
          "id": 325,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "Structured Streaming",
          "alias_type": "VERSION",
          "id": 324,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Spark Streaming",
        "id": 121,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "spark-streaming",
        "sub_category_id": 94,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Stream Processing Systems",
            "id": 25,
            "rationale": "Technologies for processing event streams and near-real-time data flows. This includes stream transformations, windowing, stateful processing, and stream-to-warehouse delivery patterns.",
            "slug": "stream-processing-systems",
            "source": "db"
          },
          "input_skill": "Spark Streaming",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Spark Streaming",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Kafka",
          "alias_type": "CANONICAL",
          "id": 173,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 3,
        "display_name": "Kafka",
        "id": 36,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "TOOL",
        "slug": "kafka",
        "sub_category_id": 3533,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Asynchronous Messaging and Event Streaming",
            "id": 297,
            "rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
            "slug": "asynchronous-messaging-and-event-streaming",
            "source": "db"
          },
          "input_skill": "Kafka",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": ".NET Backend Developer",
              "id": 83,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "dotnet-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Go Backend Developer",
              "id": 81,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "go-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Kotlin Backend Developer",
              "id": 84,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "kotlin-server-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Node.js Backend Developer",
              "id": 82,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "node-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Scala Backend Developer",
              "id": 87,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "scala-backend-developer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Messaging and Background Jobs",
            "id": 291,
            "rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
            "slug": "messaging-and-background-jobs",
            "source": "db"
          },
          "input_skill": "Kafka",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "PHP Backend Developer",
              "id": 86,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "php-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Python Backend Developer",
              "id": 80,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "python-backend-developer",
              "source": "db"
            },
            {
              "display_name": "Ruby Backend Developer",
              "id": 85,
              "rationale": null,
              "role_archetype": "Engineering",
              "slug": "ruby-backend-developer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Messaging and Event Streaming",
            "id": 8,
            "rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
            "slug": "messaging-and-event-streaming",
            "source": "db"
          },
          "input_skill": "Kafka",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Backend Developer",
              "id": 1,
              "rationale": null,
              "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
              "slug": "backend-engineer",
              "source": "db"
            },
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Kafka",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Scala",
          "alias_type": "CANONICAL",
          "id": 272,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 6,
        "display_name": "Scala",
        "id": 102,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "LANGUAGE",
        "slug": "scala",
        "sub_category_id": 96,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for Data Work",
            "id": 21,
            "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
            "slug": "programming-languages-for-data-work",
            "source": "db"
          },
          "input_skill": "Scala",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Programming Languages for ML Systems",
            "id": 39,
            "rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
            "slug": "programming-languages-for-ml-systems",
            "source": "db"
          },
          "input_skill": "Scala",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            },
            {
              "display_name": "MLOps Engineer",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-ops-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Scala",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": [
    "RDD",
    "Spark SQL"
  ]
}
API 3 — final-role-output
{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD is centered on Spark application development, Spark SQL, Kafka streaming, and performance/debugging, which best matches a Data Engineer focused on big data and streaming work.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "Apache Spark",
      "tag": "in_db"
    },
    {
      "skill": "RDD",
      "tag": "new"
    },
    {
      "skill": "Spark SQL",
      "tag": "new"
    },
    {
      "skill": "Spark Streaming",
      "tag": "in_db"
    },
    {
      "skill": "Kafka",
      "tag": "in_db"
    },
    {
      "skill": "Scala",
      "tag": "in_db"
    }
  ],
  "llm_cost_api1_usd": null,
  "llm_cost_api2_usd": null,
  "llm_cost_api3_usd": null,
  "llm_cost_total_usd": null,
  "persistence": {
    "items": [
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "ETL and ELT Tooling",
          "id": 24,
          "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
          "slug": "etl-and-elt-tooling",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "Apache Spark",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1350,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Stream Processing Systems",
          "id": 25,
          "rationale": "Technologies for processing event streams and near-real-time data flows. This includes stream transformations, windowing, stateful processing, and stream-to-warehouse delivery patterns.",
          "slug": "stream-processing-systems",
          "source": "db"
        },
        "dimension_id": 25,
        "input_skill": "Spark Streaming",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 121,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Asynchronous Messaging and Event Streaming",
          "id": 297,
          "rationale": "Asynchronous communication patterns and broker technologies used to decouple backend services and move work off the request path. Includes queues, pub/sub, event streams, consumer groups, dead-letter queues, and delivery semantics across systems such as Kafka, RabbitMQ, NATS, SQS/SNS, Pulsar, and ActiveMQ.",
          "slug": "asynchronous-messaging-and-event-streaming",
          "source": "db"
        },
        "dimension_id": 297,
        "input_skill": "Kafka",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": ".NET Backend Developer",
            "id": 83,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "dotnet-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Go Backend Developer",
            "id": 81,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "go-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Kotlin Backend Developer",
            "id": 84,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "kotlin-server-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Node.js Backend Developer",
            "id": 82,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "node-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Scala Backend Developer",
            "id": 87,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "scala-backend-developer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 36,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Messaging and Background Jobs",
          "id": 291,
          "rationale": "Asynchronous processing patterns and worker systems used to decouple backend work from request handling. This is a coherent cluster because the role supports background jobs, retries, and deferred processing.",
          "slug": "messaging-and-background-jobs",
          "source": "db"
        },
        "dimension_id": 291,
        "input_skill": "Kafka",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "PHP Backend Developer",
            "id": 86,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "php-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Python Backend Developer",
            "id": 80,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "python-backend-developer",
            "source": "db"
          },
          {
            "display_name": "Ruby Backend Developer",
            "id": 85,
            "rationale": null,
            "role_archetype": "Engineering",
            "slug": "ruby-backend-developer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 36,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Messaging and Event Streaming",
          "id": 8,
          "rationale": "Transport-layer systems used to move events and decouple producers from consumers. Data engineers use these systems to ingest, buffer, and distribute event data before downstream processing.",
          "slug": "messaging-and-event-streaming",
          "source": "db"
        },
        "dimension_id": 8,
        "input_skill": "Kafka",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Backend Developer",
            "id": 1,
            "rationale": null,
            "role_archetype": "A Backend Engineer designs, builds, and maintains the server-side logic and data handling that power applications and services. They focus on implementing reliable business functionality, integrating with other systems, and ensuring the backend is scalable, maintainable, and observable.",
            "slug": "backend-engineer",
            "source": "db"
          },
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 36,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for Data Work",
          "id": 21,
          "rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
          "slug": "programming-languages-for-data-work",
          "source": "db"
        },
        "dimension_id": 21,
        "input_skill": "Scala",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 102,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Programming Languages for ML Systems",
          "id": 39,
          "rationale": "Languages used to build training code, inference services, evaluation jobs, and ML glue code. This is the primary implementation surface for ML engineers across experimentation and productionization.",
          "slug": "programming-languages-for-ml-systems",
          "source": "db"
        },
        "dimension_id": 39,
        "input_skill": "Scala",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          },
          {
            "display_name": "MLOps Engineer",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-ops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 102,
        "skill_tag": "in_db",
        "skipped_reason": null
      }
    ],
    "new_skills_created": 0,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 0,
    "skipped": 0
  },
  "planner_output": null,
  "run_id": "ed0d70d6-8ed9-4c78-8000-b857d0c839e3"
}