Pipeline run

e7cc83d4-091c-4d28-b237-a18cae3b08f1

Pipeline LLM cost (USD)

API 1: $0.0065 API 2: $0.0000 API 3: $0.0000 Total: $0.0065

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description

SPARSE JD role baseline loaded sources · ai_index: role_baseline · nature_of_work: jd · tech_stack_maturity: role_baseline

Nature of work · Data transformation and modeling

Develop Spark/PySpark jobs to work with structured data, likely on cloud platforms, with big-data processing as a strong plus.

"Excellent in Spark/Pyspark"

Tech stack maturity

Modern Cloud Native

Data engineers typically build cloud-based batch and streaming pipelines and warehouse models, but AI is usually incidental rather than central to the role.

AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)

1.20 / 5

· Title match

· Has AI skill

· AI skill (primary)

· AI skill (secondary)

· On AI team

· Builds AI products

vocab breakdown (legacy)

Assistants (×1): —

Frameworks (×2): —

Models / concepts (×3): —

Evidence — skills matched in JD (2)

Spark PySpark

Skill cluster (2 dimension groups, role-scoped)

ETL and ELT Tooling

Spark

Cross-cutting / unaligned

PySpark

Show KRA description ↓

Excellent in Spark/Pyspark Experience working with structured data Exposure to cloud Bid data will be a big plus

Signals

Skill data-engineer

0.50

Alias —

—

KRA data-engineer

0.53

Post-classification

Centroidupdated · n=375

Alias collision log—

New-role queue—

New skills captured1

New KRA captured—

Captured for admin review

PySpark primary ↔ Data Engineer pending

Status: completed Created: 2026-05-27T16:00:45.283441Z Updated: 2026-05-27T16:01:17.002721Z API 3 duration: 6812 ms

Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

domain · Data Engineering & Analytics CASE DOMAIN

slug: data-engineer · id: 2 · source: db

Domain=Data Engineering & Analytics; The JD centers on Spark/PySpark, structured data, and cloud exposure, which aligns best with data engineering responsibilities.

Matched skills

SparkPysparkstructured datacloudbig data

Matched dimensions

Big Data ProcessingCloud Data EngineeringStructured Data Handling

Matched KRAs

Excellent in Spark/PysparkExperience working with structured dataExposure to cloud

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

New skills

Skill↔dim saved

Role↔dim saved

Skipped

Job description

Experience: 4+ yrs
Job Location: Bangalore
Notice Period: Immediate to 20 days Joining






Mandatory skills:


Excellent in Spark/Pyspark
Experience working with structured data
Exposure to cloud
Bid data will be a big plus




About US:
Thank you for expressing interest in a career with TEKsystems Global Services (TGS). At TEKsystems Global Services, we believe in lasting careers with room to run. Giving our people limitless opportunity to make an impact and enable the world’s largest companies to transform how they do business.
We’re the professional services division of TEKsystems, accounting for over $1 Billion in revenue. We’re one of India’s fastest growing full-stack services companies, with about 5500 full time employees across the globe of which 2000 are in India (Bangalore and Hyderabad). TGS operates through multiple solution centers across North America, EMEA and APAC, including locations like Dallas, Redmond, Bloomington, Baltimore, Maryland, Europe (Amsterdam, London), Canada (Montreal), and Philippines (Manila).
To support key areas of our growth, we acquired two companies to join our family. One North, a full-service digital agency and 1Strategy a premiere AWS service provider. Through One North we’re able to help our customers elevate their customer, brand and UI/UX experiences. And, through 1Strategy, we’re able to ensure our customers can take full advantage of the complete AWS solutions portfolio.


Certifications
Joining the elite team at TEKsystems Global Services, gives you runway to grow with us. Upskill faster. Surpass your peers. Even earning certifications:
• ISO 27001
• HITRUST Certification
• PCI DSS Certification
• HIPPA Compliance
• PMP certified Project Managers


Partnership
The world’s leading technology and software providers partner with us because of our scale, full-stack capabilities and speed—giving you the room to specialize and sharpen your skills on the most innovative platforms and game-changing technology.
• Amazon Webservices Advanced Consulting Partner
• Microsoft Gold Partner
• Google Cloud Premier Partner
• Other top partnerships – Snowflake, RedHat, Oracle platinum partner, Salesforce Gold partner, ServiceNow Managed Service Provider and reseller, MuleSoft, Tableau system Integrator, SailPoint Systems, Cloudera Specialized, Hortornworks Community Partner
TGS offers a wide range of IT services including but not limited to delivering high end business consulting services and building applications. This is done through multiple centers of excellence including:
• Data Analytics
• Data Insights
• Enterprise Integration
• Enterprise cloud application (Salesforce and Oracle)
• Transformation Operation Management
• Continuous Development
• Continuous Testing
• Transformation Devops Cloud


For more details, please visit us on
https://www.teksystems.com/
https://www.teksystems.com/en-in/services
https://www.teksystems.com/careers-in-india
https://www.linkedin.com/company/teksystems-global-services-india/mycompany/

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Spark Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Apache Spark id=1350 · apache-spark

Aliases — catalog

Apache Spark (CANONICAL)
apache spark 3 (VERSION)
spark (VERSION)
spark 3 (VERSION)
spark 3.x (VERSION)
spark3 (VERSION)

Context tags (catalog)

Apache Kafka Cluster Manager DAGScheduler Data Lake DataFrame ETL Hadoop MLlib Machine Learning PySpark RDD Scala Spark SQL Spark Streaming SparkSession

Stored enrichment (catalog DB)

Category: Framework
Sub-category: Distributed Data Processing Framework
Vendor: Apache Software Foundation
License: apache_2
Year introduced: 2010
Confidence: 0.94
Version strategy: SEPARATE_ENTITY
Version tag: 3.x

Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.

Skill profile (library / DB)

Skill nature: FRAMEWORK
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 5
Sub-category id: 1021
Extractable: True
Also category: False

Dimensions (API 2 worklist)

ETL and ELT Tooling Catalog dimension db id 24

Library dimension (catalog)

Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
ETL and ELT Tooling etl-and-elt-tooling	✓	✓	Existing dimension (library) · Role↔dimension saved

PySpark Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Apache Spark id=1350 · apache-spark

Aliases — catalog

Apache Spark (CANONICAL)
apache spark 3 (VERSION)
spark (VERSION)
spark 3 (VERSION)
spark 3.x (VERSION)
spark3 (VERSION)

Context tags (catalog)

Apache Kafka Cluster Manager DAGScheduler Data Lake DataFrame ETL Hadoop MLlib Machine Learning PySpark RDD Scala Spark SQL Spark Streaming SparkSession

Stored enrichment (catalog DB)

Category: Framework
Sub-category: Distributed Data Processing Framework
Vendor: Apache Software Foundation
License: apache_2
Year introduced: 2010
Confidence: 0.94
Version strategy: SEPARATE_ENTITY
Version tag: 3.x

Skill profile (library / DB)

Skill nature: FRAMEWORK
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 5
Sub-category id: 1021
Extractable: True
Also category: False

Dimensions (API 2 worklist)

ETL and ELT Tooling Catalog dimension db id 24

Library dimension (catalog)

Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
ETL and ELT Tooling etl-and-elt-tooling	—	—	Skipped — no persistable v3 meta for new skill skill_not_in_db_v3_proposed

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill	Tag	Dimension	Skill↔dim	Role↔dim	Outcome	Notes
Spark	in_db	ETL and ELT Tooling etl-and-elt-tooling	✓	✓	Existing dimension (library) · Role↔dimension saved
PySpark	new	ETL and ELT Tooling etl-and-elt-tooling	—	—	Skipped — no persistable v3 meta for new skill	skill_not_in_db_v3_proposed

Library artifacts (this run)

Kind	Detail	DB id
dimension_skill_link_proposed	PySpark ↔ ETL and ELT Tooling
role_dimension_link_proposed	Data Engineer ↔ ETL and ELT Tooling

nano JD Parser — gpt-4.1-nano click to toggle

CompanyTEKsystems Global Services

Experience4+ yrs

DomainIT Services & Consulting

Location Bangalore, India

JD type pass

Certifications

ISO 27001 HITRUST Certification PCI DSS Certification HIPPA Compliance PMP certified Project Managers

Show raw JSON

{
  "JD_type": "pass",
  "about_company": {
    "source_marker": {
      "first_5_words": "Thank you for expressing interest",
      "last_5_words": "full advantage of the complete AWS solutions portfolio."
    },
    "text": "Thank you for expressing interest in a career with TEKsystems Global Services (TGS). At TEKsystems Global Services, we believe in lasting careers with room to run. Giving our people limitless opportunity to make an impact and enable the world\u2019s largest companies to transform how they do business. We\u2019re the professional services division of TEKsystems, accounting for over $1 Billion in revenue. We\u2019re one of India\u2019s fastest growing full-stack services companies, with about 5500 full time employees across the globe of which 2000 are in India (Bangalore and Hyderabad). TGS operates through multiple solution centers across North America, EMEA and APAC, including locations like Dallas, Redmond, Bloomington, Baltimore, Maryland, Europe (Amsterdam, London), Canada (Montreal), and Philippines (Manila). To support key areas of our growth, we acquired two companies to join our family. One North, a full-service digital agency and 1Strategy a premiere AWS service provider. Through One North we\u2019re able to help our customers elevate their customer, brand and UI/UX experiences. And, through 1Strategy, we\u2019re able to ensure our customers can take full advantage of the complete AWS solutions portfolio.",
    "word_count": 164
  },
  "archetype_override_applied": true,
  "archetype_override_matched_skills": [
    "Snowflake",
    "Tableau",
    "AWS",
    "Make",
    "DevOps",
    "UI/UX",
    "Analytics",
    "Cloud",
    "ISO 27001",
    "Room",
    "Provider",
    "Location",
    "PCI DSS"
  ],
  "certifications": [
    "ISO 27001",
    "HITRUST Certification",
    "PCI DSS Certification",
    "HIPPA Compliance",
    "PMP certified Project Managers"
  ],
  "company_name": "TEKsystems Global Services",
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [
        "ITES",
        "BPO"
      ],
      "domain": "IT Services \u0026 Consulting"
    },
    "secondary": null
  },
  "education": [],
  "experience": {
    "max": null,
    "min": 4,
    "raw": "4+ yrs"
  },
  "job_locations": [
    {
      "aliases": [
        "Bengaluru"
      ],
      "city": "Bangalore",
      "country": "India",
      "state": null,
      "work_mode": null
    }
  ],
  "role": null,
  "role_aliases": [],
  "role_archetype": "Engineering",
  "roles_and_responsibilities": [
    {
      "bullet_count": 4,
      "heading": "Mandatory skills",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Excellent in Spark/Pyspark Experience",
        "last_5_words": "data will be a big plus"
      },
      "text": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
      "word_count": 24
    }
  ],
  "urls": [
    {
      "type": "website",
      "url": "https://www.teksystems.com/"
    },
    {
      "type": "other",
      "url": "https://www.teksystems.com/en-in/services"
    },
    {
      "type": "careers",
      "url": "https://www.teksystems.com/careers-in-india"
    },
    {
      "type": "linkedin",
      "url": "https://www.linkedin.com/company/teksystems-global-services-india/mycompany/"
    }
  ]
}

API 1 — extract-from-jd click to toggle

{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Spark"
    },
    {
      "is_primary": true,
      "skill_name": "PySpark"
    }
  ],
  "jd_role": null,
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": {
      "source_marker": {
        "first_5_words": "Thank you for expressing interest",
        "last_5_words": "full advantage of the complete AWS solutions portfolio."
      },
      "text": "Thank you for expressing interest in a career with TEKsystems Global Services (TGS). At TEKsystems Global Services, we believe in lasting careers with room to run. Giving our people limitless opportunity to make an impact and enable the world\u2019s largest companies to transform how they do business. We\u2019re the professional services division of TEKsystems, accounting for over $1 Billion in revenue. We\u2019re one of India\u2019s fastest growing full-stack services companies, with about 5500 full time employees across the globe of which 2000 are in India (Bangalore and Hyderabad). TGS operates through multiple solution centers across North America, EMEA and APAC, including locations like Dallas, Redmond, Bloomington, Baltimore, Maryland, Europe (Amsterdam, London), Canada (Montreal), and Philippines (Manila). To support key areas of our growth, we acquired two companies to join our family. One North, a full-service digital agency and 1Strategy a premiere AWS service provider. Through One North we\u2019re able to help our customers elevate their customer, brand and UI/UX experiences. And, through 1Strategy, we\u2019re able to ensure our customers can take full advantage of the complete AWS solutions portfolio.",
      "word_count": 164
    },
    "archetype_override_applied": true,
    "archetype_override_matched_skills": [
      "Snowflake",
      "Tableau",
      "AWS",
      "Make",
      "DevOps",
      "UI/UX",
      "Analytics",
      "Cloud",
      "ISO 27001",
      "Room",
      "Provider",
      "Location",
      "PCI DSS"
    ],
    "certifications": [
      "ISO 27001",
      "HITRUST Certification",
      "PCI DSS Certification",
      "HIPPA Compliance",
      "PMP certified Project Managers"
    ],
    "company_name": "TEKsystems Global Services",
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [
          "ITES",
          "BPO"
        ],
        "domain": "IT Services \u0026 Consulting"
      },
      "secondary": null
    },
    "education": [],
    "experience": {
      "max": null,
      "min": 4,
      "raw": "4+ yrs"
    },
    "job_locations": [
      {
        "aliases": [
          "Bengaluru"
        ],
        "city": "Bangalore",
        "country": "India",
        "state": null,
        "work_mode": null
      }
    ],
    "role": null,
    "role_aliases": [],
    "role_archetype": "Engineering",
    "roles_and_responsibilities": [
      {
        "bullet_count": 4,
        "heading": "Mandatory skills",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Excellent in Spark/Pyspark Experience",
          "last_5_words": "data will be a big plus"
        },
        "text": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
        "word_count": 24
      }
    ],
    "urls": [
      {
        "type": "website",
        "url": "https://www.teksystems.com/"
      },
      {
        "type": "other",
        "url": "https://www.teksystems.com/en-in/services"
      },
      {
        "type": "careers",
        "url": "https://www.teksystems.com/careers-in-india"
      },
      {
        "type": "linkedin",
        "url": "https://www.linkedin.com/company/teksystems-global-services-india/mycompany/"
      }
    ]
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "e7cc83d4-091c-4d28-b237-a18cae3b08f1",
  "stage3_signals": {
    "alias_found": false,
    "alias_match_roles": [],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": [
          {
            "kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
            "sentence": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
            "similarity": 0.5312
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 0.5312,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "ML Engineer",
        "kra_matches": [
          {
            "kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
            "sentence": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
            "similarity": 0.4005
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 3,
        "score": 0.4005,
        "slug": "ml-engineer",
        "total_count": null
      },
      {
        "display_name": "Fullstack Developer",
        "kra_matches": [
          {
            "kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
            "sentence": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
            "similarity": 0.3796
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 15,
        "score": 0.3796,
        "slug": "full-stack-engineer",
        "total_count": null
      },
      {
        "display_name": "Svelte Frontend Developer",
        "kra_matches": [
          {
            "kra_text": "backend data integration",
            "sentence": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
            "similarity": 0.3763
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 92,
        "score": 0.3763,
        "slug": "svelte-frontend-developer",
        "total_count": null
      },
      {
        "display_name": "AI Engineer",
        "kra_matches": [
          {
            "kra_text": "Designs and implements prompt engineering workflows, few-shot examples, chain-of-thought patterns, and structured output parsing for AI feature pipelines.",
            "sentence": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
            "similarity": 0.3653
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 13,
        "score": 0.3653,
        "slug": "ai-engineer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Apache Spark"
        ],
        "role_id": 2,
        "score": 0.5,
        "slug": "data-engineer",
        "total_count": 2
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "DOMAIN",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 0.97,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 0.97,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [
      "Big Data Processing",
      "Cloud Data Engineering",
      "Structured Data Handling"
    ],
    "matched_kras": [
      "Excellent in Spark/Pyspark",
      "Experience working with structured data",
      "Exposure to cloud"
    ],
    "matched_skills": [
      "Spark",
      "Pyspark",
      "structured data",
      "cloud",
      "big data"
    ],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Domain=Data Engineering \u0026 Analytics; The JD centers on Spark/PySpark, structured data, and cloud exposure, which aligns best with data engineering responsibilities.",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 375,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": null,
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 17758,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "PySpark",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}

API 2 — extract-details

{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2510,
      "existing_alias_text": "spark",
      "input_term": "Spark",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "TODO: REMOVE AFTER TESTING \u2014 alias DB write disabled",
      "alias_persisted": false,
      "existing_alias_id": 2004,
      "existing_alias_text": "Apache Spark",
      "input_term": "PySpark",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "embedding_alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "Data Engineer",
      "id": 2,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD centers on Spark/PySpark, structured data, and cloud exposure, which aligns best with data engineering responsibilities.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "ETL and ELT Tooling",
        "id": 24,
        "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
        "slug": "etl-and-elt-tooling",
        "source": "db"
      },
      "input_skill": "Spark",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "ETL and ELT Tooling",
        "id": 24,
        "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
        "slug": "etl-and-elt-tooling",
        "source": "db"
      },
      "input_skill": "PySpark",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    }
  ],
  "input_final_skills": [
    "Spark",
    "PySpark"
  ],
  "input_llm_skills": [
    "Spark",
    "PySpark"
  ],
  "new_aliases_persisted": 0,
  "run_id": "e7cc83d4-091c-4d28-b237-a18cae3b08f1",
  "skills_detail": [
    {
      "aliases_in_db": [
        {
          "alias_text": "Apache Spark",
          "alias_type": "CANONICAL",
          "id": 2004,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "apache spark 3",
          "alias_type": "VERSION",
          "id": 2006,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark",
          "alias_type": "VERSION",
          "id": 2510,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3",
          "alias_type": "VERSION",
          "id": 2007,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3.x",
          "alias_type": "VERSION",
          "id": 2009,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark3",
          "alias_type": "VERSION",
          "id": 2008,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "ETL and ELT Tooling",
            "id": 24,
            "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
            "slug": "etl-and-elt-tooling",
            "source": "db"
          },
          "input_skill": "Spark",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Spark",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Apache Spark",
          "alias_type": "CANONICAL",
          "id": 2004,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "apache spark 3",
          "alias_type": "VERSION",
          "id": 2006,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark",
          "alias_type": "VERSION",
          "id": 2510,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3",
          "alias_type": "VERSION",
          "id": 2007,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3.x",
          "alias_type": "VERSION",
          "id": 2009,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark3",
          "alias_type": "VERSION",
          "id": 2008,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "ETL and ELT Tooling",
            "id": 24,
            "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
            "slug": "etl-and-elt-tooling",
            "source": "db"
          },
          "input_skill": "PySpark",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "PySpark",
      "matched_via": "embedding_alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": []
}

API 3 — final-role-output

{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD centers on Spark/PySpark, structured data, and cloud exposure, which aligns best with data engineering responsibilities.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "Spark",
      "tag": "in_db"
    },
    {
      "skill": "PySpark",
      "tag": "in_db"
    }
  ],
  "llm_cost_api1_usd": null,
  "llm_cost_api2_usd": null,
  "llm_cost_api3_usd": null,
  "llm_cost_total_usd": null,
  "persistence": {
    "items": [
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "ETL and ELT Tooling",
          "id": 24,
          "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
          "slug": "etl-and-elt-tooling",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "Spark",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1350,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "ETL and ELT Tooling",
          "id": 24,
          "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
          "slug": "etl-and-elt-tooling",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "PySpark",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": false,
        "skill_id": null,
        "skill_tag": "new",
        "skipped_reason": "skill_not_in_db_v3_proposed"
      }
    ],
    "new_skills_created": 0,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 0,
    "skipped": 1
  },
  "planner_output": null,
  "run_id": "e7cc83d4-091c-4d28-b237-a18cae3b08f1"
}