← Back to history

Pipeline run

e7cc83d4-091c-4d28-b237-a18cae3b08f1

Pipeline LLM cost (USD)
API 1: $0.0065 API 2: $0.0000 API 3: $0.0000 Total: $0.0065

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
SPARSE JD role baseline loaded sources · ai_index: role_baseline · nature_of_work: jd · tech_stack_maturity: role_baseline
Nature of work · Data transformation and modeling
Develop Spark/PySpark jobs to work with structured data, likely on cloud platforms, with big-data processing as a strong plus.
"Excellent in Spark/Pyspark"
Tech stack maturity
Modern Cloud Native
Data engineers typically build cloud-based batch and streaming pipelines and warehouse models, but AI is usually incidental rather than central to the role.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
1.20 / 5
· Title match
· Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
Frameworks (×2):
Models / concepts (×3):
Evidence — skills matched in JD (2)
Spark PySpark
Skill cluster (2 dimension groups, role-scoped)
ETL and ELT Tooling
Spark
Cross-cutting / unaligned
PySpark
Show KRA description ↓
Excellent in Spark/Pyspark Experience working with structured data Exposure to cloud Bid data will be a big plus

Signals

Skill data-engineer
0.50
Alias
KRA data-engineer
0.53

Post-classification

Centroidupdated · n=375
Alias collision log
New-role queue
New skills captured1
New KRA captured

Captured for admin review

PySpark primary Data Engineer pending
Status: completed Created: 2026-05-27T16:00:45.283441Z Updated: 2026-05-27T16:01:17.002721Z API 3 duration: 6812 ms
Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

domain · Data Engineering & Analytics CASE DOMAIN

slug: data-engineer · id: 2 · source: db

Domain=Data Engineering & Analytics; The JD centers on Spark/PySpark, structured data, and cloud exposure, which aligns best with data engineering responsibilities.

Matched skills

SparkPysparkstructured datacloudbig data

Matched dimensions

Big Data ProcessingCloud Data EngineeringStructured Data Handling

Matched KRAs

Excellent in Spark/PysparkExperience working with structured dataExposure to cloud

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

0
New skills
0
Skill↔dim saved
0
Role↔dim saved
1
Skipped

Job description

Experience: 4+ yrs
Job Location: Bangalore
Notice Period: Immediate to 20 days Joining






Mandatory skills:


Excellent in Spark/Pyspark
Experience working with structured data
Exposure to cloud
Bid data will be a big plus




About US:
Thank you for expressing interest in a career with TEKsystems Global Services (TGS). At TEKsystems Global Services, we believe in lasting careers with room to run. Giving our people limitless opportunity to make an impact and enable the world’s largest companies to transform how they do business.
We’re the professional services division of TEKsystems, accounting for over $1 Billion in revenue. We’re one of India’s fastest growing full-stack services companies, with about 5500 full time employees across the globe of which 2000 are in India (Bangalore and Hyderabad). TGS operates through multiple solution centers across North America, EMEA and APAC, including locations like Dallas, Redmond, Bloomington, Baltimore, Maryland, Europe (Amsterdam, London), Canada (Montreal), and Philippines (Manila).
To support key areas of our growth, we acquired two companies to join our family. One North, a full-service digital agency and 1Strategy a premiere AWS service provider. Through One North we’re able to help our customers elevate their customer, brand and UI/UX experiences. And, through 1Strategy, we’re able to ensure our customers can take full advantage of the complete AWS solutions portfolio.


Certifications
Joining the elite team at TEKsystems Global Services, gives you runway to grow with us. Upskill faster. Surpass your peers. Even earning certifications:
• ISO 27001
• HITRUST Certification
• PCI DSS Certification
• HIPPA Compliance
• PMP certified Project Managers


Partnership
The world’s leading technology and software providers partner with us because of our scale, full-stack capabilities and speed—giving you the room to specialize and sharpen your skills on the most innovative platforms and game-changing technology.
• Amazon Webservices Advanced Consulting Partner
• Microsoft Gold Partner
• Google Cloud Premier Partner
• Other top partnerships – Snowflake, RedHat, Oracle platinum partner, Salesforce Gold partner, ServiceNow Managed Service Provider and reseller, MuleSoft, Tableau system Integrator, SailPoint Systems, Cloudera Specialized, Hortornworks Community Partner
TGS offers a wide range of IT services including but not limited to delivering high end business consulting services and building applications. This is done through multiple centers of excellence including:
• Data Analytics
• Data Insights
• Enterprise Integration
• Enterprise cloud application (Salesforce and Oracle)
• Transformation Operation Management
• Continuous Development
• Continuous Testing
• Transformation Devops Cloud


For more details, please visit us on
https://www.teksystems.com/
https://www.teksystems.com/en-in/services
https://www.teksystems.com/careers-in-india
https://www.linkedin.com/company/teksystems-global-services-india/mycompany/

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Spark Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Apache Spark id=1350 · apache-spark

Aliases — catalog

  • Apache Spark (CANONICAL)
  • apache spark 3 (VERSION)
  • spark (VERSION)
  • spark 3 (VERSION)
  • spark 3.x (VERSION)
  • spark3 (VERSION)

Context tags (catalog)

Apache Kafka Cluster Manager DAGScheduler Data Lake DataFrame ETL Hadoop MLlib Machine Learning PySpark RDD Scala Spark SQL Spark Streaming SparkSession

Stored enrichment (catalog DB)

Category
Framework
Sub-category
Distributed Data Processing Framework
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2010
Confidence
0.94
Version strategy
SEPARATE_ENTITY
Version tag
3.x

Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.

Skill profile (library / DB)

Skill nature
FRAMEWORK
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
5
Sub-category id
1021
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • ETL and ELT Tooling Catalog dimension db id 24

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
PySpark Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Apache Spark id=1350 · apache-spark

Aliases — catalog

  • Apache Spark (CANONICAL)
  • apache spark 3 (VERSION)
  • spark (VERSION)
  • spark 3 (VERSION)
  • spark 3.x (VERSION)
  • spark3 (VERSION)

Context tags (catalog)

Apache Kafka Cluster Manager DAGScheduler Data Lake DataFrame ETL Hadoop MLlib Machine Learning PySpark RDD Scala Spark SQL Spark Streaming SparkSession

Stored enrichment (catalog DB)

Category
Framework
Sub-category
Distributed Data Processing Framework
Vendor
Apache Software Foundation
License
apache_2
Year introduced
2010
Confidence
0.94
Version strategy
SEPARATE_ENTITY
Version tag
3.x

Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.

Skill profile (library / DB)

Skill nature
FRAMEWORK
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
5
Sub-category id
1021
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • ETL and ELT Tooling Catalog dimension db id 24

    Library dimension (catalog)

    Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
ETL and ELT Tooling
etl-and-elt-tooling
Skipped — no persistable v3 meta for new skill
skill_not_in_db_v3_proposed

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill Tag Dimension Skill↔dim Role↔dim Outcome Notes
Spark in_db
ETL and ELT Tooling
etl-and-elt-tooling
Existing dimension (library) · Role↔dimension saved
PySpark new
ETL and ELT Tooling
etl-and-elt-tooling
Skipped — no persistable v3 meta for new skill skill_not_in_db_v3_proposed

Library artifacts (this run)

Kind Detail DB id
dimension_skill_link_proposed PySpark ↔ ETL and ELT Tooling
role_dimension_link_proposed Data Engineer ↔ ETL and ELT Tooling
nano JD Parser — gpt-4.1-nano click to toggle
CompanyTEKsystems Global Services
Experience4+ yrs
DomainIT Services & Consulting
Location Bangalore, India
JD type pass

Certifications

ISO 27001 HITRUST Certification PCI DSS Certification HIPPA Compliance PMP certified Project Managers
Show raw JSON
{
  "JD_type": "pass",
  "about_company": {
    "source_marker": {
      "first_5_words": "Thank you for expressing interest",
      "last_5_words": "full advantage of the complete AWS solutions portfolio."
    },
    "text": "Thank you for expressing interest in a career with TEKsystems Global Services (TGS). At TEKsystems Global Services, we believe in lasting careers with room to run. Giving our people limitless opportunity to make an impact and enable the world\u2019s largest companies to transform how they do business. We\u2019re the professional services division of TEKsystems, accounting for over $1 Billion in revenue. We\u2019re one of India\u2019s fastest growing full-stack services companies, with about 5500 full time employees across the globe of which 2000 are in India (Bangalore and Hyderabad). TGS operates through multiple solution centers across North America, EMEA and APAC, including locations like Dallas, Redmond, Bloomington, Baltimore, Maryland, Europe (Amsterdam, London), Canada (Montreal), and Philippines (Manila). To support key areas of our growth, we acquired two companies to join our family. One North, a full-service digital agency and 1Strategy a premiere AWS service provider. Through One North we\u2019re able to help our customers elevate their customer, brand and UI/UX experiences. And, through 1Strategy, we\u2019re able to ensure our customers can take full advantage of the complete AWS solutions portfolio.",
    "word_count": 164
  },
  "archetype_override_applied": true,
  "archetype_override_matched_skills": [
    "Snowflake",
    "Tableau",
    "AWS",
    "Make",
    "DevOps",
    "UI/UX",
    "Analytics",
    "Cloud",
    "ISO 27001",
    "Room",
    "Provider",
    "Location",
    "PCI DSS"
  ],
  "certifications": [
    "ISO 27001",
    "HITRUST Certification",
    "PCI DSS Certification",
    "HIPPA Compliance",
    "PMP certified Project Managers"
  ],
  "company_name": "TEKsystems Global Services",
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [
        "ITES",
        "BPO"
      ],
      "domain": "IT Services \u0026 Consulting"
    },
    "secondary": null
  },
  "education": [],
  "experience": {
    "max": null,
    "min": 4,
    "raw": "4+ yrs"
  },
  "job_locations": [
    {
      "aliases": [
        "Bengaluru"
      ],
      "city": "Bangalore",
      "country": "India",
      "state": null,
      "work_mode": null
    }
  ],
  "role": null,
  "role_aliases": [],
  "role_archetype": "Engineering",
  "roles_and_responsibilities": [
    {
      "bullet_count": 4,
      "heading": "Mandatory skills",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Excellent in Spark/Pyspark Experience",
        "last_5_words": "data will be a big plus"
      },
      "text": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
      "word_count": 24
    }
  ],
  "urls": [
    {
      "type": "website",
      "url": "https://www.teksystems.com/"
    },
    {
      "type": "other",
      "url": "https://www.teksystems.com/en-in/services"
    },
    {
      "type": "careers",
      "url": "https://www.teksystems.com/careers-in-india"
    },
    {
      "type": "linkedin",
      "url": "https://www.linkedin.com/company/teksystems-global-services-india/mycompany/"
    }
  ]
}
API 1 — extract-from-jd click to toggle
{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Spark"
    },
    {
      "is_primary": true,
      "skill_name": "PySpark"
    }
  ],
  "jd_role": null,
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": {
      "source_marker": {
        "first_5_words": "Thank you for expressing interest",
        "last_5_words": "full advantage of the complete AWS solutions portfolio."
      },
      "text": "Thank you for expressing interest in a career with TEKsystems Global Services (TGS). At TEKsystems Global Services, we believe in lasting careers with room to run. Giving our people limitless opportunity to make an impact and enable the world\u2019s largest companies to transform how they do business. We\u2019re the professional services division of TEKsystems, accounting for over $1 Billion in revenue. We\u2019re one of India\u2019s fastest growing full-stack services companies, with about 5500 full time employees across the globe of which 2000 are in India (Bangalore and Hyderabad). TGS operates through multiple solution centers across North America, EMEA and APAC, including locations like Dallas, Redmond, Bloomington, Baltimore, Maryland, Europe (Amsterdam, London), Canada (Montreal), and Philippines (Manila). To support key areas of our growth, we acquired two companies to join our family. One North, a full-service digital agency and 1Strategy a premiere AWS service provider. Through One North we\u2019re able to help our customers elevate their customer, brand and UI/UX experiences. And, through 1Strategy, we\u2019re able to ensure our customers can take full advantage of the complete AWS solutions portfolio.",
      "word_count": 164
    },
    "archetype_override_applied": true,
    "archetype_override_matched_skills": [
      "Snowflake",
      "Tableau",
      "AWS",
      "Make",
      "DevOps",
      "UI/UX",
      "Analytics",
      "Cloud",
      "ISO 27001",
      "Room",
      "Provider",
      "Location",
      "PCI DSS"
    ],
    "certifications": [
      "ISO 27001",
      "HITRUST Certification",
      "PCI DSS Certification",
      "HIPPA Compliance",
      "PMP certified Project Managers"
    ],
    "company_name": "TEKsystems Global Services",
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [
          "ITES",
          "BPO"
        ],
        "domain": "IT Services \u0026 Consulting"
      },
      "secondary": null
    },
    "education": [],
    "experience": {
      "max": null,
      "min": 4,
      "raw": "4+ yrs"
    },
    "job_locations": [
      {
        "aliases": [
          "Bengaluru"
        ],
        "city": "Bangalore",
        "country": "India",
        "state": null,
        "work_mode": null
      }
    ],
    "role": null,
    "role_aliases": [],
    "role_archetype": "Engineering",
    "roles_and_responsibilities": [
      {
        "bullet_count": 4,
        "heading": "Mandatory skills",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Excellent in Spark/Pyspark Experience",
          "last_5_words": "data will be a big plus"
        },
        "text": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
        "word_count": 24
      }
    ],
    "urls": [
      {
        "type": "website",
        "url": "https://www.teksystems.com/"
      },
      {
        "type": "other",
        "url": "https://www.teksystems.com/en-in/services"
      },
      {
        "type": "careers",
        "url": "https://www.teksystems.com/careers-in-india"
      },
      {
        "type": "linkedin",
        "url": "https://www.linkedin.com/company/teksystems-global-services-india/mycompany/"
      }
    ]
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "e7cc83d4-091c-4d28-b237-a18cae3b08f1",
  "stage3_signals": {
    "alias_found": false,
    "alias_match_roles": [],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": [
          {
            "kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
            "sentence": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
            "similarity": 0.5312
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 0.5312,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "ML Engineer",
        "kra_matches": [
          {
            "kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
            "sentence": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
            "similarity": 0.4005
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 3,
        "score": 0.4005,
        "slug": "ml-engineer",
        "total_count": null
      },
      {
        "display_name": "Fullstack Developer",
        "kra_matches": [
          {
            "kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
            "sentence": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
            "similarity": 0.3796
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 15,
        "score": 0.3796,
        "slug": "full-stack-engineer",
        "total_count": null
      },
      {
        "display_name": "Svelte Frontend Developer",
        "kra_matches": [
          {
            "kra_text": "backend data integration",
            "sentence": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
            "similarity": 0.3763
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 92,
        "score": 0.3763,
        "slug": "svelte-frontend-developer",
        "total_count": null
      },
      {
        "display_name": "AI Engineer",
        "kra_matches": [
          {
            "kra_text": "Designs and implements prompt engineering workflows, few-shot examples, chain-of-thought patterns, and structured output parsing for AI feature pipelines.",
            "sentence": "Excellent in Spark/Pyspark\nExperience working with structured data\nExposure to cloud\nBid data will be a big plus",
            "similarity": 0.3653
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 13,
        "score": 0.3653,
        "slug": "ai-engineer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Apache Spark"
        ],
        "role_id": 2,
        "score": 0.5,
        "slug": "data-engineer",
        "total_count": 2
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "DOMAIN",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 0.97,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 0.97,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [
      "Big Data Processing",
      "Cloud Data Engineering",
      "Structured Data Handling"
    ],
    "matched_kras": [
      "Excellent in Spark/Pyspark",
      "Experience working with structured data",
      "Exposure to cloud"
    ],
    "matched_skills": [
      "Spark",
      "Pyspark",
      "structured data",
      "cloud",
      "big data"
    ],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Domain=Data Engineering \u0026 Analytics; The JD centers on Spark/PySpark, structured data, and cloud exposure, which aligns best with data engineering responsibilities.",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 375,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": null,
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 17758,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "PySpark",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}
API 2 — extract-details
{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2510,
      "existing_alias_text": "spark",
      "input_term": "Spark",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    },
    {
      "alias_persist_skipped_reason": "TODO: REMOVE AFTER TESTING \u2014 alias DB write disabled",
      "alias_persisted": false,
      "existing_alias_id": 2004,
      "existing_alias_text": "Apache Spark",
      "input_term": "PySpark",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "embedding_alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "Data Engineer",
      "id": 2,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD centers on Spark/PySpark, structured data, and cloud exposure, which aligns best with data engineering responsibilities.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "ETL and ELT Tooling",
        "id": 24,
        "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
        "slug": "etl-and-elt-tooling",
        "source": "db"
      },
      "input_skill": "Spark",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "ETL and ELT Tooling",
        "id": 24,
        "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
        "slug": "etl-and-elt-tooling",
        "source": "db"
      },
      "input_skill": "PySpark",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    }
  ],
  "input_final_skills": [
    "Spark",
    "PySpark"
  ],
  "input_llm_skills": [
    "Spark",
    "PySpark"
  ],
  "new_aliases_persisted": 0,
  "run_id": "e7cc83d4-091c-4d28-b237-a18cae3b08f1",
  "skills_detail": [
    {
      "aliases_in_db": [
        {
          "alias_text": "Apache Spark",
          "alias_type": "CANONICAL",
          "id": 2004,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "apache spark 3",
          "alias_type": "VERSION",
          "id": 2006,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark",
          "alias_type": "VERSION",
          "id": 2510,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3",
          "alias_type": "VERSION",
          "id": 2007,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3.x",
          "alias_type": "VERSION",
          "id": 2009,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark3",
          "alias_type": "VERSION",
          "id": 2008,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "ETL and ELT Tooling",
            "id": 24,
            "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
            "slug": "etl-and-elt-tooling",
            "source": "db"
          },
          "input_skill": "Spark",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Spark",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Apache Spark",
          "alias_type": "CANONICAL",
          "id": 2004,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "apache spark 3",
          "alias_type": "VERSION",
          "id": 2006,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark",
          "alias_type": "VERSION",
          "id": 2510,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3",
          "alias_type": "VERSION",
          "id": 2007,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3.x",
          "alias_type": "VERSION",
          "id": 2009,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark3",
          "alias_type": "VERSION",
          "id": 2008,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "ETL and ELT Tooling",
            "id": 24,
            "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
            "slug": "etl-and-elt-tooling",
            "source": "db"
          },
          "input_skill": "PySpark",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "PySpark",
      "matched_via": "embedding_alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": []
}
API 3 — final-role-output
{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Domain=Data Engineering \u0026 Analytics; The JD centers on Spark/PySpark, structured data, and cloud exposure, which aligns best with data engineering responsibilities.",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "Spark",
      "tag": "in_db"
    },
    {
      "skill": "PySpark",
      "tag": "in_db"
    }
  ],
  "llm_cost_api1_usd": null,
  "llm_cost_api2_usd": null,
  "llm_cost_api3_usd": null,
  "llm_cost_total_usd": null,
  "persistence": {
    "items": [
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "ETL and ELT Tooling",
          "id": 24,
          "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
          "slug": "etl-and-elt-tooling",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "Spark",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1350,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "ETL and ELT Tooling",
          "id": 24,
          "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
          "slug": "etl-and-elt-tooling",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "PySpark",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": false,
        "skill_id": null,
        "skill_tag": "new",
        "skipped_reason": "skill_not_in_db_v3_proposed"
      }
    ],
    "new_skills_created": 0,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 0,
    "skipped": 1
  },
  "planner_output": null,
  "run_id": "e7cc83d4-091c-4d28-b237-a18cae3b08f1"
}