Pipeline run

69a5b208-d3d6-459c-8c8a-42ea615ee412

Pipeline LLM cost (USD)

API 1: $0.0029 API 2: $0.0002 API 3: $0.0000 Total: $0.0031

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description

role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd

Nature of work · Data pipeline development

Build and maintain data pipelines and transformations in PySpark, working across IICS, Snowflake, Synapse, and ADF to move and reshape data for analytics platforms.

"Must have PySpark experience"

Tech stack maturity

Modern Cloud Native cache hit

Snowflake is a cloud-native data platform, and a Data Engineer role centered on it aligns with modern cloud-native stack maturity.

AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)

0.00 / 5

· Title match

· Has AI skill

· AI skill (primary)

· AI skill (secondary)

· On AI team

· Builds AI products

vocab breakdown (legacy)

Assistants (×1): —

Frameworks (×2): —

Models / concepts (×3): —

Evidence — skills matched in JD (5)

PySpark IICS Snowflake Synapse ADF

Skill cluster (2 dimension groups, role-scoped)

Cloud Data Warehouses

Snowflake

Cross-cutting / unaligned

PySpark IICS Synapse ADF

Show KRA description ↓

Must have PySpark experience Hands-on knowledge on IICS tool Hands-on Knowledge on Snowflake hands-on knowledge on Synapse Hands-on knowledge on ADF Hands-on experience on any of the above tool/technology is expected with PySpark being the must to have skill

Signals

Skill data-engineer

0.20

Alias data-engineer

1.00

KRA data-engineer

0.46

Post-classification

Centroidupdated · n=121

Alias collision log—

New-role queue—

New skills captured4

New KRA captured—

Captured for admin review

PySpark primary ↔ Data Engineer pending

IICS primary ↔ Data Engineer pending

Synapse primary ↔ Data Engineer pending

ADF primary ↔ Data Engineer pending

Status: completed Created: 2026-05-27T14:04:46.565155Z Updated: 2026-05-27T14:05:47.497430Z API 3 duration: 10750 ms

Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

CASE A

slug: data-engineer · id: 2 · source: db

Exact alias hit on data-engineer (1.0) — no other alias at this confidence; skill_top data-engineer 0.20 does not contradict

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

New skills

Skill↔dim saved

Role↔dim saved

Skipped

Job description

Job Description:


Profile: Senior Data Engineer / Data Lead
Location: Work From Home
Experience: 3-10 Years


WUElev8 is organising a 12 hour Online-interactive hiring hackathon called Next Pathway Hack Backpackers being presented by Next Pathway Inc. on 6th August 2022 that aims to solve a data problem and hire passionate and dedicated data enthusiasts and experienced professionals to join team as Senior Data Engineer / Data Lead / Data Consultant / Snowflake Consultant.


We have already provided hiring opportunities to many talented professionals in our 15+ hackathons. Our work has also been recognized by Hon'ble PM of India, various government and non-government organisations.


Mode: Online Interactive


Guidelines:
You can participate individuallyEvery participant needs to register on the WUElev8 platform and apply for participation in the hackathonThe mode of the hackathon is online interactive.You will work on the problem statement during the hackathon time onlyBased on your participation and solution, you will be screened further for the interview round.Offer letter can be released on the same day or next day of the interviewDo not copy or do plagiarism for the solution. If found, you will be disqualified.

I am sure you might have got super excited by now! Then what are you waiting for?


Hurry up & Register before it gets closed!


How to Participate:
Signup/Login on the platformRegister for the eventA welcome email will be sent to you for the event with the further details

Sign up & Register Now


Link: https://wuelev8.tech/drills/next-pathway-hack-backpackers


For more hiring & innovation hackathons stay connected with us:


LinkedIn Page: https://www.linkedin.com/company/wuelev8/
Website: https://wuelev8.tech/drill


Skills Required:
Must have PySpark experienceHands-on knowledge on IICS toolHands-on Knowledge on Snowflakehands-on knowledge on SynapseHands-on knowledge on ADFHands-on experience on any of the above tool/technology is expected with PySpark being the must to have skill

About Us:


WUElev8(Where you Elevate!) is a platform which empowers engineers to engage themselves in ongoing innovation journeys and thereby allowing them to elevate their careers to new heights.


The engineering undergrads and working professionals get the ‘value of their contributions’ on WUElev8 and which they can use in getting the best recommendations of best jobs, hiring challenges, hackathons which itself help them in elevating their career.


The platform best serves the organizations and startups which value the ongoing innovation and technology to scale their business and serve their customers by providing them the best talent enabling their innovation journeys and thereby elevating their businesses!

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

PySpark Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Apache Spark id=1350 · apache-spark

Aliases — catalog

Apache Spark (CANONICAL)
apache spark 3 (VERSION)
spark (VERSION)
spark 3 (VERSION)
spark 3.x (VERSION)
spark3 (VERSION)

Context tags (catalog)

Apache Kafka Cluster Manager DAGScheduler Data Lake DataFrame ETL Hadoop MLlib Machine Learning PySpark RDD Scala Spark SQL Spark Streaming SparkSession

Stored enrichment (catalog DB)

Category: Framework
Sub-category: Distributed Data Processing Framework
Vendor: Apache Software Foundation
License: apache_2
Year introduced: 2010
Confidence: 0.94
Version strategy: SEPARATE_ENTITY
Version tag: 3.x

Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.

Skill profile (library / DB)

Skill nature: FRAMEWORK
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 5
Sub-category id: 1021
Extractable: True
Also category: False

Dimensions (API 2 worklist)

ETL and ELT Tooling Catalog dimension db id 24

Library dimension (catalog)

Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
ETL and ELT Tooling etl-and-elt-tooling	—	—	Skipped — no persistable v3 meta for new skill skill_not_in_db_v3_proposed

IICS Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields

Category: Cloud Platforms
Sub-category: general
Skill nature: PLATFORM
Volatility: MEDIUM
Typical lifespan: MULTI_YEAR
Version strategy: UNVERSIONED

Snowflake Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)

Canonical: Snowflake id=105 · snowflake

Aliases — catalog

Snowflake (CANONICAL) primary

Context tags (catalog)

ELT ETL SQL Snowpark Snowpipe Streams Tasks Time Travel VARIANT data sharing data warehouse dbt semi-structured data virtual warehouse zero-copy cloning

Stored enrichment (catalog DB)

Category: Platform
Sub-category: Data Cloud Platform
Vendor: Snowflake Inc.
License: proprietary
Year introduced: 2012
Confidence: 0.98
Version strategy: NOT_APPLICABLE

Maturity reasoning: Snowflake appears frequently in data/analytics job postings and is a standard cloud data warehouse platform alongside BigQuery and Redshift.

Skill profile (library / DB)

Skill nature: PLATFORM
Volatility: STABLE
Typical lifespan: EVERGREEN
Category id: 9
Sub-category id: 113
Extractable: True
Also category: False

Dimensions (API 2 worklist)

Cloud Data Warehouses Catalog dimension db id 22

Library dimension (catalog)

Roles linked in library: Data Engineer

API 3 link attempts (this skill)

Dimension	Skill↔dim	Role↔dim	Outcome
Cloud Data Warehouses cloud-data-warehouses	✓	✓	Existing dimension (library) · Role↔dimension saved

Synapse Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields

Category: Cloud Platforms
Sub-category: general
Skill nature: PLATFORM
Volatility: MEDIUM
Typical lifespan: MULTI_YEAR
Version strategy: UNVERSIONED

ADF Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields

Category: Cloud Platforms
Sub-category: general
Skill nature: PLATFORM
Volatility: MEDIUM
Typical lifespan: MULTI_YEAR
Version strategy: UNVERSIONED

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill	Tag	Dimension	Skill↔dim	Role↔dim	Outcome	Notes
PySpark	new	ETL and ELT Tooling etl-and-elt-tooling	—	—	Skipped — no persistable v3 meta for new skill	skill_not_in_db_v3_proposed
Snowflake	in_db	Cloud Data Warehouses cloud-data-warehouses	✓	✓	Existing dimension (library) · Role↔dimension saved

Library artifacts (this run)

Kind	Detail	DB id
canonical_skill_proposed	IICS \| type=Cloud Platforms subtype=general nature=PLATFORM lifespan=MULTI_YEAR
canonical_skill_proposed	Synapse \| type=Cloud Platforms subtype=general nature=PLATFORM lifespan=MULTI_YEAR
canonical_skill_proposed	ADF \| type=Cloud Platforms subtype=general nature=PLATFORM lifespan=MULTI_YEAR
dimension_skill_link_proposed	PySpark ↔ ETL and ELT Tooling
role_dimension_link_proposed	Data Engineer ↔ ETL and ELT Tooling

nano JD Parser — gpt-4.1-nano click to toggle

RoleSenior Data Engineer / Data Lead

CompanyWUElev8

Experience3-10 Years

DomainOther

Location — (remote)

JD type pass

Show raw JSON

{
  "JD_type": "pass",
  "about_company": {
    "source_marker": {
      "first_5_words": "WUElev8(Where you Elevate!) is a",
      "last_5_words": "elevating their businesses!"
    },
    "text": "WUElev8(Where you Elevate!) is a platform which empowers engineers to engage themselves in ongoing innovation journeys and thereby allowing them to elevate their careers to new heights.\n\nThe engineering undergrads and working professionals get the \u2018value of their contributions\u2019 on WUElev8 and which they can use in getting the best recommendations of best jobs, hiring challenges, hackathons which itself help them in elevating their career.\n\nThe platform best serves the organizations and startups which value the ongoing innovation and technology to scale their business and serve their customers by providing them the best talent enabling their innovation journeys and thereby elevating their businesses!",
    "word_count": 84
  },
  "certifications": [],
  "company_name": "WUElev8",
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [],
      "domain": "Other"
    },
    "secondary": null
  },
  "education": [],
  "experience": {
    "max": 10,
    "min": 3,
    "raw": "3-10 Years"
  },
  "job_locations": [
    {
      "aliases": [],
      "city": null,
      "country": null,
      "state": null,
      "work_mode": "remote"
    }
  ],
  "role": "Senior Data Engineer / Data Lead",
  "role_aliases": [
    "Data Engineer",
    "Data Lead",
    "Data Consultant",
    "Snowflake Consultant"
  ],
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 6,
      "heading": "Skills Required",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Must have PySpark experience Hands-on",
        "last_5_words": "being the must to have skill"
      },
      "text": "Must have PySpark experience\nHands-on knowledge on IICS tool\nHands-on Knowledge on Snowflake\nhands-on knowledge on Synapse\nHands-on knowledge on ADF\nHands-on experience on any of the above tool/technology is expected with PySpark being the must to have skill",
      "word_count": 43
    }
  ],
  "urls": [
    {
      "type": "other",
      "url": "https://wuelev8.tech/drills/next-pathway-hack-backpackers"
    },
    {
      "type": "linkedin",
      "url": "https://www.linkedin.com/company/wuelev8/"
    },
    {
      "type": "website",
      "url": "https://wuelev8.tech/drill"
    }
  ]
}

API 1 — extract-from-jd click to toggle

{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "PySpark"
    },
    {
      "is_primary": true,
      "skill_name": "IICS"
    },
    {
      "is_primary": true,
      "skill_name": "Snowflake"
    },
    {
      "is_primary": true,
      "skill_name": "Synapse"
    },
    {
      "is_primary": true,
      "skill_name": "ADF"
    }
  ],
  "jd_role": {
    "display_name": "Senior Data Engineer / Data Lead",
    "rationale": null,
    "role_aliases": [
      "Data Engineer",
      "Data Lead",
      "Data Consultant",
      "Snowflake Consultant"
    ],
    "role_archetype": "Data",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": {
      "source_marker": {
        "first_5_words": "WUElev8(Where you Elevate!) is a",
        "last_5_words": "elevating their businesses!"
      },
      "text": "WUElev8(Where you Elevate!) is a platform which empowers engineers to engage themselves in ongoing innovation journeys and thereby allowing them to elevate their careers to new heights.\n\nThe engineering undergrads and working professionals get the \u2018value of their contributions\u2019 on WUElev8 and which they can use in getting the best recommendations of best jobs, hiring challenges, hackathons which itself help them in elevating their career.\n\nThe platform best serves the organizations and startups which value the ongoing innovation and technology to scale their business and serve their customers by providing them the best talent enabling their innovation journeys and thereby elevating their businesses!",
      "word_count": 84
    },
    "certifications": [],
    "company_name": "WUElev8",
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [],
        "domain": "Other"
      },
      "secondary": null
    },
    "education": [],
    "experience": {
      "max": 10,
      "min": 3,
      "raw": "3-10 Years"
    },
    "job_locations": [
      {
        "aliases": [],
        "city": null,
        "country": null,
        "state": null,
        "work_mode": "remote"
      }
    ],
    "role": "Senior Data Engineer / Data Lead",
    "role_aliases": [
      "Data Engineer",
      "Data Lead",
      "Data Consultant",
      "Snowflake Consultant"
    ],
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 6,
        "heading": "Skills Required",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Must have PySpark experience Hands-on",
          "last_5_words": "being the must to have skill"
        },
        "text": "Must have PySpark experience\nHands-on knowledge on IICS tool\nHands-on Knowledge on Snowflake\nhands-on knowledge on Synapse\nHands-on knowledge on ADF\nHands-on experience on any of the above tool/technology is expected with PySpark being the must to have skill",
        "word_count": 43
      }
    ],
    "urls": [
      {
        "type": "other",
        "url": "https://wuelev8.tech/drills/next-pathway-hack-backpackers"
      },
      {
        "type": "linkedin",
        "url": "https://www.linkedin.com/company/wuelev8/"
      },
      {
        "type": "website",
        "url": "https://wuelev8.tech/drill"
      }
    ]
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "69a5b208-d3d6-459c-8c8a-42ea615ee412",
  "stage3_signals": {
    "alias_found": true,
    "alias_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 1.0,
        "slug": "data-engineer",
        "total_count": null
      }
    ],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": [
          {
            "kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
            "sentence": "Hands-on experience on any of the above tool/technology is expected with PySpark being the must to have skill",
            "similarity": 0.4584
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 0.4584,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "ML Engineer",
        "kra_matches": [
          {
            "kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
            "sentence": "Hands-on experience on any of the above tool/technology is expected with PySpark being the must to have skill",
            "similarity": 0.3678
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 3,
        "score": 0.3678,
        "slug": "ml-engineer",
        "total_count": null
      },
      {
        "display_name": "AI Engineer",
        "kra_matches": [
          {
            "kra_text": "Designs and implements prompt engineering workflows, few-shot examples, chain-of-thought patterns, and structured output parsing for AI feature pipelines.",
            "sentence": "Hands-on experience on any of the above tool/technology is expected with PySpark being the must to have skill",
            "similarity": 0.348
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 13,
        "score": 0.348,
        "slug": "ai-engineer",
        "total_count": null
      },
      {
        "display_name": "Fullstack Developer",
        "kra_matches": [
          {
            "kra_text": "Implements complete product features end-to-end from database schema design through backend API to frontend UI using JavaScript, TypeScript, Python, or Ruby on Rails.",
            "sentence": "Hands-on experience on any of the above tool/technology is expected with PySpark being the must to have skill",
            "similarity": 0.3267
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 15,
        "score": 0.3267,
        "slug": "full-stack-engineer",
        "total_count": null
      },
      {
        "display_name": "MLOps Engineer",
        "kra_matches": [
          {
            "kra_text": "Supports ML platform incidents by diagnosing model serving failures, feature store pipeline breaks, and training environment configuration issues.",
            "sentence": "Hands-on experience on any of the above tool/technology is expected with PySpark being the must to have skill",
            "similarity": 0.299
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 16,
        "score": 0.299,
        "slug": "ml-ops-engineer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Snowflake"
        ],
        "role_id": 2,
        "score": 0.2,
        "slug": "data-engineer",
        "total_count": 5
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "A",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 1.0,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 1.0,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [],
    "matched_kras": [],
    "matched_skills": [],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.20 does not contradict",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 121,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": null,
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 6768,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "PySpark",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 6769,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "IICS",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 6770,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Synapse",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 6771,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "ADF",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}

API 2 — extract-details

{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "TODO: REMOVE AFTER TESTING \u2014 alias DB write disabled",
      "alias_persisted": false,
      "existing_alias_id": 2004,
      "existing_alias_text": "Apache Spark",
      "input_term": "PySpark",
      "matched_canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "embedding_alias"
    },
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 299,
      "existing_alias_text": "Snowflake",
      "input_term": "Snowflake",
      "matched_canonical": {
        "category_id": 9,
        "display_name": "Snowflake",
        "id": 105,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "snowflake",
        "sub_category_id": 113,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "Data Engineer",
      "id": 2,
      "rationale": null,
      "role_archetype": null,
      "slug": "data-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.20 does not contradict",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "ETL and ELT Tooling",
        "id": 24,
        "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
        "slug": "etl-and-elt-tooling",
        "source": "db"
      },
      "input_skill": "PySpark",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "Cloud Data Warehouses",
        "id": 22,
        "rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
        "slug": "cloud-data-warehouses",
        "source": "db"
      },
      "input_skill": "Snowflake",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "Data Engineer",
          "id": 2,
          "rationale": null,
          "role_archetype": null,
          "slug": "data-engineer",
          "source": "db"
        }
      ]
    }
  ],
  "input_final_skills": [
    "PySpark",
    "IICS",
    "Snowflake",
    "Synapse",
    "ADF"
  ],
  "input_llm_skills": [
    "PySpark",
    "IICS",
    "Snowflake",
    "Synapse",
    "ADF"
  ],
  "new_aliases_persisted": 0,
  "run_id": "69a5b208-d3d6-459c-8c8a-42ea615ee412",
  "skills_detail": [
    {
      "aliases_in_db": [
        {
          "alias_text": "Apache Spark",
          "alias_type": "CANONICAL",
          "id": 2004,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "apache spark 3",
          "alias_type": "VERSION",
          "id": 2006,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark",
          "alias_type": "VERSION",
          "id": 2510,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3",
          "alias_type": "VERSION",
          "id": 2007,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark 3.x",
          "alias_type": "VERSION",
          "id": 2009,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        },
        {
          "alias_text": "spark3",
          "alias_type": "VERSION",
          "id": 2008,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 5,
        "display_name": "Apache Spark",
        "id": 1350,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "FRAMEWORK",
        "slug": "apache-spark",
        "sub_category_id": 1021,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "ETL and ELT Tooling",
            "id": 24,
            "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
            "slug": "etl-and-elt-tooling",
            "source": "db"
          },
          "input_skill": "PySpark",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "PySpark",
      "matched_via": "embedding_alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "IICS",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Cloud Platforms",
          "skill_nature": "PLATFORM",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "iics",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Snowflake",
          "alias_type": "CANONICAL",
          "id": 299,
          "is_primary": true,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 9,
        "display_name": "Snowflake",
        "id": 105,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "PLATFORM",
        "slug": "snowflake",
        "sub_category_id": 113,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "Cloud Data Warehouses",
            "id": 22,
            "rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
            "slug": "cloud-data-warehouses",
            "source": "db"
          },
          "input_skill": "Snowflake",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "Data Engineer",
              "id": 2,
              "rationale": null,
              "role_archetype": null,
              "slug": "data-engineer",
              "source": "db"
            }
          ]
        }
      ],
      "input_skill": "Snowflake",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Synapse",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Cloud Platforms",
          "skill_nature": "PLATFORM",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "synapse",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "ADF",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Cloud Platforms",
          "skill_nature": "PLATFORM",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "adf",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": [
    "IICS",
    "Synapse",
    "ADF"
  ]
}

API 3 — final-role-output

{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.20 does not contradict",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "PySpark",
      "tag": "in_db"
    },
    {
      "skill": "IICS",
      "tag": "new"
    },
    {
      "skill": "Snowflake",
      "tag": "in_db"
    },
    {
      "skill": "Synapse",
      "tag": "new"
    },
    {
      "skill": "ADF",
      "tag": "new"
    }
  ],
  "llm_cost_api1_usd": null,
  "llm_cost_api2_usd": null,
  "llm_cost_api3_usd": null,
  "llm_cost_total_usd": null,
  "persistence": {
    "items": [
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "ETL and ELT Tooling",
          "id": 24,
          "rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
          "slug": "etl-and-elt-tooling",
          "source": "db"
        },
        "dimension_id": 24,
        "input_skill": "PySpark",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Skipped \u2014 no persistable v3 meta for new skill",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": false,
        "skill_id": null,
        "skill_tag": "new",
        "skipped_reason": "skill_not_in_db_v3_proposed"
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "Cloud Data Warehouses",
          "id": 22,
          "rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
          "slug": "cloud-data-warehouses",
          "source": "db"
        },
        "dimension_id": 22,
        "input_skill": "Snowflake",
        "llm_role": null,
        "matched_chosen_role": true,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
        "role_dimension_saved": true,
        "roles_from_db": [
          {
            "display_name": "Data Engineer",
            "id": 2,
            "rationale": null,
            "role_archetype": null,
            "slug": "data-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 105,
        "skill_tag": "in_db",
        "skipped_reason": null
      }
    ],
    "new_skills_created": 0,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 0,
    "skipped": 1
  },
  "planner_output": null,
  "run_id": "69a5b208-d3d6-459c-8c8a-42ea615ee412"
}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…