Pipeline run

8bf80c3a-7848-402f-a02a-9c2b9f2d9dd9

Pipeline LLM cost (USD)

API 1: $0.0029 API 2: $0.0000 API 3: $0.0000 Total: $0.0029

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description

SPARSE JD role baseline loaded sources · ai_index: role_baseline · nature_of_work: jd · tech_stack_maturity: jd

Nature of work · Data pipeline development

Build batch and streaming data pipelines in Airflow, Spark, Kafka, and Flink to move data from RDS to S3 and Snowflake, while modeling Snowflake schemas, enforcing data quality/observability, and publishing curated datasets for analytics.

"Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake"

Tech stack maturity

Modern Cloud Native cache hit

The stack centers on cloud services and contemporary data engineering tools like AWS, S3, Snowflake, Airflow, dbt, Kafka, Spark, and Flink, which is characteristic of a modern cloud-native data platform.

AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)

1.20 / 5

· Title match

✓ Has AI skill

· AI skill (primary)

· AI skill (secondary)

· On AI team

· Builds AI products

vocab breakdown (legacy)

Assistants (×1): —

Frameworks (×2): —

Models / concepts (×3): ML

Evidence — skills matched in JD (11)

Python SQL Apache Spark Apache Airflow Snowflake dbt Apache Kafka AWS Amazon RDS Amazon S3 Apache Flink

Skill cluster (9 dimension groups, role-scoped)

ETL and ELT Tooling

Apache Spark dbt

Programming Languages for Data Work

Python SQL

Cloud Data Warehouses

Snowflake

Cloud Platforms

AWS

Cloud Storage and File Formats

Amazon S3

Data Pipeline Orchestration

Apache Airflow

Messaging and Event Streaming

Apache Kafka

Stream Processing Systems

Apache Flink

Cross-cutting / unaligned

Amazon RDS

Show KRA description ↓

Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake Design and optimize data warehouse schemas in Snowflake Manage Spark jobs for large-scale data transformations Build streaming pipelines using Kafka and Flink Ensure data quality and observability across all pipelines Partner with analytics to expose curated datasets Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS

Signals

Skill data-engineer

0.82

Alias ml-engineer

1.00

KRA data-engineer

0.45

Status: extract_from_jd_done Created: 2026-05-18T20:34:07.751452Z Updated: 2026-05-18T20:34:07.751452Z

Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

No chosen role stored for this run.

Job description

ML Engineer — DataCo

We're hiring an ML Engineer to own our data infrastructure.

Responsibilities:
- Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake
- Design and optimize data warehouse schemas in Snowflake
- Manage Spark jobs for large-scale data transformations
- Build streaming pipelines using Kafka and Flink
- Ensure data quality and observability across all pipelines
- Partner with analytics to expose curated datasets

Required skills: Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Python Primary No API 2 row (run stopped after API 1 or history missing)

SQL Primary No API 2 row (run stopped after API 1 or history missing)

Apache Spark Primary No API 2 row (run stopped after API 1 or history missing)

Apache Airflow Primary No API 2 row (run stopped after API 1 or history missing)

Snowflake Primary No API 2 row (run stopped after API 1 or history missing)

dbt Primary No API 2 row (run stopped after API 1 or history missing)

Apache Kafka Primary No API 2 row (run stopped after API 1 or history missing)

AWS Primary No API 2 row (run stopped after API 1 or history missing)

Amazon RDS Primary No API 2 row (run stopped after API 1 or history missing)

Amazon S3 Primary No API 2 row (run stopped after API 1 or history missing)

Apache Flink Primary No API 2 row (run stopped after API 1 or history missing)

Library artifacts (this run)

No artifact rows for this run.

nano JD Parser — gpt-4.1-nano click to toggle

RoleML Engineer

CompanyDataCo

DomainOther

JD type pass

Show raw JSON

{
  "JD_type": "pass",
  "about_company": null,
  "certifications": [],
  "company_name": "DataCo",
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [],
      "domain": "Other"
    },
    "secondary": null
  },
  "education": [],
  "experience": {
    "max": null,
    "min": null,
    "raw": null
  },
  "job_locations": [],
  "role": "ML Engineer",
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 6,
      "heading": "Responsibilities",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Build and maintain Airflow DAGs",
        "last_5_words": "to expose curated datasets"
      },
      "text": "Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake\nDesign and optimize data warehouse schemas in Snowflake\nManage Spark jobs for large-scale data transformations\nBuild streaming pipelines using Kafka and Flink\nEnsure data quality and observability across all pipelines\nPartner with analytics to expose curated datasets",
      "word_count": 54
    },
    {
      "bullet_count": 0,
      "heading": "Required skills",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Python, SQL, Spark, Airflow,",
        "last_5_words": "Snowflake, dbt, Kafka, AWS"
      },
      "text": "Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS",
      "word_count": 8
    }
  ],
  "urls": []
}

API 1 — extract-from-jd click to toggle

{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Python"
    },
    {
      "is_primary": true,
      "skill_name": "SQL"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Spark"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Airflow"
    },
    {
      "is_primary": true,
      "skill_name": "Snowflake"
    },
    {
      "is_primary": true,
      "skill_name": "dbt"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Kafka"
    },
    {
      "is_primary": true,
      "skill_name": "AWS"
    },
    {
      "is_primary": true,
      "skill_name": "Amazon RDS"
    },
    {
      "is_primary": true,
      "skill_name": "Amazon S3"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Flink"
    }
  ],
  "jd_role": {
    "display_name": "ML Engineer",
    "rationale": null,
    "role_archetype": "Data",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": null,
    "certifications": [],
    "company_name": "DataCo",
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [],
        "domain": "Other"
      },
      "secondary": null
    },
    "education": [],
    "experience": {
      "max": null,
      "min": null,
      "raw": null
    },
    "job_locations": [],
    "role": "ML Engineer",
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 6,
        "heading": "Responsibilities",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Build and maintain Airflow DAGs",
          "last_5_words": "to expose curated datasets"
        },
        "text": "Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake\nDesign and optimize data warehouse schemas in Snowflake\nManage Spark jobs for large-scale data transformations\nBuild streaming pipelines using Kafka and Flink\nEnsure data quality and observability across all pipelines\nPartner with analytics to expose curated datasets",
        "word_count": 54
      },
      {
        "bullet_count": 0,
        "heading": "Required skills",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Python, SQL, Spark, Airflow,",
          "last_5_words": "Snowflake, dbt, Kafka, AWS"
        },
        "text": "Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS",
        "word_count": 8
      }
    ],
    "urls": []
  },
  "run_id": null,
  "stage3_signals": {
    "alias_match_roles": [
      {
        "display_name": "ML Engineer",
        "matched_count": null,
        "role_id": 3,
        "score": 1.0,
        "slug": "ml-engineer",
        "total_count": null
      }
    ],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "matched_count": null,
        "role_id": 2,
        "score": 0.4496,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "DevOps Engineer",
        "matched_count": null,
        "role_id": 10,
        "score": 0.374,
        "slug": "devops-engineer",
        "total_count": null
      },
      {
        "display_name": "Cloud Architect",
        "matched_count": null,
        "role_id": 9,
        "score": 0.3737,
        "slug": "cloud-architect",
        "total_count": null
      },
      {
        "display_name": "ML Engineer",
        "matched_count": null,
        "role_id": 3,
        "score": 0.3721,
        "slug": "ml-engineer",
        "total_count": null
      },
      {
        "display_name": "Backend Engineer",
        "matched_count": null,
        "role_id": 1,
        "score": 0.3594,
        "slug": "backend-engineer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "Data Engineer",
        "matched_count": 9,
        "role_id": 2,
        "score": 0.8182,
        "slug": "data-engineer",
        "total_count": 11
      },
      {
        "display_name": "Backend Engineer",
        "matched_count": 3,
        "role_id": 1,
        "score": 0.2727,
        "slug": "backend-engineer",
        "total_count": 11
      },
      {
        "display_name": "Cybersecurity Engineer",
        "matched_count": 2,
        "role_id": 5,
        "score": 0.1818,
        "slug": "cybersecurity-engineer",
        "total_count": 11
      },
      {
        "display_name": "ML Engineer",
        "matched_count": 2,
        "role_id": 3,
        "score": 0.1818,
        "slug": "ml-engineer",
        "total_count": 11
      },
      {
        "display_name": "Cloud Architect",
        "matched_count": 2,
        "role_id": 9,
        "score": 0.1818,
        "slug": "cloud-architect",
        "total_count": 11
      }
    ],
    "stage35_ran": false
  },
  "stage4_decision": {
    "alias_collision_detected": true,
    "case": "B",
    "chosen_role": {
      "display_name": "Data Engineer",
      "matched_count": null,
      "role_id": 2,
      "score": 0.4496,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 0.42712,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "queued": false,
    "reasoning": "Skill+KRA agree on data-engineer; alias-\u003eml-engineer"
  },
  "stage5_updates": null
}

API 2 — extract-details

{}

API 3 — final-role-output

{}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…