← Back to history

Pipeline run

8bf80c3a-7848-402f-a02a-9c2b9f2d9dd9

Pipeline LLM cost (USD)
API 1: $0.0029 API 2: $0.0000 API 3: $0.0000 Total: $0.0029

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
SPARSE JD role baseline loaded sources · ai_index: role_baseline · nature_of_work: jd · tech_stack_maturity: jd
Nature of work · Data pipeline development
Build batch and streaming data pipelines in Airflow, Spark, Kafka, and Flink to move data from RDS to S3 and Snowflake, while modeling Snowflake schemas, enforcing data quality/observability, and publishing curated datasets for analytics.
"Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake"
Tech stack maturity
Modern Cloud Native cache hit
The stack centers on cloud services and contemporary data engineering tools like AWS, S3, Snowflake, Airflow, dbt, Kafka, Spark, and Flink, which is characteristic of a modern cloud-native data platform.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
1.20 / 5
· Title match
Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
Frameworks (×2):
Models / concepts (×3): ML
Evidence — skills matched in JD (11)
Python SQL Apache Spark Apache Airflow Snowflake dbt Apache Kafka AWS Amazon RDS Amazon S3 Apache Flink
Skill cluster (9 dimension groups, role-scoped)
ETL and ELT Tooling
Apache Spark dbt
Programming Languages for Data Work
Python SQL
Cloud Data Warehouses
Snowflake
Cloud Platforms
AWS
Cloud Storage and File Formats
Amazon S3
Data Pipeline Orchestration
Apache Airflow
Messaging and Event Streaming
Apache Kafka
Stream Processing Systems
Apache Flink
Cross-cutting / unaligned
Amazon RDS
Show KRA description ↓
Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake Design and optimize data warehouse schemas in Snowflake Manage Spark jobs for large-scale data transformations Build streaming pipelines using Kafka and Flink Ensure data quality and observability across all pipelines Partner with analytics to expose curated datasets Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS

Signals

Skill data-engineer
0.82
Alias ml-engineer
1.00
KRA data-engineer
0.45
Status: extract_from_jd_done Created: 2026-05-18T20:34:07.751452Z Updated: 2026-05-18T20:34:07.751452Z
Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

No chosen role stored for this run.

Job description

ML Engineer — DataCo

We're hiring an ML Engineer to own our data infrastructure.

Responsibilities:
- Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake
- Design and optimize data warehouse schemas in Snowflake
- Manage Spark jobs for large-scale data transformations
- Build streaming pipelines using Kafka and Flink
- Ensure data quality and observability across all pipelines
- Partner with analytics to expose curated datasets

Required skills: Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Python Primary No API 2 row (run stopped after API 1 or history missing)
SQL Primary No API 2 row (run stopped after API 1 or history missing)
Apache Spark Primary No API 2 row (run stopped after API 1 or history missing)
Apache Airflow Primary No API 2 row (run stopped after API 1 or history missing)
Snowflake Primary No API 2 row (run stopped after API 1 or history missing)
dbt Primary No API 2 row (run stopped after API 1 or history missing)
Apache Kafka Primary No API 2 row (run stopped after API 1 or history missing)
AWS Primary No API 2 row (run stopped after API 1 or history missing)
Amazon RDS Primary No API 2 row (run stopped after API 1 or history missing)
Amazon S3 Primary No API 2 row (run stopped after API 1 or history missing)
Apache Flink Primary No API 2 row (run stopped after API 1 or history missing)

Library artifacts (this run)

No artifact rows for this run.
nano JD Parser — gpt-4.1-nano click to toggle
RoleML Engineer
CompanyDataCo
DomainOther
JD type pass
Show raw JSON
{
  "JD_type": "pass",
  "about_company": null,
  "certifications": [],
  "company_name": "DataCo",
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [],
      "domain": "Other"
    },
    "secondary": null
  },
  "education": [],
  "experience": {
    "max": null,
    "min": null,
    "raw": null
  },
  "job_locations": [],
  "role": "ML Engineer",
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 6,
      "heading": "Responsibilities",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Build and maintain Airflow DAGs",
        "last_5_words": "to expose curated datasets"
      },
      "text": "Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake\nDesign and optimize data warehouse schemas in Snowflake\nManage Spark jobs for large-scale data transformations\nBuild streaming pipelines using Kafka and Flink\nEnsure data quality and observability across all pipelines\nPartner with analytics to expose curated datasets",
      "word_count": 54
    },
    {
      "bullet_count": 0,
      "heading": "Required skills",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Python, SQL, Spark, Airflow,",
        "last_5_words": "Snowflake, dbt, Kafka, AWS"
      },
      "text": "Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS",
      "word_count": 8
    }
  ],
  "urls": []
}
API 1 — extract-from-jd click to toggle
{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Python"
    },
    {
      "is_primary": true,
      "skill_name": "SQL"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Spark"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Airflow"
    },
    {
      "is_primary": true,
      "skill_name": "Snowflake"
    },
    {
      "is_primary": true,
      "skill_name": "dbt"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Kafka"
    },
    {
      "is_primary": true,
      "skill_name": "AWS"
    },
    {
      "is_primary": true,
      "skill_name": "Amazon RDS"
    },
    {
      "is_primary": true,
      "skill_name": "Amazon S3"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Flink"
    }
  ],
  "jd_role": {
    "display_name": "ML Engineer",
    "rationale": null,
    "role_archetype": "Data",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": null,
    "certifications": [],
    "company_name": "DataCo",
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [],
        "domain": "Other"
      },
      "secondary": null
    },
    "education": [],
    "experience": {
      "max": null,
      "min": null,
      "raw": null
    },
    "job_locations": [],
    "role": "ML Engineer",
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 6,
        "heading": "Responsibilities",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Build and maintain Airflow DAGs",
          "last_5_words": "to expose curated datasets"
        },
        "text": "Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake\nDesign and optimize data warehouse schemas in Snowflake\nManage Spark jobs for large-scale data transformations\nBuild streaming pipelines using Kafka and Flink\nEnsure data quality and observability across all pipelines\nPartner with analytics to expose curated datasets",
        "word_count": 54
      },
      {
        "bullet_count": 0,
        "heading": "Required skills",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Python, SQL, Spark, Airflow,",
          "last_5_words": "Snowflake, dbt, Kafka, AWS"
        },
        "text": "Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS",
        "word_count": 8
      }
    ],
    "urls": []
  },
  "run_id": null,
  "stage3_signals": {
    "alias_match_roles": [
      {
        "display_name": "ML Engineer",
        "matched_count": null,
        "role_id": 3,
        "score": 1.0,
        "slug": "ml-engineer",
        "total_count": null
      }
    ],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "matched_count": null,
        "role_id": 2,
        "score": 0.4496,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "DevOps Engineer",
        "matched_count": null,
        "role_id": 10,
        "score": 0.374,
        "slug": "devops-engineer",
        "total_count": null
      },
      {
        "display_name": "Cloud Architect",
        "matched_count": null,
        "role_id": 9,
        "score": 0.3737,
        "slug": "cloud-architect",
        "total_count": null
      },
      {
        "display_name": "ML Engineer",
        "matched_count": null,
        "role_id": 3,
        "score": 0.3721,
        "slug": "ml-engineer",
        "total_count": null
      },
      {
        "display_name": "Backend Engineer",
        "matched_count": null,
        "role_id": 1,
        "score": 0.3594,
        "slug": "backend-engineer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "Data Engineer",
        "matched_count": 9,
        "role_id": 2,
        "score": 0.8182,
        "slug": "data-engineer",
        "total_count": 11
      },
      {
        "display_name": "Backend Engineer",
        "matched_count": 3,
        "role_id": 1,
        "score": 0.2727,
        "slug": "backend-engineer",
        "total_count": 11
      },
      {
        "display_name": "Cybersecurity Engineer",
        "matched_count": 2,
        "role_id": 5,
        "score": 0.1818,
        "slug": "cybersecurity-engineer",
        "total_count": 11
      },
      {
        "display_name": "ML Engineer",
        "matched_count": 2,
        "role_id": 3,
        "score": 0.1818,
        "slug": "ml-engineer",
        "total_count": 11
      },
      {
        "display_name": "Cloud Architect",
        "matched_count": 2,
        "role_id": 9,
        "score": 0.1818,
        "slug": "cloud-architect",
        "total_count": 11
      }
    ],
    "stage35_ran": false
  },
  "stage4_decision": {
    "alias_collision_detected": true,
    "case": "B",
    "chosen_role": {
      "display_name": "Data Engineer",
      "matched_count": null,
      "role_id": 2,
      "score": 0.4496,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 0.42712,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "queued": false,
    "reasoning": "Skill+KRA agree on data-engineer; alias-\u003eml-engineer"
  },
  "stage5_updates": null
}
API 2 — extract-details
{}
API 3 — final-role-output
{}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…