Pipeline run
8bf80c3a-7848-402f-a02a-9c2b9f2d9dd9
Pipeline LLM cost (USD)
API 1: $0.0029
API 2: $0.0000
API 3: $0.0000
Total: $0.0029
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
SPARSE JD
role baseline loaded
sources · ai_index: role_baseline · nature_of_work: jd · tech_stack_maturity: jd
Nature of work
· Data pipeline development
Build batch and streaming data pipelines in Airflow, Spark, Kafka, and Flink to move data from RDS to S3 and Snowflake, while modeling Snowflake schemas, enforcing data quality/observability, and publishing curated datasets for analytics.
"Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake"
Tech stack maturity
Modern Cloud Native
The stack centers on cloud services and contemporary data engineering tools like AWS, S3, Snowflake, Airflow, dbt, Kafka, Spark, and Flink, which is characteristic of a modern cloud-native data platform.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
1.20 / 5
· Title match
✓ Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
—
Frameworks (×2):
—
Models / concepts (×3):
ML
Evidence — skills matched in JD (11)
Python
SQL
Apache Spark
Apache Airflow
Snowflake
dbt
Apache Kafka
AWS
Amazon RDS
Amazon S3
Apache Flink
Skill cluster (9 dimension groups, role-scoped)
ETL and ELT Tooling
Apache Spark
dbt
Programming Languages for Data Work
Python
SQL
Cloud Data Warehouses
Snowflake
Cloud Platforms
AWS
Cloud Storage and File Formats
Amazon S3
Data Pipeline Orchestration
Apache Airflow
Messaging and Event Streaming
Apache Kafka
Stream Processing Systems
Apache Flink
Cross-cutting / unaligned
Amazon RDS
Show KRA description ↓
Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake
Design and optimize data warehouse schemas in Snowflake
Manage Spark jobs for large-scale data transformations
Build streaming pipelines using Kafka and Flink
Ensure data quality and observability across all pipelines
Partner with analytics to expose curated datasets
Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS
Signals
Skill
data-engineer
0.82
Alias
ml-engineer
1.00
KRA
data-engineer
0.45
Status:
extract_from_jd_done
Created: 2026-05-18T20:34:07.751452Z
Updated: 2026-05-18T20:34:07.751452Z
Flow
Current 3-step pipeline
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Role
Chosen role & resolution
No chosen role stored for this run.
Job description
ML Engineer — DataCo We're hiring an ML Engineer to own our data infrastructure. Responsibilities: - Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake - Design and optimize data warehouse schemas in Snowflake - Manage Spark jobs for large-scale data transformations - Build streaming pipelines using Kafka and Flink - Ensure data quality and observability across all pipelines - Partner with analytics to expose curated datasets Required skills: Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Python
Primary
No API 2 row (run stopped after API 1 or history missing)
SQL
Primary
No API 2 row (run stopped after API 1 or history missing)
Apache Spark
Primary
No API 2 row (run stopped after API 1 or history missing)
Apache Airflow
Primary
No API 2 row (run stopped after API 1 or history missing)
Snowflake
Primary
No API 2 row (run stopped after API 1 or history missing)
dbt
Primary
No API 2 row (run stopped after API 1 or history missing)
Apache Kafka
Primary
No API 2 row (run stopped after API 1 or history missing)
AWS
Primary
No API 2 row (run stopped after API 1 or history missing)
Amazon RDS
Primary
No API 2 row (run stopped after API 1 or history missing)
Amazon S3
Primary
No API 2 row (run stopped after API 1 or history missing)
Apache Flink
Primary
No API 2 row (run stopped after API 1 or history missing)
Library artifacts (this run)
No artifact rows for this run.
nano JD Parser — gpt-4.1-nano click to toggle
RoleML Engineer
CompanyDataCo
DomainOther
JD type
pass
Show raw JSON
{
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": "DataCo",
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": null,
"min": null,
"raw": null
},
"job_locations": [],
"role": "ML Engineer",
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 6,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Build and maintain Airflow DAGs",
"last_5_words": "to expose curated datasets"
},
"text": "Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake\nDesign and optimize data warehouse schemas in Snowflake\nManage Spark jobs for large-scale data transformations\nBuild streaming pipelines using Kafka and Flink\nEnsure data quality and observability across all pipelines\nPartner with analytics to expose curated datasets",
"word_count": 54
},
{
"bullet_count": 0,
"heading": "Required skills",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Python, SQL, Spark, Airflow,",
"last_5_words": "Snowflake, dbt, Kafka, AWS"
},
"text": "Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS",
"word_count": 8
}
],
"urls": []
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "Python"
},
{
"is_primary": true,
"skill_name": "SQL"
},
{
"is_primary": true,
"skill_name": "Apache Spark"
},
{
"is_primary": true,
"skill_name": "Apache Airflow"
},
{
"is_primary": true,
"skill_name": "Snowflake"
},
{
"is_primary": true,
"skill_name": "dbt"
},
{
"is_primary": true,
"skill_name": "Apache Kafka"
},
{
"is_primary": true,
"skill_name": "AWS"
},
{
"is_primary": true,
"skill_name": "Amazon RDS"
},
{
"is_primary": true,
"skill_name": "Amazon S3"
},
{
"is_primary": true,
"skill_name": "Apache Flink"
}
],
"jd_role": {
"display_name": "ML Engineer",
"rationale": null,
"role_archetype": "Data",
"slug": ""
},
"nano_parsed": {
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": "DataCo",
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": null,
"min": null,
"raw": null
},
"job_locations": [],
"role": "ML Engineer",
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 6,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Build and maintain Airflow DAGs",
"last_5_words": "to expose curated datasets"
},
"text": "Build and maintain Airflow DAGs for ETL pipelines from RDS to S3 to Snowflake\nDesign and optimize data warehouse schemas in Snowflake\nManage Spark jobs for large-scale data transformations\nBuild streaming pipelines using Kafka and Flink\nEnsure data quality and observability across all pipelines\nPartner with analytics to expose curated datasets",
"word_count": 54
},
{
"bullet_count": 0,
"heading": "Required skills",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Python, SQL, Spark, Airflow,",
"last_5_words": "Snowflake, dbt, Kafka, AWS"
},
"text": "Python, SQL, Spark, Airflow, Snowflake, dbt, Kafka, AWS",
"word_count": 8
}
],
"urls": []
},
"run_id": null,
"stage3_signals": {
"alias_match_roles": [
{
"display_name": "ML Engineer",
"matched_count": null,
"role_id": 3,
"score": 1.0,
"slug": "ml-engineer",
"total_count": null
}
],
"kra_match_roles": [
{
"display_name": "Data Engineer",
"matched_count": null,
"role_id": 2,
"score": 0.4496,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "DevOps Engineer",
"matched_count": null,
"role_id": 10,
"score": 0.374,
"slug": "devops-engineer",
"total_count": null
},
{
"display_name": "Cloud Architect",
"matched_count": null,
"role_id": 9,
"score": 0.3737,
"slug": "cloud-architect",
"total_count": null
},
{
"display_name": "ML Engineer",
"matched_count": null,
"role_id": 3,
"score": 0.3721,
"slug": "ml-engineer",
"total_count": null
},
{
"display_name": "Backend Engineer",
"matched_count": null,
"role_id": 1,
"score": 0.3594,
"slug": "backend-engineer",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Data Engineer",
"matched_count": 9,
"role_id": 2,
"score": 0.8182,
"slug": "data-engineer",
"total_count": 11
},
{
"display_name": "Backend Engineer",
"matched_count": 3,
"role_id": 1,
"score": 0.2727,
"slug": "backend-engineer",
"total_count": 11
},
{
"display_name": "Cybersecurity Engineer",
"matched_count": 2,
"role_id": 5,
"score": 0.1818,
"slug": "cybersecurity-engineer",
"total_count": 11
},
{
"display_name": "ML Engineer",
"matched_count": 2,
"role_id": 3,
"score": 0.1818,
"slug": "ml-engineer",
"total_count": 11
},
{
"display_name": "Cloud Architect",
"matched_count": 2,
"role_id": 9,
"score": 0.1818,
"slug": "cloud-architect",
"total_count": 11
}
],
"stage35_ran": false
},
"stage4_decision": {
"alias_collision_detected": true,
"case": "B",
"chosen_role": {
"display_name": "Data Engineer",
"matched_count": null,
"role_id": 2,
"score": 0.4496,
"slug": "data-engineer",
"total_count": null
},
"confidence": 0.42712,
"llm2_fired": false,
"llm2_reasoning": null,
"queued": false,
"reasoning": "Skill+KRA agree on data-engineer; alias-\u003eml-engineer"
},
"stage5_updates": null
}
API 2 — extract-details
{}
API 3 — final-role-output
{}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.
Loading…