Pipeline run
d3cbd107-8df1-48e0-80e4-a9a4f71bb810
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA descriptionNature of work
—
Tech stack maturity
Mainstream Modern
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
0.20 / 5
· Title match
✓ Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
—
Frameworks (×2):
—
Models / concepts (×3):
ML, Machine Learning
Evidence — skills matched in JD (16)
SQL
Python
Apache Spark
Kafka
Airflow
AWS
Azure
GCP
Git
CI/CD
Scala
Snowflake
BigQuery
Redshift
Kubernetes
dbt
Skill cluster (0 dimension groups, role-scoped)
Status:
extract_from_jd_done
Created: 2026-05-12T04:42:29.150593Z
Updated: 2026-05-12T04:42:29.150593Z
Flow
Current 3-step pipeline
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Role
Chosen role & resolution
No chosen role stored for this run.
Job description
Job Title: Data Engineer Experience: 3–7 Years Location: Gurgaon / Bengaluru / Hybrid About the Role We are hiring a Data Engineer to build scalable data pipelines and maintain reliable data infrastructure for analytics and machine learning workloads. You will work with large datasets, optimize ETL workflows, and support business intelligence initiatives across teams. Key Responsibilities Design, build, and maintain scalable ETL and ELT pipelines Process and transform structured and unstructured datasets from multiple sources Develop batch and real-time data ingestion workflows Optimize database queries and data processing performance Build and manage data warehouses and lakehouse architectures Ensure data quality, integrity, and governance standards Collaborate with analysts, ML engineers, and backend teams for data requirements Monitor and troubleshoot production data pipelines Automate data validation and reporting workflows Required Skills Strong experience with SQL and database optimization Hands-on experience with Python or Scala for data processing Experience with Apache Spark, Kafka, or Airflow Knowledge of relational and NoSQL databases Experience with cloud platforms such as AWS, Azure, or GCP Familiarity with data warehousing tools like Snowflake, BigQuery, or Redshift Understanding of distributed systems and large-scale data processing Experience with Git and CI/CD practices Good to Have Exposure to ML pipelines and feature engineering workflows Experience with dbt and modern data stack tools Knowledge of containerization and Kubernetes Familiarity with streaming architectures and event-driven systems Qualification Bachelor’s degree in Computer Science, Engineering, or related field Strong analytical and problem-solving abilities Good communication and collaboration skills
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
SQL
Primary
No API 2 row (run stopped after API 1 or history missing)
Python
Primary
No API 2 row (run stopped after API 1 or history missing)
Scala
Secondary
No API 2 row (run stopped after API 1 or history missing)
Apache Spark
Primary
No API 2 row (run stopped after API 1 or history missing)
Kafka
Primary
No API 2 row (run stopped after API 1 or history missing)
Airflow
Primary
No API 2 row (run stopped after API 1 or history missing)
AWS
Primary
No API 2 row (run stopped after API 1 or history missing)
Azure
Primary
No API 2 row (run stopped after API 1 or history missing)
GCP
Primary
No API 2 row (run stopped after API 1 or history missing)
Snowflake
Secondary
No API 2 row (run stopped after API 1 or history missing)
BigQuery
Secondary
No API 2 row (run stopped after API 1 or history missing)
Redshift
Secondary
No API 2 row (run stopped after API 1 or history missing)
Git
Primary
No API 2 row (run stopped after API 1 or history missing)
CI/CD
Primary
No API 2 row (run stopped after API 1 or history missing)
Kubernetes
Secondary
No API 2 row (run stopped after API 1 or history missing)
dbt
Secondary
No API 2 row (run stopped after API 1 or history missing)
Library artifacts (this run)
No artifact rows for this run.
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "SQL"
},
{
"is_primary": true,
"skill_name": "Python"
},
{
"is_primary": false,
"skill_name": "Scala"
},
{
"is_primary": true,
"skill_name": "Apache Spark"
},
{
"is_primary": true,
"skill_name": "Kafka"
},
{
"is_primary": true,
"skill_name": "Airflow"
},
{
"is_primary": true,
"skill_name": "AWS"
},
{
"is_primary": true,
"skill_name": "Azure"
},
{
"is_primary": true,
"skill_name": "GCP"
},
{
"is_primary": false,
"skill_name": "Snowflake"
},
{
"is_primary": false,
"skill_name": "BigQuery"
},
{
"is_primary": false,
"skill_name": "Redshift"
},
{
"is_primary": true,
"skill_name": "Git"
},
{
"is_primary": true,
"skill_name": "CI/CD"
},
{
"is_primary": false,
"skill_name": "Kubernetes"
},
{
"is_primary": false,
"skill_name": "dbt"
}
],
"run_id": null
}
API 2 — extract-details
{}
API 3 — final-role-output
{}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.
Loading…