← Back to history

Pipeline run

d3cbd107-8df1-48e0-80e4-a9a4f71bb810

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
Nature of work
no_db_connection
Tech stack maturity
Mainstream Modern
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
0.20 / 5
· Title match
Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
Frameworks (×2):
Models / concepts (×3): ML, Machine Learning
Evidence — skills matched in JD (16)
SQL Python Apache Spark Kafka Airflow AWS Azure GCP Git CI/CD Scala Snowflake BigQuery Redshift Kubernetes dbt
Skill cluster (0 dimension groups, role-scoped)
No dimension groups computed for this JD.
Status: extract_from_jd_done Created: 2026-05-12T04:42:29.150593Z Updated: 2026-05-12T04:42:29.150593Z
Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

No chosen role stored for this run.

Job description

Job Title: Data Engineer

Experience: 3–7 Years
Location: Gurgaon / Bengaluru / Hybrid

About the Role

We are hiring a Data Engineer to build scalable data pipelines and maintain reliable data infrastructure for analytics and machine learning workloads. You will work with large datasets, optimize ETL workflows, and support business intelligence initiatives across teams.

Key Responsibilities
Design, build, and maintain scalable ETL and ELT pipelines
Process and transform structured and unstructured datasets from multiple sources
Develop batch and real-time data ingestion workflows
Optimize database queries and data processing performance
Build and manage data warehouses and lakehouse architectures
Ensure data quality, integrity, and governance standards
Collaborate with analysts, ML engineers, and backend teams for data requirements
Monitor and troubleshoot production data pipelines
Automate data validation and reporting workflows
Required Skills
Strong experience with SQL and database optimization
Hands-on experience with Python or Scala for data processing
Experience with Apache Spark, Kafka, or Airflow
Knowledge of relational and NoSQL databases
Experience with cloud platforms such as AWS, Azure, or GCP
Familiarity with data warehousing tools like Snowflake, BigQuery, or Redshift
Understanding of distributed systems and large-scale data processing
Experience with Git and CI/CD practices
Good to Have
Exposure to ML pipelines and feature engineering workflows
Experience with dbt and modern data stack tools
Knowledge of containerization and Kubernetes
Familiarity with streaming architectures and event-driven systems
Qualification
Bachelor’s degree in Computer Science, Engineering, or related field
Strong analytical and problem-solving abilities
Good communication and collaboration skills

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

SQL Primary No API 2 row (run stopped after API 1 or history missing)
Python Primary No API 2 row (run stopped after API 1 or history missing)
Scala Secondary No API 2 row (run stopped after API 1 or history missing)
Apache Spark Primary No API 2 row (run stopped after API 1 or history missing)
Kafka Primary No API 2 row (run stopped after API 1 or history missing)
Airflow Primary No API 2 row (run stopped after API 1 or history missing)
AWS Primary No API 2 row (run stopped after API 1 or history missing)
Azure Primary No API 2 row (run stopped after API 1 or history missing)
GCP Primary No API 2 row (run stopped after API 1 or history missing)
Snowflake Secondary No API 2 row (run stopped after API 1 or history missing)
BigQuery Secondary No API 2 row (run stopped after API 1 or history missing)
Redshift Secondary No API 2 row (run stopped after API 1 or history missing)
Git Primary No API 2 row (run stopped after API 1 or history missing)
CI/CD Primary No API 2 row (run stopped after API 1 or history missing)
Kubernetes Secondary No API 2 row (run stopped after API 1 or history missing)
dbt Secondary No API 2 row (run stopped after API 1 or history missing)

Library artifacts (this run)

No artifact rows for this run.
API 1 — extract-from-jd click to toggle
{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "SQL"
    },
    {
      "is_primary": true,
      "skill_name": "Python"
    },
    {
      "is_primary": false,
      "skill_name": "Scala"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Spark"
    },
    {
      "is_primary": true,
      "skill_name": "Kafka"
    },
    {
      "is_primary": true,
      "skill_name": "Airflow"
    },
    {
      "is_primary": true,
      "skill_name": "AWS"
    },
    {
      "is_primary": true,
      "skill_name": "Azure"
    },
    {
      "is_primary": true,
      "skill_name": "GCP"
    },
    {
      "is_primary": false,
      "skill_name": "Snowflake"
    },
    {
      "is_primary": false,
      "skill_name": "BigQuery"
    },
    {
      "is_primary": false,
      "skill_name": "Redshift"
    },
    {
      "is_primary": true,
      "skill_name": "Git"
    },
    {
      "is_primary": true,
      "skill_name": "CI/CD"
    },
    {
      "is_primary": false,
      "skill_name": "Kubernetes"
    },
    {
      "is_primary": false,
      "skill_name": "dbt"
    }
  ],
  "run_id": null
}
API 2 — extract-details
{}
API 3 — final-role-output
{}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…