Pipeline run

6fdb7e19-2a66-45ab-8c1b-9550f03cae14

Pipeline LLM cost (USD)

API 1: $0.0038 API 2: $0.0000 API 3: $0.0000 Total: $0.0038

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description

role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd

Nature of work · Data pipeline development

Builds and optimizes large-scale ETL/data pipelines in PySpark/Python/Java on GCP and Hadoop/HDFS, models data marts for analytics, and works with data science to productionize models while enforcing data quality and performance.

""Writing complex ETL (Extract / Transform / Load) processes""

Tech stack maturity

Mainstream Modern cache hit

The skill set centers on widely adopted modern data engineering technologies such as Spark, GCP, Hadoop, Java, and Python, which are common in contemporary production environments but not bleeding-edge.

AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)

0.00 / 5

· Title match

· Has AI skill

· AI skill (primary)

· AI skill (secondary)

· On AI team

· Builds AI products

vocab breakdown (legacy)

Assistants (×1): —

Frameworks (×2): —

Models / concepts (×3): —

Evidence — skills matched in JD (18)

Google Cloud Platform Hadoop HDFS Python Java PySpark Spark ETL Scala Bash UNIX Hive Cassandra Pig MySQL NoSQL Data Science DevOps

Skill cluster (6 dimension groups, role-scoped)

Programming Languages for Data Work

Python Java Scala Bash

ETL and ELT Tooling

Hadoop Spark

CI/CD Pipeline Platforms

DevOps

Cloud Provider Platforms

Google Cloud Platform

Relational Database Usage

MySQL

Cross-cutting / unaligned

HDFS PySpark ETL UNIX Hive Cassandra Pig NoSQL Data Science

Show KRA description ↓

● Designing and developing complex and large-scale data structures and pipelines to organize, collect, and standardize data to generate insights and address reporting needs ● Writing complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing ● Developing frameworks, standards & reference material for architecture and associated products ● Designing data marts and data models to support Data Science and other internal customers. ● Behaving as mentor to junior team members to provide technical advice ● Applying knowledge of gcp-data tools and products to consult and advise on additional efforts across multiple domains spanning broader enterprise ● Collaborating with data science team to transform data and integrate algorithms and models into highly available, production systems ● Using in-depth knowledge on Hadoop architecture and HDFS commands and experience designing & optimizing queries to build scalable, modular, and efficient data pipelines ● Using advanced programming skills in Python, Java, PySpark, or any of the major languages to build robust data pipelines and dynamic systems ● Integrating data from a variety of sources, assuring that they adhere to data quality and accessibility standards ● Experimenting with available tools and advice on new tools in order to determine optimal solution given the requirements dictated by the model/use case ● 5+ years Data Engineering experience ● 5+ years PySpark (Spark/Scala) ● 3+ years advanced knowledge in Hadoop architecture, HDFS commands and experience designing & optimising queries against data in the HDFS environment ● 2+ years' experience with Google Cloud Platform ( GCP ) ● Experience with bash shell scripts, UNIX utilities & UNIX Commands ● Experience building and implementing data transformation and processing solutions ● Advanced knowledge in Java, Python, Hive, Cassandra, Pig, MySQL or NoSQL or similar ● If you are passionate about DevOps and GCP, and if you thrive in a collaborative and fast-paced environment ● You like to solve puzzles and figure things out, how they work, how they operate etc. ● You thrive in an environment that constantly demands you to learn.

Signals

Skill data-engineer

0.33

Alias data-engineer

1.00

KRA devops-engineer

0.39

Post-classification

Centroidupdated · n=18

Alias collision log—

New-role queue—

New skills captured7

New KRA capturedyes

Captured for admin review

HDFS primary ↔ Data Engineer pending

PySpark primary ↔ Data Engineer pending

UNIX ↔ Data Engineer pending

Hive ↔ Data Engineer pending

Pig ↔ Data Engineer pending

ETL primary ↔ Data Engineer pending

Data Science ↔ Data Engineer pending

R&R fragment (sim 0.38) ↔ Data Engineer pending

Status: extract_from_jd_done Created: 2026-05-19T23:05:34.696434Z Updated: 2026-05-19T23:05:35.678908Z

Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

No chosen role stored for this run.

Job description

Sr. GCP Data Engineer

Job Description-------------------------------------------




Work Mode: WFO
Start Date: Immediate
Description
We are looking for a Senior GCP Data Engineer who's confident, curious, and straightforward, a
great fit for our empowering and driven culture. The candidate’s determination and clear
communication will make our cloud-based solutions sharp and easy to understand. Candidates
need to solve problems effortlessly and effectively in the cloud. Join us if you're excited to be
part of a team that values clarity, confidence, and getting things done through cloud technology.
Responsibilities:
● Designing and developing complex and large-scale data structures and pipelines to
organize, collect, and standardize data to generate insights and address reporting needs
● Writing complex ETL (Extract / Transform / Load) processes, designs database systems
and develops tools for real-time and offline analytic processing
● Developing frameworks, standards & reference material for architecture and associated
products
● Designing data marts and data models to support Data Science and other internal
customers.
● Behaving as mentor to junior team members to provide technical advice
● Applying knowledge of gcp-data tools and products to consult and advise on additional
efforts across multiple domains spanning broader enterprise
● Collaborating with data science team to transform data and integrate algorithms and
models into highly available, production systems
● Using in-depth knowledge on Hadoop architecture and HDFS commands and experience
designing & optimizing queries to build scalable, modular, and efficient data pipelines
● Using advanced programming skills in Python, Java, PySpark, or any of the major
languages to build robust data pipelines and dynamic systems
● Integrating data from a variety of sources, assuring that they adhere to data quality and
accessibility standards
● Experimenting with available tools and advice on new tools in order to determine optimal
solution given the requirements dictated by the model/use case
Requirements:
● 5+ years Data Engineering experience
● 5+ years PySpark (Spark/Scala)
● 3+ years advanced knowledge in Hadoop architecture, HDFS commands and experience
designing & optimising queries against data in the HDFS environment
● 2+ years' experience with Google Cloud Platform ( GCP )
● Experience with bash shell scripts, UNIX utilities & UNIX Commands
● Experience building and implementing data transformation and processing solutions
● Advanced knowledge in Java, Python, Hive, Cassandra, Pig, MySQL or NoSQL or
similar
● If you are passionate about DevOps and GCP, and if you thrive in a collaborative and
fast-paced environment
● You like to solve puzzles and figure things out, how they work, how they operate etc.
● You thrive in an environment that constantly demands you to learn.

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Google Cloud Platform Primary No API 2 row (run stopped after API 1 or history missing)

Hadoop Primary No API 2 row (run stopped after API 1 or history missing)

HDFS Primary No API 2 row (run stopped after API 1 or history missing)

Python Primary No API 2 row (run stopped after API 1 or history missing)

Java Primary No API 2 row (run stopped after API 1 or history missing)

PySpark Primary No API 2 row (run stopped after API 1 or history missing)

Spark Primary No API 2 row (run stopped after API 1 or history missing)

Scala Secondary No API 2 row (run stopped after API 1 or history missing)

Bash Secondary No API 2 row (run stopped after API 1 or history missing)

UNIX Secondary No API 2 row (run stopped after API 1 or history missing)

Hive Secondary No API 2 row (run stopped after API 1 or history missing)

Cassandra Secondary No API 2 row (run stopped after API 1 or history missing)

Pig Secondary No API 2 row (run stopped after API 1 or history missing)

MySQL Secondary No API 2 row (run stopped after API 1 or history missing)

NoSQL Secondary No API 2 row (run stopped after API 1 or history missing)

ETL Primary No API 2 row (run stopped after API 1 or history missing)

Data Science Secondary No API 2 row (run stopped after API 1 or history missing)

DevOps Secondary No API 2 row (run stopped after API 1 or history missing)

Library artifacts (this run)

No artifact rows for this run.

nano JD Parser — gpt-4.1-nano click to toggle

RoleSr. GCP Data Engineer

Experience5+ years Data Engineering experience

DomainIT Services & Consulting

JD type pass

Show raw JSON

{
  "JD_type": "pass",
  "about_company": null,
  "certifications": [],
  "company_name": null,
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [],
      "domain": "IT Services \u0026 Consulting"
    },
    "secondary": null
  },
  "education": [],
  "experience": {
    "max": null,
    "min": 5,
    "raw": "5+ years Data Engineering experience"
  },
  "job_locations": [],
  "role": "Sr. GCP Data Engineer",
  "role_aliases": [
    "GCP Data Engineer",
    "Data Engineer",
    "Senior Data Engineer"
  ],
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 10,
      "heading": "Responsibilities",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "\u25cf Designing and developing complex",
        "last_5_words": "requirements dictated by the model/use case"
      },
      "text": "\u25cf Designing and developing complex and large-scale data structures and pipelines to organize, collect, and standardize data to generate insights and address reporting needs\n\u25cf Writing complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing\n\u25cf Developing frameworks, standards \u0026 reference material for architecture and associated products\n\u25cf Designing data marts and data models to support Data Science and other internal customers.\n\u25cf Behaving as mentor to junior team members to provide technical advice\n\u25cf Applying knowledge of gcp-data tools and products to consult and advise on additional efforts across multiple domains spanning broader enterprise\n\u25cf Collaborating with data science team to transform data and integrate algorithms and models into highly available, production systems\n\u25cf Using in-depth knowledge on Hadoop architecture and HDFS commands and experience designing \u0026 optimizing queries to build scalable, modular, and efficient data pipelines\n\u25cf Using advanced programming skills in Python, Java, PySpark, or any of the major languages to build robust data pipelines and dynamic systems\n\u25cf Integrating data from a variety of sources, assuring that they adhere to data quality and accessibility standards\n\u25cf Experimenting with available tools and advice on new tools in order to determine optimal solution given the requirements dictated by the model/use case",
      "word_count": 218
    },
    {
      "bullet_count": 10,
      "heading": "Requirements",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "\u25cf 5+ years Data Engineering experience",
        "last_5_words": "constantly demands you to learn."
      },
      "text": "\u25cf 5+ years Data Engineering experience\n\u25cf 5+ years PySpark (Spark/Scala)\n\u25cf 3+ years advanced knowledge in Hadoop architecture, HDFS commands and experience designing \u0026 optimising queries against data in the HDFS environment\n\u25cf 2+ years\u0027 experience with Google Cloud Platform ( GCP )\n\u25cf Experience with bash shell scripts, UNIX utilities \u0026 UNIX Commands\n\u25cf Experience building and implementing data transformation and processing solutions\n\u25cf Advanced knowledge in Java, Python, Hive, Cassandra, Pig, MySQL or NoSQL or similar\n\u25cf If you are passionate about DevOps and GCP, and if you thrive in a collaborative and fast-paced environment\n\u25cf You like to solve puzzles and figure things out, how they work, how they operate etc.\n\u25cf You thrive in an environment that constantly demands you to learn.",
      "word_count": 134
    }
  ],
  "urls": []
}

API 1 — extract-from-jd click to toggle

{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Google Cloud Platform"
    },
    {
      "is_primary": true,
      "skill_name": "Hadoop"
    },
    {
      "is_primary": true,
      "skill_name": "HDFS"
    },
    {
      "is_primary": true,
      "skill_name": "Python"
    },
    {
      "is_primary": true,
      "skill_name": "Java"
    },
    {
      "is_primary": true,
      "skill_name": "PySpark"
    },
    {
      "is_primary": true,
      "skill_name": "Spark"
    },
    {
      "is_primary": false,
      "skill_name": "Scala"
    },
    {
      "is_primary": false,
      "skill_name": "Bash"
    },
    {
      "is_primary": false,
      "skill_name": "UNIX"
    },
    {
      "is_primary": false,
      "skill_name": "Hive"
    },
    {
      "is_primary": false,
      "skill_name": "Cassandra"
    },
    {
      "is_primary": false,
      "skill_name": "Pig"
    },
    {
      "is_primary": false,
      "skill_name": "MySQL"
    },
    {
      "is_primary": false,
      "skill_name": "NoSQL"
    },
    {
      "is_primary": true,
      "skill_name": "ETL"
    },
    {
      "is_primary": false,
      "skill_name": "Data Science"
    },
    {
      "is_primary": false,
      "skill_name": "DevOps"
    }
  ],
  "jd_role": {
    "display_name": "Sr. GCP Data Engineer",
    "rationale": null,
    "role_aliases": [
      "GCP Data Engineer",
      "Data Engineer",
      "Senior Data Engineer"
    ],
    "role_archetype": "Data",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": null,
    "certifications": [],
    "company_name": null,
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [],
        "domain": "IT Services \u0026 Consulting"
      },
      "secondary": null
    },
    "education": [],
    "experience": {
      "max": null,
      "min": 5,
      "raw": "5+ years Data Engineering experience"
    },
    "job_locations": [],
    "role": "Sr. GCP Data Engineer",
    "role_aliases": [
      "GCP Data Engineer",
      "Data Engineer",
      "Senior Data Engineer"
    ],
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 10,
        "heading": "Responsibilities",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "\u25cf Designing and developing complex",
          "last_5_words": "requirements dictated by the model/use case"
        },
        "text": "\u25cf Designing and developing complex and large-scale data structures and pipelines to organize, collect, and standardize data to generate insights and address reporting needs\n\u25cf Writing complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing\n\u25cf Developing frameworks, standards \u0026 reference material for architecture and associated products\n\u25cf Designing data marts and data models to support Data Science and other internal customers.\n\u25cf Behaving as mentor to junior team members to provide technical advice\n\u25cf Applying knowledge of gcp-data tools and products to consult and advise on additional efforts across multiple domains spanning broader enterprise\n\u25cf Collaborating with data science team to transform data and integrate algorithms and models into highly available, production systems\n\u25cf Using in-depth knowledge on Hadoop architecture and HDFS commands and experience designing \u0026 optimizing queries to build scalable, modular, and efficient data pipelines\n\u25cf Using advanced programming skills in Python, Java, PySpark, or any of the major languages to build robust data pipelines and dynamic systems\n\u25cf Integrating data from a variety of sources, assuring that they adhere to data quality and accessibility standards\n\u25cf Experimenting with available tools and advice on new tools in order to determine optimal solution given the requirements dictated by the model/use case",
        "word_count": 218
      },
      {
        "bullet_count": 10,
        "heading": "Requirements",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "\u25cf 5+ years Data Engineering experience",
          "last_5_words": "constantly demands you to learn."
        },
        "text": "\u25cf 5+ years Data Engineering experience\n\u25cf 5+ years PySpark (Spark/Scala)\n\u25cf 3+ years advanced knowledge in Hadoop architecture, HDFS commands and experience designing \u0026 optimising queries against data in the HDFS environment\n\u25cf 2+ years\u0027 experience with Google Cloud Platform ( GCP )\n\u25cf Experience with bash shell scripts, UNIX utilities \u0026 UNIX Commands\n\u25cf Experience building and implementing data transformation and processing solutions\n\u25cf Advanced knowledge in Java, Python, Hive, Cassandra, Pig, MySQL or NoSQL or similar\n\u25cf If you are passionate about DevOps and GCP, and if you thrive in a collaborative and fast-paced environment\n\u25cf You like to solve puzzles and figure things out, how they work, how they operate etc.\n\u25cf You thrive in an environment that constantly demands you to learn.",
        "word_count": 134
      }
    ],
    "urls": []
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "6fdb7e19-2a66-45ab-8c1b-9550f03cae14",
  "stage3_signals": {
    "alias_found": true,
    "alias_match_roles": [
      {
        "display_name": "Data Engineer",
        "matched_count": null,
        "role_id": 2,
        "score": 1.0,
        "slug": "data-engineer",
        "total_count": null
      }
    ],
    "kra_match_roles": [
      {
        "display_name": "DevOps Engineer",
        "matched_count": null,
        "role_id": 10,
        "score": 0.3918,
        "slug": "devops-engineer",
        "total_count": null
      },
      {
        "display_name": "Data Engineer",
        "matched_count": null,
        "role_id": 2,
        "score": 0.3779,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "ML Engineer",
        "matched_count": null,
        "role_id": 3,
        "score": 0.3437,
        "slug": "ml-engineer",
        "total_count": null
      },
      {
        "display_name": "Backend Engineer",
        "matched_count": null,
        "role_id": 1,
        "score": 0.3324,
        "slug": "backend-engineer",
        "total_count": null
      },
      {
        "display_name": "Cloud Architect",
        "matched_count": null,
        "role_id": 9,
        "score": 0.3148,
        "slug": "cloud-architect",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "Data Engineer",
        "matched_count": 6,
        "role_id": 2,
        "score": 0.3333,
        "slug": "data-engineer",
        "total_count": 18
      },
      {
        "display_name": "Backend Engineer",
        "matched_count": 4,
        "role_id": 1,
        "score": 0.2222,
        "slug": "backend-engineer",
        "total_count": 18
      },
      {
        "display_name": "Full Stack Engineer",
        "matched_count": 3,
        "role_id": 15,
        "score": 0.1667,
        "slug": "full-stack-engineer",
        "total_count": 18
      },
      {
        "display_name": "Cloud Architect",
        "matched_count": 3,
        "role_id": 9,
        "score": 0.1667,
        "slug": "cloud-architect",
        "total_count": 18
      },
      {
        "display_name": "Cybersecurity Engineer",
        "matched_count": 3,
        "role_id": 5,
        "score": 0.1667,
        "slug": "cybersecurity-engineer",
        "total_count": 18
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "B",
    "chosen_role": {
      "display_name": "Data Engineer",
      "matched_count": null,
      "role_id": 2,
      "score": 0.3333,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 0.3333,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "queued": false,
    "reasoning": "Stage 1 title \u0027Sr. GCP Data Engineer\u0027 is unmapped (designation?); KRA inconclusive (0.39). Skill profile points at data-engineer (0.33) - generalize."
  },
  "stage5_updates": {
    "centroid_n_after": 18,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": {
      "best_kra_similarity": 0.3779,
      "queue_id": 40,
      "r_and_r_preview": "\u25cf Designing and developing complex and large-scale data structures and pipelines to organize, collect, and standardize data to generate insights and address reporting needs\n\u25cf Writing complex ETL (Extr",
      "role_display_name": "Data Engineer",
      "role_slug": "data-engineer",
      "status": "pending"
    },
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 1386,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "HDFS",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 1387,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "PySpark",
        "status": "pending"
      },
      {
        "is_primary": false,
        "queue_id": 1388,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "UNIX",
        "status": "pending"
      },
      {
        "is_primary": false,
        "queue_id": 1389,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Hive",
        "status": "pending"
      },
      {
        "is_primary": false,
        "queue_id": 1390,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Pig",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 1391,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "ETL",
        "status": "pending"
      },
      {
        "is_primary": false,
        "queue_id": 1392,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Science",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}

API 2 — extract-details

{}

API 3 — final-role-output

{}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…