Pipeline run
04d030cb-0702-4a5b-87d9-d45d36055ae8
Pipeline LLM cost (USD)
API 1: $0.0037
API 2: $0.0000
API 3: $0.0000
Total: $0.0037
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
role baseline loaded
sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd
Nature of work
· Data pipeline development
Build and optimize large-scale ETL/data pipelines on GCP/Hadoop with PySpark, designing data marts/models and production analytics systems while enforcing data quality and mentoring junior engineers.
""Writing complex ETL (Extract / Transform / Load) processes""
Tech stack maturity
Mainstream Legacy
The stack centers on Hadoop, Hive, Cassandra, MySQL, and Spark/Scala, which are widely used but largely established technologies rather than modern cloud-native or bleeding-edge tools.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
0.00 / 5
· Title match
· Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
—
Frameworks (×2):
—
Models / concepts (×3):
—
Evidence — skills matched in JD (16)
Google Cloud Platform
PySpark
Spark
Scala
Hadoop
HDFS
Python
Java
Hive
Cassandra
Pig
MySQL
NoSQL
Bash
UNIX
DevOps
Skill cluster (6 dimension groups, role-scoped)
Programming Languages for Data Work
Scala
Python
Java
Bash
ETL and ELT Tooling
Spark
Hadoop
CI/CD Pipeline Platforms
DevOps
Cloud Provider Platforms
Google Cloud Platform
Relational Database Usage
MySQL
Cross-cutting / unaligned
PySpark
HDFS
Hive
Cassandra
Pig
NoSQL
UNIX
Show KRA description ↓
● Designing and developing complex and large-scale data structures and pipelines to organize, collect, and standardize data to generate insights and address reporting needs
● Writing complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing
● Developing frameworks, standards & reference material for architecture and associated products
● Designing data marts and data models to support Data Science and other internal customers.
● Behaving as mentor to junior team members to provide technical advice
● Applying knowledge of gcp-data tools and products to consult and advise on additional efforts across multiple domains spanning broader enterprise
● Collaborating with data science team to transform data and integrate algorithms and models into highly available, production systems
● Using in-depth knowledge on Hadoop architecture and HDFS commands and experience designing & optimizing queries to build scalable, modular, and efficient data pipelines
● Using advanced programming skills in Python, Java, PySpark, or any of the major languages to build robust data pipelines and dynamic systems
● Integrating data from a variety of sources, assuring that they adhere to data quality and accessibility standards
● Experimenting with available tools and advice on new tools in order to determine optimal solution given the requirements dictated by the model/use case
● 5+ years Data Engineering experience
● 5+ years PySpark (Spark/Scala)
● 3+ years advanced knowledge in Hadoop architecture, HDFS commands and experience designing & optimising queries against data in the HDFS environment
● 2+ years' experience with Google Cloud Platform ( GCP )
● Experience with bash shell scripts, UNIX utilities & UNIX Commands
● Experience building and implementing data transformation and processing solutions
● Advanced knowledge in Java, Python, Hive, Cassandra, Pig, MySQL or NoSQL or similar
● If you are passionate about DevOps and GCP, and if you thrive in a collaborative and fast-paced environment
● You like to solve puzzles and figure things out, how they work, how they operate etc.
● You thrive in an environment that constantly demands you to learn.
Signals
Skill
backend-engineer
0.25
Alias
data-engineer
0.67
KRA
devops-engineer
0.39
Post-classification
Centroidupdated · n=10
Alias collision log#31
New-role queue—
New skills captured8
New KRA capturedyes
Captured for admin review
PySpark
primary
↔
Data Engineer
pending
Spark
primary
↔
Data Engineer
pending
Hadoop
primary
↔
Data Engineer
pending
HDFS
primary
↔
Data Engineer
pending
Hive
primary
↔
Data Engineer
pending
Cassandra
primary
↔
Data Engineer
pending
Pig
primary
↔
Data Engineer
pending
UNIX
primary
↔
Data Engineer
pending
R&R fragment (sim 0.38)
↔
Data Engineer
pending
● Designing and developing complex and large-scale data structures and pipelines to organize, collect, and standardize data to generate insights and address reporting needs ● Writing complex ETL (Extr…
Status:
extract_from_jd_done
Created: 2026-05-19T00:29:50.063533Z
Updated: 2026-05-19T00:29:51.035616Z
Flow
Current 3-step pipeline
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Role
Chosen role & resolution
No chosen role stored for this run.
Job description
Sr. GCP Data Engineer Job Description------------------------------------------- Work Mode: WFO Start Date: Immediate Description We are looking for a Senior GCP Data Engineer who's confident, curious, and straightforward, a great fit for our empowering and driven culture. The candidate’s determination and clear communication will make our cloud-based solutions sharp and easy to understand. Candidates need to solve problems effortlessly and effectively in the cloud. Join us if you're excited to be part of a team that values clarity, confidence, and getting things done through cloud technology. Responsibilities: ● Designing and developing complex and large-scale data structures and pipelines to organize, collect, and standardize data to generate insights and address reporting needs ● Writing complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing ● Developing frameworks, standards & reference material for architecture and associated products ● Designing data marts and data models to support Data Science and other internal customers. ● Behaving as mentor to junior team members to provide technical advice ● Applying knowledge of gcp-data tools and products to consult and advise on additional efforts across multiple domains spanning broader enterprise ● Collaborating with data science team to transform data and integrate algorithms and models into highly available, production systems ● Using in-depth knowledge on Hadoop architecture and HDFS commands and experience designing & optimizing queries to build scalable, modular, and efficient data pipelines ● Using advanced programming skills in Python, Java, PySpark, or any of the major languages to build robust data pipelines and dynamic systems ● Integrating data from a variety of sources, assuring that they adhere to data quality and accessibility standards ● Experimenting with available tools and advice on new tools in order to determine optimal solution given the requirements dictated by the model/use case Requirements: ● 5+ years Data Engineering experience ● 5+ years PySpark (Spark/Scala) ● 3+ years advanced knowledge in Hadoop architecture, HDFS commands and experience designing & optimising queries against data in the HDFS environment ● 2+ years' experience with Google Cloud Platform ( GCP ) ● Experience with bash shell scripts, UNIX utilities & UNIX Commands ● Experience building and implementing data transformation and processing solutions ● Advanced knowledge in Java, Python, Hive, Cassandra, Pig, MySQL or NoSQL or similar ● If you are passionate about DevOps and GCP, and if you thrive in a collaborative and fast-paced environment ● You like to solve puzzles and figure things out, how they work, how they operate etc. ● You thrive in an environment that constantly demands you to learn.
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Google Cloud Platform
Primary
No API 2 row (run stopped after API 1 or history missing)
PySpark
Primary
No API 2 row (run stopped after API 1 or history missing)
Spark
Primary
No API 2 row (run stopped after API 1 or history missing)
Scala
Primary
No API 2 row (run stopped after API 1 or history missing)
Hadoop
Primary
No API 2 row (run stopped after API 1 or history missing)
HDFS
Primary
No API 2 row (run stopped after API 1 or history missing)
Python
Primary
No API 2 row (run stopped after API 1 or history missing)
Java
Primary
No API 2 row (run stopped after API 1 or history missing)
Hive
Primary
No API 2 row (run stopped after API 1 or history missing)
Cassandra
Primary
No API 2 row (run stopped after API 1 or history missing)
Pig
Primary
No API 2 row (run stopped after API 1 or history missing)
MySQL
Primary
No API 2 row (run stopped after API 1 or history missing)
NoSQL
Primary
No API 2 row (run stopped after API 1 or history missing)
Bash
Primary
No API 2 row (run stopped after API 1 or history missing)
UNIX
Primary
No API 2 row (run stopped after API 1 or history missing)
DevOps
Secondary
No API 2 row (run stopped after API 1 or history missing)
Library artifacts (this run)
No artifact rows for this run.
nano JD Parser — gpt-4.1-nano click to toggle
RoleSr. GCP Data Engineer
Experience5+ years Data Engineering experience
DomainOther
JD type
pass
Show raw JSON
{
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": null,
"min": 5,
"raw": "5+ years Data Engineering experience"
},
"job_locations": [],
"role": "Sr. GCP Data Engineer",
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 10,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u25cf Designing and developing complex",
"last_5_words": "requirements dictated by the model/use case"
},
"text": "\u25cf Designing and developing complex and large-scale data structures and pipelines to organize, collect, and standardize data to generate insights and address reporting needs\n\u25cf Writing complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing\n\u25cf Developing frameworks, standards \u0026 reference material for architecture and associated products\n\u25cf Designing data marts and data models to support Data Science and other internal customers.\n\u25cf Behaving as mentor to junior team members to provide technical advice\n\u25cf Applying knowledge of gcp-data tools and products to consult and advise on additional efforts across multiple domains spanning broader enterprise\n\u25cf Collaborating with data science team to transform data and integrate algorithms and models into highly available, production systems\n\u25cf Using in-depth knowledge on Hadoop architecture and HDFS commands and experience designing \u0026 optimizing queries to build scalable, modular, and efficient data pipelines\n\u25cf Using advanced programming skills in Python, Java, PySpark, or any of the major languages to build robust data pipelines and dynamic systems\n\u25cf Integrating data from a variety of sources, assuring that they adhere to data quality and accessibility standards\n\u25cf Experimenting with available tools and advice on new tools in order to determine optimal solution given the requirements dictated by the model/use case",
"word_count": 218
},
{
"bullet_count": 10,
"heading": "Requirements",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u25cf 5+ years Data Engineering experience",
"last_5_words": "constantly demands you to learn."
},
"text": "\u25cf 5+ years Data Engineering experience\n\u25cf 5+ years PySpark (Spark/Scala)\n\u25cf 3+ years advanced knowledge in Hadoop architecture, HDFS commands and experience designing \u0026 optimising queries against data in the HDFS environment\n\u25cf 2+ years\u0027 experience with Google Cloud Platform ( GCP )\n\u25cf Experience with bash shell scripts, UNIX utilities \u0026 UNIX Commands\n\u25cf Experience building and implementing data transformation and processing solutions\n\u25cf Advanced knowledge in Java, Python, Hive, Cassandra, Pig, MySQL or NoSQL or similar\n\u25cf If you are passionate about DevOps and GCP, and if you thrive in a collaborative and fast-paced environment\n\u25cf You like to solve puzzles and figure things out, how they work, how they operate etc.\n\u25cf You thrive in an environment that constantly demands you to learn.",
"word_count": 134
}
],
"urls": []
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "Google Cloud Platform"
},
{
"is_primary": true,
"skill_name": "PySpark"
},
{
"is_primary": true,
"skill_name": "Spark"
},
{
"is_primary": true,
"skill_name": "Scala"
},
{
"is_primary": true,
"skill_name": "Hadoop"
},
{
"is_primary": true,
"skill_name": "HDFS"
},
{
"is_primary": true,
"skill_name": "Python"
},
{
"is_primary": true,
"skill_name": "Java"
},
{
"is_primary": true,
"skill_name": "Hive"
},
{
"is_primary": true,
"skill_name": "Cassandra"
},
{
"is_primary": true,
"skill_name": "Pig"
},
{
"is_primary": true,
"skill_name": "MySQL"
},
{
"is_primary": true,
"skill_name": "NoSQL"
},
{
"is_primary": true,
"skill_name": "Bash"
},
{
"is_primary": true,
"skill_name": "UNIX"
},
{
"is_primary": false,
"skill_name": "DevOps"
}
],
"jd_role": {
"display_name": "Sr. GCP Data Engineer",
"rationale": null,
"role_archetype": "Data",
"slug": ""
},
"nano_parsed": {
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [],
"experience": {
"max": null,
"min": 5,
"raw": "5+ years Data Engineering experience"
},
"job_locations": [],
"role": "Sr. GCP Data Engineer",
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 10,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u25cf Designing and developing complex",
"last_5_words": "requirements dictated by the model/use case"
},
"text": "\u25cf Designing and developing complex and large-scale data structures and pipelines to organize, collect, and standardize data to generate insights and address reporting needs\n\u25cf Writing complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing\n\u25cf Developing frameworks, standards \u0026 reference material for architecture and associated products\n\u25cf Designing data marts and data models to support Data Science and other internal customers.\n\u25cf Behaving as mentor to junior team members to provide technical advice\n\u25cf Applying knowledge of gcp-data tools and products to consult and advise on additional efforts across multiple domains spanning broader enterprise\n\u25cf Collaborating with data science team to transform data and integrate algorithms and models into highly available, production systems\n\u25cf Using in-depth knowledge on Hadoop architecture and HDFS commands and experience designing \u0026 optimizing queries to build scalable, modular, and efficient data pipelines\n\u25cf Using advanced programming skills in Python, Java, PySpark, or any of the major languages to build robust data pipelines and dynamic systems\n\u25cf Integrating data from a variety of sources, assuring that they adhere to data quality and accessibility standards\n\u25cf Experimenting with available tools and advice on new tools in order to determine optimal solution given the requirements dictated by the model/use case",
"word_count": 218
},
{
"bullet_count": 10,
"heading": "Requirements",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u25cf 5+ years Data Engineering experience",
"last_5_words": "constantly demands you to learn."
},
"text": "\u25cf 5+ years Data Engineering experience\n\u25cf 5+ years PySpark (Spark/Scala)\n\u25cf 3+ years advanced knowledge in Hadoop architecture, HDFS commands and experience designing \u0026 optimising queries against data in the HDFS environment\n\u25cf 2+ years\u0027 experience with Google Cloud Platform ( GCP )\n\u25cf Experience with bash shell scripts, UNIX utilities \u0026 UNIX Commands\n\u25cf Experience building and implementing data transformation and processing solutions\n\u25cf Advanced knowledge in Java, Python, Hive, Cassandra, Pig, MySQL or NoSQL or similar\n\u25cf If you are passionate about DevOps and GCP, and if you thrive in a collaborative and fast-paced environment\n\u25cf You like to solve puzzles and figure things out, how they work, how they operate etc.\n\u25cf You thrive in an environment that constantly demands you to learn.",
"word_count": 134
}
],
"urls": []
},
"rejected": false,
"rejection_reason": null,
"run_id": "04d030cb-0702-4a5b-87d9-d45d36055ae8",
"stage3_signals": {
"alias_match_roles": [
{
"display_name": "Data Engineer",
"matched_count": null,
"role_id": 2,
"score": 0.6667,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "AI Engineer",
"matched_count": null,
"role_id": 13,
"score": 0.3846,
"slug": "ai-engineer",
"total_count": null
},
{
"display_name": "Frontend Engineer",
"matched_count": null,
"role_id": 7,
"score": 0.375,
"slug": "frontend-engineer",
"total_count": null
},
{
"display_name": "ML Engineer",
"matched_count": null,
"role_id": 3,
"score": 0.375,
"slug": "ml-engineer",
"total_count": null
},
{
"display_name": "AR/VR Engineer",
"matched_count": null,
"role_id": 8,
"score": 0.375,
"slug": "ar-vr-engineer",
"total_count": null
}
],
"kra_match_roles": [
{
"display_name": "DevOps Engineer",
"matched_count": null,
"role_id": 10,
"score": 0.3918,
"slug": "devops-engineer",
"total_count": null
},
{
"display_name": "Cloud Architect",
"matched_count": null,
"role_id": 9,
"score": 0.3878,
"slug": "cloud-architect",
"total_count": null
},
{
"display_name": "Android Engineer",
"matched_count": null,
"role_id": 4,
"score": 0.3784,
"slug": "android-engineer",
"total_count": null
},
{
"display_name": "Data Engineer",
"matched_count": null,
"role_id": 2,
"score": 0.3779,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "ML Engineer",
"matched_count": null,
"role_id": 3,
"score": 0.3438,
"slug": "ml-engineer",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Backend Engineer",
"matched_count": 4,
"role_id": 1,
"score": 0.25,
"slug": "backend-engineer",
"total_count": 16
},
{
"display_name": "Data Engineer",
"matched_count": 4,
"role_id": 2,
"score": 0.25,
"slug": "data-engineer",
"total_count": 16
},
{
"display_name": "Cybersecurity Engineer",
"matched_count": 3,
"role_id": 5,
"score": 0.1875,
"slug": "cybersecurity-engineer",
"total_count": 16
},
{
"display_name": "ML Engineer",
"matched_count": 2,
"role_id": 3,
"score": 0.125,
"slug": "ml-engineer",
"total_count": 16
},
{
"display_name": "Cloud Architect",
"matched_count": 2,
"role_id": 9,
"score": 0.125,
"slug": "cloud-architect",
"total_count": 16
}
],
"stage35_ran": false
},
"stage4_decision": {
"alias_collision_detected": true,
"case": "D",
"chosen_role": {
"display_name": "Data Engineer",
"matched_count": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
},
"confidence": 0.95,
"llm2_fired": true,
"llm2_reasoning": "The responsibilities focus on large-scale ETL, data pipeline architecture, Hadoop/HDFS, Spark, and GCP data services which align strictly with a Data Engineer role rather than DevOps.",
"queued": false,
"reasoning": "LLM2 picked data-engineer (confidence 0.95)"
},
"stage5_updates": {
"centroid_n_after": 10,
"centroid_updated": true,
"collision_log_id": 31,
"new_kra_attached": {
"best_kra_similarity": 0.3779,
"queue_id": 13,
"r_and_r_preview": "\u25cf Designing and developing complex and large-scale data structures and pipelines to organize, collect, and standardize data to generate insights and address reporting needs\n\u25cf Writing complex ETL (Extr",
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"status": "pending"
},
"new_skills_attached": [
{
"is_primary": true,
"queue_id": 552,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "PySpark",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 553,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Spark",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 554,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Hadoop",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 555,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "HDFS",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 556,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Hive",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 557,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Cassandra",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 558,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Pig",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 559,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "UNIX",
"status": "pending"
}
],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{}
API 3 — final-role-output
{}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.
Loading…