Pipeline run
a6556f41-5e53-4742-8fca-c75d44713263
Pipeline LLM cost (USD)
API 1: $0.0051
API 2: $0.0000
API 3: $0.0000
Total: $0.0051
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
role baseline loaded
sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd
Nature of work
· Data pipeline development
Build and scale batch/real-time data pipelines and backend data infrastructure in Python/Java, using Spark/Databricks/Hadoop, Kafka/Kinesis, and cloud platforms; also model data, manage integrations, and support governance for product-focused e-commerce systems.
"Design and implement robust ETL (Extract, Transform, Load) data pipelines"
Tech stack maturity
Modern Cloud Native
The stack centers on cloud services, distributed data processing, and modern data platforms such as AWS, Azure, GCP, Databricks, Delta Lake, and Spark, which aligns with a modern cloud-native environment.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
0.00 / 5
· Title match
· Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
—
Frameworks (×2):
—
Models / concepts (×3):
—
Evidence — skills matched in JD (39)
SQL
Python
Java
Apache Spark
Databricks
Hadoop
MongoDB
Cassandra
DynamoDB
Amazon DynamoDB
Azure Cosmos DB
Kafka
AWS Kinesis
Google Cloud Dataflow
Redis
Elasticsearch
Solr
RabbitMQ
Amazon SQS
Google Cloud Tasks
Delta Lake
Parquet
AWS
Google Cloud Platform
Azure
+14
Skill cluster (9 dimension groups, role-scoped)
Messaging and Event Streaming
Kafka
RabbitMQ
Amazon SQS
Programming Languages for Data Work
SQL
Python
Java
Cloud Platforms
AWS
Azure
ETL and ELT Tooling
Apache Spark
Hadoop
Caching and State Management
Redis
Cloud Provider Platforms
Google Cloud Platform
Data Serialization Standards & Protocols
Parquet
Search and Content Discovery
Elasticsearch
Cross-cutting / unaligned
Databricks
MongoDB
Cassandra
DynamoDB
Amazon DynamoDB
Azure Cosmos DB
AWS Kinesis
Google Cloud Dataflow
Solr
Google Cloud Tasks
Delta Lake
Git
GitHub
Bitbucket
TDD
Microservices
ETL
Data Modeling
Data Warehousing
Big Data
Distributed Computing
Real-time Stream Processing
Caching
Search Technologies
Message Queuing
Show KRA description ↓
Big Data Technologies, Data Modeling, ETL Processes, Data Warehousing, SQL, Python, Cloud Platforms, Data Governance,
We seek a Lead Data Engineer to take charge of our data engineering initiatives, focusing on enhancing data collection, storage, and analysis.
• Architect and scale a state-of-the-art data infrastructure capable of handling batch and real-time data processing needs with unparalleled performance.
• Collaborate closely with the data science team to oversee data systems, ensuring accurate monitoring and insightful analysis of business processes.
• Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.
• Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.
• Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.
• A background at a product-centric software company directly contributing to building products (vs providing data/analytics to business stakeholders)
• Software development experience with a focus on data engineering.
• Extensive experience in building ETL pipelines using tools such as Apache Spark, Databricks, or Hadoop.
• Proficiency in Python or Java, with a deep understanding of software engineering best practices.
• Expertise in distributed computing and data modeling, capable of designing scalable data systems.
• Proficiency with NoSQL databases, including MongoDB, Cassandra, DynamoDB, and CosmosDB.
• Proficiency in real-time stream processing systems such as Kafka, AWS Kinesis, or GCP Data Flow.
• Skilled in utilizing caching and search technologies like Redis, Elasticsearch, or Solr.
• Familiarity with message queuing systems, including RabbitMQ, AWS SQS, or GCP Cloud Tasks.
• Experience with Delta Lake, Parquet files, and AWS, GCP, or Azure cloud services.
• A strong advocate for Test Driven Development (TDD) and experienced in version control using Git platforms like GitHub or Bitbucket.
• Experience at a startup is preferred.
• Experience with consumer e-commerce data/technologies would be a bonus.
Signals
Skill
data-engineer
0.28
Alias
data-engineer
1.00
KRA
data-engineer
0.63
Post-classification
Centroidupdated · n=291
Alias collision log—
New-role queue—
New skills captured16
New KRA captured—
Captured for admin review
DynamoDB
primary
↔
Data Engineer
pending
Azure Cosmos DB
primary
↔
Data Engineer
pending
AWS Kinesis
primary
↔
Data Engineer
pending
Google Cloud Dataflow
primary
↔
Data Engineer
pending
Google Cloud Tasks
primary
↔
Data Engineer
pending
Bitbucket
primary
↔
Data Engineer
pending
TDD
primary
↔
Data Engineer
pending
ETL
primary
↔
Data Engineer
pending
Data Modeling
primary
↔
Data Engineer
pending
Data Warehousing
primary
↔
Data Engineer
pending
Big Data
primary
↔
Data Engineer
pending
Distributed Computing
primary
↔
Data Engineer
pending
Real-time Stream Processing
primary
↔
Data Engineer
pending
Caching
primary
↔
Data Engineer
pending
Search Technologies
primary
↔
Data Engineer
pending
Message Queuing
primary
↔
Data Engineer
pending
Status:
extract_from_jd_done
Created: 2026-05-27T15:23:24.437556Z
Updated: 2026-06-12T16:30:54.530240Z
Flow
Current 3-step pipeline
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Role
Chosen role & resolution
No chosen role stored for this run.
Job description
Skills: Big Data Technologies, Data Modeling, ETL Processes, Data Warehousing, SQL, Python, Cloud Platforms, Data Governance, We seek a Lead Data Engineer to take charge of our data engineering initiatives, focusing on enhancing data collection, storage, and analysis. • Architect and scale a state-of-the-art data infrastructure capable of handling batch and real-time data processing needs with unparalleled performance. • Collaborate closely with the data science team to oversee data systems, ensuring accurate monitoring and insightful analysis of business processes. • Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility. • Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management. • Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities. Requirements • A background at a product-centric software company directly contributing to building products (vs providing data/analytics to business stakeholders) • Software development experience with a focus on data engineering. • Extensive experience in building ETL pipelines using tools such as Apache Spark, Databricks, or Hadoop. • Proficiency in Python or Java, with a deep understanding of software engineering best practices. • Expertise in distributed computing and data modeling, capable of designing scalable data systems. • Proficiency with NoSQL databases, including MongoDB, Cassandra, DynamoDB, and CosmosDB. • Proficiency in real-time stream processing systems such as Kafka, AWS Kinesis, or GCP Data Flow. • Skilled in utilizing caching and search technologies like Redis, Elasticsearch, or Solr. • Familiarity with message queuing systems, including RabbitMQ, AWS SQS, or GCP Cloud Tasks. • Experience with Delta Lake, Parquet files, and AWS, GCP, or Azure cloud services. • A strong advocate for Test Driven Development (TDD) and experienced in version control using Git platforms like GitHub or Bitbucket. • Experience at a startup is preferred. • Experience with consumer e-commerce data/technologies would be a bonus. Benefits • Work Location: Remote • 5 days working
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
SQL
Primary
No API 2 row (run stopped after API 1 or history missing)
Python
Primary
No API 2 row (run stopped after API 1 or history missing)
Java
Primary
No API 2 row (run stopped after API 1 or history missing)
Apache Spark
Primary
No API 2 row (run stopped after API 1 or history missing)
Databricks
Primary
No API 2 row (run stopped after API 1 or history missing)
Hadoop
Primary
No API 2 row (run stopped after API 1 or history missing)
MongoDB
Primary
No API 2 row (run stopped after API 1 or history missing)
Cassandra
Primary
No API 2 row (run stopped after API 1 or history missing)
DynamoDB
Primary
No API 2 row (run stopped after API 1 or history missing)
Amazon DynamoDB
Primary
No API 2 row (run stopped after API 1 or history missing)
Azure Cosmos DB
Primary
No API 2 row (run stopped after API 1 or history missing)
Kafka
Primary
No API 2 row (run stopped after API 1 or history missing)
AWS Kinesis
Primary
No API 2 row (run stopped after API 1 or history missing)
Google Cloud Dataflow
Primary
No API 2 row (run stopped after API 1 or history missing)
Redis
Primary
No API 2 row (run stopped after API 1 or history missing)
Elasticsearch
Primary
No API 2 row (run stopped after API 1 or history missing)
Solr
Primary
No API 2 row (run stopped after API 1 or history missing)
RabbitMQ
Primary
No API 2 row (run stopped after API 1 or history missing)
Amazon SQS
Primary
No API 2 row (run stopped after API 1 or history missing)
Google Cloud Tasks
Primary
No API 2 row (run stopped after API 1 or history missing)
Delta Lake
Primary
No API 2 row (run stopped after API 1 or history missing)
Parquet
Primary
No API 2 row (run stopped after API 1 or history missing)
AWS
Primary
No API 2 row (run stopped after API 1 or history missing)
Google Cloud Platform
Primary
No API 2 row (run stopped after API 1 or history missing)
Azure
Primary
No API 2 row (run stopped after API 1 or history missing)
Git
Primary
No API 2 row (run stopped after API 1 or history missing)
GitHub
Primary
No API 2 row (run stopped after API 1 or history missing)
Bitbucket
Primary
No API 2 row (run stopped after API 1 or history missing)
TDD
Primary
No API 2 row (run stopped after API 1 or history missing)
Microservices
Primary
No API 2 row (run stopped after API 1 or history missing)
ETL
Primary
No API 2 row (run stopped after API 1 or history missing)
Data Modeling
Primary
No API 2 row (run stopped after API 1 or history missing)
Data Warehousing
Primary
No API 2 row (run stopped after API 1 or history missing)
Big Data
Primary
No API 2 row (run stopped after API 1 or history missing)
Distributed Computing
Primary
No API 2 row (run stopped after API 1 or history missing)
Real-time Stream Processing
Primary
No API 2 row (run stopped after API 1 or history missing)
Caching
Primary
No API 2 row (run stopped after API 1 or history missing)
Search Technologies
Primary
No API 2 row (run stopped after API 1 or history missing)
Message Queuing
Primary
No API 2 row (run stopped after API 1 or history missing)
Library artifacts (this run)
No artifact rows for this run.
nano JD Parser — gpt-4.1-nano click to toggle
RoleLead Data Engineer
DomainSoftware & SaaS Products
Location
—
(remote)
JD type
pass
Show raw JSON
{
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [
"SaaS",
"Product Companies"
],
"domain": "Software \u0026 SaaS Products"
},
"secondary": null
},
"education": [],
"experience": {
"max": null,
"min": null,
"raw": null
},
"job_locations": [
{
"aliases": [],
"city": null,
"country": null,
"state": null,
"work_mode": "remote"
}
],
"role": "Lead Data Engineer",
"role_aliases": [
"Data Engineer",
"Senior Data Engineer",
"Data Engineering Lead"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 0,
"heading": "Skills",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Big Data Technologies, Data Modeling,",
"last_5_words": "Data Governance,"
},
"text": "Big Data Technologies, Data Modeling, ETL Processes, Data Warehousing, SQL, Python, Cloud Platforms, Data Governance,",
"word_count": 14
},
{
"bullet_count": 0,
"heading": "Role Overview",
"heading_was_present": false,
"source_marker": {
"first_5_words": "We seek a Lead Data",
"last_5_words": "collection, storage, and analysis."
},
"text": "We seek a Lead Data Engineer to take charge of our data engineering initiatives, focusing on enhancing data collection, storage, and analysis.",
"word_count": 25
},
{
"bullet_count": 5,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Architect and scale a",
"last_5_words": "data ecosystem and capabilities."
},
"text": "\u2022 Architect and scale a state-of-the-art data infrastructure capable of handling batch and real-time data processing needs with unparalleled performance.\n\u2022 Collaborate closely with the data science team to oversee data systems, ensuring accurate monitoring and insightful analysis of business processes.\n\u2022 Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.\n\u2022 Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.\n\u2022 Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.",
"word_count": 90
},
{
"bullet_count": 12,
"heading": "Requirements",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 A background at a",
"last_5_words": "data/technologies would be a bonus."
},
"text": "\u2022 A background at a product-centric software company directly contributing to building products (vs providing data/analytics to business stakeholders)\n\u2022 Software development experience with a focus on data engineering.\n\u2022 Extensive experience in building ETL pipelines using tools such as Apache Spark, Databricks, or Hadoop.\n\u2022 Proficiency in Python or Java, with a deep understanding of software engineering best practices.\n\u2022 Expertise in distributed computing and data modeling, capable of designing scalable data systems.\n\u2022 Proficiency with NoSQL databases, including MongoDB, Cassandra, DynamoDB, and CosmosDB.\n\u2022 Proficiency in real-time stream processing systems such as Kafka, AWS Kinesis, or GCP Data Flow.\n\u2022 Skilled in utilizing caching and search technologies like Redis, Elasticsearch, or Solr.\n\u2022 Familiarity with message queuing systems, including RabbitMQ, AWS SQS, or GCP Cloud Tasks.\n\u2022 Experience with Delta Lake, Parquet files, and AWS, GCP, or Azure cloud services.\n\u2022 A strong advocate for Test Driven Development (TDD) and experienced in version control using Git platforms like GitHub or Bitbucket.\n\u2022 Experience at a startup is preferred.\n\u2022 Experience with consumer e-commerce data/technologies would be a bonus.",
"word_count": 174
}
],
"urls": []
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "SQL"
},
{
"is_primary": true,
"skill_name": "Python"
},
{
"is_primary": true,
"skill_name": "Java"
},
{
"is_primary": true,
"skill_name": "Apache Spark"
},
{
"is_primary": true,
"skill_name": "Databricks"
},
{
"is_primary": true,
"skill_name": "Hadoop"
},
{
"is_primary": true,
"skill_name": "MongoDB"
},
{
"is_primary": true,
"skill_name": "Cassandra"
},
{
"is_primary": true,
"skill_name": "DynamoDB"
},
{
"is_primary": true,
"skill_name": "Amazon DynamoDB"
},
{
"is_primary": true,
"skill_name": "Azure Cosmos DB"
},
{
"is_primary": true,
"skill_name": "Kafka"
},
{
"is_primary": true,
"skill_name": "AWS Kinesis"
},
{
"is_primary": true,
"skill_name": "Google Cloud Dataflow"
},
{
"is_primary": true,
"skill_name": "Redis"
},
{
"is_primary": true,
"skill_name": "Elasticsearch"
},
{
"is_primary": true,
"skill_name": "Solr"
},
{
"is_primary": true,
"skill_name": "RabbitMQ"
},
{
"is_primary": true,
"skill_name": "Amazon SQS"
},
{
"is_primary": true,
"skill_name": "Google Cloud Tasks"
},
{
"is_primary": true,
"skill_name": "Delta Lake"
},
{
"is_primary": true,
"skill_name": "Parquet"
},
{
"is_primary": true,
"skill_name": "AWS"
},
{
"is_primary": true,
"skill_name": "Google Cloud Platform"
},
{
"is_primary": true,
"skill_name": "Azure"
},
{
"is_primary": true,
"skill_name": "Git"
},
{
"is_primary": true,
"skill_name": "GitHub"
},
{
"is_primary": true,
"skill_name": "Bitbucket"
},
{
"is_primary": true,
"skill_name": "TDD"
},
{
"is_primary": true,
"skill_name": "Microservices"
},
{
"is_primary": true,
"skill_name": "ETL"
},
{
"is_primary": true,
"skill_name": "Data Modeling"
},
{
"is_primary": true,
"skill_name": "Data Warehousing"
},
{
"is_primary": true,
"skill_name": "Big Data"
},
{
"is_primary": true,
"skill_name": "Distributed Computing"
},
{
"is_primary": true,
"skill_name": "Real-time Stream Processing"
},
{
"is_primary": true,
"skill_name": "Caching"
},
{
"is_primary": true,
"skill_name": "Search Technologies"
},
{
"is_primary": true,
"skill_name": "Message Queuing"
}
],
"jd_role": {
"display_name": "Lead Data Engineer",
"rationale": null,
"role_aliases": [
"Data Engineer",
"Senior Data Engineer",
"Data Engineering Lead"
],
"role_archetype": "Data",
"slug": ""
},
"nano_parsed": {
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [
"SaaS",
"Product Companies"
],
"domain": "Software \u0026 SaaS Products"
},
"secondary": null
},
"education": [],
"experience": {
"max": null,
"min": null,
"raw": null
},
"job_locations": [
{
"aliases": [],
"city": null,
"country": null,
"state": null,
"work_mode": "remote"
}
],
"role": "Lead Data Engineer",
"role_aliases": [
"Data Engineer",
"Senior Data Engineer",
"Data Engineering Lead"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 0,
"heading": "Skills",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Big Data Technologies, Data Modeling,",
"last_5_words": "Data Governance,"
},
"text": "Big Data Technologies, Data Modeling, ETL Processes, Data Warehousing, SQL, Python, Cloud Platforms, Data Governance,",
"word_count": 14
},
{
"bullet_count": 0,
"heading": "Role Overview",
"heading_was_present": false,
"source_marker": {
"first_5_words": "We seek a Lead Data",
"last_5_words": "collection, storage, and analysis."
},
"text": "We seek a Lead Data Engineer to take charge of our data engineering initiatives, focusing on enhancing data collection, storage, and analysis.",
"word_count": 25
},
{
"bullet_count": 5,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Architect and scale a",
"last_5_words": "data ecosystem and capabilities."
},
"text": "\u2022 Architect and scale a state-of-the-art data infrastructure capable of handling batch and real-time data processing needs with unparalleled performance.\n\u2022 Collaborate closely with the data science team to oversee data systems, ensuring accurate monitoring and insightful analysis of business processes.\n\u2022 Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.\n\u2022 Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.\n\u2022 Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.",
"word_count": 90
},
{
"bullet_count": 12,
"heading": "Requirements",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 A background at a",
"last_5_words": "data/technologies would be a bonus."
},
"text": "\u2022 A background at a product-centric software company directly contributing to building products (vs providing data/analytics to business stakeholders)\n\u2022 Software development experience with a focus on data engineering.\n\u2022 Extensive experience in building ETL pipelines using tools such as Apache Spark, Databricks, or Hadoop.\n\u2022 Proficiency in Python or Java, with a deep understanding of software engineering best practices.\n\u2022 Expertise in distributed computing and data modeling, capable of designing scalable data systems.\n\u2022 Proficiency with NoSQL databases, including MongoDB, Cassandra, DynamoDB, and CosmosDB.\n\u2022 Proficiency in real-time stream processing systems such as Kafka, AWS Kinesis, or GCP Data Flow.\n\u2022 Skilled in utilizing caching and search technologies like Redis, Elasticsearch, or Solr.\n\u2022 Familiarity with message queuing systems, including RabbitMQ, AWS SQS, or GCP Cloud Tasks.\n\u2022 Experience with Delta Lake, Parquet files, and AWS, GCP, or Azure cloud services.\n\u2022 A strong advocate for Test Driven Development (TDD) and experienced in version control using Git platforms like GitHub or Bitbucket.\n\u2022 Experience at a startup is preferred.\n\u2022 Experience with consumer e-commerce data/technologies would be a bonus.",
"word_count": 174
}
],
"urls": []
},
"rejected": false,
"rejection_reason": null,
"run_id": "a6556f41-5e53-4742-8fca-c75d44713263",
"stage3_signals": {
"alias_found": true,
"alias_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
}
],
"kra_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": [
{
"kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
"sentence": "Extensive experience in building ETL pipelines using tools such as Apache Spark, Databricks, or Hadoop.",
"similarity": 0.6569
},
{
"kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
"sentence": "Collaborate closely with the data science team to oversee data systems, ensuring accurate monitoring and insightful analysis of business processes.",
"similarity": 0.6224
},
{
"kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
"sentence": "Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.",
"similarity": 0.6028
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 0.6274,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "Svelte Frontend Developer",
"kra_matches": [
{
"kra_text": "backend data integration",
"sentence": "Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.",
"similarity": 0.6376
},
{
"kra_text": "backend data integration",
"sentence": "Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.",
"similarity": 0.5121
},
{
"kra_text": "backend data integration",
"sentence": "Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.",
"similarity": 0.4723
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 92,
"score": 0.5406,
"slug": "svelte-frontend-developer",
"total_count": null
},
{
"display_name": "Fullstack Developer",
"kra_matches": [
{
"kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
"sentence": "Proficiency with NoSQL databases, including MongoDB, Cassandra, DynamoDB, and CosmosDB.",
"similarity": 0.538
},
{
"kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
"sentence": "Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.",
"similarity": 0.5088
},
{
"kra_text": "Implements complete product features end-to-end from database schema design through backend API to frontend UI using JavaScript, TypeScript, Python, or Ruby on Rails.",
"sentence": "Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.",
"similarity": 0.476
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 15,
"score": 0.5076,
"slug": "full-stack-engineer",
"total_count": null
},
{
"display_name": "Flutter Developer",
"kra_matches": [
{
"kra_text": "integrate external APIs and data sources",
"sentence": "Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.",
"similarity": 0.5345
},
{
"kra_text": "collaborate with design, product, and backend teams",
"sentence": "Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.",
"similarity": 0.5058
},
{
"kra_text": "collaborate with design, product, and backend teams",
"sentence": "A background at a product-centric software company directly contributing to building products (vs providing data/analytics to business stakeholders)",
"similarity": 0.4759
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 74,
"score": 0.5054,
"slug": "flutter-developer",
"total_count": null
},
{
"display_name": "Backend Developer",
"kra_matches": [
{
"kra_text": "Integrates with third-party services, payment gateways, messaging queues like Kafka or RabbitMQ, and internal microservices via HTTP and event-driven patterns.",
"sentence": "Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.",
"similarity": 0.5164
},
{
"kra_text": "Integrates with third-party services, payment gateways, messaging queues like Kafka or RabbitMQ, and internal microservices via HTTP and event-driven patterns.",
"sentence": "Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.",
"similarity": 0.5079
},
{
"kra_text": "Identifies and resolves backend performance bottlenecks through query optimization, indexing strategies, connection pooling, and distributed caching with Redis.",
"sentence": "Architect and scale a state-of-the-art data infrastructure capable of handling batch and real-time data processing needs with unparalleled performance.",
"similarity": 0.4714
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 1,
"score": 0.4986,
"slug": "backend-engineer",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": 11,
"matched_skills": [
"AWS",
"Amazon SQS",
"Apache Spark",
"Azure",
"Hadoop",
"Java",
"Kafka",
"Parquet",
"Python",
"RabbitMQ",
"SQL"
],
"role_id": 2,
"score": 0.2821,
"slug": "data-engineer",
"total_count": 39
},
{
"display_name": "Backend Developer",
"kra_matches": null,
"matched_count": 11,
"matched_skills": [
"AWS",
"Amazon DynamoDB",
"Amazon SQS",
"Azure",
"Java",
"Kafka",
"MongoDB",
"Python",
"RabbitMQ",
"Redis",
"microservices"
],
"role_id": 1,
"score": 0.2821,
"slug": "backend-engineer",
"total_count": 39
},
{
"display_name": "Scala Backend Developer",
"kra_matches": null,
"matched_count": 7,
"matched_skills": [
"AWS",
"Azure",
"Java",
"Kafka",
"RabbitMQ",
"Redis",
"microservices"
],
"role_id": 87,
"score": 0.1795,
"slug": "scala-backend-developer",
"total_count": 39
},
{
"display_name": "Python Backend Developer",
"kra_matches": null,
"matched_count": 7,
"matched_skills": [
"AWS",
"Amazon SQS",
"Azure",
"Kafka",
"Python",
"RabbitMQ",
"Redis"
],
"role_id": 80,
"score": 0.1795,
"slug": "python-backend-developer",
"total_count": 39
},
{
"display_name": "Node.js Backend Developer",
"kra_matches": null,
"matched_count": 6,
"matched_skills": [
"AWS",
"Azure",
"Kafka",
"RabbitMQ",
"Redis",
"microservices"
],
"role_id": 82,
"score": 0.1538,
"slug": "node-backend-developer",
"total_count": 39
}
]
},
"stage4_decision": {
"alias_collision_detected": false,
"case": "A",
"chosen_role": {
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
},
"confidence": 1.0,
"is_new_role": false,
"llm2_fired": false,
"llm2_reasoning": null,
"matched_dimensions": [],
"matched_kras": [],
"matched_skills": [],
"new_role_display_name": null,
"new_role_slug": null,
"queued": false,
"reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.28 does not contradict",
"sub_role": null
},
"stage5_updates": {
"centroid_n_after": 291,
"centroid_updated": true,
"collision_log_id": null,
"new_kra_attached": null,
"new_skills_attached": [
{
"is_primary": true,
"queue_id": 14048,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "DynamoDB",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14049,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Azure Cosmos DB",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14050,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "AWS Kinesis",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14051,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Google Cloud Dataflow",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14052,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Google Cloud Tasks",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14053,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Bitbucket",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14054,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "TDD",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14055,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "ETL",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14056,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Modeling",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14057,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Warehousing",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14058,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Big Data",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14059,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Distributed Computing",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14060,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Real-time Stream Processing",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14061,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Caching",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14062,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Search Technologies",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 14063,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Message Queuing",
"status": "pending"
}
],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{}
API 3 — final-role-output
{}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.
Loading…