Pipeline run

a6556f41-5e53-4742-8fca-c75d44713263

Pipeline LLM cost (USD)

API 1: $0.0051 API 2: $0.0000 API 3: $0.0000 Total: $0.0051

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description

role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd

Nature of work · Data pipeline development

Build and scale batch/real-time data pipelines and backend data infrastructure in Python/Java, using Spark/Databricks/Hadoop, Kafka/Kinesis, and cloud platforms; also model data, manage integrations, and support governance for product-focused e-commerce systems.

"Design and implement robust ETL (Extract, Transform, Load) data pipelines"

Tech stack maturity

Modern Cloud Native

The stack centers on cloud services, distributed data processing, and modern data platforms such as AWS, Azure, GCP, Databricks, Delta Lake, and Spark, which aligns with a modern cloud-native environment.

AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)

0.00 / 5

· Title match

· Has AI skill

· AI skill (primary)

· AI skill (secondary)

· On AI team

· Builds AI products

vocab breakdown (legacy)

Assistants (×1): —

Frameworks (×2): —

Models / concepts (×3): —

Evidence — skills matched in JD (39)

SQL Python Java Apache Spark Databricks Hadoop MongoDB Cassandra DynamoDB Amazon DynamoDB Azure Cosmos DB Kafka AWS Kinesis Google Cloud Dataflow Redis Elasticsearch Solr RabbitMQ Amazon SQS Google Cloud Tasks Delta Lake Parquet AWS Google Cloud Platform Azure +14

Skill cluster (9 dimension groups, role-scoped)

Messaging and Event Streaming

Kafka RabbitMQ Amazon SQS

Programming Languages for Data Work

SQL Python Java

Cloud Platforms

AWS Azure

ETL and ELT Tooling

Apache Spark Hadoop

Caching and State Management

Redis

Cloud Provider Platforms

Google Cloud Platform

Data Serialization Standards & Protocols

Parquet

Search and Content Discovery

Elasticsearch

Cross-cutting / unaligned

Databricks MongoDB Cassandra DynamoDB Amazon DynamoDB Azure Cosmos DB AWS Kinesis Google Cloud Dataflow Solr Google Cloud Tasks Delta Lake Git GitHub Bitbucket TDD Microservices ETL Data Modeling Data Warehousing Big Data Distributed Computing Real-time Stream Processing Caching Search Technologies Message Queuing

Show KRA description ↓

Big Data Technologies, Data Modeling, ETL Processes, Data Warehousing, SQL, Python, Cloud Platforms, Data Governance, We seek a Lead Data Engineer to take charge of our data engineering initiatives, focusing on enhancing data collection, storage, and analysis. • Architect and scale a state-of-the-art data infrastructure capable of handling batch and real-time data processing needs with unparalleled performance. • Collaborate closely with the data science team to oversee data systems, ensuring accurate monitoring and insightful analysis of business processes. • Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility. • Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management. • Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities. • A background at a product-centric software company directly contributing to building products (vs providing data/analytics to business stakeholders) • Software development experience with a focus on data engineering. • Extensive experience in building ETL pipelines using tools such as Apache Spark, Databricks, or Hadoop. • Proficiency in Python or Java, with a deep understanding of software engineering best practices. • Expertise in distributed computing and data modeling, capable of designing scalable data systems. • Proficiency with NoSQL databases, including MongoDB, Cassandra, DynamoDB, and CosmosDB. • Proficiency in real-time stream processing systems such as Kafka, AWS Kinesis, or GCP Data Flow. • Skilled in utilizing caching and search technologies like Redis, Elasticsearch, or Solr. • Familiarity with message queuing systems, including RabbitMQ, AWS SQS, or GCP Cloud Tasks. • Experience with Delta Lake, Parquet files, and AWS, GCP, or Azure cloud services. • A strong advocate for Test Driven Development (TDD) and experienced in version control using Git platforms like GitHub or Bitbucket. • Experience at a startup is preferred. • Experience with consumer e-commerce data/technologies would be a bonus.

Signals

Skill data-engineer

0.28

Alias data-engineer

1.00

KRA data-engineer

0.63

Post-classification

Centroidupdated · n=291

Alias collision log—

New-role queue—

New skills captured16

New KRA captured—

Captured for admin review

DynamoDB primary ↔ Data Engineer pending

Azure Cosmos DB primary ↔ Data Engineer pending

AWS Kinesis primary ↔ Data Engineer pending

Google Cloud Dataflow primary ↔ Data Engineer pending

Google Cloud Tasks primary ↔ Data Engineer pending

Bitbucket primary ↔ Data Engineer pending

TDD primary ↔ Data Engineer pending

ETL primary ↔ Data Engineer pending

Data Modeling primary ↔ Data Engineer pending

Data Warehousing primary ↔ Data Engineer pending

Big Data primary ↔ Data Engineer pending

Distributed Computing primary ↔ Data Engineer pending

Real-time Stream Processing primary ↔ Data Engineer pending

Caching primary ↔ Data Engineer pending

Search Technologies primary ↔ Data Engineer pending

Message Queuing primary ↔ Data Engineer pending

Status: extract_from_jd_done Created: 2026-05-27T15:23:24.437556Z Updated: 2026-06-12T16:30:54.530240Z

Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

No chosen role stored for this run.

Job description

Skills:
Big Data Technologies, Data Modeling, ETL Processes, Data Warehousing, SQL, Python, Cloud Platforms, Data Governance,

We seek a Lead Data Engineer to take charge of our data engineering initiatives, focusing on enhancing data collection, storage, and analysis.

• Architect and scale a state-of-the-art data infrastructure capable of handling batch and real-time data processing needs with unparalleled performance.
• Collaborate closely with the data science team to oversee data systems, ensuring accurate monitoring and insightful analysis of business processes.
• Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.
• Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.
• Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.

Requirements

• A background at a product-centric software company directly contributing to building products (vs providing data/analytics to business stakeholders)
• Software development experience with a focus on data engineering.
• Extensive experience in building ETL pipelines using tools such as Apache Spark, Databricks, or Hadoop.
• Proficiency in Python or Java, with a deep understanding of software engineering best practices.
• Expertise in distributed computing and data modeling, capable of designing scalable data systems.
• Proficiency with NoSQL databases, including MongoDB, Cassandra, DynamoDB, and CosmosDB.
• Proficiency in real-time stream processing systems such as Kafka, AWS Kinesis, or GCP Data Flow.
• Skilled in utilizing caching and search technologies like Redis, Elasticsearch, or Solr.
• Familiarity with message queuing systems, including RabbitMQ, AWS SQS, or GCP Cloud Tasks.
• Experience with Delta Lake, Parquet files, and AWS, GCP, or Azure cloud services.
• A strong advocate for Test Driven Development (TDD) and experienced in version control using Git platforms like GitHub or Bitbucket.
• Experience at a startup is preferred.
• Experience with consumer e-commerce data/technologies would be a bonus.

Benefits

• Work Location: Remote
• 5 days working

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

SQL Primary No API 2 row (run stopped after API 1 or history missing)

Python Primary No API 2 row (run stopped after API 1 or history missing)

Java Primary No API 2 row (run stopped after API 1 or history missing)

Apache Spark Primary No API 2 row (run stopped after API 1 or history missing)

Databricks Primary No API 2 row (run stopped after API 1 or history missing)

Hadoop Primary No API 2 row (run stopped after API 1 or history missing)

MongoDB Primary No API 2 row (run stopped after API 1 or history missing)

Cassandra Primary No API 2 row (run stopped after API 1 or history missing)

DynamoDB Primary No API 2 row (run stopped after API 1 or history missing)

Amazon DynamoDB Primary No API 2 row (run stopped after API 1 or history missing)

Azure Cosmos DB Primary No API 2 row (run stopped after API 1 or history missing)

Kafka Primary No API 2 row (run stopped after API 1 or history missing)

AWS Kinesis Primary No API 2 row (run stopped after API 1 or history missing)

Google Cloud Dataflow Primary No API 2 row (run stopped after API 1 or history missing)

Redis Primary No API 2 row (run stopped after API 1 or history missing)

Elasticsearch Primary No API 2 row (run stopped after API 1 or history missing)

Solr Primary No API 2 row (run stopped after API 1 or history missing)

RabbitMQ Primary No API 2 row (run stopped after API 1 or history missing)

Amazon SQS Primary No API 2 row (run stopped after API 1 or history missing)

Google Cloud Tasks Primary No API 2 row (run stopped after API 1 or history missing)

Delta Lake Primary No API 2 row (run stopped after API 1 or history missing)

Parquet Primary No API 2 row (run stopped after API 1 or history missing)

AWS Primary No API 2 row (run stopped after API 1 or history missing)

Google Cloud Platform Primary No API 2 row (run stopped after API 1 or history missing)

Azure Primary No API 2 row (run stopped after API 1 or history missing)

Git Primary No API 2 row (run stopped after API 1 or history missing)

GitHub Primary No API 2 row (run stopped after API 1 or history missing)

Bitbucket Primary No API 2 row (run stopped after API 1 or history missing)

TDD Primary No API 2 row (run stopped after API 1 or history missing)

Microservices Primary No API 2 row (run stopped after API 1 or history missing)

ETL Primary No API 2 row (run stopped after API 1 or history missing)

Data Modeling Primary No API 2 row (run stopped after API 1 or history missing)

Data Warehousing Primary No API 2 row (run stopped after API 1 or history missing)

Big Data Primary No API 2 row (run stopped after API 1 or history missing)

Distributed Computing Primary No API 2 row (run stopped after API 1 or history missing)

Real-time Stream Processing Primary No API 2 row (run stopped after API 1 or history missing)

Caching Primary No API 2 row (run stopped after API 1 or history missing)

Search Technologies Primary No API 2 row (run stopped after API 1 or history missing)

Message Queuing Primary No API 2 row (run stopped after API 1 or history missing)

Library artifacts (this run)

No artifact rows for this run.

nano JD Parser — gpt-4.1-nano click to toggle

RoleLead Data Engineer

DomainSoftware & SaaS Products

Location — (remote)

JD type pass

Show raw JSON

{
  "JD_type": "pass",
  "about_company": null,
  "certifications": [],
  "company_name": null,
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [
        "SaaS",
        "Product Companies"
      ],
      "domain": "Software \u0026 SaaS Products"
    },
    "secondary": null
  },
  "education": [],
  "experience": {
    "max": null,
    "min": null,
    "raw": null
  },
  "job_locations": [
    {
      "aliases": [],
      "city": null,
      "country": null,
      "state": null,
      "work_mode": "remote"
    }
  ],
  "role": "Lead Data Engineer",
  "role_aliases": [
    "Data Engineer",
    "Senior Data Engineer",
    "Data Engineering Lead"
  ],
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 0,
      "heading": "Skills",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "Big Data Technologies, Data Modeling,",
        "last_5_words": "Data Governance,"
      },
      "text": "Big Data Technologies, Data Modeling, ETL Processes, Data Warehousing, SQL, Python, Cloud Platforms, Data Governance,",
      "word_count": 14
    },
    {
      "bullet_count": 0,
      "heading": "Role Overview",
      "heading_was_present": false,
      "source_marker": {
        "first_5_words": "We seek a Lead Data",
        "last_5_words": "collection, storage, and analysis."
      },
      "text": "We seek a Lead Data Engineer to take charge of our data engineering initiatives, focusing on enhancing data collection, storage, and analysis.",
      "word_count": 25
    },
    {
      "bullet_count": 5,
      "heading": "Responsibilities",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "\u2022 Architect and scale a",
        "last_5_words": "data ecosystem and capabilities."
      },
      "text": "\u2022 Architect and scale a state-of-the-art data infrastructure capable of handling batch and real-time data processing needs with unparalleled performance.\n\u2022 Collaborate closely with the data science team to oversee data systems, ensuring accurate monitoring and insightful analysis of business processes.\n\u2022 Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.\n\u2022 Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.\n\u2022 Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.",
      "word_count": 90
    },
    {
      "bullet_count": 12,
      "heading": "Requirements",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "\u2022 A background at a",
        "last_5_words": "data/technologies would be a bonus."
      },
      "text": "\u2022 A background at a product-centric software company directly contributing to building products (vs providing data/analytics to business stakeholders)\n\u2022 Software development experience with a focus on data engineering.\n\u2022 Extensive experience in building ETL pipelines using tools such as Apache Spark, Databricks, or Hadoop.\n\u2022 Proficiency in Python or Java, with a deep understanding of software engineering best practices.\n\u2022 Expertise in distributed computing and data modeling, capable of designing scalable data systems.\n\u2022 Proficiency with NoSQL databases, including MongoDB, Cassandra, DynamoDB, and CosmosDB.\n\u2022 Proficiency in real-time stream processing systems such as Kafka, AWS Kinesis, or GCP Data Flow.\n\u2022 Skilled in utilizing caching and search technologies like Redis, Elasticsearch, or Solr.\n\u2022 Familiarity with message queuing systems, including RabbitMQ, AWS SQS, or GCP Cloud Tasks.\n\u2022 Experience with Delta Lake, Parquet files, and AWS, GCP, or Azure cloud services.\n\u2022 A strong advocate for Test Driven Development (TDD) and experienced in version control using Git platforms like GitHub or Bitbucket.\n\u2022 Experience at a startup is preferred.\n\u2022 Experience with consumer e-commerce data/technologies would be a bonus.",
      "word_count": 174
    }
  ],
  "urls": []
}

API 1 — extract-from-jd click to toggle

{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "SQL"
    },
    {
      "is_primary": true,
      "skill_name": "Python"
    },
    {
      "is_primary": true,
      "skill_name": "Java"
    },
    {
      "is_primary": true,
      "skill_name": "Apache Spark"
    },
    {
      "is_primary": true,
      "skill_name": "Databricks"
    },
    {
      "is_primary": true,
      "skill_name": "Hadoop"
    },
    {
      "is_primary": true,
      "skill_name": "MongoDB"
    },
    {
      "is_primary": true,
      "skill_name": "Cassandra"
    },
    {
      "is_primary": true,
      "skill_name": "DynamoDB"
    },
    {
      "is_primary": true,
      "skill_name": "Amazon DynamoDB"
    },
    {
      "is_primary": true,
      "skill_name": "Azure Cosmos DB"
    },
    {
      "is_primary": true,
      "skill_name": "Kafka"
    },
    {
      "is_primary": true,
      "skill_name": "AWS Kinesis"
    },
    {
      "is_primary": true,
      "skill_name": "Google Cloud Dataflow"
    },
    {
      "is_primary": true,
      "skill_name": "Redis"
    },
    {
      "is_primary": true,
      "skill_name": "Elasticsearch"
    },
    {
      "is_primary": true,
      "skill_name": "Solr"
    },
    {
      "is_primary": true,
      "skill_name": "RabbitMQ"
    },
    {
      "is_primary": true,
      "skill_name": "Amazon SQS"
    },
    {
      "is_primary": true,
      "skill_name": "Google Cloud Tasks"
    },
    {
      "is_primary": true,
      "skill_name": "Delta Lake"
    },
    {
      "is_primary": true,
      "skill_name": "Parquet"
    },
    {
      "is_primary": true,
      "skill_name": "AWS"
    },
    {
      "is_primary": true,
      "skill_name": "Google Cloud Platform"
    },
    {
      "is_primary": true,
      "skill_name": "Azure"
    },
    {
      "is_primary": true,
      "skill_name": "Git"
    },
    {
      "is_primary": true,
      "skill_name": "GitHub"
    },
    {
      "is_primary": true,
      "skill_name": "Bitbucket"
    },
    {
      "is_primary": true,
      "skill_name": "TDD"
    },
    {
      "is_primary": true,
      "skill_name": "Microservices"
    },
    {
      "is_primary": true,
      "skill_name": "ETL"
    },
    {
      "is_primary": true,
      "skill_name": "Data Modeling"
    },
    {
      "is_primary": true,
      "skill_name": "Data Warehousing"
    },
    {
      "is_primary": true,
      "skill_name": "Big Data"
    },
    {
      "is_primary": true,
      "skill_name": "Distributed Computing"
    },
    {
      "is_primary": true,
      "skill_name": "Real-time Stream Processing"
    },
    {
      "is_primary": true,
      "skill_name": "Caching"
    },
    {
      "is_primary": true,
      "skill_name": "Search Technologies"
    },
    {
      "is_primary": true,
      "skill_name": "Message Queuing"
    }
  ],
  "jd_role": {
    "display_name": "Lead Data Engineer",
    "rationale": null,
    "role_aliases": [
      "Data Engineer",
      "Senior Data Engineer",
      "Data Engineering Lead"
    ],
    "role_archetype": "Data",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": null,
    "certifications": [],
    "company_name": null,
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [
          "SaaS",
          "Product Companies"
        ],
        "domain": "Software \u0026 SaaS Products"
      },
      "secondary": null
    },
    "education": [],
    "experience": {
      "max": null,
      "min": null,
      "raw": null
    },
    "job_locations": [
      {
        "aliases": [],
        "city": null,
        "country": null,
        "state": null,
        "work_mode": "remote"
      }
    ],
    "role": "Lead Data Engineer",
    "role_aliases": [
      "Data Engineer",
      "Senior Data Engineer",
      "Data Engineering Lead"
    ],
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 0,
        "heading": "Skills",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "Big Data Technologies, Data Modeling,",
          "last_5_words": "Data Governance,"
        },
        "text": "Big Data Technologies, Data Modeling, ETL Processes, Data Warehousing, SQL, Python, Cloud Platforms, Data Governance,",
        "word_count": 14
      },
      {
        "bullet_count": 0,
        "heading": "Role Overview",
        "heading_was_present": false,
        "source_marker": {
          "first_5_words": "We seek a Lead Data",
          "last_5_words": "collection, storage, and analysis."
        },
        "text": "We seek a Lead Data Engineer to take charge of our data engineering initiatives, focusing on enhancing data collection, storage, and analysis.",
        "word_count": 25
      },
      {
        "bullet_count": 5,
        "heading": "Responsibilities",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "\u2022 Architect and scale a",
          "last_5_words": "data ecosystem and capabilities."
        },
        "text": "\u2022 Architect and scale a state-of-the-art data infrastructure capable of handling batch and real-time data processing needs with unparalleled performance.\n\u2022 Collaborate closely with the data science team to oversee data systems, ensuring accurate monitoring and insightful analysis of business processes.\n\u2022 Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.\n\u2022 Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.\n\u2022 Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.",
        "word_count": 90
      },
      {
        "bullet_count": 12,
        "heading": "Requirements",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "\u2022 A background at a",
          "last_5_words": "data/technologies would be a bonus."
        },
        "text": "\u2022 A background at a product-centric software company directly contributing to building products (vs providing data/analytics to business stakeholders)\n\u2022 Software development experience with a focus on data engineering.\n\u2022 Extensive experience in building ETL pipelines using tools such as Apache Spark, Databricks, or Hadoop.\n\u2022 Proficiency in Python or Java, with a deep understanding of software engineering best practices.\n\u2022 Expertise in distributed computing and data modeling, capable of designing scalable data systems.\n\u2022 Proficiency with NoSQL databases, including MongoDB, Cassandra, DynamoDB, and CosmosDB.\n\u2022 Proficiency in real-time stream processing systems such as Kafka, AWS Kinesis, or GCP Data Flow.\n\u2022 Skilled in utilizing caching and search technologies like Redis, Elasticsearch, or Solr.\n\u2022 Familiarity with message queuing systems, including RabbitMQ, AWS SQS, or GCP Cloud Tasks.\n\u2022 Experience with Delta Lake, Parquet files, and AWS, GCP, or Azure cloud services.\n\u2022 A strong advocate for Test Driven Development (TDD) and experienced in version control using Git platforms like GitHub or Bitbucket.\n\u2022 Experience at a startup is preferred.\n\u2022 Experience with consumer e-commerce data/technologies would be a bonus.",
        "word_count": 174
      }
    ],
    "urls": []
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "a6556f41-5e53-4742-8fca-c75d44713263",
  "stage3_signals": {
    "alias_found": true,
    "alias_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 1.0,
        "slug": "data-engineer",
        "total_count": null
      }
    ],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": [
          {
            "kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
            "sentence": "Extensive experience in building ETL pipelines using tools such as Apache Spark, Databricks, or Hadoop.",
            "similarity": 0.6569
          },
          {
            "kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
            "sentence": "Collaborate closely with the data science team to oversee data systems, ensuring accurate monitoring and insightful analysis of business processes.",
            "similarity": 0.6224
          },
          {
            "kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
            "sentence": "Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.",
            "similarity": 0.6028
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 0.6274,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "Svelte Frontend Developer",
        "kra_matches": [
          {
            "kra_text": "backend data integration",
            "sentence": "Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.",
            "similarity": 0.6376
          },
          {
            "kra_text": "backend data integration",
            "sentence": "Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.",
            "similarity": 0.5121
          },
          {
            "kra_text": "backend data integration",
            "sentence": "Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.",
            "similarity": 0.4723
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 92,
        "score": 0.5406,
        "slug": "svelte-frontend-developer",
        "total_count": null
      },
      {
        "display_name": "Fullstack Developer",
        "kra_matches": [
          {
            "kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
            "sentence": "Proficiency with NoSQL databases, including MongoDB, Cassandra, DynamoDB, and CosmosDB.",
            "similarity": 0.538
          },
          {
            "kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
            "sentence": "Design and implement robust ETL (Extract, Transform, Load) data pipelines, optimizing data flow and accessibility.",
            "similarity": 0.5088
          },
          {
            "kra_text": "Implements complete product features end-to-end from database schema design through backend API to frontend UI using JavaScript, TypeScript, Python, or Ruby on Rails.",
            "sentence": "Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.",
            "similarity": 0.476
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 15,
        "score": 0.5076,
        "slug": "full-stack-engineer",
        "total_count": null
      },
      {
        "display_name": "Flutter Developer",
        "kra_matches": [
          {
            "kra_text": "integrate external APIs and data sources",
            "sentence": "Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.",
            "similarity": 0.5345
          },
          {
            "kra_text": "collaborate with design, product, and backend teams",
            "sentence": "Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.",
            "similarity": 0.5058
          },
          {
            "kra_text": "collaborate with design, product, and backend teams",
            "sentence": "A background at a product-centric software company directly contributing to building products (vs providing data/analytics to business stakeholders)",
            "similarity": 0.4759
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 74,
        "score": 0.5054,
        "slug": "flutter-developer",
        "total_count": null
      },
      {
        "display_name": "Backend Developer",
        "kra_matches": [
          {
            "kra_text": "Integrates with third-party services, payment gateways, messaging queues like Kafka or RabbitMQ, and internal microservices via HTTP and event-driven patterns.",
            "sentence": "Develop comprehensive backend data solutions to bolster microservices architecture, ensuring seamless data integration and management.",
            "similarity": 0.5164
          },
          {
            "kra_text": "Integrates with third-party services, payment gateways, messaging queues like Kafka or RabbitMQ, and internal microservices via HTTP and event-driven patterns.",
            "sentence": "Engineer and manage integrations with third-party e-commerce platforms, expanding data ecosystem and capabilities.",
            "similarity": 0.5079
          },
          {
            "kra_text": "Identifies and resolves backend performance bottlenecks through query optimization, indexing strategies, connection pooling, and distributed caching with Redis.",
            "sentence": "Architect and scale a state-of-the-art data infrastructure capable of handling batch and real-time data processing needs with unparalleled performance.",
            "similarity": 0.4714
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 1,
        "score": 0.4986,
        "slug": "backend-engineer",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": 11,
        "matched_skills": [
          "AWS",
          "Amazon SQS",
          "Apache Spark",
          "Azure",
          "Hadoop",
          "Java",
          "Kafka",
          "Parquet",
          "Python",
          "RabbitMQ",
          "SQL"
        ],
        "role_id": 2,
        "score": 0.2821,
        "slug": "data-engineer",
        "total_count": 39
      },
      {
        "display_name": "Backend Developer",
        "kra_matches": null,
        "matched_count": 11,
        "matched_skills": [
          "AWS",
          "Amazon DynamoDB",
          "Amazon SQS",
          "Azure",
          "Java",
          "Kafka",
          "MongoDB",
          "Python",
          "RabbitMQ",
          "Redis",
          "microservices"
        ],
        "role_id": 1,
        "score": 0.2821,
        "slug": "backend-engineer",
        "total_count": 39
      },
      {
        "display_name": "Scala Backend Developer",
        "kra_matches": null,
        "matched_count": 7,
        "matched_skills": [
          "AWS",
          "Azure",
          "Java",
          "Kafka",
          "RabbitMQ",
          "Redis",
          "microservices"
        ],
        "role_id": 87,
        "score": 0.1795,
        "slug": "scala-backend-developer",
        "total_count": 39
      },
      {
        "display_name": "Python Backend Developer",
        "kra_matches": null,
        "matched_count": 7,
        "matched_skills": [
          "AWS",
          "Amazon SQS",
          "Azure",
          "Kafka",
          "Python",
          "RabbitMQ",
          "Redis"
        ],
        "role_id": 80,
        "score": 0.1795,
        "slug": "python-backend-developer",
        "total_count": 39
      },
      {
        "display_name": "Node.js Backend Developer",
        "kra_matches": null,
        "matched_count": 6,
        "matched_skills": [
          "AWS",
          "Azure",
          "Kafka",
          "RabbitMQ",
          "Redis",
          "microservices"
        ],
        "role_id": 82,
        "score": 0.1538,
        "slug": "node-backend-developer",
        "total_count": 39
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "A",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 1.0,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 1.0,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [],
    "matched_kras": [],
    "matched_skills": [],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.28 does not contradict",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 291,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": null,
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 14048,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "DynamoDB",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14049,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Azure Cosmos DB",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14050,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "AWS Kinesis",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14051,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Google Cloud Dataflow",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14052,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Google Cloud Tasks",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14053,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Bitbucket",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14054,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "TDD",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14055,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "ETL",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14056,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Modeling",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14057,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Warehousing",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14058,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Big Data",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14059,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Distributed Computing",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14060,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Real-time Stream Processing",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14061,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Caching",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14062,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Search Technologies",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 14063,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Message Queuing",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}

API 2 — extract-details

{}

API 3 — final-role-output

{}

LLM Calls

Every model call made for this run, in pipeline order. Click a card to see the model's response.

Loading…