← Back to history

Pipeline run

655a31d7-f6e8-4465-a2e8-6c70793acda8

Pipeline LLM cost (USD)
API 1: $0.0037 API 2: $0.0002 API 3: $0.0000 Total: $0.0040

Client output enrichment

v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
role baseline loaded sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd
Nature of work · Data pipeline development
Build and maintain high-volume ETL pipelines and real-time/batch data services, improve performance and automate manual work, while evaluating new data technologies and mentoring teammates.
"“design, implement, and maintain data pipelines for extraction, transformation, and loading of data from a wide variety of data sources”"
Tech stack maturity
Mainstream Modern
A data engineer with machine-learning as a primary skill typically works in modern data and ML platforms, but the role alone does not imply cutting-edge AI-native or legacy-only stack characteristics.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
1.70 / 5
· Title match
Has AI skill
AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
Frameworks (×2):
Models / concepts (×3): Machine Learning
Evidence — skills matched in JD (6)
Big Data Data Pipelines ETL Real-time Streaming Batch Processing Machine Learning
Skill cluster (2 dimension groups, role-scoped)
AI Governance and Model Security
Machine Learning
Cross-cutting / unaligned
Big Data Data Pipelines ETL Real-time Streaming Batch Processing
Show KRA description ↓
We’re looking for a Senior Data Engineer to join our team and help build the next generation of big data solutions at Index Exchange with real-time streaming and batch analytical capabilities. Data is a big deal at Index Exchange (Index). Index’s advertising exchange handles billions of auctions and generates terabytes of auction-related information every day. Our team builds tools and infrastructure to manage this vast amount of data and make it available to both internal and external customers and partners for their reporting and analytics as well as machine model training needs. • Evaluate new technologies, design, implement, and maintain data pipelines for extraction, transformation, and loading of data from a wide variety of data sources to various data services • Identify, design, and implement system performance improvements • Identify, design, and implement internal process improvements • Automate manual processes and optimize data delivery • Lead and mentor team members • Identify and assess potential solutions for technical and business suitability • Experience and Leadership: A senior engineer with exposure leading projects and mentoring junior developers. A leader who continuously challenges the status quo and brings forth innovative ideas and improvements for the team. • Problem Solvers: You don’t stop until the problem gets solved and you find more than one way to solve it. You love working with other people, presenting your viewpoint but ultimately working towards the best solution, regardless of where it comes from • Knowledge Hungry: Learning new frameworks and languages is exciting to you – you’re not satisfied with the status quo. We use a variety of languages and tools to solve problems and we're interested in what you're looking to learn. • Passionate: You have a passion for Big Data and an interest in the latest trends and developments constantly researching new tools and data technologies

Signals

Skill ml-engineer
0.17
Alias data-engineer
1.00
KRA data-engineer
0.55

Post-classification

Centroidupdated · n=495
Alias collision log
New-role queue
New skills captured5
New KRA captured

Captured for admin review

Big Data primary Data Engineer pending
Data Pipelines primary Data Engineer pending
ETL primary Data Engineer pending
Real-time Streaming primary Data Engineer pending
Batch Processing primary Data Engineer pending
Status: completed Created: 2026-05-27T17:17:39.300391Z Updated: 2026-05-27T17:18:27.671372Z API 3 duration: 1453 ms
Flow Current 3-step pipeline

1 POST /skills/extract-from-jd

2 POST /skills/extract-details

3 POST /skills/final-role-output

Role Chosen role & resolution

Data Engineer

CASE A

slug: data-engineer · id: 2 · source: db

Exact alias hit on data-engineer (1.0) — no other alias at this confidence; skill_top ml-engineer 0.17 does not contradict

Resolution: in_db — role exists in library; skill↔dim and role↔dim links saved when applicable.

0
New skills
0
Skill↔dim saved
0
Role↔dim saved
0
Skipped

Job description

We shaped the earliest forms of ad tech, and we’re looking for the technical expertise to help shape its future. Our customers have unique problems that can only be solved at internet scale, and that’s where the technical skills of our team make a real difference.

Our exchange handles over 500 billion requests every day (for comparison Google serves an estimated 9 billion searches a day), all running in our own global data centers. Every member of our technology team has an enormous amount of autonomy in building and managing our systems to support and enable our growing level of scale. Through the transparency of our technology, dedication to innovation and integrity, and long-standing customer relationships, we lead through change.

What’s it like to work at Index?

We have more than 550 Indexers around the globe dedicated to building a safe and transparent marketplace that provides a trusted experience for consumers.

Index is an exciting and fast-paced place to work. We’re built on our values of change, support, learning and teaching, trust, and intention. We pride ourselves on our independence and openness, not only in our technology, but in our teams, too. Our diverse and inclusive culture celebrates how we can leverage our unique differences to help drive Index forward.

Our culture of success is truly supportive and collaborative. In working together across our teams, we’re continually investing in the people and technology to solve the industry’s most complex problems. As we extend the promise of ad tech to every channel, we’re looking for talented engineers to help advance Index, and the industry, forward.

Are you ready to join the programmatic evolution?

Index Exchange funds the open web. Content and journalism across the internet are funded through advertising, and we are the engine that helps to make that happen transparently, safely and efficiently. Handling hundreds of billions of auctions per day within milliseconds requires an intense understanding of the exchange and the ecosystem that we live in.

Our business is growing significantly every year and is poised to grow even faster. Our people and our platforms are the foundation and enabler of that growth. We are significantly expanding our technology teams, and are looking for technologists with a passion for high performance software development, and a drive to deliver software products and platforms that enable and empower industries at a global scale.

About The Role

We’re looking for a Senior Data Engineer to join our team and help build the next generation of big data solutions at Index Exchange with real-time streaming and batch analytical capabilities. Data is a big deal at Index Exchange (Index). Index’s advertising exchange handles billions of auctions and generates terabytes of auction-related information every day. Our team builds tools and infrastructure to manage this vast amount of data and make it available to both internal and external customers and partners for their reporting and analytics as well as machine model training needs.

Working with exciting technologies, your team will experiment with new tools and engineer innovative approaches to solve interesting challenges. Things shift very quickly in our industry, and we rely on our Engineering teams to keep Index and our clients ahead of the curve and moving in the right direction. We’re looking for Engineers who have experience in an Agile environment, who can drive innovation, and be a technical leader on our team.

Index’s scale spans the globe, our transactions happen 24x7 in our global data centers, and every second that passes millions of requests are evaluated across our exchange. In order to achieve our mission, global efficiency and reliability are absolutely key, as every millisecond quite literally counts in our business.

What We’re Looking For

• Experience and Leadership: A senior engineer with exposure leading projects and mentoring junior developers. A leader who continuously challenges the status quo and brings forth innovative ideas and improvements for the team.
• Problem Solvers: You don’t stop until the problem gets solved and you find more than one way to solve it. You love working with other people, presenting your viewpoint but ultimately working towards the best solution, regardless of where it comes from
• Knowledge Hungry: Learning new frameworks and languages is exciting to you – you’re not satisfied with the status quo. We use a variety of languages and tools to solve problems and we're interested in what you're looking to learn.
• Passionate: You have a passion for Big Data and an interest in the latest trends and developments constantly researching new tools and data technologies


Here’s What You’ll Be Doing

• Evaluate new technologies, design, implement, and maintain data pipelines for extraction, transformation, and loading of data from a wide variety of data sources to various data services
• Identify, design, and implement system performance improvements
• Identify, design, and implement internal process improvements
• Automate manual processes and optimize data delivery
• Lead and mentor team members
• Identify and assess potential solutions for technical and business suitability


Here's What You Need

• Bachelor/ Master’s Degree in Computer Science or Engineering related fields
• 8+ years of experience as a Software Engineer in enterprise grade, large scale distributed software product development
• 5+ years of work experience designing and building high performance data pipelines and applications using Hadoop/Ceph, Spark/Flink, Hive, Presto/Trino, Kafka, StarRocks/Vertica, Airflow, or other similar technologies
• Proficiency in some of the following languages: SQL, Scala, Java, Python, Bash
• Deep understanding of design principles of large scale distributed systems and familiar with mainstream big data related technologies and distributed frameworks
• Strong leadership, mentorship, and communication skills, with experience collaborating in a globally distributed, culturally diverse team
• Knowledge of data modeling, data warehousing, streaming data processing, and business intelligence reporting tools
• Experience working in Agile methodologies with continuous integration and delivery as CI/CD
• Experience working with containerization, and virtualization tools such as Kubernetes and Docker


Why You’ll Love Working Here

• Company paid comprehensive health and life insurance plans
• Paid Time off and flexible work schedules
• Company contribution to Provident Fund
• Participation in our company Stock options plan
• Company paid Parental Leave
• Monthly internet stipend
• Quarterly Wellness allowance
• Community engagement opportunities and donation-matching program
• Volunteer paid day off
• Annual virtual company retreats and regular community-led team events
• A workplace that supports a diverse, equitable, and inclusive environment – learn more here


Notification

Index Exchange is aware that there have been recent scams directed toward candidates regarding job interviews and offers.

Please be vigilant and do not accept interview requests, job offers, or other hiring-related documents from anyone other than our dedicated recruitment team, from the domain of @indexexchange.com. Our interview process consists of several steps, including phone screens and video interviews. We do not conduct interviews via an email questionnaire or request money at any point in the process.

We remain dedicated to resolving this matter and we appreciate your support.

Equal employment opportunity

At Index Exchange, we believe that successful products are built by teams just as diverse as the audience who uses them. As such, we are committed to equal employment opportunities. We celebrate diversity of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or expression, or veteran status. Additionally, we realize that diversity is deeper than any status or classification—diversity is the human experience. For those who show grit, passion, and humility—Index will welcome you.

Accessibility For Applicants With Disabilities

Index Exchange welcomes and encourages individuals with disabilities to apply to work with us.

If you require an accommodation, please share the details of your request and any information how we can assist you with the hiring recruiter when they contact you. Index Exchange will make reasonable efforts to ensure accommodation requests are met throughout the recruitment process.

Index Everywhere, Index Anywhere

Our corporate headquarters are in Toronto, with major offices in New York, Montreal, Kitchener, London, San Francisco, and many other global cities. As a major global advertising exchange, we are committed to operating as a tightly knit global team and embracing and empowering talent wherever our colleagues may be.

Skills from this JD

Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.

Big Data Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
CONCEPT
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Data Pipelines Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
CONCEPT
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
ETL Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
PRACTICE
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Real-time Streaming Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
CONCEPT
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Batch Processing Primary New / orchestrated API 3: new canonical path (new) New / unmatched skill (orchestrated in API 2)

Skill enrichment (orchestrator / LLM)

No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).

Derived legacy fields
Category
Data Engineering Tools
Sub-category
general
Skill nature
CONCEPT
Volatility
MEDIUM
Typical lifespan
MULTI_YEAR
Version strategy
UNVERSIONED
Machine Learning Primary Library skill API 3: existing canonical (in_db) Existing skill (matched library)
Canonical: Machine Learning id=1356 · machine-learning

Aliases — catalog

  • Machine Learning (CANONICAL)

Context tags (catalog)

Keras PyTorch TensorFlow cross-validation data preprocessing ensemble methods feature engineering hyperparameter tuning model evaluation natural language processing neural networks reinforcement learning scikit-learn supervised learning unsupervised learning

Stored enrichment (catalog DB)

Category
Concept
Sub-category
Machine Learning
Confidence
0.98
Version strategy
NOT_APPLICABLE

Maturity reasoning: Machine Learning appears in large volumes of job descriptions across data, product, and platform roles, and major cloud vendors (AWS, Google Cloud, Azure) offer dedicated ML services and certifications, indicating broad adoption.

Skill profile (library / DB)

Skill nature
CONCEPT
Volatility
STABLE
Typical lifespan
EVERGREEN
Category id
2
Sub-category id
1024
Extractable
True
Also category
False

Dimensions (API 2 worklist)

  • AI Governance and Model Security Catalog dimension db id 50

    Library dimension (catalog)

    Roles linked in library: AI Engineer, ML Engineer, MLOps Engineer

  • React Frontend Development Catalog dimension db id 96

    Library dimension (catalog)

API 3 link attempts (this skill)

Dimension Skill↔dim Role↔dim Outcome
AI Governance and Model Security
ai-governance-and-model-security
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
React Frontend Development
d_init_01
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

All API 3 persistence rows

Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.

Skill Tag Dimension Skill↔dim Role↔dim Outcome Notes
Machine Learning in_db
AI Governance and Model Security
ai-governance-and-model-security
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)
Machine Learning in_db
React Frontend Development
d_init_01
Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role)

Library artifacts (this run)

Kind Detail DB id
canonical_skill_proposed Big Data | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR
canonical_skill_proposed Data Pipelines | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR
canonical_skill_proposed ETL | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR
canonical_skill_proposed Real-time Streaming | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR
canonical_skill_proposed Batch Processing | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR
nano JD Parser — gpt-4.1-nano click to toggle
RoleSenior Data Engineer
CompanyIndex Exchange
Experience8+ years of experience as a Software Engineer in enterprise grade, large scale distributed software product development
DomainSoftware & SaaS Products
Location Toronto, Canada (hybrid)
JD type pass
Show raw JSON
{
  "JD_type": "pass",
  "about_company": {
    "source_marker": {
      "first_5_words": "Index Exchange funds the open",
      "last_5_words": "exchange and the ecosystem that"
    },
    "text": "Index Exchange funds the open web. Content and journalism across the internet are funded through advertising, and we are the engine that helps to make that happen transparently, safely and efficiently. Handling hundreds of billions of auctions per day within milliseconds requires an intense understanding of the exchange and the ecosystem that we live in.",
    "word_count": 64
  },
  "certifications": [],
  "company_name": "Index Exchange",
  "ctc": null,
  "domain": {
    "primary": {
      "aliases": [
        "SaaS",
        "Ad Tech"
      ],
      "domain": "Software \u0026 SaaS Products"
    },
    "secondary": null
  },
  "education": [
    {
      "level": "Bachelor\u0027s",
      "qualification": "BTECH/BE/BSC - Computer Science or Engineering related fields",
      "raw": "Bachelor/ Master\u2019s Degree in Computer Science or Engineering related fields",
      "requirement": "required"
    }
  ],
  "experience": {
    "max": null,
    "min": 8,
    "raw": "8+ years of experience as a Software Engineer in enterprise grade, large scale distributed software product development"
  },
  "job_locations": [
    {
      "aliases": [
        "Toronto, ON"
      ],
      "city": "Toronto",
      "country": "Canada",
      "state": "Ontario",
      "work_mode": "hybrid"
    },
    {
      "aliases": [
        "NYC"
      ],
      "city": "New York",
      "country": "United States",
      "state": "New York",
      "work_mode": "hybrid"
    },
    {
      "aliases": [
        "Montreal, QC"
      ],
      "city": "Montreal",
      "country": "Canada",
      "state": "Quebec",
      "work_mode": "hybrid"
    },
    {
      "aliases": [
        "Kitchener, ON"
      ],
      "city": "Kitchener",
      "country": "Canada",
      "state": "Ontario",
      "work_mode": "hybrid"
    },
    {
      "aliases": [
        "London, UK"
      ],
      "city": "London",
      "country": "United Kingdom",
      "state": "England",
      "work_mode": "hybrid"
    },
    {
      "aliases": [
        "SF"
      ],
      "city": "San Francisco",
      "country": "United States",
      "state": "California",
      "work_mode": "hybrid"
    }
  ],
  "role": "Senior Data Engineer",
  "role_aliases": [
    "Data Engineer",
    "Senior Data Engineer",
    "Big Data Engineer"
  ],
  "role_archetype": "Data",
  "roles_and_responsibilities": [
    {
      "bullet_count": 0,
      "heading": "About The Role",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "We\u2019re looking for a Senior",
        "last_5_words": "reporting and analytics as well"
      },
      "text": "We\u2019re looking for a Senior Data Engineer to join our team and help build the next generation of big data solutions at Index Exchange with real-time streaming and batch analytical capabilities. Data is a big deal at Index Exchange (Index). Index\u2019s advertising exchange handles billions of auctions and generates terabytes of auction-related information every day. Our team builds tools and infrastructure to manage this vast amount of data and make it available to both internal and external customers and partners for their reporting and analytics as well as machine model training needs.",
      "word_count": 83
    },
    {
      "bullet_count": 6,
      "heading": "Here\u2019s What You\u2019ll Be Doing",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "\u2022 Evaluate new technologies, design,",
        "last_5_words": "and business suitability"
      },
      "text": "\u2022 Evaluate new technologies, design, implement, and maintain data pipelines for extraction, transformation, and loading of data from a wide variety of data sources to various data services\n\u2022 Identify, design, and implement system performance improvements\n\u2022 Identify, design, and implement internal process improvements\n\u2022 Automate manual processes and optimize data delivery\n\u2022 Lead and mentor team members\n\u2022 Identify and assess potential solutions for technical and business suitability",
      "word_count": 56
    },
    {
      "bullet_count": 4,
      "heading": "What We\u2019re Looking For",
      "heading_was_present": true,
      "source_marker": {
        "first_5_words": "\u2022 Experience and Leadership: A",
        "last_5_words": "new tools and data technologies"
      },
      "text": "\u2022 Experience and Leadership: A senior engineer with exposure leading projects and mentoring junior developers. A leader who continuously challenges the status quo and brings forth innovative ideas and improvements for the team.\n\u2022 Problem Solvers: You don\u2019t stop until the problem gets solved and you find more than one way to solve it. You love working with other people, presenting your viewpoint but ultimately working towards the best solution, regardless of where it comes from\n\u2022 Knowledge Hungry: Learning new frameworks and languages is exciting to you \u2013 you\u2019re not satisfied with the status quo. We use a variety of languages and tools to solve problems and we\u0027re interested in what you\u0027re looking to learn.\n\u2022 Passionate: You have a passion for Big Data and an interest in the latest trends and developments constantly researching new tools and data technologies",
      "word_count": 104
    }
  ],
  "urls": []
}
API 1 — extract-from-jd click to toggle
{
  "final_skills": [
    {
      "is_primary": true,
      "skill_name": "Big Data"
    },
    {
      "is_primary": true,
      "skill_name": "Data Pipelines"
    },
    {
      "is_primary": true,
      "skill_name": "ETL"
    },
    {
      "is_primary": true,
      "skill_name": "Real-time Streaming"
    },
    {
      "is_primary": true,
      "skill_name": "Batch Processing"
    },
    {
      "is_primary": true,
      "skill_name": "Machine Learning"
    }
  ],
  "jd_role": {
    "display_name": "Senior Data Engineer",
    "rationale": null,
    "role_aliases": [
      "Data Engineer",
      "Senior Data Engineer",
      "Big Data Engineer"
    ],
    "role_archetype": "Data",
    "slug": ""
  },
  "nano_parsed": {
    "JD_type": "pass",
    "about_company": {
      "source_marker": {
        "first_5_words": "Index Exchange funds the open",
        "last_5_words": "exchange and the ecosystem that"
      },
      "text": "Index Exchange funds the open web. Content and journalism across the internet are funded through advertising, and we are the engine that helps to make that happen transparently, safely and efficiently. Handling hundreds of billions of auctions per day within milliseconds requires an intense understanding of the exchange and the ecosystem that we live in.",
      "word_count": 64
    },
    "certifications": [],
    "company_name": "Index Exchange",
    "ctc": null,
    "domain": {
      "primary": {
        "aliases": [
          "SaaS",
          "Ad Tech"
        ],
        "domain": "Software \u0026 SaaS Products"
      },
      "secondary": null
    },
    "education": [
      {
        "level": "Bachelor\u0027s",
        "qualification": "BTECH/BE/BSC - Computer Science or Engineering related fields",
        "raw": "Bachelor/ Master\u2019s Degree in Computer Science or Engineering related fields",
        "requirement": "required"
      }
    ],
    "experience": {
      "max": null,
      "min": 8,
      "raw": "8+ years of experience as a Software Engineer in enterprise grade, large scale distributed software product development"
    },
    "job_locations": [
      {
        "aliases": [
          "Toronto, ON"
        ],
        "city": "Toronto",
        "country": "Canada",
        "state": "Ontario",
        "work_mode": "hybrid"
      },
      {
        "aliases": [
          "NYC"
        ],
        "city": "New York",
        "country": "United States",
        "state": "New York",
        "work_mode": "hybrid"
      },
      {
        "aliases": [
          "Montreal, QC"
        ],
        "city": "Montreal",
        "country": "Canada",
        "state": "Quebec",
        "work_mode": "hybrid"
      },
      {
        "aliases": [
          "Kitchener, ON"
        ],
        "city": "Kitchener",
        "country": "Canada",
        "state": "Ontario",
        "work_mode": "hybrid"
      },
      {
        "aliases": [
          "London, UK"
        ],
        "city": "London",
        "country": "United Kingdom",
        "state": "England",
        "work_mode": "hybrid"
      },
      {
        "aliases": [
          "SF"
        ],
        "city": "San Francisco",
        "country": "United States",
        "state": "California",
        "work_mode": "hybrid"
      }
    ],
    "role": "Senior Data Engineer",
    "role_aliases": [
      "Data Engineer",
      "Senior Data Engineer",
      "Big Data Engineer"
    ],
    "role_archetype": "Data",
    "roles_and_responsibilities": [
      {
        "bullet_count": 0,
        "heading": "About The Role",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "We\u2019re looking for a Senior",
          "last_5_words": "reporting and analytics as well"
        },
        "text": "We\u2019re looking for a Senior Data Engineer to join our team and help build the next generation of big data solutions at Index Exchange with real-time streaming and batch analytical capabilities. Data is a big deal at Index Exchange (Index). Index\u2019s advertising exchange handles billions of auctions and generates terabytes of auction-related information every day. Our team builds tools and infrastructure to manage this vast amount of data and make it available to both internal and external customers and partners for their reporting and analytics as well as machine model training needs.",
        "word_count": 83
      },
      {
        "bullet_count": 6,
        "heading": "Here\u2019s What You\u2019ll Be Doing",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "\u2022 Evaluate new technologies, design,",
          "last_5_words": "and business suitability"
        },
        "text": "\u2022 Evaluate new technologies, design, implement, and maintain data pipelines for extraction, transformation, and loading of data from a wide variety of data sources to various data services\n\u2022 Identify, design, and implement system performance improvements\n\u2022 Identify, design, and implement internal process improvements\n\u2022 Automate manual processes and optimize data delivery\n\u2022 Lead and mentor team members\n\u2022 Identify and assess potential solutions for technical and business suitability",
        "word_count": 56
      },
      {
        "bullet_count": 4,
        "heading": "What We\u2019re Looking For",
        "heading_was_present": true,
        "source_marker": {
          "first_5_words": "\u2022 Experience and Leadership: A",
          "last_5_words": "new tools and data technologies"
        },
        "text": "\u2022 Experience and Leadership: A senior engineer with exposure leading projects and mentoring junior developers. A leader who continuously challenges the status quo and brings forth innovative ideas and improvements for the team.\n\u2022 Problem Solvers: You don\u2019t stop until the problem gets solved and you find more than one way to solve it. You love working with other people, presenting your viewpoint but ultimately working towards the best solution, regardless of where it comes from\n\u2022 Knowledge Hungry: Learning new frameworks and languages is exciting to you \u2013 you\u2019re not satisfied with the status quo. We use a variety of languages and tools to solve problems and we\u0027re interested in what you\u0027re looking to learn.\n\u2022 Passionate: You have a passion for Big Data and an interest in the latest trends and developments constantly researching new tools and data technologies",
        "word_count": 104
      }
    ],
    "urls": []
  },
  "rejected": false,
  "rejection_reason": null,
  "run_id": "655a31d7-f6e8-4465-a2e8-6c70793acda8",
  "stage3_signals": {
    "alias_found": true,
    "alias_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": null,
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 1.0,
        "slug": "data-engineer",
        "total_count": null
      }
    ],
    "kra_match_roles": [
      {
        "display_name": "Data Engineer",
        "kra_matches": [
          {
            "kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
            "sentence": "Evaluate new technologies, design, implement, and maintain data pipelines for extraction, transformation, and loading of data from a wide variety of data sources to various data services",
            "similarity": 0.6265
          },
          {
            "kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
            "sentence": "Our team builds tools and infrastructure to manage this vast amount of data and make it available to both internal and external customers and partners for their reporting and analytics as well as machine model training needs.",
            "similarity": 0.5736
          },
          {
            "kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
            "sentence": "Automate manual processes and optimize data delivery",
            "similarity": 0.4519
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 2,
        "score": 0.5507,
        "slug": "data-engineer",
        "total_count": null
      },
      {
        "display_name": "Scala Backend Developer",
        "kra_matches": [
          {
            "kra_text": "performance and reliability tuning",
            "sentence": "Identify, design, and implement system performance improvements",
            "similarity": 0.6202
          },
          {
            "kra_text": "backend workflow orchestration",
            "sentence": "Automate manual processes and optimize data delivery",
            "similarity": 0.4715
          },
          {
            "kra_text": "internal and external system integration",
            "sentence": "Identify, design, and implement internal process improvements",
            "similarity": 0.4625
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 87,
        "score": 0.518,
        "slug": "scala-backend-developer",
        "total_count": null
      },
      {
        "display_name": "Ruby Backend Developer",
        "kra_matches": [
          {
            "kra_text": "performance and reliability improvements",
            "sentence": "Identify, design, and implement system performance improvements",
            "similarity": 0.5888
          },
          {
            "kra_text": "internal and external service integration",
            "sentence": "Identify, design, and implement internal process improvements",
            "similarity": 0.4574
          },
          {
            "kra_text": "automated backend checks",
            "sentence": "Automate manual processes and optimize data delivery",
            "similarity": 0.4452
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 85,
        "score": 0.4972,
        "slug": "ruby-backend-developer",
        "total_count": null
      },
      {
        "display_name": "Svelte Frontend Developer",
        "kra_matches": [
          {
            "kra_text": "performance tuning",
            "sentence": "Identify, design, and implement system performance improvements",
            "similarity": 0.563
          },
          {
            "kra_text": "backend data integration",
            "sentence": "Evaluate new technologies, design, implement, and maintain data pipelines for extraction, transformation, and loading of data from a wide variety of data sources to various data services",
            "similarity": 0.4645
          },
          {
            "kra_text": "backend data integration",
            "sentence": "Automate manual processes and optimize data delivery",
            "similarity": 0.4486
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 92,
        "score": 0.492,
        "slug": "svelte-frontend-developer",
        "total_count": null
      },
      {
        "display_name": "Engineering Manager",
        "kra_matches": [
          {
            "kra_text": "facilitate technical and delivery decisions",
            "sentence": "Identify and assess potential solutions for technical and business suitability",
            "similarity": 0.5509
          },
          {
            "kra_text": "facilitate technical and delivery decisions",
            "sentence": "Automate manual processes and optimize data delivery",
            "similarity": 0.5023
          },
          {
            "kra_text": "facilitate technical and delivery decisions",
            "sentence": "Identify, design, and implement system performance improvements",
            "similarity": 0.4212
          }
        ],
        "matched_count": null,
        "matched_skills": null,
        "role_id": 121,
        "score": 0.4915,
        "slug": "engineering-manager",
        "total_count": null
      }
    ],
    "skill_match_roles": [
      {
        "display_name": "ML Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Machine Learning"
        ],
        "role_id": 3,
        "score": 0.1667,
        "slug": "ml-engineer",
        "total_count": 6
      },
      {
        "display_name": "AI Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Machine Learning"
        ],
        "role_id": 13,
        "score": 0.1667,
        "slug": "ai-engineer",
        "total_count": 6
      },
      {
        "display_name": "MLOps Engineer",
        "kra_matches": null,
        "matched_count": 1,
        "matched_skills": [
          "Machine Learning"
        ],
        "role_id": 16,
        "score": 0.1667,
        "slug": "ml-ops-engineer",
        "total_count": 6
      }
    ]
  },
  "stage4_decision": {
    "alias_collision_detected": false,
    "case": "A",
    "chosen_role": {
      "display_name": "Data Engineer",
      "kra_matches": null,
      "matched_count": null,
      "matched_skills": null,
      "role_id": 2,
      "score": 1.0,
      "slug": "data-engineer",
      "total_count": null
    },
    "confidence": 1.0,
    "is_new_role": false,
    "llm2_fired": false,
    "llm2_reasoning": null,
    "matched_dimensions": [],
    "matched_kras": [],
    "matched_skills": [],
    "new_role_display_name": null,
    "new_role_slug": null,
    "queued": false,
    "reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top ml-engineer 0.17 does not contradict",
    "sub_role": null
  },
  "stage5_updates": {
    "centroid_n_after": 495,
    "centroid_updated": true,
    "collision_log_id": null,
    "new_kra_attached": null,
    "new_skills_attached": [
      {
        "is_primary": true,
        "queue_id": 23050,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Big Data",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 23051,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Data Pipelines",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 23052,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "ETL",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 23053,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Real-time Streaming",
        "status": "pending"
      },
      {
        "is_primary": true,
        "queue_id": 23054,
        "role_display_name": "Data Engineer",
        "role_slug": "data-engineer",
        "skill_name": "Batch Processing",
        "status": "pending"
      }
    ],
    "queue_entry_id": null,
    "v3_pipeline_triggered": false,
    "v3_role_slug": null,
    "v3_run_id": null
  }
}
API 2 — extract-details
{
  "alias_matches": [
    {
      "alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
      "alias_persisted": false,
      "existing_alias_id": 2015,
      "existing_alias_text": "Machine Learning",
      "input_term": "Machine Learning",
      "matched_canonical": {
        "category_id": 2,
        "display_name": "Machine Learning",
        "id": 1356,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CONCEPT",
        "slug": "machine-learning",
        "sub_category_id": 1024,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "matched_via": "alias"
    }
  ],
  "candidate_roles": [
    {
      "display_name": "AI Engineer",
      "id": 13,
      "rationale": null,
      "role_archetype": null,
      "slug": "ai-engineer",
      "source": "db"
    },
    {
      "display_name": "ML Engineer",
      "id": 3,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-engineer",
      "source": "db"
    },
    {
      "display_name": "MLOps Engineer",
      "id": 16,
      "rationale": null,
      "role_archetype": null,
      "slug": "ml-ops-engineer",
      "source": "db"
    }
  ],
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top ml-engineer 0.17 does not contradict",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "dimensions": [
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "AI Governance and Model Security",
        "id": 50,
        "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
        "slug": "ai-governance-and-model-security",
        "source": "db"
      },
      "input_skill": "Machine Learning",
      "llm_role": null,
      "roles_from_db": [
        {
          "display_name": "AI Engineer",
          "id": 13,
          "rationale": null,
          "role_archetype": null,
          "slug": "ai-engineer",
          "source": "db"
        },
        {
          "display_name": "ML Engineer",
          "id": 3,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-engineer",
          "source": "db"
        },
        {
          "display_name": "MLOps Engineer",
          "id": 16,
          "rationale": null,
          "role_archetype": null,
          "slug": "ml-ops-engineer",
          "source": "db"
        }
      ]
    },
    {
      "dimension": {
        "difficulty_hint": "well_known",
        "display_name": "React Frontend Development",
        "id": 96,
        "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
        "slug": "d_init_01",
        "source": "db"
      },
      "input_skill": "Machine Learning",
      "llm_role": null,
      "roles_from_db": []
    }
  ],
  "input_final_skills": [
    "Big Data",
    "Data Pipelines",
    "ETL",
    "Real-time Streaming",
    "Batch Processing",
    "Machine Learning"
  ],
  "input_llm_skills": [
    "Big Data",
    "Data Pipelines",
    "ETL",
    "Real-time Streaming",
    "Batch Processing",
    "Machine Learning"
  ],
  "new_aliases_persisted": 0,
  "run_id": "655a31d7-f6e8-4465-a2e8-6c70793acda8",
  "skills_detail": [
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Big Data",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "big-data",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Data Pipelines",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "data-pipelines",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "ETL",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "PRACTICE",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "etl",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Real-time Streaming",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "real-time-streaming",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [],
      "canonical": null,
      "dimensions": [],
      "input_skill": "Batch Processing",
      "matched_via": null,
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": {
        "derived": {
          "category": "Data Engineering Tools",
          "skill_nature": "CONCEPT",
          "sub_category": "general",
          "typical_lifespan": "MULTI_YEAR",
          "version_strategy": "UNVERSIONED",
          "volatility": "MEDIUM"
        },
        "enrichment": null,
        "keep_log": [],
        "locked_dimensions": [],
        "merge_log": [],
        "placed": null,
        "relationships": null,
        "skill_id": "batch-processing",
        "split_log": [],
        "typed": null,
        "warnings": []
      },
      "source_tag": "llm",
      "was_in_llm_skills": true
    },
    {
      "aliases_in_db": [
        {
          "alias_text": "Machine Learning",
          "alias_type": "CANONICAL",
          "id": 2015,
          "is_primary": false,
          "match_strategy": "CASE_INSENSITIVE"
        }
      ],
      "canonical": {
        "category_id": 2,
        "display_name": "Machine Learning",
        "id": 1356,
        "is_also_category": false,
        "is_extractable": true,
        "skill_nature": "CONCEPT",
        "slug": "machine-learning",
        "sub_category_id": 1024,
        "typical_lifespan": "EVERGREEN",
        "volatility": "STABLE"
      },
      "dimensions": [
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "AI Governance and Model Security",
            "id": 50,
            "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
            "slug": "ai-governance-and-model-security",
            "source": "db"
          },
          "input_skill": "Machine Learning",
          "llm_role": null,
          "roles_from_db": [
            {
              "display_name": "AI Engineer",
              "id": 13,
              "rationale": null,
              "role_archetype": null,
              "slug": "ai-engineer",
              "source": "db"
            },
            {
              "display_name": "ML Engineer",
              "id": 3,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-engineer",
              "source": "db"
            },
            {
              "display_name": "MLOps Engineer",
              "id": 16,
              "rationale": null,
              "role_archetype": null,
              "slug": "ml-ops-engineer",
              "source": "db"
            }
          ]
        },
        {
          "dimension": {
            "difficulty_hint": "well_known",
            "display_name": "React Frontend Development",
            "id": 96,
            "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
            "slug": "d_init_01",
            "source": "db"
          },
          "input_skill": "Machine Learning",
          "llm_role": null,
          "roles_from_db": []
        }
      ],
      "input_skill": "Machine Learning",
      "matched_via": "alias",
      "new_alias_persisted": false,
      "new_alias_text": null,
      "new_skill_meta": null,
      "source_tag": "db",
      "was_in_llm_skills": true
    }
  ],
  "unmatched_skills": [
    "Big Data",
    "Data Pipelines",
    "ETL",
    "Real-time Streaming",
    "Batch Processing"
  ]
}
API 3 — final-role-output
{
  "chosen_role": {
    "display_name": "Data Engineer",
    "id": 2,
    "rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top ml-engineer 0.17 does not contradict",
    "role_archetype": null,
    "slug": "data-engineer",
    "source": "db"
  },
  "chosen_role_resolution": "in_db",
  "final_input_skills": [
    {
      "skill": "Big Data",
      "tag": "new"
    },
    {
      "skill": "Data Pipelines",
      "tag": "new"
    },
    {
      "skill": "ETL",
      "tag": "new"
    },
    {
      "skill": "Real-time Streaming",
      "tag": "new"
    },
    {
      "skill": "Batch Processing",
      "tag": "new"
    },
    {
      "skill": "Machine Learning",
      "tag": "in_db"
    }
  ],
  "llm_cost_api1_usd": null,
  "llm_cost_api2_usd": null,
  "llm_cost_api3_usd": null,
  "llm_cost_total_usd": null,
  "persistence": {
    "items": [
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "AI Governance and Model Security",
          "id": 50,
          "rationale": "Controls and documentation used to make models safer, auditable, and compliant. ML engineers use this to manage model risk, supply chain integrity, and governance requirements.",
          "slug": "ai-governance-and-model-security",
          "source": "db"
        },
        "dimension_id": 50,
        "input_skill": "Machine Learning",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [
          {
            "display_name": "AI Engineer",
            "id": 13,
            "rationale": null,
            "role_archetype": null,
            "slug": "ai-engineer",
            "source": "db"
          },
          {
            "display_name": "ML Engineer",
            "id": 3,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-engineer",
            "source": "db"
          },
          {
            "display_name": "MLOps Engineer",
            "id": 16,
            "rationale": null,
            "role_archetype": null,
            "slug": "ml-ops-engineer",
            "source": "db"
          }
        ],
        "skill_dimension_saved": true,
        "skill_id": 1356,
        "skill_tag": "in_db",
        "skipped_reason": null
      },
      {
        "chosen_role_id": 2,
        "dimension": {
          "difficulty_hint": "well_known",
          "display_name": "React Frontend Development",
          "id": 96,
          "rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
          "slug": "d_init_01",
          "source": "db"
        },
        "dimension_id": 96,
        "input_skill": "Machine Learning",
        "llm_role": null,
        "matched_chosen_role": false,
        "outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
        "role_dimension_saved": false,
        "roles_from_db": [],
        "skill_dimension_saved": true,
        "skill_id": 1356,
        "skill_tag": "in_db",
        "skipped_reason": null
      }
    ],
    "new_skills_created": 0,
    "role_dimension_saved": 0,
    "skill_dimension_saved": 0,
    "skipped": 0
  },
  "planner_output": null,
  "run_id": "655a31d7-f6e8-4465-a2e8-6c70793acda8"
}