Pipeline run
c2b11bf5-0ed4-4d40-b6df-b822db58604b
Pipeline LLM cost (USD)
API 1: $0.0058
API 2: $0.0000
API 3: $0.0000
Total: $0.0058
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA description
role baseline loaded
sources · ai_index: jd · nature_of_work: jd · tech_stack_maturity: jd
Nature of work
· Data pipeline development
Build and operate real-time data pipelines and lakehouse/OLAP layers, adding observability, fault-tolerance, and SQL optimization while modernizing legacy ETL into scalable bronze→silver→gold workflows.
"Build and maintain high-throughput, real-time data pipelines using Kafka/Pulsar with Spark"
Tech stack maturity
Modern Cloud Native
The stack centers on containerization, Kubernetes, Terraform, Airflow/Dagster orchestration, Kafka, Spark/Flink, dbt, and cloud-oriented data engineering patterns, which aligns best with modern cloud-native systems.
AI index (0 = no AI use, 5 = totally AI-dependent · v2.1)
0.20 / 5
· Title match
✓ Has AI skill
· AI skill (primary)
· AI skill (secondary)
· On AI team
· Builds AI products
vocab breakdown (legacy)
Assistants (×1):
—
Frameworks (×2):
—
Models / concepts (×3):
AI
Evidence — skills matched in JD (44)
Kafka
Pulsar
Apache Spark
Checkpointing
Replay Logic
Data Observability
Data Quality
SLA Alerts
Anomaly Detection
Data Lineage
Apache Iceberg
Polaris
Gravitino
ClickHouse
StarRocks
Bronze-Silver-Gold Data Modeling
Airflow
dbt
Dagster
SQLMesh
SQL
Trino
Apache Flink
Python
Java
+19
Skill cluster (13 dimension groups, role-scoped)
Programming Languages for Data Work
SQL
Python
Java
ETL and ELT Tooling
Apache Spark
dbt
Cloud Platforms
Distributed Systems
Container Orchestration Platforms
Kubernetes
Containerization and Image Builds
Docker
Data Pipeline Orchestration
Dagster
Data Quality and Reconciliation
Anomaly Detection
Data Serialization Standards & Protocols
Parquet
Infrastructure as Code
Terraform
Messaging and Event Streaming
Kafka
Relational Database Design
Indexing
Stream Processing Systems
Apache Flink
Cross-cutting / unaligned
Pulsar
Checkpointing
Replay Logic
Data Observability
Data Quality
SLA Alerts
Data Lineage
Apache Iceberg
Polaris
Gravitino
ClickHouse
StarRocks
Bronze-Silver-Gold Data Modeling
Airflow
SQLMesh
Trino
Iceberg REST Catalogs
Data Structures and Algorithms
Sorting
Searching
Memory Models
OLTP
OLAP
Query Execution
Storage Formats
Monitoring
Cube.js
RisingWave
Arroyo
Show KRA description ↓
• Build and maintain high-throughput, real-time data pipelines using Kafka/Pulsar with Spark,
• Design fault-tolerant systems with zero-data-loss principles — checkpointing, replay logic,
• Implement data observability — quality checks, SLA alerts, anomaly detection, lineage, and
• Design and manage Iceberg-based lakehouse tables (Polaris/Gravitino catalogs, schema
• Build fast OLAP layers using ClickHouse / StarRocks.
• Model data across bronze → silver → gold layers for downstream teams.
• Migrate and modernize legacy pipelines into scalable, distributed workflows.
• Orchestrate ETL workloads using Airflow, DBT, Dagster, SQLMesh.
• Optimize SQL transformations and distributed execution across Trino/Spark.
• Ensure strict security and governance across all data layers — access control, encryption,
• Collaborate with backend, analytics, and platform teams for seamless data delivery.
• Extremely strong SQL — window functions, query planning, optimization.
• High comfort working with distributed & parallel workloads.
• Hands-on experience with some-many of these technologies : Apache Spark, Apache Flink,
• Advanced experience in Python (preferred) or Java (strong fundamentals).
• Strong understanding of Parquet, Apache Iceberg, and Iceberg REST catalogs (Polaris /
• Experience with OLAP databases — ClickHouse / StarRocks.
• Experience with semantic layers — Cube.js or similar.
• Strong experience building pipelines with Airflow, DBT, Dagster, SQLMesh.
• Solid understanding of data structures & algorithms — sorting, searching, memory models.
• Strong grasp of OLTP vs OLAP, indexing, query execution, and storage formats.
• Ability to debug distributed systems end-to-end (compute, storage, network, orchestration).
• Familiarity with cloud environments, containerization (Docker), and monitoring.
• Experience with large-scale data — high throughput, billions of rows, large parallel workloads.
• Awareness of cost optimization in compute & storage.
• Experience with emerging stream processors — Dagster, RisingWave, Arroyo.
• Kubernetes, Terraform, or cloud-native big-data stacks.
• Strong ownership — takes systems from design → build → monitor.
• Self-driven, independent, and comfortable making technical decisions.
• High attention to reliability, data accuracy, and operational excellence.
• Naturally grows into broader technical responsibility as the platform scales.
Signals
Skill
data-engineer
0.29
Alias
data-engineer
1.00
KRA
data-engineer
0.66
Post-classification
Centroidupdated · n=84
Alias collision log—
New-role queue—
New skills captured27
New KRA captured—
Captured for admin review
Pulsar
primary
↔
Data Engineer
pending
Checkpointing
primary
↔
Data Engineer
pending
Replay Logic
primary
↔
Data Engineer
pending
Data Observability
primary
↔
Data Engineer
pending
Data Quality
primary
↔
Data Engineer
pending
SLA Alerts
primary
↔
Data Engineer
pending
Data Lineage
primary
↔
Data Engineer
pending
Apache Iceberg
primary
↔
Data Engineer
pending
Polaris
primary
↔
Data Engineer
pending
Gravitino
primary
↔
Data Engineer
pending
ClickHouse
primary
↔
Data Engineer
pending
StarRocks
primary
↔
Data Engineer
pending
Bronze-Silver-Gold Data Modeling
primary
↔
Data Engineer
pending
SQLMesh
primary
↔
Data Engineer
pending
Trino
primary
↔
Data Engineer
pending
Iceberg REST Catalogs
primary
↔
Data Engineer
pending
Cube.js
↔
Data Engineer
pending
Data Structures and Algorithms
primary
↔
Data Engineer
pending
Sorting
primary
↔
Data Engineer
pending
Searching
primary
↔
Data Engineer
pending
Memory Models
primary
↔
Data Engineer
pending
OLTP
primary
↔
Data Engineer
pending
OLAP
primary
↔
Data Engineer
pending
Query Execution
primary
↔
Data Engineer
pending
Storage Formats
primary
↔
Data Engineer
pending
RisingWave
↔
Data Engineer
pending
Arroyo
↔
Data Engineer
pending
Status:
extract_from_jd_done
Created: 2026-05-27T13:52:24.861248Z
Updated: 2026-05-27T13:52:28.407782Z
Flow
Current 3-step pipeline
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Role
Chosen role & resolution
No chosen role stored for this run.
Job description
Experience: 5.00 + years Salary: Confidential (based on experience) Shift: (GMT+05:30) Asia/Kolkata (IST) Opportunity Type: Remote Placement Type: Full time Permanent Position (*Note: This is a requirement for one of Uplers' client - 1digitalstack.ai) What do you need for this opportunity? Must have skills required: Python, Java, Iceberg, Kafka, Apache Beam, Apache Flink, Apache pulsar, Spark, Trino, OLAP, ClickHouse, starrocks 1digitalstack.ai is Looking for: Role - Senior Data Engineer Experience - 5-7 Years Location - Remote (India) About 1DigitalStack.ai 1DigitalStack.ai combines AI and deep eCommerce data to help global brands grow faster on online marketplaces. Our platforms deliver advanced analytics, actionable intelligence, and media automation — enabling brands to optimize visibility, efficiency, and sales performance at scale. We partner with India’s top consumer companies — Unilever, Marico, Coca-Cola, Tata Consumer, Dabur, and Unicharm — across 125+ marketplaces globally. Backed by leading venture investors and powered by a 220+ member team, we’re in our $5–10M growth journey, scaling rapidly across categories and geographies to redefine how brands win on digital shelves. 🔗 Check out more at www.1digitalstack.ai About Role This is a high-impact, hands-on engineering role owning the core data systems that power our analytics, AI, and automation stack. You’ll work closely with the CTO and Engineering Leads and independently manage large, high-throughput data pipelines that process millions of events. Responsibilities : • Build and maintain high-throughput, real-time data pipelines using Kafka/Pulsar with Spark, Flink, and distributed compute engines. • Design fault-tolerant systems with zero-data-loss principles — checkpointing, replay logic, DLQs, deduplication, and back-pressure handling. • Implement data observability — quality checks, SLA alerts, anomaly detection, lineage, and metadata insights. • Design and manage Iceberg-based lakehouse tables (Polaris/Gravitino catalogs, schema evolution, compaction). • Build fast OLAP layers using ClickHouse / StarRocks. • Model data across bronze → silver → gold layers for downstream teams. • Migrate and modernize legacy pipelines into scalable, distributed workflows. • Orchestrate ETL workloads using Airflow, DBT, Dagster, SQLMesh. • Optimize SQL transformations and distributed execution across Trino/Spark. • Ensure strict security and governance across all data layers — access control, encryption, auditability. • Collaborate with backend, analytics, and platform teams for seamless data delivery. Requirements Core Technical Skills • Extremely strong SQL — window functions, query planning, optimization. • High comfort working with distributed & parallel workloads. • Hands-on experience with some-many of these technologies : Apache Spark, Apache Flink, Trino, Apache Kafka, Apache Pulsar, Apache Beam • Advanced experience in Python (preferred) or Java (strong fundamentals). • Strong understanding of Parquet, Apache Iceberg, and Iceberg REST catalogs (Polaris / Gravitino). • Experience with OLAP databases — ClickHouse / StarRocks. • Experience with semantic layers — Cube.js or similar. • Strong experience building pipelines with Airflow, DBT, Dagster, SQLMesh. Foundational Strengths • Solid understanding of data structures & algorithms — sorting, searching, memory models. • Strong grasp of OLTP vs OLAP, indexing, query execution, and storage formats. • Ability to debug distributed systems end-to-end (compute, storage, network, orchestration). • Familiarity with cloud environments, containerization (Docker), and monitoring. • Experience with large-scale data — high throughput, billions of rows, large parallel workloads. • Awareness of cost optimization in compute & storage. Good to Have • Experience with emerging stream processors — Dagster, RisingWave, Arroyo. • Kubernetes, Terraform, or cloud-native big-data stacks. Mindset • Strong ownership — takes systems from design → build → monitor. • Self-driven, independent, and comfortable making technical decisions. • High attention to reliability, data accuracy, and operational excellence. • Naturally grows into broader technical responsibility as the platform scales. Why 1DS is a great choice • High-trust, no-politics culture — we value communication, ownership, and accountability • Collaborative, ego-free team — building together is in our DNA • Learning-first environment — mentorship, peer reviews, and exposure to real business impact • Modern stack + autonomy — your voice shapes how we build • VC-funded & scaling fast — 250+ strong, building from India for the world How to apply for this opportunity? • Step 1: Click On Apply! And Register or Login on our portal. • Step 2: Complete the Screening Form & Upload updated Resume • Step 3: Increase your chances to get shortlisted & meet the client for the Interview! About Uplers: Our goal is to make hiring reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant contractual onsite opportunities and progress in their career. We will support any grievances or challenges you may face during the engagement. (Note: There are many more opportunities apart from this on the portal. Depending on the assessments you clear, you can apply for them as well). So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Kafka
Primary
No API 2 row (run stopped after API 1 or history missing)
Pulsar
Primary
No API 2 row (run stopped after API 1 or history missing)
Apache Spark
Primary
No API 2 row (run stopped after API 1 or history missing)
Checkpointing
Primary
No API 2 row (run stopped after API 1 or history missing)
Replay Logic
Primary
No API 2 row (run stopped after API 1 or history missing)
Data Observability
Primary
No API 2 row (run stopped after API 1 or history missing)
Data Quality
Primary
No API 2 row (run stopped after API 1 or history missing)
SLA Alerts
Primary
No API 2 row (run stopped after API 1 or history missing)
Anomaly Detection
Primary
No API 2 row (run stopped after API 1 or history missing)
Data Lineage
Primary
No API 2 row (run stopped after API 1 or history missing)
Apache Iceberg
Primary
No API 2 row (run stopped after API 1 or history missing)
Polaris
Primary
No API 2 row (run stopped after API 1 or history missing)
Gravitino
Primary
No API 2 row (run stopped after API 1 or history missing)
ClickHouse
Primary
No API 2 row (run stopped after API 1 or history missing)
StarRocks
Primary
No API 2 row (run stopped after API 1 or history missing)
Bronze-Silver-Gold Data Modeling
Primary
No API 2 row (run stopped after API 1 or history missing)
Airflow
Primary
No API 2 row (run stopped after API 1 or history missing)
dbt
Primary
No API 2 row (run stopped after API 1 or history missing)
Dagster
Primary
No API 2 row (run stopped after API 1 or history missing)
SQLMesh
Primary
No API 2 row (run stopped after API 1 or history missing)
SQL
Primary
No API 2 row (run stopped after API 1 or history missing)
Trino
Primary
No API 2 row (run stopped after API 1 or history missing)
Apache Flink
Primary
No API 2 row (run stopped after API 1 or history missing)
Python
Primary
No API 2 row (run stopped after API 1 or history missing)
Java
Primary
No API 2 row (run stopped after API 1 or history missing)
Parquet
Primary
No API 2 row (run stopped after API 1 or history missing)
Iceberg REST Catalogs
Primary
No API 2 row (run stopped after API 1 or history missing)
Cube.js
Secondary
No API 2 row (run stopped after API 1 or history missing)
Data Structures and Algorithms
Primary
No API 2 row (run stopped after API 1 or history missing)
Sorting
Primary
No API 2 row (run stopped after API 1 or history missing)
Searching
Primary
No API 2 row (run stopped after API 1 or history missing)
Memory Models
Primary
No API 2 row (run stopped after API 1 or history missing)
OLTP
Primary
No API 2 row (run stopped after API 1 or history missing)
OLAP
Primary
No API 2 row (run stopped after API 1 or history missing)
Indexing
Primary
No API 2 row (run stopped after API 1 or history missing)
Query Execution
Primary
No API 2 row (run stopped after API 1 or history missing)
Storage Formats
Primary
No API 2 row (run stopped after API 1 or history missing)
Distributed Systems
Primary
No API 2 row (run stopped after API 1 or history missing)
Docker
Primary
No API 2 row (run stopped after API 1 or history missing)
Monitoring
Primary
No API 2 row (run stopped after API 1 or history missing)
Kubernetes
Primary
No API 2 row (run stopped after API 1 or history missing)
Terraform
Primary
No API 2 row (run stopped after API 1 or history missing)
RisingWave
Secondary
No API 2 row (run stopped after API 1 or history missing)
Arroyo
Secondary
No API 2 row (run stopped after API 1 or history missing)
Library artifacts (this run)
No artifact rows for this run.
nano JD Parser — gpt-4.1-nano click to toggle
RoleSenior Data Engineer
Company1DigitalStack.ai
Experience5-7 Years
DomainE-commerce
Location
India
(remote)
JD type
pass
Show raw JSON
{
"JD_type": "pass",
"about_company": {
"source_marker": {
"first_5_words": "1DigitalStack.ai combines AI and deep",
"last_5_words": "how brands win on digital shelves."
},
"text": "1DigitalStack.ai combines AI and deep eCommerce data to help global brands grow faster on online marketplaces. Our platforms deliver advanced analytics, actionable intelligence, and media automation \u2014 enabling brands to optimize visibility, efficiency, and sales performance at scale. We partner with India\u2019s top consumer companies \u2014 Unilever, Marico, Coca-Cola, Tata Consumer, Dabur, and Unicharm \u2014 across 125+ marketplaces globally. Backed by leading venture investors and powered by a 220+ member team, we\u2019re in our $5\u201310M growth journey, scaling rapidly across categories and geographies to redefine how brands win on digital shelves.",
"word_count": 84
},
"certifications": [],
"company_name": "1DigitalStack.ai",
"ctc": null,
"domain": {
"primary": {
"aliases": [
"Online Retail",
"Marketplaces"
],
"domain": "E-commerce"
},
"secondary": null
},
"education": [],
"experience": {
"max": 7,
"min": 5,
"raw": "5-7 Years"
},
"job_locations": [
{
"aliases": [],
"city": null,
"country": "India",
"state": null,
"work_mode": "remote"
}
],
"role": "Senior Data Engineer",
"role_aliases": [
"Data Engineer",
"Senior Data Engineer",
"Big Data Engineer"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 11,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Build and maintain high-throughput,",
"last_5_words": "and platform teams for seamless data delivery."
},
"text": "\u2022 Build and maintain high-throughput, real-time data pipelines using Kafka/Pulsar with Spark,\n\n\u2022 Design fault-tolerant systems with zero-data-loss principles \u2014 checkpointing, replay logic,\n\n\u2022 Implement data observability \u2014 quality checks, SLA alerts, anomaly detection, lineage, and\n\n\u2022 Design and manage Iceberg-based lakehouse tables (Polaris/Gravitino catalogs, schema\n\n\u2022 Build fast OLAP layers using ClickHouse / StarRocks.\n\u2022 Model data across bronze \u2192 silver \u2192 gold layers for downstream teams.\n\u2022 Migrate and modernize legacy pipelines into scalable, distributed workflows.\n\u2022 Orchestrate ETL workloads using Airflow, DBT, Dagster, SQLMesh.\n\u2022 Optimize SQL transformations and distributed execution across Trino/Spark.\n\u2022 Ensure strict security and governance across all data layers \u2014 access control, encryption,\n\n\u2022 Collaborate with backend, analytics, and platform teams for seamless data delivery.",
"word_count": 134
},
{
"bullet_count": 8,
"heading": "Core Technical Skills",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Extremely strong SQL \u2014 window functions,",
"last_5_words": "with Airflow, DBT, Dagster, SQLMesh."
},
"text": "\u2022 Extremely strong SQL \u2014 window functions, query planning, optimization.\n\u2022 High comfort working with distributed \u0026 parallel workloads.\n\u2022 Hands-on experience with some-many of these technologies : Apache Spark, Apache Flink,\n\u2022 Advanced experience in Python (preferred) or Java (strong fundamentals).\n\u2022 Strong understanding of Parquet, Apache Iceberg, and Iceberg REST catalogs (Polaris /\n\u2022 Experience with OLAP databases \u2014 ClickHouse / StarRocks.\n\u2022 Experience with semantic layers \u2014 Cube.js or similar.\n\u2022 Strong experience building pipelines with Airflow, DBT, Dagster, SQLMesh.",
"word_count": 104
},
{
"bullet_count": 6,
"heading": "Foundational Strengths",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Solid understanding of data structures",
"last_5_words": "in compute \u0026 storage."
},
"text": "\u2022 Solid understanding of data structures \u0026 algorithms \u2014 sorting, searching, memory models.\n\u2022 Strong grasp of OLTP vs OLAP, indexing, query execution, and storage formats.\n\u2022 Ability to debug distributed systems end-to-end (compute, storage, network, orchestration).\n\u2022 Familiarity with cloud environments, containerization (Docker), and monitoring.\n\u2022 Experience with large-scale data \u2014 high throughput, billions of rows, large parallel workloads.\n\u2022 Awareness of cost optimization in compute \u0026 storage.",
"word_count": 104
},
{
"bullet_count": 2,
"heading": "Good to Have",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Experience with emerging stream processors",
"last_5_words": "or cloud-native big-data stacks."
},
"text": "\u2022 Experience with emerging stream processors \u2014 Dagster, RisingWave, Arroyo.\n\u2022 Kubernetes, Terraform, or cloud-native big-data stacks.",
"word_count": 24
},
{
"bullet_count": 4,
"heading": "Mindset",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Strong ownership \u2014 takes systems",
"last_5_words": "responsibility as the platform scales."
},
"text": "\u2022 Strong ownership \u2014 takes systems from design \u2192 build \u2192 monitor.\n\u2022 Self-driven, independent, and comfortable making technical decisions.\n\u2022 High attention to reliability, data accuracy, and operational excellence.\n\u2022 Naturally grows into broader technical responsibility as the platform scales.",
"word_count": 40
}
],
"urls": [
{
"type": "website",
"url": "http://www.1digitalstack.ai"
}
]
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "Kafka"
},
{
"is_primary": true,
"skill_name": "Pulsar"
},
{
"is_primary": true,
"skill_name": "Apache Spark"
},
{
"is_primary": true,
"skill_name": "Checkpointing"
},
{
"is_primary": true,
"skill_name": "Replay Logic"
},
{
"is_primary": true,
"skill_name": "Data Observability"
},
{
"is_primary": true,
"skill_name": "Data Quality"
},
{
"is_primary": true,
"skill_name": "SLA Alerts"
},
{
"is_primary": true,
"skill_name": "Anomaly Detection"
},
{
"is_primary": true,
"skill_name": "Data Lineage"
},
{
"is_primary": true,
"skill_name": "Apache Iceberg"
},
{
"is_primary": true,
"skill_name": "Polaris"
},
{
"is_primary": true,
"skill_name": "Gravitino"
},
{
"is_primary": true,
"skill_name": "ClickHouse"
},
{
"is_primary": true,
"skill_name": "StarRocks"
},
{
"is_primary": true,
"skill_name": "Bronze-Silver-Gold Data Modeling"
},
{
"is_primary": true,
"skill_name": "Airflow"
},
{
"is_primary": true,
"skill_name": "dbt"
},
{
"is_primary": true,
"skill_name": "Dagster"
},
{
"is_primary": true,
"skill_name": "SQLMesh"
},
{
"is_primary": true,
"skill_name": "SQL"
},
{
"is_primary": true,
"skill_name": "Trino"
},
{
"is_primary": true,
"skill_name": "Apache Flink"
},
{
"is_primary": true,
"skill_name": "Python"
},
{
"is_primary": true,
"skill_name": "Java"
},
{
"is_primary": true,
"skill_name": "Parquet"
},
{
"is_primary": true,
"skill_name": "Iceberg REST Catalogs"
},
{
"is_primary": false,
"skill_name": "Cube.js"
},
{
"is_primary": true,
"skill_name": "Data Structures and Algorithms"
},
{
"is_primary": true,
"skill_name": "Sorting"
},
{
"is_primary": true,
"skill_name": "Searching"
},
{
"is_primary": true,
"skill_name": "Memory Models"
},
{
"is_primary": true,
"skill_name": "OLTP"
},
{
"is_primary": true,
"skill_name": "OLAP"
},
{
"is_primary": true,
"skill_name": "Indexing"
},
{
"is_primary": true,
"skill_name": "Query Execution"
},
{
"is_primary": true,
"skill_name": "Storage Formats"
},
{
"is_primary": true,
"skill_name": "Distributed Systems"
},
{
"is_primary": true,
"skill_name": "Docker"
},
{
"is_primary": true,
"skill_name": "Monitoring"
},
{
"is_primary": true,
"skill_name": "Kubernetes"
},
{
"is_primary": true,
"skill_name": "Terraform"
},
{
"is_primary": false,
"skill_name": "RisingWave"
},
{
"is_primary": false,
"skill_name": "Arroyo"
}
],
"jd_role": {
"display_name": "Senior Data Engineer",
"rationale": null,
"role_aliases": [
"Data Engineer",
"Senior Data Engineer",
"Big Data Engineer"
],
"role_archetype": "Data",
"slug": ""
},
"nano_parsed": {
"JD_type": "pass",
"about_company": {
"source_marker": {
"first_5_words": "1DigitalStack.ai combines AI and deep",
"last_5_words": "how brands win on digital shelves."
},
"text": "1DigitalStack.ai combines AI and deep eCommerce data to help global brands grow faster on online marketplaces. Our platforms deliver advanced analytics, actionable intelligence, and media automation \u2014 enabling brands to optimize visibility, efficiency, and sales performance at scale. We partner with India\u2019s top consumer companies \u2014 Unilever, Marico, Coca-Cola, Tata Consumer, Dabur, and Unicharm \u2014 across 125+ marketplaces globally. Backed by leading venture investors and powered by a 220+ member team, we\u2019re in our $5\u201310M growth journey, scaling rapidly across categories and geographies to redefine how brands win on digital shelves.",
"word_count": 84
},
"certifications": [],
"company_name": "1DigitalStack.ai",
"ctc": null,
"domain": {
"primary": {
"aliases": [
"Online Retail",
"Marketplaces"
],
"domain": "E-commerce"
},
"secondary": null
},
"education": [],
"experience": {
"max": 7,
"min": 5,
"raw": "5-7 Years"
},
"job_locations": [
{
"aliases": [],
"city": null,
"country": "India",
"state": null,
"work_mode": "remote"
}
],
"role": "Senior Data Engineer",
"role_aliases": [
"Data Engineer",
"Senior Data Engineer",
"Big Data Engineer"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 11,
"heading": "Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Build and maintain high-throughput,",
"last_5_words": "and platform teams for seamless data delivery."
},
"text": "\u2022 Build and maintain high-throughput, real-time data pipelines using Kafka/Pulsar with Spark,\n\n\u2022 Design fault-tolerant systems with zero-data-loss principles \u2014 checkpointing, replay logic,\n\n\u2022 Implement data observability \u2014 quality checks, SLA alerts, anomaly detection, lineage, and\n\n\u2022 Design and manage Iceberg-based lakehouse tables (Polaris/Gravitino catalogs, schema\n\n\u2022 Build fast OLAP layers using ClickHouse / StarRocks.\n\u2022 Model data across bronze \u2192 silver \u2192 gold layers for downstream teams.\n\u2022 Migrate and modernize legacy pipelines into scalable, distributed workflows.\n\u2022 Orchestrate ETL workloads using Airflow, DBT, Dagster, SQLMesh.\n\u2022 Optimize SQL transformations and distributed execution across Trino/Spark.\n\u2022 Ensure strict security and governance across all data layers \u2014 access control, encryption,\n\n\u2022 Collaborate with backend, analytics, and platform teams for seamless data delivery.",
"word_count": 134
},
{
"bullet_count": 8,
"heading": "Core Technical Skills",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Extremely strong SQL \u2014 window functions,",
"last_5_words": "with Airflow, DBT, Dagster, SQLMesh."
},
"text": "\u2022 Extremely strong SQL \u2014 window functions, query planning, optimization.\n\u2022 High comfort working with distributed \u0026 parallel workloads.\n\u2022 Hands-on experience with some-many of these technologies : Apache Spark, Apache Flink,\n\u2022 Advanced experience in Python (preferred) or Java (strong fundamentals).\n\u2022 Strong understanding of Parquet, Apache Iceberg, and Iceberg REST catalogs (Polaris /\n\u2022 Experience with OLAP databases \u2014 ClickHouse / StarRocks.\n\u2022 Experience with semantic layers \u2014 Cube.js or similar.\n\u2022 Strong experience building pipelines with Airflow, DBT, Dagster, SQLMesh.",
"word_count": 104
},
{
"bullet_count": 6,
"heading": "Foundational Strengths",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Solid understanding of data structures",
"last_5_words": "in compute \u0026 storage."
},
"text": "\u2022 Solid understanding of data structures \u0026 algorithms \u2014 sorting, searching, memory models.\n\u2022 Strong grasp of OLTP vs OLAP, indexing, query execution, and storage formats.\n\u2022 Ability to debug distributed systems end-to-end (compute, storage, network, orchestration).\n\u2022 Familiarity with cloud environments, containerization (Docker), and monitoring.\n\u2022 Experience with large-scale data \u2014 high throughput, billions of rows, large parallel workloads.\n\u2022 Awareness of cost optimization in compute \u0026 storage.",
"word_count": 104
},
{
"bullet_count": 2,
"heading": "Good to Have",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Experience with emerging stream processors",
"last_5_words": "or cloud-native big-data stacks."
},
"text": "\u2022 Experience with emerging stream processors \u2014 Dagster, RisingWave, Arroyo.\n\u2022 Kubernetes, Terraform, or cloud-native big-data stacks.",
"word_count": 24
},
{
"bullet_count": 4,
"heading": "Mindset",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Strong ownership \u2014 takes systems",
"last_5_words": "responsibility as the platform scales."
},
"text": "\u2022 Strong ownership \u2014 takes systems from design \u2192 build \u2192 monitor.\n\u2022 Self-driven, independent, and comfortable making technical decisions.\n\u2022 High attention to reliability, data accuracy, and operational excellence.\n\u2022 Naturally grows into broader technical responsibility as the platform scales.",
"word_count": 40
}
],
"urls": [
{
"type": "website",
"url": "http://www.1digitalstack.ai"
}
]
},
"rejected": false,
"rejection_reason": null,
"run_id": "c2b11bf5-0ed4-4d40-b6df-b822db58604b",
"stage3_signals": {
"alias_found": true,
"alias_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
}
],
"kra_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": [
{
"kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
"sentence": "Build and maintain high-throughput, real-time data pipelines using Kafka/Pulsar with Spark,",
"similarity": 0.7268
},
{
"kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
"sentence": "Collaborate with backend, analytics, and platform teams for seamless data delivery.",
"similarity": 0.627
},
{
"kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
"sentence": "Orchestrate ETL workloads using Airflow, DBT, Dagster, SQLMesh.",
"similarity": 0.6256
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 0.6598,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "Backend Developer",
"kra_matches": [
{
"kra_text": "Adds structured logging, metrics, distributed tracing, and alerting to improve system observability and support production debugging.",
"sentence": "Implement data observability \u2014 quality checks, SLA alerts, anomaly detection, lineage, and",
"similarity": 0.5875
},
{
"kra_text": "Adds structured logging, metrics, distributed tracing, and alerting to improve system observability and support production debugging.",
"sentence": "Ability to debug distributed systems end-to-end (compute, storage, network, orchestration).",
"similarity": 0.5049
},
{
"kra_text": "Identifies and resolves backend performance bottlenecks through query optimization, indexing strategies, connection pooling, and distributed caching with Redis.",
"sentence": "Optimize SQL transformations and distributed execution across Trino/Spark.",
"similarity": 0.495
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 1,
"score": 0.5291,
"slug": "backend-engineer",
"total_count": null
},
{
"display_name": "Cloud Architect",
"kra_matches": [
{
"kra_text": "Establishes cloud governance guardrails including budget alerts, resource quotas, policy-as-code enforcement, and compliance posture management.",
"sentence": "Ensure strict security and governance across all data layers \u2014 access control, encryption,",
"similarity": 0.5281
},
{
"kra_text": "Evaluates cloud-native managed services, serverless compute, PaaS databases, and CDN solutions for workload fit and total cost of ownership.",
"sentence": "Awareness of cost optimization in compute \u0026 storage.",
"similarity": 0.5227
},
{
"kra_text": "Designs multi-region and multi-availability-zone cloud infrastructure architectures for high availability, fault tolerance, and horizontal scalability.",
"sentence": "High comfort working with distributed \u0026 parallel workloads.",
"similarity": 0.5168
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 9,
"score": 0.5225,
"slug": "cloud-architect",
"total_count": null
},
{
"display_name": "DevOps Engineer",
"kra_matches": [
{
"kra_text": "Provisions and manages cloud infrastructure on AWS, Azure, or GCP using Terraform or CloudFormation to enforce infrastructure-as-code standards.",
"sentence": "Kubernetes, Terraform, or cloud-native big-data stacks.",
"similarity": 0.5323
},
{
"kra_text": "Monitors CI/CD pipeline reliability, identifies bottlenecks in delivery workflows, and improves deployment frequency, lead time, and failure recovery rate.",
"sentence": "Implement data observability \u2014 quality checks, SLA alerts, anomaly detection, lineage, and",
"similarity": 0.5209
},
{
"kra_text": "Collaborates with development teams to improve build processes, reduce deployment friction, containerize applications, and adopt DevOps best practices.",
"sentence": "Collaborate with backend, analytics, and platform teams for seamless data delivery.",
"similarity": 0.5046
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 10,
"score": 0.5193,
"slug": "devops-engineer",
"total_count": null
},
{
"display_name": "MLOps Engineer",
"kra_matches": [
{
"kra_text": "Orchestrates model serving deployments to production using Kubernetes, MLflow Model Registry, SageMaker, or Kubeflow Serving infrastructure.",
"sentence": "Orchestrate ETL workloads using Airflow, DBT, Dagster, SQLMesh.",
"similarity": 0.5392
},
{
"kra_text": "Sets up model monitoring dashboards, data drift detection, prediction performance tracking, and alert routing for production ML systems.",
"sentence": "Implement data observability \u2014 quality checks, SLA alerts, anomaly detection, lineage, and",
"similarity": 0.51
},
{
"kra_text": "Orchestrates model serving deployments to production using Kubernetes, MLflow Model Registry, SageMaker, or Kubeflow Serving infrastructure.",
"sentence": "Kubernetes, Terraform, or cloud-native big-data stacks.",
"similarity": 0.4988
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 16,
"score": 0.516,
"slug": "ml-ops-engineer",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": 12,
"matched_skills": [
"Anomaly detection",
"Apache Flink",
"Apache Spark",
"Dagster",
"Distributed Systems",
"Flink",
"Java",
"Kafka",
"Parquet",
"Python",
"SQL",
"dbt"
],
"role_id": 2,
"score": 0.2927,
"slug": "data-engineer",
"total_count": 41
},
{
"display_name": "ML Engineer",
"kra_matches": null,
"matched_count": 7,
"matched_skills": [
"Airflow",
"Anomaly detection",
"Dagster",
"Distributed Systems",
"Kubernetes",
"Python",
"Terraform"
],
"role_id": 3,
"score": 0.1707,
"slug": "ml-engineer",
"total_count": 41
},
{
"display_name": "MLOps Engineer",
"kra_matches": null,
"matched_count": 6,
"matched_skills": [
"Airflow",
"Anomaly detection",
"Dagster",
"Distributed Systems",
"Kubernetes",
"Python"
],
"role_id": 16,
"score": 0.1463,
"slug": "ml-ops-engineer",
"total_count": 41
},
{
"display_name": "Backend Developer",
"kra_matches": null,
"matched_count": 6,
"matched_skills": [
"Distributed Systems",
"Docker",
"Java",
"Kafka",
"Python",
"indexing"
],
"role_id": 1,
"score": 0.1463,
"slug": "backend-engineer",
"total_count": 41
},
{
"display_name": "DevOps Engineer",
"kra_matches": null,
"matched_count": 5,
"matched_skills": [
"Distributed Systems",
"Docker",
"Kubernetes",
"Monitoring",
"Terraform"
],
"role_id": 10,
"score": 0.122,
"slug": "devops-engineer",
"total_count": 41
}
]
},
"stage4_decision": {
"alias_collision_detected": false,
"case": "A",
"chosen_role": {
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
},
"confidence": 1.0,
"is_new_role": false,
"llm2_fired": false,
"llm2_reasoning": null,
"matched_dimensions": [],
"matched_kras": [],
"matched_skills": [],
"new_role_display_name": null,
"new_role_slug": null,
"queued": false,
"reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.29 does not contradict",
"sub_role": null
},
"stage5_updates": {
"centroid_n_after": 84,
"centroid_updated": true,
"collision_log_id": null,
"new_kra_attached": null,
"new_skills_attached": [
{
"is_primary": true,
"queue_id": 5334,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Pulsar",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5335,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Checkpointing",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5336,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Replay Logic",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5337,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Observability",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5338,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Quality",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5339,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "SLA Alerts",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5340,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Lineage",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5341,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Apache Iceberg",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5342,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Polaris",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5343,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Gravitino",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5344,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "ClickHouse",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5345,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "StarRocks",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5346,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Bronze-Silver-Gold Data Modeling",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5347,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "SQLMesh",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5348,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Trino",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5349,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Iceberg REST Catalogs",
"status": "pending"
},
{
"is_primary": false,
"queue_id": 5350,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Cube.js",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5351,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Structures and Algorithms",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5352,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Sorting",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5353,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Searching",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5354,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Memory Models",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5355,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "OLTP",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5356,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "OLAP",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5357,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Query Execution",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 5358,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Storage Formats",
"status": "pending"
},
{
"is_primary": false,
"queue_id": 5359,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "RisingWave",
"status": "pending"
},
{
"is_primary": false,
"queue_id": 5360,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Arroyo",
"status": "pending"
}
],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{}
API 3 — final-role-output
{}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.
Loading…