Pipeline run
5c415ce7-e9d4-4ca3-97c6-e28132bfcdbe
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA descriptionvocab breakdown (legacy)
Signals
Post-classification
Captured for admin review
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Data Engineer
CASE Aslug: data-engineer · id: 2 · source: db
Exact alias hit on data-engineer (1.0) — no other alias at this confidence; skill_top data-engineer 0.43 does not contradict
Resolution:
in_db
— role exists in library; skill↔dim and role↔dim links saved when applicable.
Job description
JD Data Engineer II Job Desc Deprecation Accelerator scope We are looking for a Data Engineer who has working knowledge of building and maintaining scalable data pipelines on-premises and on the cloud. This includes understanding the input and output data sources, upstream downstream dependencies and ensuring data quality. A key aspect of this role will be focusing on the deprecation of migrated workflows and migration of workflows into new systems (if needed). The ideal candidate should be experienced with tools and technologies such as Git, Apache Airflow, Apache Spark, SQL, data migration, and data validation. Key Responsibilities: 1. Workflow Deprecation o Plan and execute the deprecation of migrated workflows by evaluating current workflows' dependencies and consumption. o Utilize tools and best practices to identify, mark, and communicate deprecated workflows to stakeholders. 2. Data Migration o Plan and execute data migration tasks to move data between different storage systems or formats. o Ensure the accuracy and completeness of data during migration processes. o Implement strategies to accelerate the pace of data migration by backfilling, validating, and making new data assets ready for use. 3. Data Validation o Define and implement data validation rules to ensure data accuracy, completeness, and reliability. o Utilize data validation solutions and anomaly detection methods to monitor data quality. 4. Workflow Management o Use Apache Airflow to schedule, monitor, and automate data workflows. o Develop and manage DAGs (Directed Acyclic Graphs) in Airflow to orchestrate complex data processing tasks. 5. Data Processing o Develop and maintain data processing scripts using SQL and Apache Spark. o Optimize data processing for performance and efficiency. 6. Version Control o Use Git for version control, collaborating with the team to manage the codebase and track changes. o Ensure best practices in code quality and repository management. 7. Continuous Improvement o Keep up to date with the latest developments in data engineering and related technologies. o Continuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability. Skills and Qualifications: Bachelor's degree in Computer Science, Engineering, or a related field. Proficient in Git for version control and collaborative development. Proficiency in SQL and experience with database technologies. Experience in data pipeline tools such as Apache Airflow. Strong knowledge of Apache Spark for data processing and transformation. Experience with data migration and validation techniques. Knowledge of data governance and security practices. Strong problem-solving skills and the ability to work independently and in a team. Ability to communicate with global team Ability to work as a team in high performing environment.
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Aliases — catalog
- Apache Airflow (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Tool
- Sub-category
- Workflow Orchestration Tool
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2015
- Confidence
- 0.98
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Frequently listed in data engineering JDs and widely adopted for workflow orchestration; strong GitHub activity and managed offerings from AWS/GCP/Azure signal broad market demand.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 13
- Sub-category id
- 130
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Data Pipeline Orchestration Catalog dimension db id 23
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Data Pipeline Orchestration
data-pipeline-orchestration
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- SQL (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Language
- Sub-category
- Query Language
- Vendor
- ANSI
- License
- unknown
- Year introduced
- 1974
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: SQL appears in a large share of data, backend, and analytics job descriptions and remains the default query language for PostgreSQL, MySQL, and cloud warehouses like Snowflake/BigQuery.
Skill profile (library / DB)
- Skill nature
- LANGUAGE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 6
- Sub-category id
- 97
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Pega Programming Languages & DSLs Catalog dimension db id 267
Library dimension (catalog)
Roles linked in library: Pega Developer
-
Programming Languages for Data Work Catalog dimension db id 21
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Pega Programming Languages & DSLs
pega-programming-languages-dsls
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Aliases — catalog
- Apache Spark (CANONICAL)
- apache spark 3 (VERSION)
- spark (VERSION)
- spark 3 (VERSION)
- spark 3.x (VERSION)
- spark3 (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Framework
- Sub-category
- Distributed Data Processing Framework
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2010
- Confidence
- 0.94
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 3.x
Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.
Skill profile (library / DB)
- Skill nature
- FRAMEWORK
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 5
- Sub-category id
- 1021
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
ETL and ELT Tooling Catalog dimension db id 24
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Aliases — catalog
- Git (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Tool
- Sub-category
- Version Control Tool
- Vendor
- Linus Torvalds
- License
- gpl_v2
- Year introduced
- 2005
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Git is a hiring-pipeline staple: it appears in the vast majority of software engineering job descriptions and is the default VCS on GitHub/GitLab/Bitbucket.
Skill profile (library / DB)
- Skill nature
- TOOL
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 13
- Sub-category id
- 730
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PRACTICE
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PRACTICE
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Anomaly detection (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Concept
- Sub-category
- Ml Monitoring Concept
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Common in ML/observability job descriptions and vendor docs (Datadog, Splunk, AWS, Azure) for fraud, monitoring, and alerting; broad market adoption across production systems.
Skill profile (library / DB)
- Skill nature
- CONCEPT
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 2
- Sub-category id
- 1117
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Data Quality and Reconciliation Catalog dimension db id 27
Library dimension (catalog)
Roles linked in library: Data Engineer
-
Model Monitoring and Drift Detection Catalog dimension db id 45
Library dimension (catalog)
Roles linked in library: ML Engineer, MLOps Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Data Quality and Reconciliation
data-quality-and-reconciliation
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
|
Model Monitoring and Drift Detection
model-monitoring-and-drift-detection
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Security Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
All API 3 persistence rows
Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.
| Skill | Tag | Dimension | Skill↔dim | Role↔dim | Outcome | Notes |
|---|---|---|---|---|---|---|
| Apache Airflow | in_db |
Data Pipeline Orchestration
data-pipeline-orchestration
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| SQL | in_db |
Pega Programming Languages & DSLs
pega-programming-languages-dsls
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| SQL | in_db |
Programming Languages for Data Work
programming-languages-for-data-work
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Apache Spark | in_db |
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Git | in_db |
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Anomaly Detection | in_db |
Data Quality and Reconciliation
data-quality-and-reconciliation
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Anomaly Detection | in_db |
Model Monitoring and Drift Detection
model-monitoring-and-drift-detection
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Library artifacts (this run)
| Kind | Detail | DB id |
|---|---|---|
| canonical_skill_proposed | DAGs | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Migration | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Validation | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Governance | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Security | type=Security Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR |
nano JD Parser — gpt-4.1-nano click to toggle
Show raw JSON
{
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [
{
"level": "Bachelor\u0027s",
"qualification": "BTECH/BE - Computer Science (or related)",
"raw": "Bachelor\u0027s degree in Computer Science, Engineering, or a related field.",
"requirement": "required"
}
],
"experience": null,
"job_locations": [],
"role": "Data Engineer II",
"role_aliases": [
"Data Engineer",
"Data Engineer II",
"Data Pipeline Engineer"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 7,
"heading": "Key Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "1. Workflow Deprecation",
"last_5_words": "and reliability."
},
"text": "1. Workflow Deprecation\n\n\nPlan and execute the deprecation of migrated workflows by evaluating current workflows\u0027 dependencies and consumption.\nUtilize tools and best practices to identify, mark, and communicate deprecated workflows to stakeholders.\n\n\n2. Data Migration\n\n\nPlan and execute data migration tasks to move data between different storage systems or formats.\nEnsure the accuracy and completeness of data during migration processes.\nImplement strategies to accelerate the pace of data migration by backfilling, validating, and making new data assets ready for use.\n\n\n3. Data Validation\n\n\nDefine and implement data validation rules to ensure data accuracy, completeness, and reliability.\nUtilize data validation solutions and anomaly detection methods to monitor data quality.\n\n\n4. Workflow Management\n\n\nUse Apache Airflow to schedule, monitor, and automate data workflows.\nDevelop and manage DAGs (Directed Acyclic Graphs) in Airflow to orchestrate complex data processing tasks.\n\n\n5. Data Processing\n\n\nDevelop and maintain data processing scripts using SQL and Apache Spark.\nOptimize data processing for performance and efficiency.\n\n\n6. Version Control\n\n\nUse Git for version control, collaborating with the team to manage the codebase and track changes.\nEnsure best practices in code quality and repository management.\n\n\n7. Continuous Improvement\n\n\nKeep up to date with the latest developments in data engineering and related technologies.\nContinuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability.",
"word_count": 366
},
{
"bullet_count": 9,
"heading": "Skills and Qualifications",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Proficient in Git for version",
"last_5_words": "high performing environment."
},
"text": "\uf0b7 Proficient in Git for version control and collaborative development.\n\uf0b7 Proficiency in SQL and experience with database technologies.\n\uf0b7 Experience in data pipeline tools such as Apache Airflow.\n\uf0b7 Strong knowledge of Apache Spark for data processing and transformation.\n\uf0b7 Experience with data migration and validation techniques.\n\uf0b7 Knowledge of data governance and security practices.\n\uf0b7 Strong problem-solving skills and the ability to work independently and in a team.\n\uf0b7 Ability to communicate with global team\n\uf0b7 Ability to work as a team in high performing environment.",
"word_count": 81
}
],
"urls": []
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "Apache Airflow"
},
{
"is_primary": true,
"skill_name": "DAGs"
},
{
"is_primary": true,
"skill_name": "SQL"
},
{
"is_primary": true,
"skill_name": "Apache Spark"
},
{
"is_primary": true,
"skill_name": "Git"
},
{
"is_primary": true,
"skill_name": "Data Migration"
},
{
"is_primary": true,
"skill_name": "Data Validation"
},
{
"is_primary": false,
"skill_name": "Anomaly Detection"
},
{
"is_primary": false,
"skill_name": "Data Governance"
},
{
"is_primary": false,
"skill_name": "Data Security"
}
],
"jd_role": {
"display_name": "Data Engineer II",
"rationale": null,
"role_aliases": [
"Data Engineer",
"Data Engineer II",
"Data Pipeline Engineer"
],
"role_archetype": "Data",
"slug": ""
},
"nano_parsed": {
"JD_type": "pass",
"about_company": null,
"certifications": [],
"company_name": null,
"ctc": null,
"domain": {
"primary": {
"aliases": [],
"domain": "Other"
},
"secondary": null
},
"education": [
{
"level": "Bachelor\u0027s",
"qualification": "BTECH/BE - Computer Science (or related)",
"raw": "Bachelor\u0027s degree in Computer Science, Engineering, or a related field.",
"requirement": "required"
}
],
"experience": null,
"job_locations": [],
"role": "Data Engineer II",
"role_aliases": [
"Data Engineer",
"Data Engineer II",
"Data Pipeline Engineer"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 7,
"heading": "Key Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "1. Workflow Deprecation",
"last_5_words": "and reliability."
},
"text": "1. Workflow Deprecation\n\n\nPlan and execute the deprecation of migrated workflows by evaluating current workflows\u0027 dependencies and consumption.\nUtilize tools and best practices to identify, mark, and communicate deprecated workflows to stakeholders.\n\n\n2. Data Migration\n\n\nPlan and execute data migration tasks to move data between different storage systems or formats.\nEnsure the accuracy and completeness of data during migration processes.\nImplement strategies to accelerate the pace of data migration by backfilling, validating, and making new data assets ready for use.\n\n\n3. Data Validation\n\n\nDefine and implement data validation rules to ensure data accuracy, completeness, and reliability.\nUtilize data validation solutions and anomaly detection methods to monitor data quality.\n\n\n4. Workflow Management\n\n\nUse Apache Airflow to schedule, monitor, and automate data workflows.\nDevelop and manage DAGs (Directed Acyclic Graphs) in Airflow to orchestrate complex data processing tasks.\n\n\n5. Data Processing\n\n\nDevelop and maintain data processing scripts using SQL and Apache Spark.\nOptimize data processing for performance and efficiency.\n\n\n6. Version Control\n\n\nUse Git for version control, collaborating with the team to manage the codebase and track changes.\nEnsure best practices in code quality and repository management.\n\n\n7. Continuous Improvement\n\n\nKeep up to date with the latest developments in data engineering and related technologies.\nContinuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability.",
"word_count": 366
},
{
"bullet_count": 9,
"heading": "Skills and Qualifications",
"heading_was_present": true,
"source_marker": {
"first_5_words": "Proficient in Git for version",
"last_5_words": "high performing environment."
},
"text": "\uf0b7 Proficient in Git for version control and collaborative development.\n\uf0b7 Proficiency in SQL and experience with database technologies.\n\uf0b7 Experience in data pipeline tools such as Apache Airflow.\n\uf0b7 Strong knowledge of Apache Spark for data processing and transformation.\n\uf0b7 Experience with data migration and validation techniques.\n\uf0b7 Knowledge of data governance and security practices.\n\uf0b7 Strong problem-solving skills and the ability to work independently and in a team.\n\uf0b7 Ability to communicate with global team\n\uf0b7 Ability to work as a team in high performing environment.",
"word_count": 81
}
],
"urls": []
},
"rejected": false,
"rejection_reason": null,
"run_id": "5c415ce7-e9d4-4ca3-97c6-e28132bfcdbe",
"stage3_signals": {
"alias_found": true,
"alias_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
}
],
"kra_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": [
{
"kra_text": "Implements data quality validation rules, reconciliation checks, and anomaly detection to ensure data completeness, accuracy, and consistency.",
"sentence": "Define and implement data validation rules to ensure data accuracy, completeness, and reliability.",
"similarity": 0.7516
},
{
"kra_text": "Implements data quality validation rules, reconciliation checks, and anomaly detection to ensure data completeness, accuracy, and consistency.",
"sentence": "Utilize data validation solutions and anomaly detection methods to monitor data quality.",
"similarity": 0.7061
},
{
"kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
"sentence": "\uf0b7 Experience in data pipeline tools such as Apache Airflow.",
"similarity": 0.6936
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 0.7171,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "React Native Developer",
"kra_matches": [
{
"kra_text": "maintain code quality",
"sentence": "Ensure best practices in code quality and repository management.",
"similarity": 0.7165
},
{
"kra_text": "support offline-aware data flow",
"sentence": "Implement strategies to accelerate the pace of data migration by backfilling, validating, and making new data assets ready for use.",
"similarity": 0.4426
},
{
"kra_text": "maintain code quality",
"sentence": "Continuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability.",
"similarity": 0.4355
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 73,
"score": 0.5315,
"slug": "react-native-developer",
"total_count": null
},
{
"display_name": "Fullstack Developer",
"kra_matches": [
{
"kra_text": "Optimizes application performance from database query efficiency through API response latency to frontend rendering speed and bundle size.",
"sentence": "Optimize data processing for performance and efficiency.",
"similarity": 0.5783
},
{
"kra_text": "Delivers features through CI/CD pipelines using automated tests, staged rollouts, feature flags, and incremental deployments.",
"sentence": "Continuously improve and refactor data pipelines, tooling, and processes to enhance performance and reliability.",
"similarity": 0.5286
},
{
"kra_text": "Designs and queries relational databases like PostgreSQL and document stores like MongoDB, writing migrations, indexes, and optimized queries.",
"sentence": "\uf0b7 Proficiency in SQL and experience with database technologies.",
"similarity": 0.4813
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 15,
"score": 0.5294,
"slug": "full-stack-engineer",
"total_count": null
},
{
"display_name": "Java Backend Developer",
"kra_matches": [
{
"kra_text": "backend performance tuning",
"sentence": "Optimize data processing for performance and efficiency.",
"similarity": 0.5833
},
{
"kra_text": "code refactoring and defect fixes",
"sentence": "Ensure best practices in code quality and repository management.",
"similarity": 0.5049
},
{
"kra_text": "request validation and error handling",
"sentence": "Define and implement data validation rules to ensure data accuracy, completeness, and reliability.",
"similarity": 0.4997
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 79,
"score": 0.5293,
"slug": "java-backend-developer",
"total_count": null
},
{
"display_name": "Scala Backend Developer",
"kra_matches": [
{
"kra_text": "business rule and validation logic",
"sentence": "Define and implement data validation rules to ensure data accuracy, completeness, and reliability.",
"similarity": 0.539
},
{
"kra_text": "performance and reliability tuning",
"sentence": "Optimize data processing for performance and efficiency.",
"similarity": 0.5095
},
{
"kra_text": "backend workflow orchestration",
"sentence": "Develop and manage DAGs (Directed Acyclic Graphs) in Airflow to orchestrate complex data processing tasks.",
"similarity": 0.4915
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 87,
"score": 0.5133,
"slug": "scala-backend-developer",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": 3,
"matched_skills": [
"Apache Airflow",
"Apache Spark",
"SQL"
],
"role_id": 2,
"score": 0.4286,
"slug": "data-engineer",
"total_count": 7
},
{
"display_name": "Pega Developer",
"kra_matches": null,
"matched_count": 1,
"matched_skills": [
"SQL"
],
"role_id": 24,
"score": 0.1429,
"slug": "pega-developer",
"total_count": 7
}
]
},
"stage4_decision": {
"alias_collision_detected": false,
"case": "A",
"chosen_role": {
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
},
"confidence": 1.0,
"is_new_role": false,
"llm2_fired": false,
"llm2_reasoning": null,
"matched_dimensions": [],
"matched_kras": [],
"matched_skills": [],
"new_role_display_name": null,
"new_role_slug": null,
"queued": false,
"reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.43 does not contradict",
"sub_role": null
},
"stage5_updates": {
"centroid_n_after": 164,
"centroid_updated": true,
"collision_log_id": null,
"new_kra_attached": null,
"new_skills_attached": [
{
"is_primary": true,
"queue_id": 8628,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "DAGs",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 8629,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Migration",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 8630,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Validation",
"status": "pending"
},
{
"is_primary": false,
"queue_id": 8631,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Governance",
"status": "pending"
},
{
"is_primary": false,
"queue_id": 8632,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Security",
"status": "pending"
}
],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{
"alias_matches": [
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 304,
"existing_alias_text": "Apache Airflow",
"input_term": "Apache Airflow",
"matched_canonical": {
"category_id": 13,
"display_name": "Apache Airflow",
"id": 110,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "apache-airflow",
"sub_category_id": 130,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 271,
"existing_alias_text": "SQL",
"input_term": "SQL",
"matched_canonical": {
"category_id": 6,
"display_name": "SQL",
"id": 101,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "sql",
"sub_category_id": 97,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2004,
"existing_alias_text": "Apache Spark",
"input_term": "Apache Spark",
"matched_canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 1613,
"existing_alias_text": "Git",
"input_term": "Git",
"matched_canonical": {
"category_id": 13,
"display_name": "Git",
"id": 1002,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "git",
"sub_category_id": 730,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 338,
"existing_alias_text": "Anomaly detection",
"input_term": "Anomaly Detection",
"matched_canonical": {
"category_id": 2,
"display_name": "Anomaly detection",
"id": 134,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CONCEPT",
"slug": "anomaly-detection",
"sub_category_id": 1117,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
}
],
"candidate_roles": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "Pega Developer",
"id": 24,
"rationale": null,
"role_archetype": null,
"slug": "pega-developer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
],
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.43 does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Data Pipeline Orchestration",
"id": 23,
"rationale": "Workflow engines that schedule, coordinate, and recover batch data jobs. This cluster covers dependency management, retries, backfills, sensors, and operational control of pipeline DAGs.",
"slug": "data-pipeline-orchestration",
"source": "db"
},
"input_skill": "Apache Airflow",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Pega Programming Languages \u0026 DSLs",
"id": 267,
"rationale": "Programming languages and domain-specific languages used in Pega development.",
"slug": "pega-programming-languages-dsls",
"source": "db"
},
"input_skill": "SQL",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Pega Developer",
"id": 24,
"rationale": null,
"role_archetype": null,
"slug": "pega-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "SQL",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Apache Spark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Git",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Data Quality and Reconciliation",
"id": 27,
"rationale": "Validation and reconciliation practices that ensure data is accurate, complete, and trustworthy. This includes rule-based checks, anomaly detection, cross-system reconciliation, and failure triage.",
"slug": "data-quality-and-reconciliation",
"source": "db"
},
"input_skill": "Anomaly Detection",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Model Monitoring and Drift Detection",
"id": 45,
"rationale": "Production observability for model behavior, data drift, concept drift, latency, and quality regressions. ML engineers use this to detect degradation and trigger remediation or retraining.",
"slug": "model-monitoring-and-drift-detection",
"source": "db"
},
"input_skill": "Anomaly Detection",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
]
}
],
"input_final_skills": [
"Apache Airflow",
"DAGs",
"SQL",
"Apache Spark",
"Git",
"Data Migration",
"Data Validation",
"Anomaly Detection",
"Data Governance",
"Data Security"
],
"input_llm_skills": [
"Apache Airflow",
"DAGs",
"SQL",
"Apache Spark",
"Git",
"Data Migration",
"Data Validation",
"Anomaly Detection",
"Data Governance",
"Data Security"
],
"new_aliases_persisted": 0,
"run_id": "5c415ce7-e9d4-4ca3-97c6-e28132bfcdbe",
"skills_detail": [
{
"aliases_in_db": [
{
"alias_text": "Apache Airflow",
"alias_type": "CANONICAL",
"id": 304,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 13,
"display_name": "Apache Airflow",
"id": 110,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "apache-airflow",
"sub_category_id": 130,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Data Pipeline Orchestration",
"id": 23,
"rationale": "Workflow engines that schedule, coordinate, and recover batch data jobs. This cluster covers dependency management, retries, backfills, sensors, and operational control of pipeline DAGs.",
"slug": "data-pipeline-orchestration",
"source": "db"
},
"input_skill": "Apache Airflow",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Apache Airflow",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "DAGs",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "dags",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "SQL",
"alias_type": "CANONICAL",
"id": 271,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 6,
"display_name": "SQL",
"id": 101,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "LANGUAGE",
"slug": "sql",
"sub_category_id": 97,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Pega Programming Languages \u0026 DSLs",
"id": 267,
"rationale": "Programming languages and domain-specific languages used in Pega development.",
"slug": "pega-programming-languages-dsls",
"source": "db"
},
"input_skill": "SQL",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Pega Developer",
"id": 24,
"rationale": null,
"role_archetype": null,
"slug": "pega-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"input_skill": "SQL",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "SQL",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Apache Spark",
"alias_type": "CANONICAL",
"id": 2004,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "apache spark 3",
"alias_type": "VERSION",
"id": 2006,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark",
"alias_type": "VERSION",
"id": 2510,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3",
"alias_type": "VERSION",
"id": 2007,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3.x",
"alias_type": "VERSION",
"id": 2009,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark3",
"alias_type": "VERSION",
"id": 2008,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Apache Spark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Apache Spark",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Git",
"alias_type": "CANONICAL",
"id": 1613,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 13,
"display_name": "Git",
"id": 1002,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "TOOL",
"slug": "git",
"sub_category_id": 730,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Git",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Git",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Migration",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PRACTICE",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-migration",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Validation",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PRACTICE",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-validation",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Anomaly detection",
"alias_type": "CANONICAL",
"id": 338,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 2,
"display_name": "Anomaly detection",
"id": 134,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CONCEPT",
"slug": "anomaly-detection",
"sub_category_id": 1117,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Data Quality and Reconciliation",
"id": 27,
"rationale": "Validation and reconciliation practices that ensure data is accurate, complete, and trustworthy. This includes rule-based checks, anomaly detection, cross-system reconciliation, and failure triage.",
"slug": "data-quality-and-reconciliation",
"source": "db"
},
"input_skill": "Anomaly Detection",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Model Monitoring and Drift Detection",
"id": 45,
"rationale": "Production observability for model behavior, data drift, concept drift, latency, and quality regressions. ML engineers use this to detect degradation and trigger remediation or retraining.",
"slug": "model-monitoring-and-drift-detection",
"source": "db"
},
"input_skill": "Anomaly Detection",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
]
}
],
"input_skill": "Anomaly Detection",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Governance",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-governance",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Security",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Security Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-security",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
}
],
"unmatched_skills": [
"DAGs",
"Data Migration",
"Data Validation",
"Data Governance",
"Data Security"
]
}
API 3 — final-role-output
{
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.43 does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"chosen_role_resolution": "in_db",
"final_input_skills": [
{
"skill": "Apache Airflow",
"tag": "in_db"
},
{
"skill": "DAGs",
"tag": "new"
},
{
"skill": "SQL",
"tag": "in_db"
},
{
"skill": "Apache Spark",
"tag": "in_db"
},
{
"skill": "Git",
"tag": "in_db"
},
{
"skill": "Data Migration",
"tag": "new"
},
{
"skill": "Data Validation",
"tag": "new"
},
{
"skill": "Anomaly Detection",
"tag": "in_db"
},
{
"skill": "Data Governance",
"tag": "new"
},
{
"skill": "Data Security",
"tag": "new"
}
],
"llm_cost_api1_usd": null,
"llm_cost_api2_usd": null,
"llm_cost_api3_usd": null,
"llm_cost_total_usd": null,
"persistence": {
"items": [
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Data Pipeline Orchestration",
"id": 23,
"rationale": "Workflow engines that schedule, coordinate, and recover batch data jobs. This cluster covers dependency management, retries, backfills, sensors, and operational control of pipeline DAGs.",
"slug": "data-pipeline-orchestration",
"source": "db"
},
"dimension_id": 23,
"input_skill": "Apache Airflow",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 110,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Pega Programming Languages \u0026 DSLs",
"id": 267,
"rationale": "Programming languages and domain-specific languages used in Pega development.",
"slug": "pega-programming-languages-dsls",
"source": "db"
},
"dimension_id": 267,
"input_skill": "SQL",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Pega Developer",
"id": 24,
"rationale": null,
"role_archetype": null,
"slug": "pega-developer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 101,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Programming Languages for Data Work",
"id": 21,
"rationale": "Languages used to implement data pipelines, transformations, and operational glue. This is the primary coding surface for building ingestion, enrichment, and automation logic in data engineering.",
"slug": "programming-languages-for-data-work",
"source": "db"
},
"dimension_id": 21,
"input_skill": "SQL",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 101,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"dimension_id": 24,
"input_skill": "Apache Spark",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1350,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Git",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1002,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Data Quality and Reconciliation",
"id": 27,
"rationale": "Validation and reconciliation practices that ensure data is accurate, complete, and trustworthy. This includes rule-based checks, anomaly detection, cross-system reconciliation, and failure triage.",
"slug": "data-quality-and-reconciliation",
"source": "db"
},
"dimension_id": 27,
"input_skill": "Anomaly Detection",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 134,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Model Monitoring and Drift Detection",
"id": 45,
"rationale": "Production observability for model behavior, data drift, concept drift, latency, and quality regressions. ML engineers use this to detect degradation and trigger remediation or retraining.",
"slug": "model-monitoring-and-drift-detection",
"source": "db"
},
"dimension_id": 45,
"input_skill": "Anomaly Detection",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 134,
"skill_tag": "in_db",
"skipped_reason": null
}
],
"new_skills_created": 0,
"role_dimension_saved": 0,
"skill_dimension_saved": 0,
"skipped": 0
},
"planner_output": null,
"run_id": "5c415ce7-e9d4-4ca3-97c6-e28132bfcdbe"
}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.