Pipeline run
ad4757bc-ccab-4012-be2f-748083a72f78
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA descriptionvocab breakdown (legacy)
Signals
Post-classification
Captured for admin review
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Data Engineer
CASE Aslug: data-engineer · id: 2 · source: db
Exact alias hit on data-engineer (1.0) — no other alias at this confidence; skill_top cloud-architect 0.11 does not contradict
Resolution:
in_db
— role exists in library; skill↔dim and role↔dim links saved when applicable.
Job description
Comcast brings together the best in media and technology. We drive innovation to create the world's best entertainment and online experiences. As a Fortune 50 leader, we set the pace in a variety of innovative and fascinating businesses and create career opportunities across a wide range of locations and disciplines. We are at the forefront of change and move at an amazing pace, thanks to our remarkable people, who bring cutting-edge products and services to life for millions of customers every day. If you share in our passion for teamwork, our vision to revolutionize industries and our goal to lead the future in media and technology, we want you to fast-forward your career at Comcast. Job Summary We are looking for a savvy Data Engineer 2 to join our growing engineering team in Freewheel, CIEC. If you’re excited to work with a tightly-knit team of data engineers solving hard problems the right way using cutting-edge data collection, transformation, analysis, and monitoring tools in the cloud, this opportunity is for you. Our data engineering team works with huge viewing datasets from several sources to help the world’s largest programmers, measurement partners, and networks understand media consumption. We build and maintain high-quality data solutions to process terabytes viewing data on the state-of-the-art cloud-native data platform using AWS. Responsible for designing, building and overseeing the deployment and operation of technology architecture, solutions and software to capture, manage, store and utilize structured and unstructured data from internal and external sources. Establishes and builds processes and structures based on business and technical requirements to channel data from multiple inputs, route appropriately and store using any combination of distributed (cloud) structures, local databases, and other applicable storage forms as required. Develops technical tools and programming that leverage artificial intelligence, machine learning and big-data techniques to cleanse, organize and transform data and to maintain, defend and update data structures and integrity on an automated basis. Creates and establishes design standards and assurance processes for software, systems and applications development to ensure compatibility and operability of data connections, flows and storage requirements. Reviews internal and external business and product requirements for data operations and activity and suggests changes and upgrades to systems and storage to accommodate ongoing needs. Work with data modelers/analysts to understand the business problems they are trying to solve then create or augment data assets to feed their analysis. Works with moderate guidance in own area of knowledge. Job Description Core Responsibilities • Develops data structures and pipelines aligned to established standards and guidelines to organize, collect, standardize and transform data that helps generate insights and address reporting needs. • Focuses on ensuring data quality during ingest, processing as well as final load to the target tables. • Creates standard ingestion frameworks for structured and unstructured data as well as checking and reporting on the quality of the data being processed. • Creates standard methods for end users / downstream applications to consume data including but not limited to database views, extracts and Application Programming Interfaces. • Develops and maintains information systems (e.g., data warehouses, data lakes) including data access Application Programming Interfaces. • Participates in the implementation of solutions via data architecture, data engineering, or data manipulation on both on-prem platforms like Kubernetes and Teradata as well as Cloud platforms like Databricks. • Determines the appropriate storage platform across different on-prem (minIO and Teradata) and Cloud (AWS S3, Redshift) depending on the privacy, access and sensitivity requirements. • Understands the data lineage from source to the final semantic layer along with the transformation rules applied to enable faster troubleshooting and impact analysis during changes. • Collaborates with technology and platform management partners to optimize data sourcing and processing rules to ensure appropriate data quality as well as process optimization. • Handles data migrations/conversions as data platforms evolve and new standards are defined. • Preemptively recognizes and resolves technical issues utilizing knowledge of policies and processes. • Understands the data sensitivity, customer data privacy rules and regulations and applies them consistently in all Information Lifecycle Management activities. • Identifies and reacts to system notification and log to ensure quality standards for databases and applications. Solves abstract problems beyond single development language or situation by reusing data file and flags already set. • Solves critical issues and shares knowledge such as trends, aggregate, quantity volume regarding specific data sources. • Consistent exercise of independent judgment and discretion in matters of significance. • Regular, consistent and punctual attendance. Must be able to work nights and weekends, variable schedule(s) as necessary. • Other duties and responsibilities as assigned. Employees At All Levels Are Expected To • Understand our Operating Principles; make them the guidelines for how you do your job. • Own the customer experience - think and act in ways that put our customers first, give them seamless digital options at every touchpoint, and make them promoters of our products and services. • Know your stuff - be enthusiastic learners, users and advocates of our game-changing technology, products and services, especially our digital tools and experiences. • Win as a team - make big things happen by working together and being open to new ideas. • Be an active part of the Net Promoter System - a way of working that brings more employee and customer feedback into the company - by joining huddles, making call backs and helping us elevate opportunities to do better for our customers. • Drive results and growth. • Respect and promote inclusion & diversity. • Do what's right for each other, our customers, investors and our communities. Disclaimer:This information has been designed to indicate the general nature and level of work performed by employees in this role. It is not designed to contain or be interpreted as a comprehensive inventory of all duties, responsibilities and qualifications. Comcast is proud to be an equal opportunity workplace. We will consider all qualified applicants for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, veteran status, genetic information, or any other basis protected by applicable law. Education Bachelor's Degree While possessing the stated degree is preferred, Comcast also may consider applicants who hold some combination of coursework and experience, or who have extensive related professional experience. Relevant Work Experience 2-5 Years Base pay is one part of the Total Rewards that Comcast provides to compensate and recognize employees for their work. Most sales positions are eligible for a Commission under the terms of an applicable plan, while most non-sales positions are eligible for a Bonus. Additionally, Comcast provides best-in-class Benefits. We believe that benefits should connect you to the support you need when it matters most, and should help you care for those who matter most. That’s why we provide an array of options, expert guidance and always-on tools, that are personalized to meet the needs of your reality – to help support you physically, financially and emotionally through the big milestones and in your everyday life. Please visit the compensation and benefits summary on our careers site for more details.
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PRACTICE
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PRACTICE
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PRACTICE
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Databases
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Web Frameworks
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Databases
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Data Lakes (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Architecture
- Sub-category
- Data Lake Architecture
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Data lakes are widely listed in cloud/data platform job descriptions and are a standard architecture in AWS, Azure, and GCP ecosystems; they’re a common hiring-pipeline staple rather than a niche pattern.
Skill profile (library / DB)
- Skill nature
- PATTERN
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 1
- Sub-category id
- 1025
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Storage and Data Services Catalog dimension db id 144
Library dimension (catalog)
Roles linked in library: Cloud Architect
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- PRACTICE
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Kubernetes (CANONICAL) primary
- Kubernetes 1.0+ (VERSION)
- Kubernetes 1.x (VERSION)
- Kubernetes v1 (VERSION)
- k8s (VERSION)
- kubernetes 1.x (VERSION)
- kubernetes latest (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Platform
- Sub-category
- Container Orchestration Platform
- Vendor
- Cloud Native Computing Foundation
- License
- apache_2
- Year introduced
- 2014
- Confidence
- 0.90
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 1.30
Maturity reasoning: Broadly adopted in cloud-native stacks; Kubernetes appears in a large share of DevOps/SRE job descriptions and is the default orchestration platform across major cloud vendors.
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 9
- Sub-category id
- 557
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Container Orchestration Platforms Catalog dimension db id 134
Library dimension (catalog)
Roles linked in library: Cloud Architect, DevOps Engineer
-
Kubernetes for ML Workloads Catalog dimension db id 47
Library dimension (catalog)
Roles linked in library: ML Engineer, MLOps Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Container Orchestration Platforms
container-orchestration-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Kubernetes for ML Workloads
kubernetes-for-ml-workloads
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Databases
- Sub-category
- general
- Skill nature
- TOOL
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- Databricks (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Platform
- Sub-category
- Data Analytics Platform
- Vendor
- Databricks, Inc.
- License
- other_open
- Year introduced
- 2013
- Confidence
- 0.97
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Databricks appears frequently in data engineering and analytics job postings, especially alongside Spark, Delta Lake, and lakehouse stacks; strong vendor adoption and broad enterprise usage signal mainstream demand.
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 9
- Sub-category id
- 911
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- TOOL
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Aliases — catalog
- AWS S3 (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Service
- Sub-category
- Object Storage Service
- Vendor
- Amazon Web Services
- License
- proprietary
- Year introduced
- 2006
- Confidence
- 0.99
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: AWS S3 is a core cloud storage service routinely listed in cloud/data engineering JDs and remains a standard AWS offering with broad ecosystem support; no vendor sunset or replacement signal exists.
Skill profile (library / DB)
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 11
- Sub-category id
- 120
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Platforms & Hosting Providers Catalog dimension db id 278
Library dimension (catalog)
Roles linked in library: .NET Backend Developer, Kotlin Backend Developer, Scala Backend Developer, Web Developer
-
Cloud Platforms & Managed Services Catalog dimension db id 221
Library dimension (catalog)
Roles linked in library: Fullstack Developer, Go Backend Developer, Node.js Backend Developer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Platforms & Hosting Providers
cloud-platforms-hosting-providers
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
|
Cloud Platforms & Managed Services
cloud-platforms-managed-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Amazon Redshift (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Service
- Sub-category
- Data Warehouse Service
- Vendor
- Amazon Web Services
- License
- proprietary
- Year introduced
- 2012
- Confidence
- 0.97
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Commonly listed in data/analytics job descriptions and widely used as AWS’s managed warehouse; strong vendor adoption and steady JD volume signal broad market demand.
Skill profile (library / DB)
- Skill nature
- CLOUD_SERVICE
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 11
- Sub-category id
- 118
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Cloud Data Warehouses Catalog dimension db id 22
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Cloud Data Warehouses
cloud-data-warehouses
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Data Engineering Tools
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- MEDIUM
- Typical lifespan
- MULTI_YEAR
- Version strategy
- UNVERSIONED
Skill enrichment (orchestrator / LLM)
No Stage 7 enrichment blob on this skill (orchestrator skipped enrichment).
- Category
- Databases
- Sub-category
- general
- Skill nature
- CONCEPT
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Version strategy
- UNVERSIONED
Aliases — catalog
- log analysis (CANONICAL) primary
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Methodology
- Sub-category
- Log Analysis Methodology
- Confidence
- 0.90
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Common in SRE/DevOps JDs and incident-response roles; vendors like Splunk, Datadog, and ELK/Elastic market log analysis as a core observability capability, indicating broad hiring demand.
Skill profile (library / DB)
- Skill nature
- METHODOLOGY
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 8
- Sub-category id
- 3297
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
Sitecore Troubleshooting and Maintenance Catalog dimension db id 447
Library dimension (catalog)
Roles linked in library: Sitecore Dev
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
Sitecore Troubleshooting and Maintenance
sitecore-troubleshooting-and-maintenance
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
All API 3 persistence rows
Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.
| Skill | Tag | Dimension | Skill↔dim | Role↔dim | Outcome | Notes |
|---|---|---|---|---|---|---|
| Data Lakes | in_db |
Cloud Storage and Data Services
cloud-storage-and-data-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Data Lakes | in_db |
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Kubernetes | in_db |
Container Orchestration Platforms
container-orchestration-platforms
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Kubernetes | in_db |
Kubernetes for ML Workloads
kubernetes-for-ml-workloads
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Databricks | in_db |
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| AWS S3 | in_db |
Cloud Platforms & Hosting Providers
cloud-platforms-hosting-providers
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| AWS S3 | in_db |
Cloud Platforms & Managed Services
cloud-platforms-managed-services
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Amazon Redshift | in_db |
Cloud Data Warehouses
cloud-data-warehouses
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved | |
| Log Analysis | in_db |
Sitecore Troubleshooting and Maintenance
sitecore-troubleshooting-and-maintenance
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Library artifacts (this run)
| Kind | Detail | DB id |
|---|---|---|
| canonical_skill_proposed | Data Engineering | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Pipelines | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Quality | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Ingestion | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Database Views | type=Databases subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Application Programming Interfaces | type=Web Frameworks subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Warehouses | type=Databases subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Architecture | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Manipulation | type=Data Engineering Tools subtype=general nature=PRACTICE lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Teradata | type=Databases subtype=general nature=TOOL lifespan=MULTI_YEAR | |
| canonical_skill_proposed | minIO | type=Data Engineering Tools subtype=general nature=TOOL lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Data Lineage | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Information Lifecycle Management | type=Data Engineering Tools subtype=general nature=CONCEPT lifespan=MULTI_YEAR | |
| canonical_skill_proposed | Database | type=Databases subtype=general nature=CONCEPT lifespan=EVERGREEN |
nano JD Parser — gpt-4.1-nano click to toggle
Show raw JSON
{
"JD_type": "pass",
"about_company": {
"source_marker": {
"first_5_words": "Comcast brings together the best",
"last_5_words": "your career at Comcast."
},
"text": "Comcast brings together the best in media and technology. We drive innovation to create the world\u0027s best entertainment and online experiences. As a Fortune 50 leader, we set the pace in a variety of innovative and fascinating businesses and create career opportunities across a wide range of locations and disciplines. We are at the forefront of change and move at an amazing pace, thanks to our remarkable people, who bring cutting-edge products and services to life for millions of customers every day. If you share in our passion for teamwork, our vision to revolutionize industries and our goal to lead the future in media and technology, we want you to fast-forward your career at Comcast.",
"word_count": 84
},
"certifications": [],
"company_name": "Comcast",
"ctc": null,
"domain": {
"primary": {
"aliases": [
"Media",
"Entertainment"
],
"domain": "Media \u0026 Entertainment"
},
"secondary": null
},
"education": [
{
"level": "Bachelor\u0027s",
"qualification": "Bachelor\u0027s - Any Discipline",
"raw": "Bachelor\u0027s Degree",
"requirement": "preferred"
}
],
"experience": {
"max": 5,
"min": 2,
"raw": "2-5 Years"
},
"job_locations": [
{
"aliases": [],
"city": "Freewheel",
"country": "USA",
"state": null,
"work_mode": "null"
}
],
"role": "Data Engineer 2",
"role_aliases": [
"Data Engineer",
"Data Engineer II",
"Data Developer"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 15,
"heading": "Core Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Develops data structures and",
"last_5_words": "and responsibilities as assigned."
},
"text": "\u2022 Develops data structures and pipelines aligned to established standards and guidelines to organize, collect, standardize and transform data that helps generate insights and address reporting needs.\n\u2022 Focuses on ensuring data quality during ingest, processing as well as final load to the target tables.\n\u2022 Creates standard ingestion frameworks for structured and unstructured data as well as checking and reporting on the quality of the data being processed.\n\u2022 Creates standard methods for end users / downstream applications to consume data including but not limited to database views, extracts and Application Programming Interfaces.\n\u2022 Develops and maintains information systems (e.g., data warehouses, data lakes) including data access Application Programming Interfaces.\n\u2022 Participates in the implementation of solutions via data architecture, data engineering, or data manipulation on both on-prem platforms like Kubernetes and Teradata as well as Cloud platforms like Databricks.\n\u2022 Determines the appropriate storage platform across different on-prem (minIO and Teradata) and Cloud (AWS S3, Redshift) depending on the privacy, access and sensitivity requirements.\n\u2022 Understands the data lineage from source to the final semantic layer along with the transformation rules applied to enable faster troubleshooting and impact analysis during changes.\n\u2022 Collaborates with technology and platform management partners to optimize data sourcing and processing rules to ensure appropriate data quality as well as process optimization.\n\u2022 Handles data migrations/conversions as data platforms evolve and new standards are defined.\n\u2022 Preemptively recognizes and resolves technical issues utilizing knowledge of policies and processes.\n\u2022 Understands the data sensitivity, customer data privacy rules and regulations and applies them consistently in all Information Lifecycle Management activities.\n\u2022 Identifies and reacts to system notification and log to ensure quality standards for databases and applications. Solves abstract problems beyond single development language or situation by reusing data file and flags already set.\n\u2022 Solves critical issues and shares knowledge such as trends, aggregate, quantity volume regarding specific data sources.\n\u2022 Consistent exercise of independent judgment and discretion in matters of significance.\n\u2022 Regular, consistent and punctual attendance. Must be able to work nights and weekends, variable schedule(s) as necessary.\n\u2022 Other duties and responsibilities as assigned.",
"word_count": 366
},
{
"bullet_count": 8,
"heading": "Employees At All Levels Are Expected To",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Understand our Operating Principles;",
"last_5_words": "and our communities."
},
"text": "\u2022 Understand our Operating Principles; make them the guidelines for how you do your job.\n\u2022 Own the customer experience - think and act in ways that put our customers first, give them seamless digital options at every touchpoint, and make them promoters of our products and services.\n\u2022 Know your stuff - be enthusiastic learners, users and advocates of our game-changing technology, products and services, especially our digital tools and experiences.\n\u2022 Win as a team - make big things happen by working together and being open to new ideas.\n\u2022 Be an active part of the Net Promoter System - a way of working that brings more employee and customer feedback into the company - by joining huddles, making call backs and helping us elevate opportunities to do better for our customers.\n\u2022 Drive results and growth.\n\u2022 Respect and promote inclusion \u0026 diversity.\n\u2022 Do what\u0027s right for each other, our customers, investors and our communities.",
"word_count": 134
}
],
"urls": []
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "Data Engineering"
},
{
"is_primary": true,
"skill_name": "Data Pipelines"
},
{
"is_primary": true,
"skill_name": "Data Quality"
},
{
"is_primary": true,
"skill_name": "Data Ingestion"
},
{
"is_primary": true,
"skill_name": "Database Views"
},
{
"is_primary": true,
"skill_name": "Application Programming Interfaces"
},
{
"is_primary": true,
"skill_name": "Data Warehouses"
},
{
"is_primary": true,
"skill_name": "Data Lakes"
},
{
"is_primary": true,
"skill_name": "Data Architecture"
},
{
"is_primary": true,
"skill_name": "Data Manipulation"
},
{
"is_primary": true,
"skill_name": "Kubernetes"
},
{
"is_primary": true,
"skill_name": "Teradata"
},
{
"is_primary": true,
"skill_name": "Databricks"
},
{
"is_primary": true,
"skill_name": "minIO"
},
{
"is_primary": true,
"skill_name": "AWS S3"
},
{
"is_primary": true,
"skill_name": "Amazon Redshift"
},
{
"is_primary": true,
"skill_name": "Data Lineage"
},
{
"is_primary": true,
"skill_name": "Information Lifecycle Management"
},
{
"is_primary": true,
"skill_name": "Database"
},
{
"is_primary": false,
"skill_name": "Log Analysis"
}
],
"jd_role": {
"display_name": "Data Engineer 2",
"rationale": null,
"role_aliases": [
"Data Engineer",
"Data Engineer II",
"Data Developer"
],
"role_archetype": "Data",
"slug": ""
},
"nano_parsed": {
"JD_type": "pass",
"about_company": {
"source_marker": {
"first_5_words": "Comcast brings together the best",
"last_5_words": "your career at Comcast."
},
"text": "Comcast brings together the best in media and technology. We drive innovation to create the world\u0027s best entertainment and online experiences. As a Fortune 50 leader, we set the pace in a variety of innovative and fascinating businesses and create career opportunities across a wide range of locations and disciplines. We are at the forefront of change and move at an amazing pace, thanks to our remarkable people, who bring cutting-edge products and services to life for millions of customers every day. If you share in our passion for teamwork, our vision to revolutionize industries and our goal to lead the future in media and technology, we want you to fast-forward your career at Comcast.",
"word_count": 84
},
"certifications": [],
"company_name": "Comcast",
"ctc": null,
"domain": {
"primary": {
"aliases": [
"Media",
"Entertainment"
],
"domain": "Media \u0026 Entertainment"
},
"secondary": null
},
"education": [
{
"level": "Bachelor\u0027s",
"qualification": "Bachelor\u0027s - Any Discipline",
"raw": "Bachelor\u0027s Degree",
"requirement": "preferred"
}
],
"experience": {
"max": 5,
"min": 2,
"raw": "2-5 Years"
},
"job_locations": [
{
"aliases": [],
"city": "Freewheel",
"country": "USA",
"state": null,
"work_mode": "null"
}
],
"role": "Data Engineer 2",
"role_aliases": [
"Data Engineer",
"Data Engineer II",
"Data Developer"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 15,
"heading": "Core Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Develops data structures and",
"last_5_words": "and responsibilities as assigned."
},
"text": "\u2022 Develops data structures and pipelines aligned to established standards and guidelines to organize, collect, standardize and transform data that helps generate insights and address reporting needs.\n\u2022 Focuses on ensuring data quality during ingest, processing as well as final load to the target tables.\n\u2022 Creates standard ingestion frameworks for structured and unstructured data as well as checking and reporting on the quality of the data being processed.\n\u2022 Creates standard methods for end users / downstream applications to consume data including but not limited to database views, extracts and Application Programming Interfaces.\n\u2022 Develops and maintains information systems (e.g., data warehouses, data lakes) including data access Application Programming Interfaces.\n\u2022 Participates in the implementation of solutions via data architecture, data engineering, or data manipulation on both on-prem platforms like Kubernetes and Teradata as well as Cloud platforms like Databricks.\n\u2022 Determines the appropriate storage platform across different on-prem (minIO and Teradata) and Cloud (AWS S3, Redshift) depending on the privacy, access and sensitivity requirements.\n\u2022 Understands the data lineage from source to the final semantic layer along with the transformation rules applied to enable faster troubleshooting and impact analysis during changes.\n\u2022 Collaborates with technology and platform management partners to optimize data sourcing and processing rules to ensure appropriate data quality as well as process optimization.\n\u2022 Handles data migrations/conversions as data platforms evolve and new standards are defined.\n\u2022 Preemptively recognizes and resolves technical issues utilizing knowledge of policies and processes.\n\u2022 Understands the data sensitivity, customer data privacy rules and regulations and applies them consistently in all Information Lifecycle Management activities.\n\u2022 Identifies and reacts to system notification and log to ensure quality standards for databases and applications. Solves abstract problems beyond single development language or situation by reusing data file and flags already set.\n\u2022 Solves critical issues and shares knowledge such as trends, aggregate, quantity volume regarding specific data sources.\n\u2022 Consistent exercise of independent judgment and discretion in matters of significance.\n\u2022 Regular, consistent and punctual attendance. Must be able to work nights and weekends, variable schedule(s) as necessary.\n\u2022 Other duties and responsibilities as assigned.",
"word_count": 366
},
{
"bullet_count": 8,
"heading": "Employees At All Levels Are Expected To",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Understand our Operating Principles;",
"last_5_words": "and our communities."
},
"text": "\u2022 Understand our Operating Principles; make them the guidelines for how you do your job.\n\u2022 Own the customer experience - think and act in ways that put our customers first, give them seamless digital options at every touchpoint, and make them promoters of our products and services.\n\u2022 Know your stuff - be enthusiastic learners, users and advocates of our game-changing technology, products and services, especially our digital tools and experiences.\n\u2022 Win as a team - make big things happen by working together and being open to new ideas.\n\u2022 Be an active part of the Net Promoter System - a way of working that brings more employee and customer feedback into the company - by joining huddles, making call backs and helping us elevate opportunities to do better for our customers.\n\u2022 Drive results and growth.\n\u2022 Respect and promote inclusion \u0026 diversity.\n\u2022 Do what\u0027s right for each other, our customers, investors and our communities.",
"word_count": 134
}
],
"urls": []
},
"rejected": false,
"rejection_reason": null,
"run_id": "ad4757bc-ccab-4012-be2f-748083a72f78",
"stage3_signals": {
"alias_found": true,
"alias_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
}
],
"kra_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": [
{
"kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
"sentence": "Develops data structures and pipelines aligned to established standards and guidelines to organize, collect, standardize and transform data that helps generate insights and address reporting needs.",
"similarity": 0.6721
},
{
"kra_text": "Builds data ingestion pipelines to collect data from transactional databases, third-party APIs, event streams, and file sources into centralized data platforms.",
"sentence": "Creates standard ingestion frameworks for structured and unstructured data as well as checking and reporting on the quality of the data being processed.",
"similarity": 0.6553
},
{
"kra_text": "Designs dimensional models, star schemas, data vault structures, and curated data mart tables to support BI tools and self-service analytics consumption.",
"sentence": "Develops and maintains information systems (e.g. , data warehouses, data lakes) including data access Application Programming Interfaces.",
"similarity": 0.6125
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 0.6466,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "DevOps Engineer",
"kra_matches": [
{
"kra_text": "Responds to deployment failures, infrastructure incidents, and environment misconfiguration issues to restore service availability and prevent recurrence.",
"sentence": "Preemptively recognizes and resolves technical issues utilizing knowledge of policies and processes.",
"similarity": 0.556
},
{
"kra_text": "Responds to deployment failures, infrastructure incidents, and environment misconfiguration issues to restore service availability and prevent recurrence.",
"sentence": "Identifies and reacts to system notification and log to ensure quality standards for databases and applications.",
"similarity": 0.4948
},
{
"kra_text": "Collaborates with development teams to improve build processes, reduce deployment friction, containerize applications, and adopt DevOps best practices.",
"sentence": "Collaborates with technology and platform management partners to optimize data sourcing and processing rules to ensure appropriate data quality as well as process optimization.",
"similarity": 0.4798
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 10,
"score": 0.5102,
"slug": "devops-engineer",
"total_count": null
},
{
"display_name": "ML Engineer",
"kra_matches": [
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Develops data structures and pipelines aligned to established standards and guidelines to organize, collect, standardize and transform data that helps generate insights and address reporting needs.",
"similarity": 0.5576
},
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Participates in the implementation of solutions via data architecture, data engineering, or data manipulation on both on-prem platforms like Kubernetes and Teradata as well as Cloud platforms like Databricks.",
"similarity": 0.5028
},
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Creates standard ingestion frameworks for structured and unstructured data as well as checking and reporting on the quality of the data being processed.",
"similarity": 0.4431
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 3,
"score": 0.5011,
"slug": "ml-engineer",
"total_count": null
},
{
"display_name": "AI Compliance Officer",
"kra_matches": [
{
"kra_text": "Assesses personal data usage, retention schedules, consent mechanisms, and cross-border transfer requirements for AI systems handling sensitive information.",
"sentence": "Understands the data sensitivity, customer data privacy rules and regulations and applies them consistently in all Information Lifecycle Management activities.",
"similarity": 0.5563
},
{
"kra_text": "Assesses personal data usage, retention schedules, consent mechanisms, and cross-border transfer requirements for AI systems handling sensitive information.",
"sentence": "Determines the appropriate storage platform across different on-prem (minIO and Teradata) and Cloud (AWS S3, Redshift) depending on the privacy, access and sensitivity requirements.",
"similarity": 0.4729
},
{
"kra_text": "Coordinates AI incident response procedures, regulatory breach notification, audit investigation support, and remediation tracking for compliance issues.",
"sentence": "Preemptively recognizes and resolves technical issues utilizing knowledge of policies and processes.",
"similarity": 0.4623
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 12,
"score": 0.4972,
"slug": "ai-compliance-officer",
"total_count": null
},
{
"display_name": "Backend Developer",
"kra_matches": [
{
"kra_text": "Investigates and resolves production incidents, API bugs, and service degradation through root cause analysis, hotfixes, and post-mortems.",
"sentence": "Preemptively recognizes and resolves technical issues utilizing knowledge of policies and processes.",
"similarity": 0.5181
},
{
"kra_text": "Adds structured logging, metrics, distributed tracing, and alerting to improve system observability and support production debugging.",
"sentence": "Identifies and reacts to system notification and log to ensure quality standards for databases and applications.",
"similarity": 0.4975
},
{
"kra_text": "Writes database access logic including SQL queries, ORM mappings, stored procedures, and migration scripts for relational databases like PostgreSQL and MySQL.",
"sentence": "Handles data migrations/conversions as data platforms evolve and new standards are defined.",
"similarity": 0.4664
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 1,
"score": 0.494,
"slug": "backend-engineer",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Cloud Architect",
"kra_matches": null,
"matched_count": 2,
"matched_skills": [
"Data Lakes",
"Kubernetes"
],
"role_id": 9,
"score": 0.1053,
"slug": "cloud-architect",
"total_count": 19
},
{
"display_name": "Fullstack Developer",
"kra_matches": null,
"matched_count": 1,
"matched_skills": [
"AWS S3"
],
"role_id": 15,
"score": 0.0526,
"slug": "full-stack-engineer",
"total_count": 19
},
{
"display_name": "ML Engineer",
"kra_matches": null,
"matched_count": 1,
"matched_skills": [
"Kubernetes"
],
"role_id": 3,
"score": 0.0526,
"slug": "ml-engineer",
"total_count": 19
},
{
"display_name": "DevOps Engineer",
"kra_matches": null,
"matched_count": 1,
"matched_skills": [
"Kubernetes"
],
"role_id": 10,
"score": 0.0526,
"slug": "devops-engineer",
"total_count": 19
},
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": 1,
"matched_skills": [
"Amazon Redshift"
],
"role_id": 2,
"score": 0.0526,
"slug": "data-engineer",
"total_count": 19
}
]
},
"stage4_decision": {
"alias_collision_detected": false,
"case": "A",
"chosen_role": {
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
},
"confidence": 1.0,
"is_new_role": false,
"llm2_fired": false,
"llm2_reasoning": null,
"matched_dimensions": [],
"matched_kras": [],
"matched_skills": [],
"new_role_display_name": null,
"new_role_slug": null,
"queued": false,
"reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top cloud-architect 0.11 does not contradict",
"sub_role": null
},
"stage5_updates": {
"centroid_n_after": 135,
"centroid_updated": true,
"collision_log_id": null,
"new_kra_attached": null,
"new_skills_attached": [
{
"is_primary": true,
"queue_id": 7337,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Engineering",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7338,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Pipelines",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7339,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Quality",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7340,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Ingestion",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7341,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Database Views",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7342,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Application Programming Interfaces",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7343,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Warehouses",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7344,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Architecture",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7345,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Manipulation",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7346,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Teradata",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7347,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "minIO",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7348,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Data Lineage",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7349,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Information Lifecycle Management",
"status": "pending"
},
{
"is_primary": true,
"queue_id": 7350,
"role_display_name": "Data Engineer",
"role_slug": "data-engineer",
"skill_name": "Database",
"status": "pending"
}
],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{
"alias_matches": [
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2017,
"existing_alias_text": "Data Lakes",
"input_term": "Data Lakes",
"matched_canonical": {
"category_id": 1,
"display_name": "Data Lakes",
"id": 1358,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PATTERN",
"slug": "data-lakes",
"sub_category_id": 1025,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 1267,
"existing_alias_text": "Kubernetes",
"input_term": "Kubernetes",
"matched_canonical": {
"category_id": 9,
"display_name": "Kubernetes",
"id": 726,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "kubernetes",
"sub_category_id": 557,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 1838,
"existing_alias_text": "Databricks",
"input_term": "Databricks",
"matched_canonical": {
"category_id": 9,
"display_name": "Databricks",
"id": 1202,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "databricks",
"sub_category_id": 911,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2355,
"existing_alias_text": "AWS S3",
"input_term": "AWS S3",
"matched_canonical": {
"category_id": 11,
"display_name": "AWS S3",
"id": 1460,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "aws-s3",
"sub_category_id": 120,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 301,
"existing_alias_text": "Amazon Redshift",
"input_term": "Amazon Redshift",
"matched_canonical": {
"category_id": 11,
"display_name": "Amazon Redshift",
"id": 107,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "amazon-redshift",
"sub_category_id": 118,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 5906,
"existing_alias_text": "log analysis",
"input_term": "Log Analysis",
"matched_canonical": {
"category_id": 8,
"display_name": "log analysis",
"id": 4183,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "METHODOLOGY",
"slug": "log-analysis",
"sub_category_id": 3297,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
}
],
"candidate_roles": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
},
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
},
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
},
{
"display_name": "Web Developer",
"id": 25,
"rationale": null,
"role_archetype": null,
"slug": "web-developer",
"source": "db"
},
{
"display_name": "Fullstack Developer",
"id": 15,
"rationale": null,
"role_archetype": null,
"slug": "full-stack-engineer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
},
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
{
"display_name": "Sitecore Dev",
"id": 233,
"rationale": null,
"role_archetype": "Engineering",
"slug": "sitecore-dev",
"source": "db"
}
],
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top cloud-architect 0.11 does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "Data Lakes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Data Lakes",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Container Orchestration Platforms",
"id": 134,
"rationale": "Platforms that schedule and manage containerized workloads across clusters and environments. Cloud Architects need these to define workload placement standards, cluster boundaries, and platform capabilities.",
"slug": "container-orchestration-platforms",
"source": "db"
},
"input_skill": "Kubernetes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Kubernetes for ML Workloads",
"id": 47,
"rationale": "Kubernetes-native components used to schedule, accelerate, and isolate ML training and serving workloads. This includes GPU enablement and ML-specific controllers rather than generic cluster administration.",
"slug": "kubernetes-for-ml-workloads",
"source": "db"
},
"input_skill": "Kubernetes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Databricks",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms \u0026 Hosting Providers",
"id": 278,
"rationale": "Familiarity with vendor-specific hosting and backend services for deploying and scaling web applications.",
"slug": "cloud-platforms-hosting-providers",
"source": "db"
},
"input_skill": "AWS S3",
"llm_role": null,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
},
{
"display_name": "Web Developer",
"id": 25,
"rationale": null,
"role_archetype": null,
"slug": "web-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms \u0026 Managed Services",
"id": 221,
"rationale": "Operates and integrates vendor-specific cloud compute, storage, and hosting services.",
"slug": "cloud-platforms-managed-services",
"source": "db"
},
"input_skill": "AWS S3",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Fullstack Developer",
"id": 15,
"rationale": null,
"role_archetype": null,
"slug": "full-stack-engineer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Warehouses",
"id": 22,
"rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
"slug": "cloud-data-warehouses",
"source": "db"
},
"input_skill": "Amazon Redshift",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Sitecore Troubleshooting and Maintenance",
"id": 447,
"rationale": "Diagnosing defects, regressions, and maintainability issues across Sitecore code, configuration, and content behavior. This is a coherent cluster because the role is expected to stabilize the site experience over time.",
"slug": "sitecore-troubleshooting-and-maintenance",
"source": "db"
},
"input_skill": "Log Analysis",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Sitecore Dev",
"id": 233,
"rationale": null,
"role_archetype": "Engineering",
"slug": "sitecore-dev",
"source": "db"
}
]
}
],
"input_final_skills": [
"Data Engineering",
"Data Pipelines",
"Data Quality",
"Data Ingestion",
"Database Views",
"Application Programming Interfaces",
"Data Warehouses",
"Data Lakes",
"Data Architecture",
"Data Manipulation",
"Kubernetes",
"Teradata",
"Databricks",
"minIO",
"AWS S3",
"Amazon Redshift",
"Data Lineage",
"Information Lifecycle Management",
"Database",
"Log Analysis"
],
"input_llm_skills": [
"Data Engineering",
"Data Pipelines",
"Data Quality",
"Data Ingestion",
"Database Views",
"Application Programming Interfaces",
"Data Warehouses",
"Data Lakes",
"Data Architecture",
"Data Manipulation",
"Kubernetes",
"Teradata",
"Databricks",
"minIO",
"AWS S3",
"Amazon Redshift",
"Data Lineage",
"Information Lifecycle Management",
"Database",
"Log Analysis"
],
"new_aliases_persisted": 0,
"run_id": "ad4757bc-ccab-4012-be2f-748083a72f78",
"skills_detail": [
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Engineering",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PRACTICE",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-engineering",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Pipelines",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PRACTICE",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-pipelines",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Quality",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-quality",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Ingestion",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PRACTICE",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-ingestion",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Database Views",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Databases",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "database-views",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Application Programming Interfaces",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Web Frameworks",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "application-programming-interfaces",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Warehouses",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Databases",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-warehouses",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Data Lakes",
"alias_type": "CANONICAL",
"id": 2017,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 1,
"display_name": "Data Lakes",
"id": 1358,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PATTERN",
"slug": "data-lakes",
"sub_category_id": 1025,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"input_skill": "Data Lakes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Data Lakes",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Data Lakes",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Architecture",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-architecture",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Manipulation",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "PRACTICE",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-manipulation",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Kubernetes",
"alias_type": "CANONICAL",
"id": 1267,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.0+",
"alias_type": "VERSION",
"id": 1271,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes 1.x",
"alias_type": "VERSION",
"id": 1270,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "Kubernetes v1",
"alias_type": "VERSION",
"id": 1269,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "k8s",
"alias_type": "VERSION",
"id": 1268,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "kubernetes 1.x",
"alias_type": "VERSION",
"id": 1400,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "kubernetes latest",
"alias_type": "VERSION",
"id": 1401,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 9,
"display_name": "Kubernetes",
"id": 726,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "kubernetes",
"sub_category_id": 557,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Container Orchestration Platforms",
"id": 134,
"rationale": "Platforms that schedule and manage containerized workloads across clusters and environments. Cloud Architects need these to define workload placement standards, cluster boundaries, and platform capabilities.",
"slug": "container-orchestration-platforms",
"source": "db"
},
"input_skill": "Kubernetes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Kubernetes for ML Workloads",
"id": 47,
"rationale": "Kubernetes-native components used to schedule, accelerate, and isolate ML training and serving workloads. This includes GPU enablement and ML-specific controllers rather than generic cluster administration.",
"slug": "kubernetes-for-ml-workloads",
"source": "db"
},
"input_skill": "Kubernetes",
"llm_role": null,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
]
}
],
"input_skill": "Kubernetes",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Teradata",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Databases",
"skill_nature": "TOOL",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "teradata",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Databricks",
"alias_type": "CANONICAL",
"id": 1838,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 9,
"display_name": "Databricks",
"id": 1202,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "databricks",
"sub_category_id": 911,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Databricks",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Databricks",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "minIO",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "TOOL",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "minio",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "AWS S3",
"alias_type": "CANONICAL",
"id": 2355,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 11,
"display_name": "AWS S3",
"id": 1460,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "aws-s3",
"sub_category_id": 120,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms \u0026 Hosting Providers",
"id": 278,
"rationale": "Familiarity with vendor-specific hosting and backend services for deploying and scaling web applications.",
"slug": "cloud-platforms-hosting-providers",
"source": "db"
},
"input_skill": "AWS S3",
"llm_role": null,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
},
{
"display_name": "Web Developer",
"id": 25,
"rationale": null,
"role_archetype": null,
"slug": "web-developer",
"source": "db"
}
]
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms \u0026 Managed Services",
"id": 221,
"rationale": "Operates and integrates vendor-specific cloud compute, storage, and hosting services.",
"slug": "cloud-platforms-managed-services",
"source": "db"
},
"input_skill": "AWS S3",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Fullstack Developer",
"id": 15,
"rationale": null,
"role_archetype": null,
"slug": "full-stack-engineer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
}
]
}
],
"input_skill": "AWS S3",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Amazon Redshift",
"alias_type": "CANONICAL",
"id": 301,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 11,
"display_name": "Amazon Redshift",
"id": 107,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "CLOUD_SERVICE",
"slug": "amazon-redshift",
"sub_category_id": 118,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Warehouses",
"id": 22,
"rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
"slug": "cloud-data-warehouses",
"source": "db"
},
"input_skill": "Amazon Redshift",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Amazon Redshift",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Data Lineage",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "data-lineage",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Information Lifecycle Management",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Data Engineering Tools",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "MULTI_YEAR",
"version_strategy": "UNVERSIONED",
"volatility": "MEDIUM"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "information-lifecycle-management",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [],
"canonical": null,
"dimensions": [],
"input_skill": "Database",
"matched_via": null,
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": {
"derived": {
"category": "Databases",
"skill_nature": "CONCEPT",
"sub_category": "general",
"typical_lifespan": "EVERGREEN",
"version_strategy": "UNVERSIONED",
"volatility": "STABLE"
},
"enrichment": null,
"keep_log": [],
"locked_dimensions": [],
"merge_log": [],
"placed": null,
"relationships": null,
"skill_id": "database",
"split_log": [],
"typed": null,
"warnings": []
},
"source_tag": "llm",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "log analysis",
"alias_type": "CANONICAL",
"id": 5906,
"is_primary": true,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 8,
"display_name": "log analysis",
"id": 4183,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "METHODOLOGY",
"slug": "log-analysis",
"sub_category_id": 3297,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Sitecore Troubleshooting and Maintenance",
"id": 447,
"rationale": "Diagnosing defects, regressions, and maintainability issues across Sitecore code, configuration, and content behavior. This is a coherent cluster because the role is expected to stabilize the site experience over time.",
"slug": "sitecore-troubleshooting-and-maintenance",
"source": "db"
},
"input_skill": "Log Analysis",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Sitecore Dev",
"id": 233,
"rationale": null,
"role_archetype": "Engineering",
"slug": "sitecore-dev",
"source": "db"
}
]
}
],
"input_skill": "Log Analysis",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
}
],
"unmatched_skills": [
"Data Engineering",
"Data Pipelines",
"Data Quality",
"Data Ingestion",
"Database Views",
"Application Programming Interfaces",
"Data Warehouses",
"Data Architecture",
"Data Manipulation",
"Teradata",
"minIO",
"Data Lineage",
"Information Lifecycle Management",
"Database"
]
}
API 3 — final-role-output
{
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top cloud-architect 0.11 does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"chosen_role_resolution": "in_db",
"final_input_skills": [
{
"skill": "Data Engineering",
"tag": "new"
},
{
"skill": "Data Pipelines",
"tag": "new"
},
{
"skill": "Data Quality",
"tag": "new"
},
{
"skill": "Data Ingestion",
"tag": "new"
},
{
"skill": "Database Views",
"tag": "new"
},
{
"skill": "Application Programming Interfaces",
"tag": "new"
},
{
"skill": "Data Warehouses",
"tag": "new"
},
{
"skill": "Data Lakes",
"tag": "in_db"
},
{
"skill": "Data Architecture",
"tag": "new"
},
{
"skill": "Data Manipulation",
"tag": "new"
},
{
"skill": "Kubernetes",
"tag": "in_db"
},
{
"skill": "Teradata",
"tag": "new"
},
{
"skill": "Databricks",
"tag": "in_db"
},
{
"skill": "minIO",
"tag": "new"
},
{
"skill": "AWS S3",
"tag": "in_db"
},
{
"skill": "Amazon Redshift",
"tag": "in_db"
},
{
"skill": "Data Lineage",
"tag": "new"
},
{
"skill": "Information Lifecycle Management",
"tag": "new"
},
{
"skill": "Database",
"tag": "new"
},
{
"skill": "Log Analysis",
"tag": "in_db"
}
],
"llm_cost_api1_usd": null,
"llm_cost_api2_usd": null,
"llm_cost_api3_usd": null,
"llm_cost_total_usd": null,
"persistence": {
"items": [
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Storage and Data Services",
"id": 144,
"rationale": "Cloud-native storage and managed data services used to place workloads, choose durability tiers, and define platform boundaries. This is a coherent cluster because architects evaluate storage fit, access patterns, and managed service tradeoffs.",
"slug": "cloud-storage-and-data-services",
"source": "db"
},
"dimension_id": 144,
"input_skill": "Data Lakes",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1358,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Data Lakes",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1358,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Container Orchestration Platforms",
"id": 134,
"rationale": "Platforms that schedule and manage containerized workloads across clusters and environments. Cloud Architects need these to define workload placement standards, cluster boundaries, and platform capabilities.",
"slug": "container-orchestration-platforms",
"source": "db"
},
"dimension_id": 134,
"input_skill": "Kubernetes",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Cloud Architect",
"id": 9,
"rationale": null,
"role_archetype": null,
"slug": "cloud-architect",
"source": "db"
},
{
"display_name": "DevOps Engineer",
"id": 10,
"rationale": null,
"role_archetype": null,
"slug": "devops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 726,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Kubernetes for ML Workloads",
"id": 47,
"rationale": "Kubernetes-native components used to schedule, accelerate, and isolate ML training and serving workloads. This includes GPU enablement and ML-specific controllers rather than generic cluster administration.",
"slug": "kubernetes-for-ml-workloads",
"source": "db"
},
"dimension_id": 47,
"input_skill": "Kubernetes",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "ML Engineer",
"id": 3,
"rationale": null,
"role_archetype": null,
"slug": "ml-engineer",
"source": "db"
},
{
"display_name": "MLOps Engineer",
"id": 16,
"rationale": null,
"role_archetype": null,
"slug": "ml-ops-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 726,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Databricks",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1202,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms \u0026 Hosting Providers",
"id": 278,
"rationale": "Familiarity with vendor-specific hosting and backend services for deploying and scaling web applications.",
"slug": "cloud-platforms-hosting-providers",
"source": "db"
},
"dimension_id": 278,
"input_skill": "AWS S3",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": ".NET Backend Developer",
"id": 83,
"rationale": null,
"role_archetype": "Engineering",
"slug": "dotnet-backend-developer",
"source": "db"
},
{
"display_name": "Kotlin Backend Developer",
"id": 84,
"rationale": null,
"role_archetype": "Engineering",
"slug": "kotlin-server-backend-developer",
"source": "db"
},
{
"display_name": "Scala Backend Developer",
"id": 87,
"rationale": null,
"role_archetype": "Engineering",
"slug": "scala-backend-developer",
"source": "db"
},
{
"display_name": "Web Developer",
"id": 25,
"rationale": null,
"role_archetype": null,
"slug": "web-developer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1460,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Platforms \u0026 Managed Services",
"id": 221,
"rationale": "Operates and integrates vendor-specific cloud compute, storage, and hosting services.",
"slug": "cloud-platforms-managed-services",
"source": "db"
},
"dimension_id": 221,
"input_skill": "AWS S3",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Fullstack Developer",
"id": 15,
"rationale": null,
"role_archetype": null,
"slug": "full-stack-engineer",
"source": "db"
},
{
"display_name": "Go Backend Developer",
"id": 81,
"rationale": null,
"role_archetype": "Engineering",
"slug": "go-backend-developer",
"source": "db"
},
{
"display_name": "Node.js Backend Developer",
"id": 82,
"rationale": null,
"role_archetype": "Engineering",
"slug": "node-backend-developer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1460,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Cloud Data Warehouses",
"id": 22,
"rationale": "Managed analytical storage and compute platforms used for curated datasets, reporting, and downstream analytics. These systems are central to data modeling, performance tuning, and cost-aware query design.",
"slug": "cloud-data-warehouses",
"source": "db"
},
"dimension_id": 22,
"input_skill": "Amazon Redshift",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 107,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "Sitecore Troubleshooting and Maintenance",
"id": 447,
"rationale": "Diagnosing defects, regressions, and maintainability issues across Sitecore code, configuration, and content behavior. This is a coherent cluster because the role is expected to stabilize the site experience over time.",
"slug": "sitecore-troubleshooting-and-maintenance",
"source": "db"
},
"dimension_id": 447,
"input_skill": "Log Analysis",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [
{
"display_name": "Sitecore Dev",
"id": 233,
"rationale": null,
"role_archetype": "Engineering",
"slug": "sitecore-dev",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 4183,
"skill_tag": "in_db",
"skipped_reason": null
}
],
"new_skills_created": 0,
"role_dimension_saved": 0,
"skill_dimension_saved": 0,
"skipped": 0
},
"planner_output": null,
"run_id": "ad4757bc-ccab-4012-be2f-748083a72f78"
}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.