Pipeline run
b90fffca-a264-4009-a738-d1f55f9cc3ad
Client output enrichment
v2 Skill cluster · Nature of work · AI index · Tech stack maturity · Evidence · KRA descriptionvocab breakdown (legacy)
Signals
Post-classification
1 POST /skills/extract-from-jd
2 POST /skills/extract-details
3 POST /skills/final-role-output
Data Engineer
CASE Aslug: data-engineer · id: 2 · source: db
Exact alias hit on data-engineer (1.0) — no other alias at this confidence; skill_top data-engineer 0.50 does not contradict
Resolution:
in_db
— role exists in library; skill↔dim and role↔dim links saved when applicable.
Job description
Skills: Apache Spark, Data Engineering, Python, SQL, Machine Learning, Data Analysis, ETL, Company Overview CN Solutions partners with clients to provide top-notch solutions for their manpower needs. Specializing in Web Technologies, Databases, ERP, Data warehousing, and more. Offering staffing solutions, leadership hiring, RPO services, and more. Job Overview Senior Databricks role with 7 to 10 years of experience in Hyderabad. Full-Time employment with Qualifications And Skills • 7-10 years of experience in Data Engineering and Databricks • Proficiency in Python, SQL, and data analysis tools • Strong knowledge of Apache Spark and ETL processes • Experience in Machine Learning and working with large datasets Roles And Responsibilities • Develop and maintain Databricks pipelines for data ingestion and processing • Collaborate with data scientists and analysts to optimize data workflows • Implement data engineering best practices and ensure data quality and integrity • Utilize Apache Spark for data processing and analysis tasks
Skills from this JD
Each row merges API 1 extraction, API 2 library match / v3 orchestration (dimensions + locked dims), and API 3 persistence tags.
Aliases — catalog
- Databricks (CANONICAL)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Platform
- Sub-category
- Data Analytics Platform
- Vendor
- Databricks, Inc.
- License
- other_open
- Year introduced
- 2013
- Confidence
- 0.97
- Version strategy
- NOT_APPLICABLE
Maturity reasoning: Databricks appears frequently in data engineering and analytics job postings, especially alongside Spark, Delta Lake, and lakehouse stacks; strong vendor adoption and broad enterprise usage signal mainstream demand.
Skill profile (library / DB)
- Skill nature
- PLATFORM
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 9
- Sub-category id
- 911
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
React Frontend Development Catalog dimension db id 96
Library dimension (catalog)
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) |
Aliases — catalog
- Apache Spark (CANONICAL)
- apache spark 3 (VERSION)
- spark (VERSION)
- spark 3 (VERSION)
- spark 3.x (VERSION)
- spark3 (VERSION)
Context tags (catalog)
Stored enrichment (catalog DB)
- Category
- Framework
- Sub-category
- Distributed Data Processing Framework
- Vendor
- Apache Software Foundation
- License
- apache_2
- Year introduced
- 2010
- Confidence
- 0.94
- Version strategy
- SEPARATE_ENTITY
- Version tag
- 3.x
Maturity reasoning: Apache Spark appears in many data engineering JDs and remains a standard for distributed ETL/ELT; its GitHub and vendor ecosystem activity stay strong, with Databricks and cloud platforms still promoting it.
Skill profile (library / DB)
- Skill nature
- FRAMEWORK
- Volatility
- STABLE
- Typical lifespan
- EVERGREEN
- Category id
- 5
- Sub-category id
- 1021
- Extractable
- True
- Also category
- False
Dimensions (API 2 worklist)
-
ETL and ELT Tooling Catalog dimension db id 24
Library dimension (catalog)
Roles linked in library: Data Engineer
API 3 link attempts (this skill)
| Dimension | Skill↔dim | Role↔dim | Outcome |
|---|---|---|---|
|
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
All API 3 persistence rows
Same grid as the skill-extractor “Persistence items” table: one row per (skill × dimension) work item.
| Skill | Tag | Dimension | Skill↔dim | Role↔dim | Outcome | Notes |
|---|---|---|---|---|---|---|
| Databricks | in_db |
React Frontend Development
d_init_01
|
✓ | — | Existing dimension (library) · Role↔dimension skipped (dimension not under chosen role) | |
| Apache Spark | in_db |
ETL and ELT Tooling
etl-and-elt-tooling
|
✓ | ✓ | Existing dimension (library) · Role↔dimension saved |
Library artifacts (this run)
nano JD Parser — gpt-4.1-nano click to toggle
Show raw JSON
{
"JD_type": "pass",
"about_company": {
"source_marker": {
"first_5_words": "CN Solutions partners with clients",
"last_5_words": "leadership hiring, RPO services, and more."
},
"text": "CN Solutions partners with clients to provide top-notch solutions for their manpower needs. Specializing in Web Technologies, Databases, ERP, Data warehousing, and more. Offering staffing solutions, leadership hiring, RPO services, and more.",
"word_count": 42
},
"certifications": [],
"company_name": "CN Solutions",
"ctc": null,
"domain": {
"primary": {
"aliases": [
"ITES",
"BPO"
],
"domain": "IT Services \u0026 Consulting"
},
"secondary": null
},
"education": [],
"experience": {
"max": 10,
"min": 7,
"raw": "7-10 years of experience"
},
"job_locations": [
{
"aliases": [
"Hyderabad, AP"
],
"city": "Hyderabad",
"country": "India",
"state": null,
"work_mode": "onsite"
}
],
"role": "Senior Databricks",
"role_aliases": [
"Databricks Engineer",
"Data Engineer",
"Senior Data Engineer"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 4,
"heading": "Roles And Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Develop and maintain Databricks",
"last_5_words": "for data processing and analysis tasks"
},
"text": "\u2022 Develop and maintain Databricks pipelines for data ingestion and processing\n\u2022 Collaborate with data scientists and analysts to optimize data workflows\n\u2022 Implement data engineering best practices and ensure data quality and integrity\n\u2022 Utilize Apache Spark for data processing and analysis tasks",
"word_count": 42
}
],
"urls": []
}
API 1 — extract-from-jd click to toggle
{
"final_skills": [
{
"is_primary": true,
"skill_name": "Databricks"
},
{
"is_primary": true,
"skill_name": "Apache Spark"
}
],
"jd_role": {
"display_name": "Senior Databricks",
"rationale": null,
"role_aliases": [
"Databricks Engineer",
"Data Engineer",
"Senior Data Engineer"
],
"role_archetype": "Data",
"slug": ""
},
"nano_parsed": {
"JD_type": "pass",
"about_company": {
"source_marker": {
"first_5_words": "CN Solutions partners with clients",
"last_5_words": "leadership hiring, RPO services, and more."
},
"text": "CN Solutions partners with clients to provide top-notch solutions for their manpower needs. Specializing in Web Technologies, Databases, ERP, Data warehousing, and more. Offering staffing solutions, leadership hiring, RPO services, and more.",
"word_count": 42
},
"certifications": [],
"company_name": "CN Solutions",
"ctc": null,
"domain": {
"primary": {
"aliases": [
"ITES",
"BPO"
],
"domain": "IT Services \u0026 Consulting"
},
"secondary": null
},
"education": [],
"experience": {
"max": 10,
"min": 7,
"raw": "7-10 years of experience"
},
"job_locations": [
{
"aliases": [
"Hyderabad, AP"
],
"city": "Hyderabad",
"country": "India",
"state": null,
"work_mode": "onsite"
}
],
"role": "Senior Databricks",
"role_aliases": [
"Databricks Engineer",
"Data Engineer",
"Senior Data Engineer"
],
"role_archetype": "Data",
"roles_and_responsibilities": [
{
"bullet_count": 4,
"heading": "Roles And Responsibilities",
"heading_was_present": true,
"source_marker": {
"first_5_words": "\u2022 Develop and maintain Databricks",
"last_5_words": "for data processing and analysis tasks"
},
"text": "\u2022 Develop and maintain Databricks pipelines for data ingestion and processing\n\u2022 Collaborate with data scientists and analysts to optimize data workflows\n\u2022 Implement data engineering best practices and ensure data quality and integrity\n\u2022 Utilize Apache Spark for data processing and analysis tasks",
"word_count": 42
}
],
"urls": []
},
"rejected": false,
"rejection_reason": null,
"run_id": "b90fffca-a264-4009-a738-d1f55f9cc3ad",
"stage3_signals": {
"alias_found": true,
"alias_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
}
],
"kra_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": [
{
"kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
"sentence": "Utilize Apache Spark for data processing and analysis tasks",
"similarity": 0.673
},
{
"kra_text": "Develops batch and real-time streaming data pipelines using Apache Spark, Apache Kafka, Apache Flink, or Airflow for data movement and processing at scale.",
"sentence": "Develop and maintain Databricks pipelines for data ingestion and processing",
"similarity": 0.6581
},
{
"kra_text": "Works with data analysts, data scientists, and business stakeholders to define data models, ingestion schedules, and data delivery requirements.",
"sentence": "Collaborate with data scientists and analysts to optimize data workflows",
"similarity": 0.6448
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 0.6586,
"slug": "data-engineer",
"total_count": null
},
{
"display_name": "ML Engineer",
"kra_matches": [
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Develop and maintain Databricks pipelines for data ingestion and processing",
"similarity": 0.4983
},
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Implement data engineering best practices and ensure data quality and integrity",
"similarity": 0.4685
},
{
"kra_text": "Prepares, cleans, and transforms training datasets, manages feature stores, and builds feature engineering pipelines for model training.",
"sentence": "Collaborate with data scientists and analysts to optimize data workflows",
"similarity": 0.4487
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 3,
"score": 0.4719,
"slug": "ml-engineer",
"total_count": null
},
{
"display_name": "Svelte Frontend Developer",
"kra_matches": [
{
"kra_text": "backend data integration",
"sentence": "Implement data engineering best practices and ensure data quality and integrity",
"similarity": 0.4955
},
{
"kra_text": "backend data integration",
"sentence": "Develop and maintain Databricks pipelines for data ingestion and processing",
"similarity": 0.4824
},
{
"kra_text": "backend data integration",
"sentence": "Collaborate with data scientists and analysts to optimize data workflows",
"similarity": 0.4256
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 92,
"score": 0.4678,
"slug": "svelte-frontend-developer",
"total_count": null
},
{
"display_name": "MLOps Engineer",
"kra_matches": [
{
"kra_text": "Validates model performance benchmarks, data schema contracts, and system integration health before signing off on production release readiness.",
"sentence": "Implement data engineering best practices and ensure data quality and integrity",
"similarity": 0.4984
},
{
"kra_text": "Automates ML platform operations including scheduled retraining triggers, pipeline orchestration, evaluation workflows, and alerting configuration.",
"sentence": "Develop and maintain Databricks pipelines for data ingestion and processing",
"similarity": 0.4411
},
{
"kra_text": "Coordinates model promotion workflows across development, staging, and production environments including integration testing and data contract validation.",
"sentence": "Collaborate with data scientists and analysts to optimize data workflows",
"similarity": 0.4344
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 16,
"score": 0.4579,
"slug": "ml-ops-engineer",
"total_count": null
},
{
"display_name": "DevOps Engineer",
"kra_matches": [
{
"kra_text": "Builds and maintains CI/CD pipelines using Jenkins, GitHub Actions, GitLab CI, or CircleCI to automate build, test, security scanning, and deployment workflows.",
"sentence": "Develop and maintain Databricks pipelines for data ingestion and processing",
"similarity": 0.4589
},
{
"kra_text": "Collaborates with development teams to improve build processes, reduce deployment friction, containerize applications, and adopt DevOps best practices.",
"sentence": "Collaborate with data scientists and analysts to optimize data workflows",
"similarity": 0.4518
},
{
"kra_text": "Collaborates with development teams to improve build processes, reduce deployment friction, containerize applications, and adopt DevOps best practices.",
"sentence": "Implement data engineering best practices and ensure data quality and integrity",
"similarity": 0.3892
}
],
"matched_count": null,
"matched_skills": null,
"role_id": 10,
"score": 0.4333,
"slug": "devops-engineer",
"total_count": null
}
],
"skill_match_roles": [
{
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": 1,
"matched_skills": [
"Apache Spark"
],
"role_id": 2,
"score": 0.5,
"slug": "data-engineer",
"total_count": 2
}
]
},
"stage4_decision": {
"alias_collision_detected": false,
"case": "A",
"chosen_role": {
"display_name": "Data Engineer",
"kra_matches": null,
"matched_count": null,
"matched_skills": null,
"role_id": 2,
"score": 1.0,
"slug": "data-engineer",
"total_count": null
},
"confidence": 1.0,
"is_new_role": false,
"llm2_fired": false,
"llm2_reasoning": null,
"matched_dimensions": [],
"matched_kras": [],
"matched_skills": [],
"new_role_display_name": null,
"new_role_slug": null,
"queued": false,
"reasoning": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.50 does not contradict",
"sub_role": null
},
"stage5_updates": {
"centroid_n_after": 347,
"centroid_updated": true,
"collision_log_id": null,
"new_kra_attached": null,
"new_skills_attached": [],
"queue_entry_id": null,
"v3_pipeline_triggered": false,
"v3_role_slug": null,
"v3_run_id": null
}
}
API 2 — extract-details
{
"alias_matches": [
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 1838,
"existing_alias_text": "Databricks",
"input_term": "Databricks",
"matched_canonical": {
"category_id": 9,
"display_name": "Databricks",
"id": 1202,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "databricks",
"sub_category_id": 911,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
},
{
"alias_persist_skipped_reason": "alias_text already exists for this canonical skill",
"alias_persisted": false,
"existing_alias_id": 2004,
"existing_alias_text": "Apache Spark",
"input_term": "Apache Spark",
"matched_canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"matched_via": "alias"
}
],
"candidate_roles": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.50 does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Databricks",
"llm_role": null,
"roles_from_db": []
},
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Apache Spark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_final_skills": [
"Databricks",
"Apache Spark"
],
"input_llm_skills": [
"Databricks",
"Apache Spark"
],
"new_aliases_persisted": 0,
"run_id": "b90fffca-a264-4009-a738-d1f55f9cc3ad",
"skills_detail": [
{
"aliases_in_db": [
{
"alias_text": "Databricks",
"alias_type": "CANONICAL",
"id": 1838,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 9,
"display_name": "Databricks",
"id": 1202,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "PLATFORM",
"slug": "databricks",
"sub_category_id": 911,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"input_skill": "Databricks",
"llm_role": null,
"roles_from_db": []
}
],
"input_skill": "Databricks",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
},
{
"aliases_in_db": [
{
"alias_text": "Apache Spark",
"alias_type": "CANONICAL",
"id": 2004,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "apache spark 3",
"alias_type": "VERSION",
"id": 2006,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark",
"alias_type": "VERSION",
"id": 2510,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3",
"alias_type": "VERSION",
"id": 2007,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark 3.x",
"alias_type": "VERSION",
"id": 2009,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
},
{
"alias_text": "spark3",
"alias_type": "VERSION",
"id": 2008,
"is_primary": false,
"match_strategy": "CASE_INSENSITIVE"
}
],
"canonical": {
"category_id": 5,
"display_name": "Apache Spark",
"id": 1350,
"is_also_category": false,
"is_extractable": true,
"skill_nature": "FRAMEWORK",
"slug": "apache-spark",
"sub_category_id": 1021,
"typical_lifespan": "EVERGREEN",
"volatility": "STABLE"
},
"dimensions": [
{
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"input_skill": "Apache Spark",
"llm_role": null,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
]
}
],
"input_skill": "Apache Spark",
"matched_via": "alias",
"new_alias_persisted": false,
"new_alias_text": null,
"new_skill_meta": null,
"source_tag": "db",
"was_in_llm_skills": true
}
],
"unmatched_skills": []
}
API 3 — final-role-output
{
"chosen_role": {
"display_name": "Data Engineer",
"id": 2,
"rationale": "Exact alias hit on data-engineer (1.0) \u2014 no other alias at this confidence; skill_top data-engineer 0.50 does not contradict",
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
},
"chosen_role_resolution": "in_db",
"final_input_skills": [
{
"skill": "Databricks",
"tag": "in_db"
},
{
"skill": "Apache Spark",
"tag": "in_db"
}
],
"llm_cost_api1_usd": null,
"llm_cost_api2_usd": null,
"llm_cost_api3_usd": null,
"llm_cost_total_usd": null,
"persistence": {
"items": [
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "React Frontend Development",
"id": 96,
"rationale": "Building interactive web user interfaces with React.js, including component composition, state management, hooks, and rendering patterns. React.js belongs here because it is a core library for client-side UI development in modern web applications.",
"slug": "d_init_01",
"source": "db"
},
"dimension_id": 96,
"input_skill": "Databricks",
"llm_role": null,
"matched_chosen_role": false,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension skipped (dimension not under chosen role)",
"role_dimension_saved": false,
"roles_from_db": [],
"skill_dimension_saved": true,
"skill_id": 1202,
"skill_tag": "in_db",
"skipped_reason": null
},
{
"chosen_role_id": 2,
"dimension": {
"difficulty_hint": "well_known",
"display_name": "ETL and ELT Tooling",
"id": 24,
"rationale": "Packaged tools for extracting, loading, and transforming data across systems. This dimension covers connector-based ingestion, transformation frameworks, and managed integration products.",
"slug": "etl-and-elt-tooling",
"source": "db"
},
"dimension_id": 24,
"input_skill": "Apache Spark",
"llm_role": null,
"matched_chosen_role": true,
"outcome_line": "Existing dimension (library) \u00b7 Role\u2194dimension saved",
"role_dimension_saved": true,
"roles_from_db": [
{
"display_name": "Data Engineer",
"id": 2,
"rationale": null,
"role_archetype": null,
"slug": "data-engineer",
"source": "db"
}
],
"skill_dimension_saved": true,
"skill_id": 1350,
"skill_tag": "in_db",
"skipped_reason": null
}
],
"new_skills_created": 0,
"role_dimension_saved": 0,
"skill_dimension_saved": 0,
"skipped": 0
},
"planner_output": null,
"run_id": "b90fffca-a264-4009-a738-d1f55f9cc3ad"
}
LLM Calls
Every model call made for this run, in pipeline order. Click a card to see the model's response.