Senior Data Engineer
RecroBangalore, Karnataka₹2,000,000 – ₹3,000,000
it-jobs
Job Description
About the Role The Smart Operations (Smart Ops) division is seeking a seasoned Senior Data Engineer / Lead Data Engineer to design and scale our next-generation industrial data platform. In this role, you will architect the real-time data backbone that processes high-frequency iHistorian telemetry, IoT sensor feeds, and complex manufacturing data across global operations. You will be the core technical owner of our Databricks Lakehouse ecosystem, responsible for building optimized, cost-efficient streaming and batch pipelines that deliver high-trust data to our Advanced Analytics, MLOps, and Business Intelligence teams. Key Responsibilities - End-to-End Lakehouse Architecture: Design, implement, and scale a robust Medallion Architecture (Bronze-Silver-Gold) on Azure Databricks and Delta Lake. - Real-Time IoT Ingestion: Build fault-tolerant, low-latency streaming pipelines to ingest high-frequency telemetry and machine sensor data. - Enterprise Governance: Enforce data security, fine-grained access controls, and comprehensive lineage tracking using Unity Catalog. - Performance Tuning & Cost Optimization: Actively profile, tune, and optimize Spark workloads (utilizing partitioning, liquid clustering, AQE, and Photon) to drastically minimize cloud compute costs and pipeline runtimes. - CI/CD Automation: Modernize data platform deployments across Dev/QA/Prod environments using Infrastructure as Code (IaC) and Databricks Asset Bundles (DABs). - Cross-Functional Collaboration: Partner closely with manufacturing stakeholders, supply chain analysts, and data scientists to provide curated datasets for predictive maintenance, anomaly detection, and operational KPI reporting. Technical Skills Breakdown Mandatory Skills (Must-Haves) - Domain Context: Minimum 5+ years of Data Engineering experience with direct exposure to Heavy Manufacturing, Automotive/Commercial Vehicles, Industrial Automation, or Global Logistics data environments. - Databricks Ecosystem: Deep architectural mastery of Azure Databricks, PySpark, Spark SQL, Delta Lake, and Unity Catalog governance. - Streaming & CDC: Proven hands-on experience engineering real-time data pipelines using Apache Kafka , Azure Event Hubs, or Spark Structured Streaming, alongside Change Data Capture (CDC) tools (e.g., Debezium). - Cloud Infrastructure: Strong proficiency within the Microsoft Azure stack—specifically Azure Data Factory (ADF), Azure Data Lake Storage Gen2 (ADLS Cores), Azure Key Vault, and Azure Monitor. - Advanced SQL & Python: Expert-level Python programming and complex SQL optimisation (advanced indexing, partition pruning, performance tuning of massive datasets). - DevOps Automation: Hands-on experience setting up production CI/CD pipelines using Git, Azure DevOps, GitHub Actions, or Databricks Asset Bundles (DABs). Good to Have Skills (Nice-to-Haves) - Professional Certifications: Databricks Certified Data Engineer Professional (highly preferred), Databricks Certified Associate Developer for Apache Spark , or Microsoft Certified: Azure Data Engineer Associate (DP-203) . - Industrial & ERP Systems: Familiarity with industrial data systems (SCADA, PLC, iHistorian), heavy vehicle telemetry standards (MDF/MF4 files), or deep integration experience with enterprise ERPs (SAP S/4HANA, SAP BW). - MLOps & GenAI: Exposure to working alongside Machine Learning pipelines, dataset feature engineering, or implementing Generative AI frameworks (RAG, Vector Databases like FAISS/Weaviate) for semantic query self-service. - Infrastructure as Code (IaC): Basic proficiency with Terraform for provisioning cloud data infrastructure. Education & Qualifications • Bachelor’s or Master’s Degree in Computer Science, Information Technology, Electrical/Electronics Engineering, or a closely related technical field. Skills:- PySpark, databricks, Delta Lake, ADF and Apache Kafka
Get AI-Matched to This Job
Upload your resume and our AI will score how well you match this and thousands of similar roles.