AVP - Data & ML Platform

Weekday AIIndia
Adzuna INPosted -92m agoOriginal Listing
it-jobs

Job Description

This role is for one of the Weekday's clients Min Experience: 10+ years Location: Bengaluru, Mumbai JobType: full-time Focus Areas: (i) Data Platform Engineering, (ii) ML Platform & MLOps, (iii) Platform Operations & FinOps, (iv) Data Governance & Quality Experience: 14–20 years total | 8–12 years in Data/ML Platform Engineering Core Platform: Databricks Intelligence Platform (Unity Catalog, Delta Lake, MLflow, Mosaic AI) The Context We are currently developing the “v2.0” intelligence layer atop this Lakehouse—aiming to standardize MLOps, expand Agentic AI capabilities, and guarantee that the platform delivers sub-second latency across the entire retail network, which includes tens of thousands of stores and high-traffic digital channels. The Data & ML Platforms group (Group A in Enterprise IT) serves as the driving force behind this transformation. It is led by a VP (L2) and organized into four AVP-led pillars, supported by 10 AI-ready Platform Engineers and a transitioning team of Data Engineers. Each AVP is responsible for a specific platform layer and functions as a builder-leader —expected not only to manage but also to architect, perform code reviews, and actively contribute to development alongside their team. The Four Pillars We are seeking to hire four AVPs, each heading one of the platform pillars. While each AVP has full ownership of their respective pillar, all four collaborate closely as a unified leadership team under the VP. Candidates may be evaluated for placement in any pillar depending on their strengths and fit. Requirements (i) Data Platform Engineering Mission: Take full ownership of the core Lakehouse infrastructure, encompassing storage, compute, and developer platform layers that support all other operations. - Design and maintain the Delta Lake storage layer, Photon compute engine, and Unity Catalog abstraction, serving over 1,000 developers across various retail sectors. - Implement advanced optimization techniques including query plan tuning, cluster auto-scaling policies, Z-ordering strategies, and partitioning schemes for datasets with trillions of rows. - Manage the internal developer platform by developing SDKs, CLI tools, templates, and enabling self-service onboarding to accelerate new teams' time-to-first-query. - Lead the technical cleanup of Phase-1 migration challenges, including schema standardization, pipeline consolidation, and deduplication of source of record (SOR) systems across hundreds of sources. - Oversee the Data Engineer transition cohort within this pillar, establishing engineering standards, enforcing code review processes, and defining career progression paths. (ii) ML Platform & MLOps Mission: Industrialize machine learning by building infrastructure that efficiently moves models from experimentation notebooks to production at retail scale. - Develop and maintain the end-to-end ML lifecycle leveraging MLflow, including experiment tracking, model registry, automated retraining, A/B testing, and canary deployments. - Design the real-time inference architecture to deliver model serving with sub-100ms latency across recommendation, pricing, and demand forecasting applications. - Construct the Agentic AI infrastructure comprising RAG pipelines, vector stores, fine-tuning workflows for Foundation Models (utilizing Mosaic AI), and agent orchestration frameworks. - Establish governance for the Feature Store by standardizing feature definitions, enforcing freshness SLAs, lineage tracking, and promoting feature reuse across retail divisions. - Ensure reliability of the ML platform through GPU/TPU cluster management, training job scheduling, cost attribution per model, and managing incident response for production model degradations. (iii) Platform Operations & FinOps Mission: Maintain platform stability, performance, and cost-efficiency—especially during critical periods. - Ensure 99.99% platform uptime, providing leadership during critical events such as festive sales, store openings, and retail peak periods. - Establish and run the FinOps practice focusing on DBU cost allocation by team and workload, implementing chargeback models, automating resource right-sizing, and delivering executive cost dashboards. - Design and manage monitoring and observability systems covering pipeline health, query performance, cluster utilization, and data freshness SLAs across all six value streams. - Lead capacity planning by forecasting compute and storage demands in line with retail seasonality (festive cycles, new store launches, category introductions) and provisioning resources accordingly in advance. - Oversee incident management, develop runbooks, and conduct post-mortem evaluations for the Databricks platform, ensuring targets for mean time to recovery are met and continually improved. (iv) Data Governance & Quality Mission: Serve as the technical steward for India’s largest consumer dataset, ensuring its trustworthiness, compliance, and discoverability. - Develop “Governance-as-Code” frameworks on Unity Catalog, incorporating automated access controls, data classification, PII masking, and audit trails to comply with DPDP Act requirements. - Design and implement a data quality framework that includes automated profiling, anomaly detection, schema enforcement, and freshness monitoring across thousands of datasets. - Manage the data catalog and discovery platform, providing metadata management, lineage visualization, business glossary, and search tools to support over 1,000 users. - Build consent management infrastructure to monitor, enforce, and audit user consent signals throughout the comprehensive “Phygital” retail ecosystem (online and offline). - Drive enterprise-wide data standards by defining naming conventions, rules for SOR deduplication, master data alignment, and data contract enforcement between producing and consuming teams. Minimum Qualifications (All Pillars) - 14 to 20 years of professional experience in software engineering, data engineering, or ML infrastructure, including a minimum of 3 years leading a platform team of 5 or more engineers. - 8 to 12 years of hands-on experience in building and scaling data or ML platforms such as Lakehouse architectures, Feature Stores, Streaming Engines, or MLOps pipelines. - Strong technical expertise within the Databricks ecosystem or similar distributed data platforms (e.g., Spark, Presto/Trino, Flink, or Kafka at scale), with a strong preference for Databricks experience. - Proven “builder-leader” approach: actively involved in code review, production debugging, and architectural decision-making without fully delegating technical responsibilities. - Experience operating within large and complex technology organizations featuring inherited teams, cross-functional dependencies, and enterprise-grade compliance requirements. - Bachelor’s or Master’s degree in Computer Science, Data Science, or a related discipline, or equivalent expertise acquired through industry experience and open-source contributions. Preferred Qualifications - Previous experience managing India-scale data platforms handling multi-billion events per day, petabyte-scale data warehouses, or real-time serving at over 10,000 queries per second. - Hands-on experience with MLflow, Mosaic AI, or similar ML infrastructure platforms at production level—not limited to experimentation phases. - Familiarity with retail or e-commerce data domains such as product catalogs, inventory management, order processing, customer behavior signals, or supply chain datasets. - Demonstrated success in building internal tooling or developer platforms that have gained widespread organic adoption within large engineering organizations. - Experience with FinOps practices including DBU/compute cost attribution, chargeback modeling, and enterprise-scale cloud cost optimization. - Knowledge of Indian data privacy regulations (DPDP Act) or global frameworks (GDPR, CCPA) in the context of data platform governance. Organisation Context This position reports directly to the VP & Head of Data & ML Platforms , who in turn reports to the Head of Enterprise IT, and ultimately to the CEO. You will collaborate as a peer with three other AVPs within the Data & ML Platforms group and work closely with more than 10 AI-ready Platform Engineers at Architect and Principal levels, alongside the transitioning Data & Platforms Engineers cohort. The broader Enterprise IT division comprises five additional L2 groups: CISO/Cybersecurity, HR/Finance/Legal Platforms, SAP-Core, Systems & AI Architects, and CIO + Cloud & Infrastructure. Must-have skills Data & ML Platform, Databricks, Platform Architecture Good-to-have skills MLOps, System Architecture, Retail

Get AI-Matched to This Job

Upload your resume and our AI will score how well you match this and thousands of similar roles.