Data Scientist
On-site · Bengaluru, Karnataka, India
Job Summary
Data Scientist – MLOps & Analytics Governance responsible for end-to-end MLOps lifecycle, model packaging, cloud deployment, monitoring, and automated retraining pipelines on GCP using Vertex AI, MLflow, or Kubeflow. Define and enforce data quality governance across all ML feature pipelines and training datasets, including schema contracts, null checks, and drift detection. Validate model outputs for statistical soundness, monitor prediction drift, and trigger automated retraining when thresholds are breached. Work with large-scale IoT sensor datasets from industrial equipment (air compressors, rotating machinery) to build scalable time-series and fault-detection pipelines. Collaborate with data engineers, domain experts, and product managers to translate requirements into scalable data science solutions, and leverage Generative AI coding assistants to accelerate development, boilerplate generation, unit tests, and code quality. Mandatory skills emphasize production-grade Python, optimized SQL for large workloads on GCP/AWS, model versioning and deployment automation, data quality governance, and insights validation; desired skills include domain knowledge of industrial machinery, IoT protocols, dbt-like transformations, and experience with Gen AI tools for code generation and review. The role is based in Bangalore, with emphasis on implementing robust MLOps and governance to ensure reliable ML deployment and actionable analytics across manufacturing and IoT contexts.
Required Qualifications
- Hands-on experience in data science, ML engineering, or applied AI roles with strong focus on production systems
- Deep ownership of MLOps – CI/CD for ML, model versioning, deployment automation, drift monitoring, and retraining pipelines on GCP (Vertex AI) or AWS (SageMaker)
- Advanced proficiency – writing and reviewing optimised, cost-efficient SQL including partitioning, clustering, query plan analysis, and scalable transformation design for large-scale workloads
- Strong Python skills for writing and reviewing production-grade ML code – feature engineering, batch scoring, and inference pipelines using scikit-learn, TensorFlow, PyTorch, or Pandas
- Hands-on experience implementing data quality governance – schema contracts, automated profiling, pipeline-level validation, lineage tracking, and quality scorecards integrated into ML workflows
- Proven ability to perform insights validation – identifying data leakage, biased model evaluations, distributional shifts, and statistically unsound conclusions prior to stakeholder delivery
- Strong grounding in statistical modeling – regression, classification, time-series forecasting, hypothesis testing, and model behaviour under distributional shift
- Familiarity with IoT data architectures – streaming pipelines, time-series databases (InfluxDB, TimescaleDB), and high-frequency sensor data processing at scale
- Experience with version control (Git), code review workflows, and working in agile cross-functional teams alongside data engineers and product managers
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.