Lead Machine Learning Engineer
$152,600–$190,700 year
Remote · Boston, Massachusetts, United States
Job Summary
Lead ML Engineer responsible for designing, building, and operating the ML platform surface used by data scientists—covering model packaging, deployment, batch and real-time inference, and observability. Establish and evangelize ML platform standards, patterns, and reusable components to raise the engineering bar for ML models across 40+ divisions. Mentor data scientists and engineers on production ML practices and perform code reviews of platform-adjacent work. Own model serving infrastructure on AWS SageMaker (including Unified Studio) for batch endpoints and serverless inference as appropriate. Build and maintain the model registry, version control, and promotion workflows from development to production with full lineage and auditability. Stand up retraining pipelines using MLflow, Weights & Biases, and orchestration tools, automate retraining triggers, experiment tracking, model evaluation, and approval gates. Build monitoring and alerting for production models (drift detection, data quality, latency/cost anomalies). Write modular Python and infrastructure-as-code (Terraform) for ML platform components with testing, versioning, and code review. Partner with data scientists to accelerate production workflows and reduce time-to-production, collaborating with Data/Platform Engineering and AI Engineering teams to ensure seamless integration of feature pipelines, model artifacts, and inference services.
Required Qualifications
- Bachelor’s degree or higher in Computer Science, Engineering, or a related technical field
- 7+ years of software engineering experience with production ownership of cloud services or platforms
- 5+ years of hands-on MLOps or ML platform experience
- Strong hands-on experience with AWS SageMaker (Unified Studio strongly preferred)
- Experience with MLflow, Weights & Biases, or similar tooling
- Strong Python skills, Terraform or IaC experience
- Understanding of batch and real-time inference, latency, throughput, and cost tradeoffs
Desired Qualifications
- MLOps
- AWS SageMaker
- SageMaker Unified Studio
- MLflow
- Weights & Biases
- Python
- Terraform
- infrastructure-as-code
- batch inference
- real-time inference
- model serving
- model registry
- version control
- retrieval/retraining pipelines
- production-grade ML infrastructure
- cloud experience
- collaboration with data scientists
- observability
- monitoring
- drift detection
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.