Staff Software Engineer / Tech Lead, ML Infrastructure
$190,000–$250,000 year
Hybrid · San Francisco, California, United States
Job Summary
Technical anchor for a small, high-impact team building Data and ML infrastructure to support large-scale machine learning in precision healthcare. Lead system design, code reviews, and unblock technical hurdles while remaining hands-on in the codebase. Write high-performance Python code and leverage Ray to architect and maintain distributed computing platforms for ML training and evaluation. Drive deployment of complex ML algorithms into scalable cloud environments and design robust cloud-data systems for massive unstructured medical datasets. Collaborate with researchers and engineers to understand model development and production monitoring needs, and use AI-powered development tools to accelerate workflows. This role combines technical leadership with hands-on IC work to guide the team through architectural decisions and implementation details.
Required Qualifications
- 8+ years of professional software engineering experience with a strong focus on ML infrastructure, distributed systems, or MLOps
- demonstrated history of mentoring peers and leading technical projects
- ability to write clear, well-tested, and scalable Python code
- deep understanding of modern distributed computing architectures and cloud data workloads (AWS, GCP, or Azure)
- familiarity with infrastructure as code (e.g., CDK, Terraform)
- familiarity with modern distributed computing frameworks and table formats like Ray, Kubernetes, and Apache Iceberg
- knowledge of cross-language bindings for high-performance computing (e.g., C++/Python)
- prior experience in healthcare domain, highly-regulated environments, or handling image-based algorithms (preferred)
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.