Staff Machine Learning Engineer - ML Training Infrastructure
$185,000–$335,300 year
Remote · United States or Austin, Texas, United States
Job Summary
Senior Staff Machine Learning Engineer for ML Training Infrastructure focused on designing and building scalable, reliable, and high-performance AI/ML platform infrastructure to support model training and advanced AI research. Collaborate with ML engineers and researchers to enable distributed training across heterogeneous hardware, optimize performance, enhance observability and user experience, and integrate new features into the platform. Role emphasizes hands-on technical execution, scalable training frameworks (supporting large models with distributed training, FSDP, and pipeline parallelism), profiling and debugging training and data loading performance, and cross-functional collaboration to drive cost efficiency and impact in automotive AI initiatives across GM vehicles.
Required Qualifications
- Bachelor's degree or higher in Computer Science or equivalent major OR equivalent relevant experience
- 5+ years professional software engineering experience
- 3+ years specialized experience in AI/ML infrastructure
- Strong programming skills in Python, with proficiency in PyTorch (preferred), TensorFlow, or similar
- Experience with distributed computing, GPU computing, and cloud environments (AWS, GCP, Azure)
- Willingness to travel to Sunnyvale, CA as needed
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.