Research Engineer, Pretraining Scaling
$350,000–$850,000 year
On-site · San Francisco, California, United States
Job Summary
Research Engineer on the ML Performance and Scaling team responsible for production pretraining pipelines. Own core aspects of the production pretraining process including model operations, performance optimization, observability, reliability, and incident response during model launches. Debug across the full stack—from hardware and networking to training dynamics and evaluation infrastructure. Design and run experiments to improve training efficiency, reduce step time, increase uptime, and enhance model performance. Build and maintain production logging, monitoring dashboards, and evaluation infrastructure; add capabilities to the training codebase (e.g., long context support, novel architectures). Collaborate with teams in SF and London and contribute to institutional knowledge by documenting systems and approaches. Candidates should have hands-on experience training large language models or deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems; enjoy both research and engineering work; be comfortable on-call during launches; thrive in high-impact, variable work prioritization; and communicate effectively across time zones. This role is located in-office in San Francisco with a compensation range of $350,000–$850,000 USD annually.
Required Qualifications
- Bachelor's degree in a field relevant to the role
- Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
- Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position
Desired Qualifications
- Hands-on experience training large language models
- Deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems
- Experience with production ML systems, observability tools, or evaluation infrastructure
- Ability to work across time zones and during high-stress incidents
- Strong communication and collaboration skills
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.