Research Engineer, Pretraining Scaling - London
On-site · London, England, United Kingdom
Job Summary
Research Engineer to own critical aspects of Anthropic’s production pretraining pipeline, including model operations, performance optimization, observability, and reliability; debug and resolve cross-stack issues; design and run experiments to improve training efficiency and uptime; respond to on-call incidents during model launches; build and maintain production logging and evaluation infrastructure; collaborate across SF and London teams and contribute to institutional knowledge.
Required Qualifications
- Bachelor’s degree in a field relevant to the role
- Hands-on experience training large language models or deep expertise with JAX/TPU, PyTorch, or large-scale distributed systems
- Willingness to be on-call for production systems and during launches
- Experience debugging across the full stack (hardware, networking, training dynamics, evaluation infrastructure)
- Ability to design and run experiments to improve training efficiency and uptime
Desired Qualifications
- Bachelor's degree or equivalent in a relevant field
- Hands-on experience training large language models or deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems
- Willingness to be on-call for production systems and work long hours during launches
- Ability to debug across the full stack (hardware, networking, training dynamics, evaluation infrastructure)
- Experience designing and running experiments to optimize training efficiency and uptime
- Strong communication and collaboration across time zones and teams
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.