Research Scientist, Safety Post Training
$216,000–$270,000 year
On-site · San Francisco, California, United States
Job Summary
Research Scientist focused on Safety Post-Training working to design post-training pipelines, develop interpretability-informed evaluations, and collaborate with policymakers, engineers, and researchers to translate findings into actionable safety standards, evaluation benchmarks, and best practices. Responsibilities include designing and running post-training pipelines to study how training choices affect model safety and alignment; developing interpretability-informed evaluations to understand unsafe or undesirable model behaviors and guide mitigations; collaborating across policy, engineering, and research teams to translate findings into practical safety standards and benchmarks. Ideal candidates have experience with post-training and RL techniques (e.g., RLHF, DPO, GRPO), a track record of ML research in generative AI, and strong cross-functional communication skills.
Required Qualifications
- At least three years of experience addressing sophisticated ML problems, whether in a research setting or product development
- Experience with post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches
- A track record of published research in machine learning, particularly in generative AI
- Strong written and verbal communication skills to operate in a cross-functional team
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.