Researcher, Connectors - Agent Post-Training
$250,000–$380,000 year
Remote · United States
Job Summary
Design and run experiments to improve agentic model behavior for complex software and plugins; own end-to-end improvements to the post-training stack (RL, data pipelines, graders, reward signals, evals, diagnostics, model-behavior analysis); build evals and environments to expose model failures and translate them into data, product fixes, or new research directions; partner with Codex and ChatGPT product teams to translate user needs into model improvements; work on data mixtures, synthetic data, and evaluation loops to shape downstream agent behavior; decide which integrations and fixes are ready for major model runs; improve machinery for large-scale training and production readiness; take on cross-functional projects involving model training, product infrastructure, and production agent harness, including multi-agent systems; debug hard failures in shipped models and turn qualitative behavior into concrete hypotheses, experiments, and fixes; required background includes ML fundamentals, hands-on experience with LLMs, RL, and production ML systems; candidates should be excited by open-ended problems and capable of moving from hypothesis to actionable experiments; compensation ranges from $250K to $380K USD.
Required Qualifications
- Knowledge of machine learning, software engineering, or related field
- Experience with LLMs, RL, RLHF/RLAIF, post-training, evals, graders, synthetic data, model training, or production ML systems
- Ability to design experiments and build pipelines for model improvement
- Ability to work across research, product, infrastructure, data, evals, and safety boundaries
- Strong communication and collaboration skills
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.