Machine Learning Engineer, LLM Evals & Observability

Glean

$200,000–$300,000 year

Hybrid · San Francisco, California, United States

EXPIREDSan Francisco, California, United StatesHybridFull Time$200,000–$300,000 yearMid LevelNot SpecifiedAI SoftwareUnknown

Type

Full Time

Level

Mid Level

Education

Not Specified

Company size

Unknown

Industry

AI Software

Job Summary

Design and curate evaluation datasets with diverse queries and golden sets to reliably cover real assistant behavior. Build and maintain large-scale evaluation pipelines that measure assistant quality across thousands of real user queries. Develop LLM-powered judges to score metrics such as correctness, completeness, and response quality, aligned with human judgment. Evaluate new models and product changes before shipping to provide quality signals that gate launches and prevent regressions. Build observability infrastructure for AI agents, including trace enrichment, data pipelines, and dashboards to make agent behavior inspectable. Close the loop between quality measurement and product improvement using evaluation results and customer feedback. Collaborate with engineers across the company to make evals a first-class part of how we ship.

Required Qualifications

2+ years of software engineering experience
Strong backend fundamentals in Go and Python
Experience with LLM evaluation, reinforcement learning from human feedback, NLP, or other large systems involving ML
Analytically rigorous mindset with a focus on offline metrics and real user impact
Ability to work in a customer-focused, cross-functional environment
Commitment to quality in systems and products

This role has closed. Sorce can match you with similar open roles and apply on your behalf.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started