Glean logo
Glean1 month ago
EXPIRED

Machine Learning Engineer, LLM Evals & Observability

$200,000–$300,000 year

Hybrid · San Francisco, California, United States

Type
Full Time
Level
Mid Level
Education
Not Specified
Company size
Unknown
Industry
AI Software

Job Summary

Design and curate evaluation datasets with diverse queries and golden sets to reliably cover real assistant behavior. Build and maintain large-scale evaluation pipelines that measure assistant quality across thousands of real user queries. Develop LLM-powered judges to score metrics such as correctness, completeness, and response quality, aligned with human judgment. Evaluate new models and product changes before shipping to provide quality signals that gate launches and prevent regressions. Build observability infrastructure for AI agents, including trace enrichment, data pipelines, and dashboards to make agent behavior inspectable. Close the loop between quality measurement and product improvement using evaluation results and customer feedback. Collaborate with engineers across the company to make evals a first-class part of how we ship.

Required Qualifications

  • 2+ years of software engineering experience
  • Strong backend fundamentals in Go and Python
  • Experience with LLM evaluation, reinforcement learning from human feedback, NLP, or other large systems involving ML
  • Analytically rigorous mindset with a focus on offline metrics and real user impact
  • Ability to work in a customer-focused, cross-functional environment
  • Commitment to quality in systems and products
Sorce

This role has closed. Sorce can match you with similar open roles and apply on your behalf.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

$200k – $300k / yr

Machine Learning Engineer, LLM Evals & Observability · Glean

Find similar roles