Member of Technical Staff, Inference & RL Systems
$225,000–$550,000 year
On-site · San Francisco, California, United States
Job Summary
As a Software Engineer on the Inference & RL Systems team, you will design and scale high-performance inference serving systems, optimize KV-cache management, and improve throughput and latency for long-context workloads. Responsibilities include building and maintaining distributed RL infrastructure, automating fault detection and recovery, and profiling to eliminate performance bottlenecks. Ideal candidates possess strong software engineering fundamentals, experience with large-scale systems, and an understanding of GPU constraints, with a focus on system trade-offs between latency, throughput, and cost.
Required Qualifications
- Strong software engineering and distributed systems fundamentals
- Experience building or operating large-scale inference or training systems
- Track record of owning critical production infrastructure
Desired Qualifications
- Deep understanding of GPU execution constraints and memory trade-offs
- Experience debugging performance issues in production ML systems
- Ability to reason about system-level trade-offs between latency, throughput, and cost
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.