Senior Engineer, Inference Control Plane
$139,000–$174,000 year
Hybrid · Seattle, Washington, United States
Job Summary
Senior Engineer to design, build, and optimize serverless inference infrastructure and APIs at scale. You will develop multi-tenant services powering AI inference and intelligent routing, operate high-scale distributed systems with strong reliability and performance goals, and improve observability, capacity management, automation, and tooling. Collaboration with platform, GPU infrastructure, and product engineering teams to deliver production-grade systems and highly available APIs, contribute to architecture decisions around traffic management and service orchestration, and participate in on-call rotations to improve service health and reduce incidents. Bonus familiarity with modern LLM serving architectures, engines like vLLM or Triton, API gateways or service meshes, and inference-optimization workloads.
Required Qualifications
- 5+ years of experience building and operating multi-tenant platforms or distributed backend systems
- Strong experience operating high-scale distributed services in production environments
- Deep understanding of SRE principles, including observability, incident management, reliability engineering, capacity planning, and operational automation
- 1+ years of hands-on experience with Go / Golang in production systems
- 1+ years of experience with Kubernetes
- Strong understanding of cloud-native architectures, microservices, and distributed systems fundamentals
- Experience debugging performance, scalability, and reliability issues in production systems
- Observability Proficiency: Experience tracking infrastructure and inference metrics like Time To First Token (TTFT), Time Per Output Token (TPOT), and GPU utilization.
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.