Model Serving Engineer
$100,000–$150,000 year
Remote · United States
Job Summary
Model Serving Engineer to design, build, and operate high-performance, highly reliable inference platforms for serving large machine learning models in production. Focus areas include request routing, batching, caching, autoscaling, GPU utilization, and end-to-end observability across diverse model workloads (LLMs, vision models, and recommendations). Responsibilities include multi-tenant routing, rate limiting, QoS policies, canary releases and automated rollback, incident response for high-availability AI services, security controls at the serving layer, and collaboration with ML and product teams to support model releases. Required skills include distributed systems expertise, proficiency in Python and a systems language (Go, Rust, or C++), experience with LLM inference frameworks (vLLM, TensorRT-LLM), GPU architecture knowledge, Kubernetes and cloud platforms, observability tooling, and performance/cost trade-off optimization.
Required Qualifications
- Bachelor’s or Master’s degree in Computer Science or a related field
- Six or more years of experience in distributed systems, infrastructure, or ML platform engineering
- Strong proficiency in Python and a systems language such as Go, Rust, or C++
- Deep experience operating high-throughput, low-latency services in production
- Hands-on experience with LLM or large model inference frameworks such as vLLM or TensorRT-LLM
- Strong understanding of GPU architecture, memory hierarchies, and accelerator utilization
- Familiarity with Kubernetes, autoscaling, and modern cloud platforms
- Experience with observability stacks including metrics, tracing, and structured logging
- Solid grounding in performance engineering and capacity planning
- Strong communication and incident response skills
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.