Member of Technical Staff — Model Optimization and Inference (New Grad)
$200,000–$300,000 year
On-site · Seattle, Washington, United States
Job Summary
Member of Technical Staff focused on Model Optimization and Inference for a full-duplex multimodal real-time AI system. Responsibilities include end-to-end inference optimization across the model stack (LLMs, audio models, diffusion components), implementing and tuning KV cache strategies for long-context conversations, extending inference serving frameworks (vLLM, SGLang, TensorRT-LLM), profiling and benchmarking latency/throughput, building internal tooling for profiling and test harnesses, accelerating diffusion model inference with caching and kernel optimizations, applying quantization to reduce memory footprint while preserving quality, and collaborating with research and infrastructure to ship optimized models from day one.
Required Qualifications
- BS, MS, or PhD in CS, ML, or a related field — completed or in the final stretch
- Strong fundamentals in LLM inference or ML systems
- Exposure to inference serving frameworks (e.g., vLLM, SGLang, TensorRT-LLM)
- Strong Python and PyTorch skills
- A systematic approach to profiling and optimization
- Curiosity about diffusion inference, quantization, or other inference-time acceleration techniques
Desired Qualifications
- BS, MS, or PhD in CS/ML or a related field completed or in the final stretch
- Strong fundamentals in LLM inference or ML systems
- Exposure to inference serving frameworks (e.g., vLLM, SGLang, TensorRT-LLM)
- Strong Python and PyTorch skills
- Familiarity with CUDA or Triton is a plus
- Experience with model optimization, latency reduction, and real-time inference
- Open-source Contributions or internship/research in ML systems welcome
- Curiosity about diffusion inference and quantization techniques
- Bonus: internship or research in LLM inference or model serving
- Experience delivering under real-time latency SLAs
- Willingness to grow in a fast-paced, research-to-production environment
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.