AI Platform Engineer, Training and Inference
On-site · Milpitas, California, United States
Job Summary
Own end-to-end ML platform responsibilities for AI training and inference: manage distributed training with Ray on GPU clusters (H100s), build and operate the LLM inference mesh using vLLM, SGLang, and NVIDIA Triton, and oversee the full model promotion lifecycle (shadow mode to canary to GA). Design and optimize the routing layer, RL training pipelines, and RL workflows; implement scalable inference with autoscaling and fractional GPU allocation; integrate RAG retrieval into the inference mesh; maintain data flows via GCS/S3; and ensure robust deployment through QA, canary, and golden-signal checks. Requires strong Python and PyTorch skills, plus experience with Ray (Train/Serve/Core/Data), LLM serving engines, distributed training, and ML lifecycle tooling.
Required Qualifications
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
- Experience in ML engineering with time in an ML platform or MLOps role
- Production Ray depth: Ray Train, Serve, Core, and Data
- LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton
- Distributed training: DDP, FSDP, NCCL, mixed precision
- RL working knowledge: PPO, policy gradient, or RLHF
- MLflow or equivalent model registry experience
- Vector databases: Pgvector or Qdrant
- Strong Python and PyTorch; Flyte or equivalent ML orchestrator
Desired Qualifications
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
- Experience in ML engineering with time in an ML platform or MLOps role
- Production Ray depth: Ray Train, Serve, Core, and Data
- Hands-on with vLLM, SGLang, or NVIDIA Triton
- Distributed training experience (DDP, NCCL, FP16/BF16, etc.)
- RL training knowledge (PPO, policy gradient, RLHF)
- MLflow or equivalent model registry experience
- Experience with vector databases (Pgvector or Qdrant) and embedding upserts
- Strong Python and PyTorch
- Experience with Flyte or similar ML orchestrator
- Experience with GPU/cluster management and autoscaling
- Understanding of model lifecycles, canary testing, and golden signals
- Experience with QA/testing for ML deployment pipelines
- Knowledge of S3/GCS data management and GPU-direct data streaming
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.