Saviynt3 weeks ago

AI Platform Engineer, Training and Inference

Saviynt

On-site · Milpitas, California, United States

Milpitas, California, United StatesOn-siteFull TimeMid LevelBachelors DegreeUnknown

Type

Full Time

Level

Mid Level

Education

Bachelors Degree

Company size

Unknown

Job Summary

Own end-to-end ML platform responsibilities for AI training and inference: manage distributed training with Ray on GPU clusters (H100s), build and operate the LLM inference mesh using vLLM, SGLang, and NVIDIA Triton, and oversee the full model promotion lifecycle (shadow mode to canary to GA). Design and optimize the routing layer, RL training pipelines, and RL workflows; implement scalable inference with autoscaling and fractional GPU allocation; integrate RAG retrieval into the inference mesh; maintain data flows via GCS/S3; and ensure robust deployment through QA, canary, and golden-signal checks. Requires strong Python and PyTorch skills, plus experience with Ray (Train/Serve/Core/Data), LLM serving engines, distributed training, and ML lifecycle tooling.

Required Qualifications

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
Experience in ML engineering with time in an ML platform or MLOps role
Production Ray depth: Ray Train, Serve, Core, and Data
LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton
Distributed training: DDP, FSDP, NCCL, mixed precision
RL working knowledge: PPO, policy gradient, or RLHF
MLflow or equivalent model registry experience
Vector databases: Pgvector or Qdrant
Strong Python and PyTorch; Flyte or equivalent ML orchestrator

Desired Qualifications

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
Experience in ML engineering with time in an ML platform or MLOps role
Production Ray depth: Ray Train, Serve, Core, and Data
Hands-on with vLLM, SGLang, or NVIDIA Triton
Distributed training experience (DDP, NCCL, FP16/BF16, etc.)
RL training knowledge (PPO, policy gradient, RLHF)
MLflow or equivalent model registry experience
Experience with vector databases (Pgvector or Qdrant) and embedding upserts
Strong Python and PyTorch
Experience with Flyte or similar ML orchestrator
Experience with GPU/cluster management and autoscaling
Understanding of model lifecycles, canary testing, and golden signals
Experience with QA/testing for ML deployment pipelines
Knowledge of S3/GCS data management and GPU-direct data streaming

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started