AI Infrastructure & Experience Engineer
$145,600–$164,320 year
On-site · Mountain View Santa Clara County, California, United States
Job Summary
AI Infrastructure & Experience Engineer to deploy and optimize multiple LLMs and generative models on local inference hardware. Responsibilities include inference optimization through quantization and caching, CUDA-based kernel development, bridging inference backends with orchestration layers and frontends (e.g., OpenWebUI), rapid prototyping of demos to showcase model memory and context-aware web search, and integrating local AI compute with peripheral devices. Requires hands-on experience with NVIDIA ecosystems, ARM64, C++, Python, Rust, CUDA, llama.cpp/TensorRT-LLM/Ollama, FastAPI, Docker/Kubernetes, and frontend tooling (React/Next.js). Minimum 3 years of relevant experience and a CS-related degree preferred.
Required Qualifications
- Recent experience in model optimization
- Hardware & Compute: Proven experience with NVIDIA ecosystems and ARM64 architecture
- Systems Programming: Proficiency in C++, Python, and Rust; CUDA experience with custom kernels
- AI/ML Frameworks: Experience with llama.cpp, TensorRT-LLM, Ollama; orchestration frameworks like LiteLLM
- Software Engineering: FastAPI, Docker/Kubernetes, sandbox environments, low-latency API design
- Full-Stack Prototyping: Frontend UIs with React/Next.js or similar
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.