NVIDIA logo
NVIDIA2 days ago

Senior Inference Engineer, AIConfigurator for Dynamo

$184,000–$356,500 year

Remote · California, United States or Santa Clara, California, United States

Type
Full Time
Level
Senior Level
Education
Doctorate Or Professional Degree
Company size
Enterprise

Job Summary

Senior Inference Engineer to build and evolve AIConfigurator's core optimization engine for LLM serving, including configuration search, SLA-aware ranking, efficiency and latency estimation, and Pareto frontier analysis. Develop production-quality Python/Rust APIs, CLIs, SDK surfaces, and web workflows to help users generate deployment configurations for NVIDIA GPU clusters (Dynamo, Kubernetes, TensorRT-LLM, vLLM, SGLang). Integrate performance databases, profiling data, and validation tools; collaborate with inference runtime, performance, benchmarking, and product teams to ensure simulated results align with actual deployment performance on NVIDIA platforms (H100, H200, B200, GB200). Drive software quality via maintainable architecture, tests, documentation, and automation for open-source and production users. Translate advanced concepts such as prefill/decode disaggregation, tensor parallelism, pipeline parallelism, and KV cache behavior into dependable software abstractions.

Required Qualifications

  • BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Math, or related field, or equivalent experience
  • 10+ years of relevant software engineering experience
  • Strong Python/Rust engineering skills with production APIs, CLIs, packaging, testing, debugging
  • Experience with GPU computing, distributed systems, ML infrastructure, or high-performance model serving
  • Understanding of LLM inference concepts such as batching, latency, efficiency, memory constraints, parallelism strategies, and serving SLAs
  • Experience with data-driven performance analysis, benchmarking, simulation, optimization, or managing resource needs
  • Ability to collaborate across research, runtime, platform, and customer-facing engineering teams
  • Strong written and verbal communication skills
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

$184k – $357k / yr

Senior Inference Engineer, AIConfigurator for Dynamo · NVIDIA

Apply on Sorce