NVIDIA2 days ago

Senior Inference Engineer, AIConfigurator for Dynamo

NVIDIA

$184,000–$356,500 year

Remote · California, United States or Santa Clara, California, United States

California, United States or Santa Clara, California, United StatesRemoteFull Time$184,000–$356,500 yearSenior LevelDoctorate Or Professional DegreeEnterprise

Type

Full Time

Level

Senior Level

Education

Doctorate Or Professional Degree

Company size

Enterprise

Job Summary

Senior Inference Engineer to build and evolve AIConfigurator's core optimization engine for LLM serving, including configuration search, SLA-aware ranking, efficiency and latency estimation, and Pareto frontier analysis. Develop production-quality Python/Rust APIs, CLIs, SDK surfaces, and web workflows to help users generate deployment configurations for NVIDIA GPU clusters (Dynamo, Kubernetes, TensorRT-LLM, vLLM, SGLang). Integrate performance databases, profiling data, and validation tools; collaborate with inference runtime, performance, benchmarking, and product teams to ensure simulated results align with actual deployment performance on NVIDIA platforms (H100, H200, B200, GB200). Drive software quality via maintainable architecture, tests, documentation, and automation for open-source and production users. Translate advanced concepts such as prefill/decode disaggregation, tensor parallelism, pipeline parallelism, and KV cache behavior into dependable software abstractions.

Required Qualifications

BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Math, or related field, or equivalent experience
10+ years of relevant software engineering experience
Strong Python/Rust engineering skills with production APIs, CLIs, packaging, testing, debugging
Experience with GPU computing, distributed systems, ML infrastructure, or high-performance model serving
Understanding of LLM inference concepts such as batching, latency, efficiency, memory constraints, parallelism strategies, and serving SLAs
Experience with data-driven performance analysis, benchmarking, simulation, optimization, or managing resource needs
Ability to collaborate across research, runtime, platform, and customer-facing engineering teams
Strong written and verbal communication skills

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started