Performance Engineer, GPU
$280,000–$850,000 year
Hybrid · San Francisco, California, United States
Job Summary
GPU Performance Engineer role involves architecting and implementing foundational GPU performance systems to maximize utilization and inference efficiency for large language models. Responsibilities span from low-level tensor core optimizations to coordinating thousands of GPUs in distributed environments, with opportunities to develop custom kernels, co-design attention mechanisms, and optimize end-to-end training and inference pipelines. Preferred skills include CUDA/Triton/CUTLASS, Flash Attention, tensor core optimization, NCCL/NVLink, mixed-precision, and experience with production-scale ML infrastructure. Visa sponsorship is offered; hybrid in-office policy expects presence at least 25% of the time; location shown is San Francisco, CA, USA.
Required Qualifications
- Bachelor’s degree or equivalent
- Experience with GPU programming and optimization at scale
- Strong collaboration and pair programming
- Experience with distributed systems and multi-node GPU clusters
- Proficiency in GPU kernel development and optimization techniques
- Familiarity with ML frameworks and compilers (e.g., PyTorch, JAX, XLA)
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.