Software Engineer, Inference – AMD GPU Enablement
On-site · San Francisco, California, United States
Job Summary
Hiring engineers to scale and optimize OpenAI’s inference infrastructure on emerging GPU platforms, with a focus on AMD. Responsibilities include debugging and optimizing distributed inference workloads, validating performance on large GPU clusters, and collaborating with teams to optimize GPU kernels and collective communication libraries. Required skills include knowledge of GPU kernel development, understanding of communication libraries, and experience with distributed systems. Ideal candidates will enjoy solving complex performance challenges in a fast-paced environment.
Required Qualifications
- Experience writing or porting GPU kernels using HIP, CUDA, or Triton
- Familiarity with communication libraries like NCCL/RCCL
- Experience with distributed inference systems
- Problem-solving skills in end-to-end performance across hardware and system libraries
- Enthusiasm for building infrastructure from first principles
Desired Qualifications
- Contributions to open-source libraries like RCCL, Triton, or vLLM
- Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling
- Prior experience deploying inference on other non-NVIDIA GPU environments
- Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models
Additional Requirements
- Background checks for applicants will be administered in accordance with applicable law
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.