OpenAI8 months ago

Software Engineer, Inference – AMD GPU Enablement

OpenAI

On-site · San Francisco, California, United States

San Francisco, California, United StatesOn-siteFull TimeMid LevelNot SpecifiedAI ServicesLarge

Type

Full Time

Level

Mid Level

Education

Not Specified

Company size

Large

Industry

AI Services

Job Summary

Hiring engineers to scale and optimize OpenAI’s inference infrastructure on emerging GPU platforms, with a focus on AMD. Responsibilities include debugging and optimizing distributed inference workloads, validating performance on large GPU clusters, and collaborating with teams to optimize GPU kernels and collective communication libraries. Required skills include knowledge of GPU kernel development, understanding of communication libraries, and experience with distributed systems. Ideal candidates will enjoy solving complex performance challenges in a fast-paced environment.

Required Qualifications

Experience writing or porting GPU kernels using HIP, CUDA, or Triton
Familiarity with communication libraries like NCCL/RCCL
Experience with distributed inference systems
Problem-solving skills in end-to-end performance across hardware and system libraries
Enthusiasm for building infrastructure from first principles

Desired Qualifications

Contributions to open-source libraries like RCCL, Triton, or vLLM
Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling
Prior experience deploying inference on other non-NVIDIA GPU environments
Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models

Additional Requirements

Background checks for applicants will be administered in accordance with applicable law

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started