OpenAI logo
OpenAI8 months ago

Training Performance Engineer

Hybrid · San Francisco, California, United States

Type
Full Time
Level
Mid Level
Education
Not Specified
Company size
Large
Industry
AI Services

Job Summary

As a Training Performance Engineer, you will drive efficiency improvements across the distributed training stack by analyzing large-scale training runs and identifying utilization gaps. Responsibilities include profiling end-to-end training runs, optimizing GPU utilization, collaborating with engineers to enhance kernel efficiency, and building monitoring tools. Ideal candidates should love optimizing performance and understanding system interactions, have strong programming skills, and experience with distributed systems. Familiarity with ML frameworks and communication libraries is also beneficial.

Required Qualifications

  • Strong programming skills in Python and C++ (Rust or CUDA a plus).
  • Experience running distributed training jobs on multi-GPU systems or HPC clusters.
  • Exposure to frameworks like PyTorch, JAX, or TensorFlow and an understanding of how large-scale training loops are built.
  • Comfortable collaborating across teams and translating raw profiling data into practical engineering improvements.

Desired Qualifications

  • Familiarity with NCCL, MPI, or UCX communication libraries.
  • Experience with large-scale data loading and checkpointing systems.
  • Prior work on training runtime, distributed scheduling, or ML compiler optimization.

Additional Requirements

  • Background checks for applicants will be administered in accordance with applicable law.
  • Qualified applicants with arrest or conviction records will be considered for employment consistent with applicable laws.
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

OpenAI

Training Performance Engineer

Apply on Sorce