Software Engineer, Inference - Multi Modal
On-site · San Francisco, California, United States
Job Summary
Join OpenAI's Inference team to design and implement high-performance infrastructure for serving multimodal models at scale. Collaborate with researchers and product teams to optimize systems for real-time audio and image processing. Responsibilities include system-level improvements, enabling reliable production services from experimental workflows, and addressing complex data handling. Ideal candidates should have experience in LLMs or multimodal models, knowledge of GPU dynamics, and a passion for innovative, cross-functional work.
Required Qualifications
- Experience building and scaling inference systems for LLMs or multimodal models
- Experience with GPU-based ML workloads
- Familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems
Desired Qualifications
- Experience working with image generation or audio synthesis models in production
- Exposure to distributed ML training or system-efficient model design
Additional Requirements
- Qualified applicants with arrest or conviction records will be considered for employment consistent with applicable laws.
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.