OpenAI logo
OpenAI2 months ago

Software Engineer, Infrastructure Reliability

$255,000–$405,000 year

On-site · San Francisco, California, United States

Type
Full Time
Level
Mid Level
Education
Not Specified
Company size
Large
Industry
AI Services

Job Summary

Join OpenAI's Applied AI Infrastructure team as a Software Engineer focusing on Infrastructure Reliability. You will design, build, and maintain reliable systems that enhance safety and performance. Responsibilities include identifying and fixing performance bottlenecks, improving automation, and collaborating with cross-functional teams to ensure system resilience. Candidates should have extensive knowledge of distributed systems, experience with Kubernetes, cloud infrastructure, and a passion for optimizing performance at scale.

Required Qualifications

  • 4+ years of relevant industry experience
  • 2+ years leading large scale, complex projects or teams as an engineer or tech lead
  • Proven experience as a reliability engineer or production engineer
  • Strong proficiency in programming / scripting languages
  • Experience with containerization technologies

Desired Qualifications

  • Experience operating orchestration systems such as Kubernetes at scale
  • Strong proficiency in cloud infrastructure (like AWS, GCP, Azure)
  • Experience with observability tools such as Datadog, Prometheus, Grafana, Splunk and ELK stack
  • Experience with microservices architecture and service mesh technologies

Additional Requirements

  • Background checks for applicants will be administered in accordance with applicable law
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

$255k – $405k / yr

Software Engineer, Infrastructure Reliability · OpenAI

Apply on Sorce