Senior Production Engineer (Reliability)
$182,000–$242,000 year
On-site · New York City, New York, United States or San Francisco, California, United States
Job Summary
Senior Production Engineer to take hands-on ownership of critical systems and frameworks, driving architecture, implementation, deployment, and long-term evolution to improve availability, scalability, and operational automation. You will lead end-to-end delivery of projects, build and maintain observability and automated remediation, participate in incident response and root-cause investigations, improve runbooks and deployment workflows, reduce operational toil through automation and refactors, ship production code in Python or Go, and collaborate with platform teams to ensure reliable integration of new features. Requires 7+ years of engineering experience, strong Python/Go skills, Kubernetes and cloud-native expertise, and experience with modern observability stacks and incident lifecycle practices.
Required Qualifications
- 7+ years of engineering experience building and operating distributed systems or cloud platforms
- Demonstrated ability to debug complex production issues end-to-end, across services, infrastructure layers, and automation
- Strong programming or scripting ability (Python, Go, or similar), with experience shipping and operating production services and tools
- Deep knowledge of cloud-native technologies and distributed system patterns, particularly Kubernetes
- Experience with modern observability stacks: metrics, tracing, structured logs, SLOs/SLIs, and incident lifecycle practices
- A track record of successfully delivering hands-on reliability improvements through engineering execution
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.