Site Reliability Engineer (SRE) - Azure | DevSecOps | IaC | Governance | Observability
$129,000–$143,000 year
Remote · New York City, New York, United States or US
Job Summary
The Site Reliability Engineer (SRE) will ensure stability, reliability, and performance on Azure and GCP platforms. Responsibilities include 24×7 on-call support, managing incidents, leading root cause analyses, and communicating with teams during major incidents. Ideal candidates should possess hands-on experience with multi-cloud environments, Infrastructure as Code (IaC) tools such as Terraform and Ansible, CI/CD systems like Jenkins and GitHub Actions, and observability tools including Grafana and Datadog. The role demands expertise in deep technical troubleshooting, defining service level objectives, and implementing AI-Ops for improved reliability.
Required Qualifications
- 5+ years in Site Reliability, DevOps, Cloud Operations, or Customer support roles
- Demonstrated experience in application-level troubleshooting by analyzing logs and traces to identify bugs, performance bottlenecks, and error conditions
- Expertise in Azure and GCP cloud operations and distributed system reliability
- Understanding of Terraform, Ansible, and CI/CD pipelines (Jenkins, GitHub Actions)
- Experience with observability and AI-Ops tools
- Solid grasp of incident management frameworks (P1–P3 handling, RCA, PIRs, on-call rotations)
- Excellent analytical, troubleshooting, and communication skills
Desired Qualifications
- Proactive Prevention
- AI-Driven Mindset
- Accountability
- Collaboration
- Efficiency
- Continuous Improvement
Additional Requirements
- Applicants must be currently authorized to work in the United States without the need for visa sponsorship now or in the future.
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.