Manager, Site Reliability Engineering
Hybrid · Noida, Uttar Pradesh, India
Job Summary
SRE Manager to lead reliability, incident management, observability, and automation for enterprise platforms. Own end-to-end reliability of production systems, manage a 24x7 incident management team, drive blameless RCAs, and improve monitoring and automation as Monotype expands AI-driven workloads. Lead a team of ~14 engineers across operations and SRE excellence, partnering with Engineering, Product and Platform teams to improve release quality, production readiness, and cost efficiency. Requires a Bachelor’s degree in a related field and 10+ years of SRE experience, with strong hands-on AWS and Kubernetes experience, and proficiency with monitoring tools (Datadog, CloudWatch, ELK, Prometheus, Grafana). Hybrid work arrangement in Noida, India, with a focus on reliability, scalability, and collaboration across cross-functional teams.
Required Qualifications
- Bachelor’s degree in computer science, Engineering, or related field
- 10+ years of experience in SRE with proven experience managing production systems and 24x7 operations teams
- Strong hands-on experience with AWS and Kubernetes (EKS preferred)
- Experience with monitoring/observability tools (Datadog, CloudWatch, ELK, Prometheus, Grafana)
- Experience driving automation and reducing operational toil
- Understanding of microservices-based architectures
- Strong knowledge of release processes and production readiness practices
- Strong understanding of SLAs, SLIs, SLOs, and reliability metrics
- Certification in relevant technologies (e.g., AWS, Kubernetes) is a plus
- Strong leadership skills with experience managing and mentoring teams
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.