Site Reliability Engineer (AWS & Kubernetes), VP
On-site · Bengaluru, Karnataka, India or Gurugram, Haryana, India
Job Summary
Senior Site Reliability Engineer (VP-level) responsible for reliability, availability, and performance of large-scale AWS/Kubernetes-based production platforms. Lead adoption of SRE practices, embedding resilience, observability, and operational excellence; own 24/7 production support models with on-call leadership; design and operate highly resilient AWS-based Kubernetes platforms (EKS); drive incident management, post-incident reviews, and toil reduction; implement infrastructure automation with Terraform and GitOps; enable self-healing, auto-scaling (Karpenter), secure networking (Cilium), and robust observability (Grafana, Prometheus, Loki, Tempo); build reusable runbooks and golden paths; ensure regulatory and security compliance; partner with DevOps/engineering for production readiness; use SLIs/SLOs and metrics to drive continuous improvement; strong leadership and stakeholder engagement in a fast-paced, regulated environment.
Required Qualifications
- Senior-level SRE experience
- Strong expertise with AWS and Kubernetes (EKS)
- 24/7 production support and on-call leadership
- Proficiency with Terraform, GitOps, and cloud automation
- Hands-on experience with GitLab CI/CD and Argo CD
- Kubernetes networking, security and service mesh (Cilium)
- Observability tooling (Grafana, Prometheus, Loki, Tempo)
- Troubleshooting across distributed systems and cloud-native environments
- Ability to operationalize reliability through SLIs/SLOs and error budgets
- Experience in regulated/high-security environments
- Leadership, mentoring, and stakeholder engagement
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.