Site Reliability Engineer with Splunk
On-site · Kanpur, Uttar Pradesh, India
Job Summary
Site Reliability Engineer with Splunk sought to design, implement, and optimize enterprise observability across applications, infrastructure, and cloud environments. Responsibilities include building end-to-end observability solutions, developing dashboards, alerts, and telemetry frameworks, enabling runbook automation and self-healing workflows, defining SLIs/SLOs and alerting strategies to improve service reliability, and driving MTTD/MTTR improvements through telemetry-driven insights. The role emphasizes proactive monitoring, anomaly detection, and predictive alerting using AIOps capabilities, integrating observability platforms with CI/CD, cloud services, and ITSM tools (ServiceNow), and collaborating across engineering, product, and operations. Candidates should have hands-on experience with Splunk, Dynatrace, Grafana, and OpenTelemetry; strong AWS/GCP skills; Python for automation; MELT metrics/logs/events/distributed tracing; Terraform/IaC; and a Bachelor’s degree (or equivalent experience). Nice-to-have items include experience with AIOps, Kubernetes, additional observability tools, related certifications, and remote-work adaptability. The role offers remote work flexibility, professional development, relocation program, and a globally distributed team environment.
Required Qualifications
- Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience)
- 3+ years of experience in Observability Engineering, Site Reliability Engineering, or related domains
- Hands-on experience with observability platforms such as Splunk, Dynatrace, Grafana, and OpenTelemetry
- Strong expertise in AWS and GCP knowledge, with familiarity with cloud-native architectures
- Proficiency in Python for automation and operational tooling
- Experience implementing metrics, logs, events, and distributed tracing (MELT) across distributed systems
- Hands-on experience with Terraform and Infrastructure as Code practices
- Strong understanding of SLIs, SLOs, alerting strategies, and incident response frameworks
- Excellent troubleshooting, communication, and collaboration skills
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.