Site Reliability Engineer with Splunk
On-site · Pune, Maharashtra, India
Job Summary
Junior Observability Engineer to design, implement, and optimize enterprise observability solutions across applications, infrastructure, and cloud environments. Responsibilities include building dashboards, alerts, and telemetry frameworks for real-time system health; developing automation to reduce repetitive tasks; enabling runbook automation, self-healing capabilities, and automated incident triage; defining SLIs/SLOs and alerting strategies to improve service reliability; driving improvements in MTTD/MTTR through telemetry-driven insights; proactive monitoring, anomaly detection, and predictive alerting; leveraging AIOps for intelligent incident response; integrating observability platforms with CI/CD pipelines, cloud services, and ITSM tools such as ServiceNow; collaborating with engineering, product, and operations teams to establish observability standards and operational readiness. Requires 3+ years of hands-on experience with Splunk, Dynatrace, Grafana, and OpenTelemetry; strong AWS and GCP knowledge; Python for automation; MELT across distributed systems; Terraform and IaC; knowledge of SLIs/SLOs and incident response; Bachelor's degree or equivalent experience; nice-to-have: AIOps platforms, Kubernetes, ServiceNow integrations, certifications. Benefits include relocation program and Work From Anywhere culture.
Required Qualifications
- 3+ years of experience in Observability Engineering, Site Reliability Engineering, or related domains
- Hands-on experience with observability platforms such as Splunk, Dynatrace, Grafana, and OpenTelemetry
- Strong expertise in AWS and GCP knowledge, with familiarity with cloud-native architectures
- Proficiency in Python for automation and operational tooling
- Experience implementing metrics, logs, events, and distributed tracing (MELT) across distributed systems
- Hands-on experience with Terraform and Infrastructure as Code practices
- Strong understanding of SLIs, SLOs, alerting strategies, and incident response frameworks
- Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience)
- Excellent troubleshooting, communication, and collaboration skills
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.