Site Reliability Engineer with Splunk
On-site · Lucknow, Uttar Pradesh, India
Job Summary
Junior Observability Engineer to design, implement, and optimize enterprise observability solutions across applications, infrastructure, and cloud environments. Responsibilities include developing dashboards, alerts, and telemetry frameworks for real-time system visibility; building automation to eliminate repetitive tasks; enabling runbook automation, self-healing capabilities, and automated incident triage; defining SLIs/SLOs and alerting strategies to improve service reliability; driving reductions in MTTD/MTTR through telemetry-driven insights; implementing proactive monitoring, anomaly detection, and predictive alerting; leveraging AIOps for alert correlation and intelligent incident response; integrating observability platforms with CI/CD, cloud services, and ITSM tools such as ServiceNow; collaborating with engineering, product, and operations teams to establish observability standards and operational readiness practices. Required: 3+ years in Observability/SRE, hands-on with Splunk, Dynatrace, Grafana, OpenTelemetry; strong AWS/GCP knowledge; Python; MELT; Terraform; SRE/observability fundamentals and incident response. Nice to have: experience with Kubernetes, ServiceNow integrations, and related certifications. The role supports a Work From Anywhere culture with relocation benefits and professional development opportunities.
Required Qualifications
- Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience)
- 3+ years of experience in Observability Engineering, Site Reliability Engineering, or related domains
- Hands-on experience with observability platforms such as Splunk, Dynatrace, Grafana, and OpenTelemetry
- Strong expertise in AWS and GCP knowledge, with familiarity with cloud-native architectures
- Proficiency in Python for automation and operational tooling
- Experience implementing metrics, logs, events, and distributed tracing (MELT) across distributed systems
- Hands-on experience with Terraform and Infrastructure as Code practices
- Strong understanding of SLIs, SLOs, alerting strategies, and incident response frameworks
- Excellent troubleshooting, communication, and collaboration skills
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.