Miratech12 days ago

Site Reliability Engineer with Splunk

Miratech1

On-site · Kanpur, Uttar Pradesh, India

Kanpur, Uttar Pradesh, IndiaOn-siteFull TimeEntry LevelBachelors DegreeLarge

Type

Full Time

Level

Entry Level

Education

Bachelors Degree

Company size

Large

Job Summary

Site Reliability Engineer with Splunk sought to design, implement, and optimize enterprise observability across applications, infrastructure, and cloud environments. Responsibilities include building end-to-end observability solutions, developing dashboards, alerts, and telemetry frameworks, enabling runbook automation and self-healing workflows, defining SLIs/SLOs and alerting strategies to improve service reliability, and driving MTTD/MTTR improvements through telemetry-driven insights. The role emphasizes proactive monitoring, anomaly detection, and predictive alerting using AIOps capabilities, integrating observability platforms with CI/CD, cloud services, and ITSM tools (ServiceNow), and collaborating across engineering, product, and operations. Candidates should have hands-on experience with Splunk, Dynatrace, Grafana, and OpenTelemetry; strong AWS/GCP skills; Python for automation; MELT metrics/logs/events/distributed tracing; Terraform/IaC; and a Bachelor’s degree (or equivalent experience). Nice-to-have items include experience with AIOps, Kubernetes, additional observability tools, related certifications, and remote-work adaptability. The role offers remote work flexibility, professional development, relocation program, and a globally distributed team environment.

Required Qualifications

Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience)
3+ years of experience in Observability Engineering, Site Reliability Engineering, or related domains
Hands-on experience with observability platforms such as Splunk, Dynatrace, Grafana, and OpenTelemetry
Strong expertise in AWS and GCP knowledge, with familiarity with cloud-native architectures
Proficiency in Python for automation and operational tooling
Experience implementing metrics, logs, events, and distributed tracing (MELT) across distributed systems
Hands-on experience with Terraform and Infrastructure as Code practices
Strong understanding of SLIs, SLOs, alerting strategies, and incident response frameworks
Excellent troubleshooting, communication, and collaboration skills

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started