Miratech12 days ago

Site Reliability Engineer with Splunk

Miratech1

On-site · Lucknow, Uttar Pradesh, India

Lucknow, Uttar Pradesh, IndiaOn-siteFull TimeMid LevelBachelors DegreeLarge

Type

Full Time

Level

Mid Level

Education

Bachelors Degree

Company size

Large

Job Summary

Junior Observability Engineer to design, implement, and optimize enterprise observability solutions across applications, infrastructure, and cloud environments. Responsibilities include developing dashboards, alerts, and telemetry frameworks for real-time system visibility; building automation to eliminate repetitive tasks; enabling runbook automation, self-healing capabilities, and automated incident triage; defining SLIs/SLOs and alerting strategies to improve service reliability; driving reductions in MTTD/MTTR through telemetry-driven insights; implementing proactive monitoring, anomaly detection, and predictive alerting; leveraging AIOps for alert correlation and intelligent incident response; integrating observability platforms with CI/CD, cloud services, and ITSM tools such as ServiceNow; collaborating with engineering, product, and operations teams to establish observability standards and operational readiness practices. Required: 3+ years in Observability/SRE, hands-on with Splunk, Dynatrace, Grafana, OpenTelemetry; strong AWS/GCP knowledge; Python; MELT; Terraform; SRE/observability fundamentals and incident response. Nice to have: experience with Kubernetes, ServiceNow integrations, and related certifications. The role supports a Work From Anywhere culture with relocation benefits and professional development opportunities.

Required Qualifications

Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience)
3+ years of experience in Observability Engineering, Site Reliability Engineering, or related domains
Hands-on experience with observability platforms such as Splunk, Dynatrace, Grafana, and OpenTelemetry
Strong expertise in AWS and GCP knowledge, with familiarity with cloud-native architectures
Proficiency in Python for automation and operational tooling
Experience implementing metrics, logs, events, and distributed tracing (MELT) across distributed systems
Hands-on experience with Terraform and Infrastructure as Code practices
Strong understanding of SLIs, SLOs, alerting strategies, and incident response frameworks
Excellent troubleshooting, communication, and collaboration skills

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started