Senior Lead Site Reliability Engineer
On-site · Plano, Texas, United States
Job Summary
As a Senior Lead Site Reliability Engineer, you will drive reliability across large-scale critical applications by setting quality gates, defining measurable SLOs/SLIs, and guiding incident response and post-incident improvements. You will own and improve logging, monitoring, alerting, and CI/CD controls to reduce toil and increase release confidence, orchestrate major incident responses, and collaborate with global teams to ensure availability, performance, security, and observability. You will lead efforts in automation, runbooks, and resilience practices, mentor engineers, and contribute to dashboards and capacity planning to improve MTTR and reduce change failure rate.
Required Qualifications
- 10+ years supporting critical applications in large-scale environments
- experience leading and mentoring engineers/teams
- Strong SDLC and secure development practices
- experience implementing objective quality gates and release readiness standards
- Hands-on SRE experience including SLIs/SLOs, error budgets, incident management, post-incident reviews/root-cause analysis
- experience designing actionable monitoring/logging and dashboards (e.g., Splunk, AppDynamics, or equivalent), including alert tuning
- experience with CI/CD pipelines and automated testing (unit, integration, security)
- operational controls that reduce change risk
- calm, accountable incident leadership under pressure
- strong communication and stakeholder management
- comfortable collaborating with global teams and engaging during critical incidents outside standard business hours
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.