Senior Lead Site Reliability Engineer
On-site · Glasgow, Scotland, United Kingdom
Job Summary
Senior Lead Site Reliability Engineer responsible for guiding reliability, observability, and performance across large-scale platforms; develop production code for reliability tooling and telemetry pipelines; lead incidents and drive blameless postmortems; define and implement SLOs/SLIs; design, deploy, and maintain OpenTelemetry-based telemetry ingestion and processing in hybrid on-prem/cloud environments with backends like InfluxDB, Prometheus, Elasticsearch, and OpenSearch; migrate legacy telemetry to standardized instrumentation; mentor engineers and influence broader engineering practices to advance observability and reliability technologies; requires strong programming, cloud-native, container orchestration, and incident-response skills.
Required Qualifications
- Formal training or certification on software engineering concepts
- Advanced knowledge of reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices
- Advanced proficiency in one or more programming languages (e.g., Java, Python, Go)
- Advanced proficiency with observability tools (Grafana, Dynatrace, Prometheus, Datadog, Splunk, Elasticsearch, OpenSearch)
- Proficiency with CI/CD tools (Jenkins, GitLab, Terraform)
- Experience with container orchestration (ECS, Kubernetes, Docker)
- Hands-on experience with OpenTelemetry collectors in production
- Ability to tackle reliability design independently
- Practical cloud native experience
- Ability to collaborate across stakeholder groups
- Knowledge of distributed tracing, metrics, and logging best practices
- Certification in AWS, Kubernetes, or relevant technologies
- Track record in system health monitoring, capacity management, blameless postmortems
- Understanding of distributed system design principles, networking, and Linux internals
- Contributions to open-source observability or telemetry projects
- Experience with agent control planes and management protocols such as OpAMP
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.