Intermedia logo
Intermedia1 day ago

Principal SRE

Hybrid · London, England, United Kingdom or Bristol, England, United Kingdom

Type
Full Time
Level
Senior Level
Education
Not Specified
Company size
Unknown

Job Summary

Principal SRE responsible for reliability engineering, partnering with Engineering teams to design resilient services, architectures, and deployment patterns. Define and promote SRE practices including SLIs, SLOs, error budgets, capacity planning, incident response, and post-incident learning. Identify systemic reliability risks and work with teams to address root causes. Help reduce operational toil through automation, tooling, and better engineering practices. Architecture & Engineering Partnership: Work actively with Engineering teams during design, development, and production-readiness reviews. Advise and challenge teams on service architecture, fault tolerance, scalability, observability, deployment safety, and operational readiness, helping them to make pragmatic trade-offs. Support teams in diagnosing complex performance, latency, throughput, and resource-utilisation issues. Help establish engineering standards and reusable patterns for reliable, maintainable services. Performance & Observability: Lead investigations into performance bottlenecks across applications, infrastructure, databases, queues, networks, and third-party dependencies. Improve observability through metrics, logs, traces, dashboards, alerting, and service-level indicators. Help teams design meaningful alerts that identify user-impacting issues while reducing noise. Drive capacity planning and load-testing practices for critical systems. Platform, Automation & Tooling: Build and improve automation, deployment tooling, infrastructure-as-code, monitoring, and reliability platforms. Contribute to CI/CD improvements, release safety, rollback strategies, and progressive delivery practices. Develop tools that help Engineering teams self-serve reliability, diagnostics, and operational insights. Improve cloud, container, and orchestration environments with a focus on security, reliability, and scalability. Incident Management & Operational Excellence: Participate in incident response for high-priority production issues. Lead or contribute to blameless post-incident reviews. Ensure actions from incidents result in improvements to architecture, tooling, monitoring, or process. Mentor engineers on production ownership and operational best practices. What you will bring to the role: Experience in Site Reliability Engineering or senior backend/software engineering roles. Software engineering background, with the ability to write clean, maintainable production code. Experience working with Engineering teams to influence architecture and improve production readiness. Understanding of distributed systems, scalability, resiliency patterns, failure modes, and performance engineering. Experience diagnosing complex production issues across application and infrastructure layers. Hands-on experience with cloud platforms such as AWS, Azure, or GCP. Hands-on experience with on-premise environments and virtualization. Experience with containers and orchestration technologies, Kubernetes is a must. Knowledge of observability tooling, including metrics, logging, tracing, dashboards, and alerting. Experience with infrastructure-as-code tools such as Terraform. Experience with CI/CD pipelines and safe deployment practices. Strong scripting or programming skills in languages such as Python, Go, Java, C#, JavaScript/TypeScript, or similar. Clear and structured communication skills, with the ability to explain complex technical issues clearly to engineering and leadership audiences. Diversity, Inclusion, and Equal Opportunity—we hire based on ability to perform job responsibilities, without regard to protected characteristics. We are an equal opportunity employer and value diversity.

Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

Intermedia

Principal SRE

Apply on Sorce