Universal Music France2 weeks ago

Service Reliability Eng

Universal Music France

On-site · London, England, United Kingdom

London, England, United KingdomOn-siteFull TimeMid LevelBachelors DegreeDigital MediaEnterprise

Type

Full Time

Level

Mid Level

Education

Bachelors Degree

Company size

Enterprise

Industry

Digital Media

Job Summary

Site Reliability Engineer responsible for the reliability, scalability and performance of global critical systems. Design, build, and maintain monitoring, alerting, and observability; automate infrastructure provisioning, deployments, and scaling; manage on-call incidents and drive post-incident reviews. Partner with engineering, IT, and security to embed SRE practices (SLOs, error budgets) into the lifecycle; ensure services connecting artists and fans remain available, scalable, and efficient.

Required Qualifications

A strong background in systems administration (Linux/Windows) in a large-scale environment
Proficiency in at least one programming language (e.g., Python, Go, Java)
Hands-on experience with a major cloud platform (AWS, GCP, or Azure)
Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, Ansible)
Experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, Dynatrace)
Proven analytical and problem-solving abilities with experience in a high-pressure environment
Excellent communication skills and the ability to foster a collaborative team environment

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started