Universal Music France logo
Universal Music France2 weeks ago

Service Reliability Eng

On-site · London, England, United Kingdom

Type
Full Time
Level
Mid Level
Education
Bachelors Degree
Company size
Enterprise
Industry
Digital Media

Job Summary

Site Reliability Engineer responsible for the reliability, scalability and performance of global critical systems. Design, build, and maintain monitoring, alerting, and observability; automate infrastructure provisioning, deployments, and scaling; manage on-call incidents and drive post-incident reviews. Partner with engineering, IT, and security to embed SRE practices (SLOs, error budgets) into the lifecycle; ensure services connecting artists and fans remain available, scalable, and efficient.

Required Qualifications

  • A strong background in systems administration (Linux/Windows) in a large-scale environment
  • Proficiency in at least one programming language (e.g., Python, Go, Java)
  • Hands-on experience with a major cloud platform (AWS, GCP, or Azure)
  • Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, Ansible)
  • Experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, Dynatrace)
  • Proven analytical and problem-solving abilities with experience in a high-pressure environment
  • Excellent communication skills and the ability to foster a collaborative team environment
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

Universal Music France

Service Reliability Eng

Apply on Sorce