Umusic logo
Umusic6 days ago

Service Reliability Engineer

On-site · Sydney, New South Wales, Australia

Type
Full Time
Level
Mid Level
Education
Bachelors Degree
Company size
Unknown

Job Summary

Site Reliability Engineer at Universal Music Group in Sydney leads reliability, scalability, and performance of global services within a follow-the-sun framework aligned to Australian business hours. Focus areas include building robust monitoring/observability (CloudWatch, Dynatrace), automating deployments and scaling, maintaining CI/CD pipelines, incident management, root cause analysis, and embedding SRE best practices (SLOs, error budgets) across engineering teams. Requires Linux/Windows systems administration, programming (Python/Go/Java), cloud (AWS preferred), containers (Docker/Kubernetes), and IaC (Terraform/Ansible); familiarity with Prometheus, Grafana, Datadog, Splunk, and Dynatrace; strong problem-solving and communication skills; plus willingness to work a Mon-Sun roster with weekend office presence.

Required Qualifications

  • Strong background in systems administration (Linux/Windows) in a large-scale environment
  • Proficiency in at least one programming language (Python, Go, Java)
  • Hands-on experience with a major cloud platform (prefer AWS) (AWS, GCP, or Azure)
  • Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (Terraform, Ansible)
  • Experience with modern monitoring and observability tools (Prometheus, Grafana, Datadog, Splunk, Dynatrace)
  • Proven analytical and problem-solving abilities in high-pressure environments
  • Excellent communication skills and ability to foster a collaborative team environment
  • Bachelor's degree in an IT-related field (preferred)
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

Umusic

Service Reliability Engineer

Apply on Sorce