Umusic6 days ago

Service Reliability Engineer

Umusic

On-site · Sydney, New South Wales, Australia

Sydney, New South Wales, AustraliaOn-siteFull TimeMid LevelBachelors DegreeUnknown

Type

Full Time

Level

Mid Level

Education

Bachelors Degree

Company size

Unknown

Job Summary

Site Reliability Engineer at Universal Music Group in Sydney leads reliability, scalability, and performance of global services within a follow-the-sun framework aligned to Australian business hours. Focus areas include building robust monitoring/observability (CloudWatch, Dynatrace), automating deployments and scaling, maintaining CI/CD pipelines, incident management, root cause analysis, and embedding SRE best practices (SLOs, error budgets) across engineering teams. Requires Linux/Windows systems administration, programming (Python/Go/Java), cloud (AWS preferred), containers (Docker/Kubernetes), and IaC (Terraform/Ansible); familiarity with Prometheus, Grafana, Datadog, Splunk, and Dynatrace; strong problem-solving and communication skills; plus willingness to work a Mon-Sun roster with weekend office presence.

Required Qualifications

Strong background in systems administration (Linux/Windows) in a large-scale environment
Proficiency in at least one programming language (Python, Go, Java)
Hands-on experience with a major cloud platform (prefer AWS) (AWS, GCP, or Azure)
Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (Terraform, Ansible)
Experience with modern monitoring and observability tools (Prometheus, Grafana, Datadog, Splunk, Dynatrace)
Proven analytical and problem-solving abilities in high-pressure environments
Excellent communication skills and ability to foster a collaborative team environment
Bachelor's degree in an IT-related field (preferred)

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started