Principal Site Reliability Engineer
$151,600–$245,300 year
Hybrid · California, United States
Job Summary
Principal Site Reliability Engineer responsible for designing, building, and operating reliable and secure cloud infrastructure for a large hybrid environment. You will contribute to automation and tooling, manage Kubernetes clusters, maintain monitoring and alerting, participate in on-call rotations, perform root cause analyses, mentor SRE practices, and collaborate with developers, researchers, data scientists, and security teams. The role emphasizes production-ready services, scalability, reliability, and automation across cloud platforms (GCP/AWS), with a hybrid work model (office-based with flexibility).
Required Qualifications
- BS or MS in Computer Science or related field or equivalent professional experience
- Experience with configuration management (e.g., Ansible, Terraform, Helm)
- Proficient in Python and/or Go
- Experience managing Kubernetes workloads with autoscaling
- Experience in Production Engineering/DevOps/SRE
- Experience with public cloud (GCP or AWS), especially GCP
- Strong Linux administration and network troubleshooting
- Programming experience in Python, Go, and shell scripting
- Experience with CI/CD pipelines (GitLab, GitHub)
Additional Requirements
- Is role eligible for Immigration Sponsorship?: Yes
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.