Senior Site Reliability Engineer, Wikimedia Enterprise
$116,633–$181,243 year
Remote
Job Summary
Senior Site Reliability Engineer to define, track, and improve SLOs/SLIs and error budgets for critical APIs; build and enhance observability (metrics, logs, distributed tracing) and CI/CD/GitOps workflows; drive reliability practices (capacity planning, load testing, chaos testing); improve DevEx with self-service infrastructure; embed reliability best practices early in development; design/operate secure-by-default, cost-efficient infrastructure across cloud platforms; track operational metrics (MTTR, MTTD, incident frequency); reduce toil through automation; contribute to internal platform capabilities; collaborate with a global, asynchronously communicating team; mentor peers; familiarity with open source and Wikimedia context a plus.
Required Qualifications
- Automation & Configuration Management: Infrastructure as Code, Terraform, Ansible
- Cloud Infrastructure across AWS, Azure, or GCP; scalability, reliability, cost efficiency
- CI/CD pipelines and GitOps workflows (GitLab, ArgoCD); progressive delivery (canary, blue-green)
- Incident Management & Reliability Operations; on-call practices; postmortems
- SRE Principles & Observability: SLOs, SLIs, error budgets; metrics/logs/distributed tracing (Prometheus, OpenTelemetry)
- Collaboration & Communication in a distributed environment
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.