Senior Site Reliability Engineer
Remote · Colombia
Job Summary
Senior Site Reliability Engineer responsible for defining and maintaining SLIs/SLOs, leading incident response and postmortems, automating operations tasks, building and evolving CI/CD pipelines with canary/blue-green deployments, driving reliability improvements across architecture, monitoring, and operational processes, implementing observability systems, optimizing cloud resource usage, deploying containerized workloads with Docker/Kubernetes, collaborating with development teams on resilience patterns, participating in high-availability and disaster-recovery discussions, and mentoring mid/senior SREs.
Required Qualifications
- 5–8 years of experience in a reliability or operations role
- Cloud-agnostic certification: Terraform Associate, Certified Kubernetes Administrator (CKA), or SRE Foundation
- Cloud provider certification: AWS/Azure/GCP/Oracle Cloud professional-level certification
- Solid coding skills in Python, Go, or equivalent
- Experience with IaC, CI/CD pipelines, and monitoring/observability stacks (Prometheus, Grafana, OpenTelemetry, ELK)
- Experience with distributed systems and production-scale services
- Nice-to-have: multi-cloud data replication or cross-cloud networks
- Nice-to-have: chaos engineering or fault injection
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.