Member of Technical Staff, Site Reliablity Engineer
$200,000–$270,000 year
Remote · United States
Job Summary
Member of Technical Staff, Site Reliability Engineer builds and ships platform services in Go or TypeScript to improve reliability of real-time voice-call platform. Responsibilities include joining oncall rotation, driving incident command, turning incident learnings into a reliability backlog, defining SLOs for the call-completion path, implementing auto-remediation and capacity forecasters, tuning autoscaling with KEDA, and establishing postmortems. You’ll work on capacity planning, load testing against provider rate limits, and building on-call tooling, with emphasis on reducing p99 call completion gaps and MTTR. Technologies include Kubernetes production ops (HPA/VPA tuning, PodDisruptionBudgets, graceful shutdown), Chronosphere/Prometheus/Grafana/Datadog/OpenTelemetry, and building services such as cluster-manager, database-health, wscaler, incidentManager.
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.