Block3 weeks ago

Senior Site Reliability Engineer

Block

Remote · New South Wales, Australia

New South Wales, AustraliaRemoteFull TimeSenior LevelNot SpecifiedEnterprise

Type

Full Time

Level

Senior Level

Education

Not Specified

Company size

Enterprise

Job Summary

Senior Site Reliability Engineer responsible for proactively and reactively improving Block's platform reliability and critical infrastructure. You will build and extend platforms, standardize reliability tooling, triage and lead stabilization of high-severity incidents, and drive AI-enabled observability and automation to reduce toil. You will serve in primary oncall rotations, coordinate incident response, and implement safe deployment patterns (progressive delivery, automated rollback, guardrails). The role requires experience with production oncall, strong incident management skills, and fluency with modern tech stack including Kubernetes, CI/CD, AWS, DataDog, and related tooling. Block emphasizes distributed systems reliability, blameless postmortems, and evidence-based maturity assessments using trailing data. Remote work is supported; the role spans multiple time zones. Block is an equal opportunity employer with a commitment to inclusivity and accommodations during recruitment.

Required Qualifications

5+ years of software development experience
on-call production experience
CI/CD and deployment automation experience
experience with observability and monitoring tooling
Kubernetes, AWS, Terraform
DataDog
Istio/Envoy
MySQL/Vitess/DynamoDB
HTTP, JSON, gRPC, Protocol Buffers
strong incident management and root cause analysis skills

Desired Qualifications

5+ years of software development experience
experience running production oncall
familiarity with AI-driven tooling for observability, incident analysis, or automation
CI/CD pipelines, progressive rollout strategies, and rollback automation
on-call leadership and incident management skills
monitoring and observability expertise (alerts for uptime, error rates, latency, resource exhaustion)
experience with Kubernetes, Terraform, Istio/Envoy, AWS
DataDog
LaunchDarkly
MySQL/Vitess/DynamoDB
HTTP, JSON, gRPC, Protocol Buffers
strong fault-tolerance and deployment-safety patterns
ability to create evidence-based maturity assessments from data
vendor/dependency management skills
autonomy and accountability
strong communication and teamwork under high-pressure incidents

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started