Senior Site Reliability Engineer
$90,489–$132,711 year
Hybrid · Toronto, Ontario, Canada
Job Summary
Senior Site Reliability Engineer role focusing on designing, building, and improving CI/CD pipelines; provisioning and maintaining AWS infrastructure with IaC; on-call triage and incident resolution; leading cross-team reliability initiatives including disaster recovery and security compliance; deploying and managing Docker containers with ECS/EKS; automating monitoring and incident response with Splunk, CloudWatch, New Relic, and Harness; collaborating with software and data engineers to embed SRE best practices (SLOs, error budgets, capacity planning); scripting automation with Python/Bash; documenting infrastructure and runbooks; leveraging AI-assisted development tools to accelerate workflows; Toronto, ON with a hybrid work model (4 days onsite).
Required Qualifications
- 5+ years of experience in Site Reliability Engineering, DevOps, or cloud infrastructure roles supporting production systems
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
- Strong hands-on experience with AWS cloud services (EC2, S3, ECS/EKS, Lambda, RDS, VPC, IAM, Route 53, CloudWatch)
- Proficiency with Infrastructure as Code tools such as Terraform, CDK, or CloudFormation
- Experience building and maintaining CI/CD pipelines using tools such as Jenkins, Harness, GitHub Actions, or similar platforms
- Strong working knowledge of Docker containers and container orchestration platforms
- Proficiency in scripting languages such as Python or Bash for automation and operational tooling
- Solid understanding of Linux/Unix system administration and networking fundamentals
- Experience with monitoring, logging, and alerting tools such as Splunk, New Relic, CloudWatch, or Datadog
- Knowledge of SRE principles, including SLIs/SLOs, error budgets, incident management, and post-incident review processes
- Experience using AI-assisted development tools (e.g., GitHub Copilot, Claude Code)
- Nice to have AWS certifications (e.g., Solutions Architect, DevOps Engineer, SysOps Administrator)
- Experience designing or supporting disaster recovery and business continuity strategies
- Security compliance frameworks and implementing security best practices in cloud environments
- Serverless architectures using AWS Lambda, SAM, or the Serverless Framework
- Experience supporting distributed engineering teams across multiple time zones
- Exposure to data pipeline infrastructure or platforms used for large-scale data processing
- FinOps certification or experience with cloud financial management and cost optimization practices
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.