Morningstar Japan1 day ago

Senior Site Reliability Engineer

Morningstar Japan

$90,489–$132,711 year

Hybrid · Toronto, Ontario, Canada

Toronto, Ontario, CanadaHybridFull Time$90,489–$132,711 yearSenior LevelBachelors DegreeEnterprise

Type

Full Time

Level

Senior Level

Education

Bachelors Degree

Company size

Enterprise

Job Summary

Senior Site Reliability Engineer role focusing on designing, building, and improving CI/CD pipelines; provisioning and maintaining AWS infrastructure with IaC; on-call triage and incident resolution; leading cross-team reliability initiatives including disaster recovery and security compliance; deploying and managing Docker containers with ECS/EKS; automating monitoring and incident response with Splunk, CloudWatch, New Relic, and Harness; collaborating with software and data engineers to embed SRE best practices (SLOs, error budgets, capacity planning); scripting automation with Python/Bash; documenting infrastructure and runbooks; leveraging AI-assisted development tools to accelerate workflows; Toronto, ON with a hybrid work model (4 days onsite).

Required Qualifications

5+ years of experience in Site Reliability Engineering, DevOps, or cloud infrastructure roles supporting production systems
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
Strong hands-on experience with AWS cloud services (EC2, S3, ECS/EKS, Lambda, RDS, VPC, IAM, Route 53, CloudWatch)
Proficiency with Infrastructure as Code tools such as Terraform, CDK, or CloudFormation
Experience building and maintaining CI/CD pipelines using tools such as Jenkins, Harness, GitHub Actions, or similar platforms
Strong working knowledge of Docker containers and container orchestration platforms
Proficiency in scripting languages such as Python or Bash for automation and operational tooling
Solid understanding of Linux/Unix system administration and networking fundamentals
Experience with monitoring, logging, and alerting tools such as Splunk, New Relic, CloudWatch, or Datadog
Knowledge of SRE principles, including SLIs/SLOs, error budgets, incident management, and post-incident review processes
Experience using AI-assisted development tools (e.g., GitHub Copilot, Claude Code)
Nice to have AWS certifications (e.g., Solutions Architect, DevOps Engineer, SysOps Administrator)
Experience designing or supporting disaster recovery and business continuity strategies
Security compliance frameworks and implementing security best practices in cloud environments
Serverless architectures using AWS Lambda, SAM, or the Serverless Framework
Experience supporting distributed engineering teams across multiple time zones
Exposure to data pipeline infrastructure or platforms used for large-scale data processing
FinOps certification or experience with cloud financial management and cost optimization practices

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started