Site Reliability Engineer - Canada Wide - Remote
Remote · Toronto, Ontario, Canada or CA
Toronto, Ontario, Canada or CARemoteFull TimeMid LevelNot SpecifiedSmall
Type
Full Time
Level
Mid Level
Education
Not Specified
Company size
Small
Job Summary
Seeking a Site Reliability Engineer to enhance the reliability, resilience, and operational readiness of services. Responsibilities include implementing improvements in fault tolerance, managing incidents, supporting critical services, defining SLIs/SLOs/SLA, enhancing observability, and leading post-mortems. Candidates should have experience with scalable system design, AWS, chaos engineering, scripting, and debugging production systems.
Required Qualifications
- Experience designing and operating scalable, reliable systems in AWS or a similar cloud environment
- Handled on-call shifts for critical systems
- Experienced with chaos engineering (i.e. Gremlin)
- Ability to dive in and debug live production systems
- Enjoy working in a growing system, and writing and deploying code without any downtime
- Experience scripting and/or development (i.e. Linux Shell, Python, Javascript, Java)
- Self-starter, taking initiative in an ambiguous space preferably within a start-up environment
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.