Principle Site Reliability Engineer (Remote)
Remote · United Kingdom
Job Summary
Principal Site Reliability Engineer to own and optimize AWS-hosted infrastructures delivering B2B products. Responsibilities include managing, enhancing, and troubleshooting AWS environments, driving CI/CD practices, guiding multi-tenant migrations from on-premise to AWS, and supporting the on-call rotation. The role emphasizes scalable cloud operations, incident response, capacity planning, and collaboration with cross-functional teams to deliver reliable, well-documented systems. Required skills include extensive AWS experience, software engineering in an OO language (e.g., .NET or Python), infrastructure as code (AWS CDK, CloudFormation), ECS, PostgreSQL, RabbitMQ, Docker, Bash/PowerShell scripting, and familiarity with automation and monitoring tools. Preference for candidates with dual systems/infrastructure and software engineering backgrounds, experience with Pulumi and Azure DevOps, and Agile/Scrum experience.
Required Qualifications
- Bachelor's degree or advanced degree in a relevant STEM field
- Extensive experience delivering and operating complex, large-scale cloud systems (AWS preferred)
- Software engineering background or coding skills in an OO language (e.g., .NET or Python)
- Experience with infrastructure as code using AWS CDK
- Experience with AWS ECS, PostgreSQL, and RabbitMQ
- Experience with Bash and/or PowerShell scripting for infrastructure
- Experience with Docker containerization
- Experience with system design consulting, platform management, and capacity planning for AWS & CloudFormation
- Experience with SCRUM/Scaled Agile Framework
- Willingness to collaborate across global teams
- Security and documentation practices
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.