Site Reliability Engineer
Hybrid · Rockville, Maryland, United States
Job Summary
Join the team supporting CMS as it merges and modernizes enterprise knowledge and data systems into a single AI-driven platform. You will operate and tune AWS environments to maintain availability, build observability with CloudWatch/New Relic/Splunk, implement infrastructure-as-code with Terraform/Ansible, support CI/CD pipelines and containerized workloads, define SLIs/SLOs and generate performance and bottleneck reports, optimize for performance and security with AWS tools, assist in security/compliance modernization toward a Continuous ATO within RMF/IS2P2 boundaries, and contribute to disaster recovery and COOP planning. You’ll own incidents end to end, drive blameless post-mortems, and implement preventative fixes while collaborating with security and compliance teams.
Required Qualifications
- Bachelor’s degree in computer science, engineering, or a related field (or equivalent hands-on experience)
- 3–5 years of experience in site reliability, systems, or cloud engineering
- Solid working knowledge of core AWS services, architecture, and best practices
- Hands-on experience with infrastructure-as-code tools (Terraform, Ansible, or CloudFormation)
- Good understanding of CI/CD pipelines and automation tools (Jenkins, GitLab CI, or similar)
- Comfort scripting and automating in Python
- Familiarity with monitoring and observability tooling (CloudWatch, New Relic, Splunk, or comparable)
- Strong problem-solving skills and ability to work under pressure
- Clear communication skills
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.