Senior Cloud, DevOps, Site Reliability Engineer (For Pooling)
On-site · Manila, Metro Manila, Philippines
Job Summary
Senior Cloud, DevOps & SRE professional opportunity in Manila talent pool. Design, build, and support secure, scalable, highly available cloud infrastructure and platform services. Develop and maintain Infrastructure as Code, automation scripts, and deployment pipelines to improve consistency and reduce manual work. Build, manage, and optimize CI/CD pipelines for faster, safer releases. Support production environments through monitoring, alerting, observability, troubleshooting, and performance tuning. Participate in incident response, post-incident reviews, and disaster recovery planning. Define and track SLI/SLO/SLAs and other service reliability metrics. Create and maintain runbooks, dashboards, and recovery procedures. Collaborate with software engineering, security, infrastructure, and product teams to improve platform reliability and delivery speed. Apply best practices in access control, security, compliance, availability, scalability, and cost optimization. Contribute to continuous improvement across tooling, standards, architecture, and operational processes. For more senior-level opportunities, provide technical leadership, mentorship, and guidance on cloud, DevOps, and SRE practices. Join Our Talent Pool if you’re based in Manila and open to future opportunities; candidates may be considered for new roles across engineering teams. #LI-KA2 #LI-Hybrid.
Required Qualifications
- 5+ years of experience in SRE, DevOps, Cloud Engineering, Infrastructure Engineering, Platform Engineering, or related roles
- Strong hands-on experience with AWS and production-grade cloud infrastructure
- Solid experience with Terraform or similar Infrastructure as Code tools
- Experience with Docker, Kubernetes, or EKS in production environments
- Strong background in CI/CD tools such as Jenkins, GitHub Actions, GitLab CI, or similar
- Experience with monitoring and observability tools such as Datadog, ELK, Splunk, or equivalent
- Scripting or automation experience using Python, Bash, Shell, PowerShell, Java, Groovy, or similar
- Good understanding of Linux/Unix systems administration, networking, distributed systems, and security fundamentals
- Experience supporting production systems, including incident response, troubleshooting, and on-call responsibilities
- Solid understanding of reliability engineering concepts, including monitoring, alerting, service health, and recovery
- Strong communication and collaboration skills
- Mentoring engineers or acting as a senior technical point of contact
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.