Linux Site Reliability Engineer (SRE)
On-site · Cyberjaya, Selangor, Malaysia
Job Summary
Linux Site Reliability Engineer (SRE) role based in Cyberjaya, Malaysia, focusing on designing and delivering a reliable, scalable, secure, and performant Red Hat Linux platform as part of OCBC Bank Infrastructure as a Service. The engineer will stay current on technical trends, participate in a 24/7 on-call rotation, drive toil elimination, and lead improvements in observability, monitoring, deployment designs, and testing. Required experience includes 6+ years on Red Hat Linux, scripting/programming (e.g., Python, Ruby, Java, C++, C#, Go), automation and CI/CD, cloud-native infrastructure (AWS/Azure), infrastructure as code (Terraform or CloudFormation), and experience with databases (relational/NoSQL) and configuration management tools (Ansible, Puppet, Chef, SCCM). The role also emphasizes strong communication, teamwork, documentation, disaster recovery, security hardening, audits, and working with vendors and professional services. The opportunity is presented within OCBC Malaysia, Cyberjaya, with a competitive salary and benefits package.
Required Qualifications
- Bachelor’s degree and/or equivalent experience in Information Technology, Computer Science or Business Management
- Have a relevant experience of above 6 years on Red Hat Linux Platform
- Install, Maintain, Upgrade and Patch UNIX servers in the organization
- Troubleshoot and fix system and software/hardware issues
- Support and maintain High Availability of system using clustering software
- Secure the systems by following published hardening guidelines
- Assist in audit and compliance tasks
- Perform Disaster Recovery activities
- Demonstrable web development experience, ability to write APIs
- Know at least one of {Python, Ruby, Java, C++, C#, Go} at an intermediate level
- Experience with Querying relational databases, and NoSQL databases
- Experience in automating releases, continuous integration/delivery systems and relevant tools in infrastructure
- Experience / understanding of Software Defined Data Centre, AWS/Azure-based, cloud-native infrastructure and managed services
- Experience with infrastructure as code (Terraform or CloudFormation)
- Knowledge of configuration management systems like SCCM, Ansible, Puppet, Chef
- Excellent communications and teamwork skills
- Proactive incident management and problem-solving abilities
- Ability to work independently and with vendors/professionals
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.