Lead Staff Systems Reliability Engineer (Linux & Distributed Systems)
On-site · London, England, United Kingdom
Job Summary
Lead a team to influence, manage, and plan work streams, systems, and data structures at scale within a global ecosystem, spanning multiple infrastructure providers. Own operations for Linux-based systems running Aerospike, Kafka, and Mongo. Serve as a point of contact to review new use cases, answer questions, and participate in on-call rotation. Learn to be a NoSQL SME with training provided. Benchmark and analyze next generation hardware offerings. Focus on building infrastructure automation for stateful systems at scale and collaborate with vendors to optimize bleeding-edge technology for internet-scale workloads.
Required Qualifications
- Linux operating system
- Leadership experience
- Troubleshooting
- Identify bottlenecks (Is it CPU? IO?)
- On-call rotation
- Experience leading a team to influence, manage, and plan work streams, systems, and data structures at scale
Desired Qualifications
- Experience leading teams in large-scale infrastructure environments
- Strong Linux system administration and troubleshooting skills
- Experience with NoSQL databases (Aerospike, MongoDB)
- Familiarity with Kafka or similar messaging systems
- Automation and scripting (Python, Bash, or equivalent)
- Experience with monitoring/observability (Prometheus, Grafana)
- Hands-on hardware/NoSQL SME development
- Kubernetes administration and container orchestration
- Nice-to-have knowledge of Ansible/PyInfra/Chef
- Strong leadership and mentoring abilities
- On-call and incident management experience
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.