Senior DevOps SRE Engineer
On-site · Ramat Gan, Tel Aviv, Israel
Job Summary
Senior DevOps SRE Engineer responsible for ensuring reliability, observability, and performance of Mastercard's Site Reliability/DevOps platforms. Key responsibilities include designing end-to-end observability (monitoring, logging, distributed tracing), incident management, automation for deployment and infra, performance tuning on Kubernetes, and guiding reliability-first practices. Must work with AWS, Kubernetes, Linux, and monitoring stacks (Prometheus, Thanos, ELK/Loki, Grafana, Jaeger) and have experience with AI/ML/LLM observability tools. Strong collaboration and communication skills required.
Required Qualifications
- 5+ years in SRE/DevOps/Production Engineering roles
- Deep expertise in AWS, Kubernetes, Linux
- Experience deploying and tuning monitoring tools like Prometheus, Thanos and time-series databases for metrics
- Logging with ELK stack, Loki, Grafana or alternatives
- Experience with tracing – opentelemetry, tempo, jaeger
- Strong incident management practices
- Automation for deployment and infrastructure management
- Excellent communication and collaboration skills
- Experience designing, creating, and maintaining AI-powered workflows (agent skills, prompts, instruction sets)
- Familiarity with performance optimization and cost governance in cloud environments
- Experience supporting AI/ML or LLM-based systems and related observability/evaluation tools
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.