JPMorgan Chase logo
JPMorgan Chase3 weeks ago

Senior Lead Site Reliability Engineer

On-site · Glasgow, Scotland, United Kingdom

Type
Full Time
Level
Senior Level
Education
Not Specified
Company size
Enterprise
Industry
Financial Services

Job Summary

Senior Lead Site Reliability Engineer responsible for guiding reliability, observability, and performance across large-scale platforms; develop production code for reliability tooling and telemetry pipelines; lead incidents and drive blameless postmortems; define and implement SLOs/SLIs; design, deploy, and maintain OpenTelemetry-based telemetry ingestion and processing in hybrid on-prem/cloud environments with backends like InfluxDB, Prometheus, Elasticsearch, and OpenSearch; migrate legacy telemetry to standardized instrumentation; mentor engineers and influence broader engineering practices to advance observability and reliability technologies; requires strong programming, cloud-native, container orchestration, and incident-response skills.

Required Qualifications

  • Formal training or certification on software engineering concepts
  • Advanced knowledge of reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices
  • Advanced proficiency in one or more programming languages (e.g., Java, Python, Go)
  • Advanced proficiency with observability tools (Grafana, Dynatrace, Prometheus, Datadog, Splunk, Elasticsearch, OpenSearch)
  • Proficiency with CI/CD tools (Jenkins, GitLab, Terraform)
  • Experience with container orchestration (ECS, Kubernetes, Docker)
  • Hands-on experience with OpenTelemetry collectors in production
  • Ability to tackle reliability design independently
  • Practical cloud native experience
  • Ability to collaborate across stakeholder groups
  • Knowledge of distributed tracing, metrics, and logging best practices
  • Certification in AWS, Kubernetes, or relevant technologies
  • Track record in system health monitoring, capacity management, blameless postmortems
  • Understanding of distributed system design principles, networking, and Linux internals
  • Contributions to open-source observability or telemetry projects
  • Experience with agent control planes and management protocols such as OpAMP
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

JPMorgan Chase

Senior Lead Site Reliability Engineer

Apply on Sorce