JPMorgan Chase logo
JPMorgan Chase2 months ago

Lead Site Reliability Engineer

On-site · Columbus, Ohio, United States

Type
Full Time
Level
Senior Level
Education
License Or Certification
Company size
Enterprise
Industry
Investment Banking

Job Summary

Lead Site Reliability Engineer at JPMorgan Chase responsible for 24x7 production support and the reliability, scalability, and availability of mission-critical systems. Drive design and deployment approaches with automated CI/CD pipelines, implement infrastructure and network as code, collaborate with software engineers to improve deployment, monitoring, and incident response, and lead adoption of SRE best practices within the team. Provide on-call support and contribute to end-to-end operations, leveraging observability tools and large-scale telemetry to proactively resolve issues and optimize performance.

Required Qualifications

  • Formal training or certification in software engineering concepts with 10+ years of applied experience.
  • Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
  • Proficient in at least one programming language such as Python, Java/Spring Boot, and shell scripting.
  • Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
  • Experience with continuous integration and continuous delivery tools like Jenkins, spinnaker, or Terraform – configuration management tools like SaltStack, ansible
  • Experience in managing, administering and supporting enterprise level large scale Splunk, ELK deployments catering application monitoring and observability to large number of applications
  • Experience in managing, administering and supporting vendor products such as Netcool, Grafana, SCOM
  • Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
  • Experience with troubleshooting performance issues, common networking technologies and issues
  • Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
  • Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
  • Experience with large scale enterprise level event streaming platforms likes Kafka
  • Experience in handling critical incident and change management – be part of critical incident taskforce call.
  • Familiarity of agile practices – preferably, scrum and Kanban
  • Certifications (a plus)
  • AWS Certified SysOps Administrator or Professional, Certified Kubernetes Administrator (CKA), terraform associate level or equivalent
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

JPMorgan Chase

Lead Site Reliability Engineer

Apply on Sorce