Site Reliability Engineer III
On-site · New York City, New York, United States
Job Summary
Site Reliability Engineer III at JPMorgan Chase within the Enterprise technology, liquidity risk team, responsible for owning non-functional requirements, driving reliability, resiliency, security, scalability, monitoring, instrumentation, and automation; lead SRE adoption across teams, set reliability metrics, coordinate post-mortems, and mentor entry- to mid-level engineers to improve customer outcomes while balancing feature delivery with system stability.
Required Qualifications
- Formal training or certification on software engineering concepts and 5+ years applied experience
- Advanced SRE knowledge and a proven track record implementing SRE practices across application and platform teams
- Experience leading technologists to resolve complex, firmwide technology issues
- Ability to influence team culture by championing innovation and change
- Experience hiring, developing, and recognizing talent
- Proficiency in at least one programming language, with preference for JavaScript, Go, or Python
- Hands-on experience with CI/CD tools (e.g., Jenkins, GitLab, Terraform)
- Experience with containers and orchestration (e.g., Docker, Kubernetes, ECS)
- Troubleshooting experience with common networking technologies and issues
- Strong fundamentals across modern architectures and observability, including GraphQL (schema design, federation/supergraph), event-driven systems (Kafka concepts like partitions/consumer groups, DLQs, replay), microservices patterns (API gateways/routers, CQRS/event sourcing), and end-to-end telemetry using OpenTelemetry (metrics/logs/traces)
Desired Qualifications
- Hands-on coding and troubleshooting ability
- Data fluency and data-driven decision making
- Proficiency in at least one programming language (JavaScript, Go, or Python)
- Experience with CI/CD tools (e.g., Jenkins, GitLab, Terraform)
- Experience with containers and orchestration (Docker, Kubernetes, ECS)
- Troubleshooting networking issues
- Strong fundamentals in modern architectures and observability (e.g., GraphQL, Kafka, OpenTelemetry)
- Experience leading technologists and mentoring engineers
- Ability to influence culture and drive change
- Experience with incident post-mortems and blameless retrospectives
- Experience with reliability engineering concepts across multiple teams
- Experience hiring, developing, and recognizing talent
- Strong collaboration with stakeholders to align reliability goals
- Ability to balance feature delivery with system stability
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.