Site Reliability Engineer III- Kafka Platform
On-site · Jersey City, New Jersey, United States
Job Summary
Site Reliability Engineer III at JPMorgan Chase within Infrastructure Platforms focused on Kafka-based architectures and mission-critical systems. You will design, implement, and operate end-to-end reliability across public and private clouds, including deploying scalable, observable platforms using code and cloud infrastructure; contribute to deployment strategies via CI/CD pipelines; ensure robust logging, monitoring, security, and auditability; participate in on-call rotations to proactively resolve platform issues; and collaborate with software engineers to improve availability, scalability, and deployment approaches using Kafka, Kafka Connect, and related distributed systems technologies.
Required Qualifications
- Formal training or certification on computer science and reliability concepts and 3+ years applied experience
- Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
- Proficient in at least one programming language such as Java/Spring Boot, python
- Experience with observability tools and telemetry collection (Grafana, Dynatrace, Prometheus, Datadog, Splunk)
- Experience with public cloud platforms like AWS, GCP or Azure
- Experience with Kafka ecosystem products: Kafka, Kafka Connect, Kafka Streams
- Experience with CI/CD tools like Jenkins, GitLab, or Terraform
- Familiarity with container and container orchestration (ECS, Kubernetes, Docker)
- Familiarity with troubleshooting networking technologies and issues
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.