Coupang Internal logo
Coupang Internal1 month ago

Senior Staff Cloud Backend Engineer - Observability and Site Reliability

Hybrid · Bengaluru, Karnataka, India

Type
Full Time
Level
Senior Level
Education
Bachelors Degree
Company size
Large
Industry
E-commerce

Job Summary

Senior Staff Data Centre Observability and Site Reliability Engineer responsible for designing, building, and operating scalable observability and reliability solutions for large-scale datacenter infrastructure. Focus on developing high-performance monitoring and telemetry platforms, ensuring system reliability, and driving operational excellence through automation, performance optimization, and SRE best practices. Collaborates with cross-functional teams to enhance visibility, resilience, and efficiency of critical systems. Responsibilities include designing/implementing observability solutions (monitoring, logging, alerting, telemetry), building dashboards and reports, applying SRE principles, leading root cause analyses, optimizing performance, automating infrastructure provisioning, and ensuring security/compliance. Proficiencies include Go/Python, Kubernetes internals, Prometheus, Grafana, ELK, cloud platforms (AWS/Azure/GCP), and a hybrid work model with at least 3 days in office per week.

Required Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field
  • 12+ years of progressive software engineering experience, with a heavy emphasis on distributed systems, cloud-native architectures, or platform operations
  • Proven experience in managing and optimizing large-scale datacenter environments
  • Strong proficiency in Go or Python, with a deep understanding of networked systems and performance optimization
  • Expert-level knowledge of Kubernetes internals (scheduling, controllers) and containerization ecosystems
  • Proven experience with load balancing, service mesh, and request routing at scale
  • Proficiency in observability tools and technologies (e.g., Prometheus, Grafana, ELK Stack)
  • Experience with SRE practices and tools (e.g., Kubernetes, Docker, Terraform)
  • Familiarity with cloud platforms (AWS, Azure, GCP) and their observability and reliability services
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

Coupang Internal

Senior Staff Cloud Backend Engineer - Observability and Site Reliability

Apply on Sorce