PhonePe logo
PhonePe12 months ago

Site Reliability Engineer - Big Data (7 to 11 years)

On-site · Karnataka, India

Type
Full Time
Level
Senior Level
Education
Not Specified
Company size
Enterprise
Industry
Fintech Services

Job Summary

Site Reliability Engineer - Big Data role responsible for managing and maintaining distributed big data ecosystems to ensure reliability, scalability, and security of large-scale production infrastructure. Key responsibilities include leading on-call rotations, incident response and postmortems, designing automation for provisioning, scaling, upgrades, and patching of clusters; troubleshooting complex production issues; designing scalable architectures; enforcing security standards; driving standardization and proactive monitoring, capacity planning, and performance tuning. Collaborates with development teams to integrate reliability, scalability, and performance practices into the software lifecycle; develops automation tools and scripts to reduce manual work; stays updated on industry trends and contributes to technology communities. Strong hands-on experience with Linux, Hadoop stack (HDFS, HBase, Airflow, YARN, Ranger, Kafka, Pinot), scripting languages (Perl, Python, Golang), open-source CM tools (Puppet/Salt/Chef/Ansible), and DevOps tooling (Saltstack, Ansible, Docker, Git)."

Required Qualifications

  • 7+ years of experience in managing and maintaining distributed big data ecosystems
  • Strong Linux expertise (IP, iptables, IPsec)
  • Scripting/programming in Perl, Golang, or Python
  • Hands-on Hadoop stack (HDFS, HBase, Airflow, YARN, Ranger, Kafka, Pinot)
  • Experience with configuration management/deployment tools (Puppet, Salt, Chef, Ansible)
  • Solid understanding of networking and open-source technologies
  • DevOps tools: Saltstack, Ansible, Docker, Git
  • SRE logging/monitoring tools: ELK, Grafana, Prometheus, OpenTelemetry
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

PhonePe

Site Reliability Engineer - Big Data (7 to 11 years)

Apply on Sorce