NielsenIQ1 day ago

Principal Software Engineer

NielsenIQ

On-site · Chennai, Tamil Nadu, India

Chennai, Tamil Nadu, IndiaOn-siteFull TimeSenior LevelBachelors DegreeEnterprise

Type

Full Time

Level

Senior Level

Education

Bachelors Degree

Company size

Enterprise

Job Summary

Principal Software Engineer (Site Reliability & Application Support) drives the reliability strategy for large-scale, cloud-native apps spanning Angular front-end, Node.js services, Java back-end, and Python tooling. Own end-to-end reliability, monitor and triage production incidents, perform RCAs, define SLI/SLOs, and lead post-mortems. Design end-to-end observability across logs, metrics, traces, and synthetic monitoring; build dashboards and alerting; drive automation to reduce toil; coordinate releases with safe deployment practices; collaborate with development, platform, and architecture teams to embed reliability as a core engineering concern. Must have extensive hands-on experience in SRE, incident triage, observe tooling (Prometheus, Grafana, OpenTelemetry, Datadog, Dynatrace, Splunk, ELK), and cloud/container ecosystems (Azure/AWS/GCP, Docker, Kubernetes). Strong communication, leadership, and problem-solving skills are essential.

Required Qualifications

Educational: Bachelor's or Master's degree in Computer Science, Information Technology, or a related field
Experience: 10–15+ years hands-on software engineering and/or SRE experience
Technical: strong experience across Angular, Node.js, Java, Python; SRE & reliability engineering fundamentals; incident management; observability tooling; automation/scripting; cloud platforms (Azure/AWS/GCP); containers and orchestration (Docker, Kubernetes)
Leadership: demonstrated leadership at Staff/Principal/Architect level and ability to influence reliability strategy across teams

Desired Qualifications

Must have 10–15+ years of hands-on software engineering and/or SRE experience
Proven experience designing and operating enterprise-grade, large-scale production systems
Demonstrated impact at Staff / Principal / Architect level in SRE, platform engineering, or application-reliability
Strong background in influencing reliability and observability strategy across multiple teams or platforms
Demonstrated experience leading incident triage and driving resolution in high-pressure, high-stakes environments
Bachelor's or master's degree in Computer Science, Information Technology, or a related field
Leadership & Soft Skills: exceptional analytical, diagnostic, and structured problem-solving skills; strong written and verbal communication; ability to lead under pressure; high ownership and bias for action; collaborative mindset; continuous improvement orientation
Nice to Have: Kafka, event-driven architectures, streaming system observability; security monitoring and vulnerability management in production; experience with Spark, BigQuery, Databricks; chaos engineering principles and tooling; certifications: AWS/Azure/GCP Associate or Professional, CKA (Certified Kubernetes Administrator) or equivalent

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started