Lambda logo
Lambda1 week ago

Senior Incident Manager

$125,000–$195,000 year

Remote · United States or San Jose, California, United States

Type
Full Time
Level
Senior Level
Education
Not Specified
Company size
Unknown

Job Summary

Senior Incident Manager to lead end-to-end lifecycle of operational incidents impacting AI infrastructure and data center services. Acts as the central command during major incidents, coordinating cross-team response, triage, and post-incident analysis; drives incident management best practices across data center operations, infrastructure engineering/operations, networking, platform reliability, and security operations; participates in on-call rotation and delivers executive-level incident summaries and dashboards; improves operational resilience, incident tooling, runbooks, and reliability frameworks.

Required Qualifications

  • 8+ years experience in incident management, site reliability engineering, or infrastructure operations
  • Experience managing incidents in large-scale distributed infrastructure environments
  • Strong understanding of data center operations, GPU compute clusters, networking and storage infrastructure
  • Experience with incident management frameworks (ITIL, SRE, or equivalent)
  • Excellent communication and stakeholder management skills
  • Experience with incident tracking and monitoring tools such as PagerDuty, ServiceNow, Jira, Datadog, Prometheus, Grafana
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

$125k – $195k / yr

Senior Incident Manager · Lambda

Apply on Sorce