Fluidstack logo
Fluidstack5 days ago

Production Engineer, Compute

$175,000–$300,000 year

Remote · New York City, New York, United States or Seattle, Washington, United States

Type
Full Time
Level
Mid Level
Education
Not Specified
Company size
Unknown

Job Summary

Production Engineer, Compute role focused on end-to-end compute fleet health and automation across GPU/TPU infrastructure. Own metrics pipelines, alerting, and a unified health view for Kubernetes‐orchestrated workloads and bare metal compute. Design and implement an automation-driven repair pipeline from detection through triage, parts management, and return to service; build the XPU qualification platform with burn-in, performance baselining, and NPI execution; own Redfish and BMC tooling, firmware telemetry, and fleet‐scale log collection. Drive end-to-end reliability, scalability, and operational discipline for one of the world’s largest XPU fleets, leveraging incident response, tooling, and AI-assisted development. Requirements emphasize toil as a bug mindset, hardware intuition at firmware/silicon levels, capability to operate under ambiguity, rapid learning, and proficiency with AI tooling and production automation (Go, Python). Bonus areas include hardware lifecycle/RMA automation, workflow engines, and metrics/alerting pipelines (Prometheus, Grafana).

Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

$175k – $300k / yr

Production Engineer, Compute · Fluidstack

Apply on Sorce