Fluidstack5 days ago

Production Engineer, Compute

Fluidstack

$175,000–$300,000 year

Remote · New York City, New York, United States or Seattle, Washington, United States

New York City, New York, United States or Seattle, Washington, United StatesRemoteFull Time$175,000–$300,000 yearMid LevelNot SpecifiedUnknown

Type

Full Time

Level

Mid Level

Education

Not Specified

Company size

Unknown

Job Summary

Production Engineer, Compute role focused on end-to-end compute fleet health and automation across GPU/TPU infrastructure. Own metrics pipelines, alerting, and a unified health view for Kubernetes‐orchestrated workloads and bare metal compute. Design and implement an automation-driven repair pipeline from detection through triage, parts management, and return to service; build the XPU qualification platform with burn-in, performance baselining, and NPI execution; own Redfish and BMC tooling, firmware telemetry, and fleet‐scale log collection. Drive end-to-end reliability, scalability, and operational discipline for one of the world’s largest XPU fleets, leveraging incident response, tooling, and AI-assisted development. Requirements emphasize toil as a bug mindset, hardware intuition at firmware/silicon levels, capability to operate under ambiguity, rapid learning, and proficiency with AI tooling and production automation (Go, Python). Bonus areas include hardware lifecycle/RMA automation, workflow engines, and metrics/alerting pipelines (Prometheus, Grafana).

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started