StackAIV logo
StackAIV2 days ago

Senior Compute Platform Engineer

Remote · Pittsburgh, Pennsylvania, United States

Type
Full Time
Level
Senior Level
Education
Not Specified
Company size
Startup
Industry
Technology

Job Summary

Senior Compute Platform Engineer accountable for designing and operating high-scale batch compute systems and workflow orchestration that power engineers across the company. Responsibilities include designing and operating distributed systems for scheduling and executing large-scale batch workloads across Kubernetes clusters; building and maintaining compute platform abstractions; optimizing compute resource utilization; developing and improving multi-tenant scheduling strategies; enhancing reliability and fault tolerance of large-scale distributed jobs and platform components; cross-team collaboration to understand workload requirements and improve platform capabilities; contributing to platform tooling, automation, and CI/CD workflows.

Required Qualifications

  • 7+ years of experience building and operating distributed systems or infrastructure platforms
  • Strong experience with Kubernetes and container orchestration in production grade environments
  • Proficiency developing in Golang and Python
  • Experience designing and operating large-scale batch compute systems
  • Strong debugging and problem-solving skills in complex distributed systems
  • Ability to collaborate across teams and communicate technical concepts clearly
  • Experience with at least one batch scheduling system such as Kueue, Armada, Volcano, or Slurm
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

StackAIV

Senior Compute Platform Engineer

Apply on Sorce