Magic Dev3 months ago

Member of Technical Staff, Pre-training Systems

Magic Dev

$225,000–$550,000 year

On-site · San Francisco, California, United States

San Francisco, California, United StatesOn-siteFull Time$225,000–$550,000 yearMid LevelNot SpecifiedArtificial IntelligenceUnknown

Type

Full Time

Level

Mid Level

Education

Not Specified

Company size

Unknown

Industry

Artificial Intelligence

Job Summary

As a Software Engineer on the Pre-training Systems team, you will design and operate distributed infrastructure for training long-context models at scale. Responsibilities include scaling distributed training across large GPU clusters, optimizing communication patterns, improving checkpointing and fault tolerance systems, and eliminating performance bottlenecks. The role requires a strong foundation in software engineering, experience with distributed systems, debugging skills in production ML systems, and a proven track record in performance optimization.

Required Qualifications

Experience training large models in multi-node GPU environments
Strong software engineering and distributed systems fundamentals

Desired Qualifications

Strong software engineering and distributed systems fundamentals
Experience training large models in multi-node GPU environments
Deep understanding of parallelism strategies and performance trade-offs
Experience debugging cross-layer issues in production ML systems
Strong ownership mindset and ability to operate critical infrastructure
Track record of improving performance or reliability of large-scale systems

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started