Member of Technical Staff — ML Data Infra
$200,000–$300,000 year
On-site · Seattle, Washington, United States
Job Summary
Design, build, and operate large-scale data pipelines for ingestion, processing, filtering, and curation of multimodal training data (video, audio, text); translate research-grade data processing code into robust production pipelines; optimize pipeline throughput and efficiency at scale; build and maintain data quality systems (deduplication, filtering, validation, quality scoring); manage petabyte-scale datasets (storage architecture, versioning, lineage tracking, cost efficiency); collaborate with researchers to translate data requirements into scalable processing systems; build tooling and infrastructure to enable faster research cycles. Required strengths include proven production data-pipeline experience, proficiency with distributed frameworks (Spark, Ray, Dask), strong software engineering fundamentals, and familiarity with multimodal data and data-quality considerations; ability to ship production versions rapidly from prototypes.
Required Qualifications
- Proven experience building and operating large-scale data pipelines in production
- Strong proficiency with distributed data processing frameworks (Spark, Ray, Dask or similar)
- Solid software engineering fundamentals: clean, testable, maintainable code
- Experience with multimodal data (video, audio) and processing libraries (FFmpeg, decord) is a strong plus
- Familiarity with ML data pipelines and understanding of how data quality/format affect model training
- Ability to move fast: ship production versions quickly from prototype scripts
- Experience building data pipelines for large-scale model training (bonus)
- Data versioning and lineage tools experience (DVC, Delta Lake, Apache Iceberg, etc.) (bonus)
- Experience with streaming data pipelines or online data processing (bonus)
- Prior work at an AI lab, video platform, or data-intensive company (bonus)
- Contributions to open-source data tooling (bonus)
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.