Nuance Labs logo
Nuance Labs1 week ago

Member of Technical Staff — ML Data Infra

$200,000–$300,000 year

On-site · Seattle, Washington, United States

Type
Full Time
Level
Mid Level
Education
Not Specified
Company size
Startup

Job Summary

Design, build, and operate large-scale data pipelines for ingestion, processing, filtering, and curation of multimodal training data (video, audio, text); translate research-grade data processing code into robust production pipelines; optimize pipeline throughput and efficiency at scale; build and maintain data quality systems (deduplication, filtering, validation, quality scoring); manage petabyte-scale datasets (storage architecture, versioning, lineage tracking, cost efficiency); collaborate with researchers to translate data requirements into scalable processing systems; build tooling and infrastructure to enable faster research cycles. Required strengths include proven production data-pipeline experience, proficiency with distributed frameworks (Spark, Ray, Dask), strong software engineering fundamentals, and familiarity with multimodal data and data-quality considerations; ability to ship production versions rapidly from prototypes.

Required Qualifications

  • Proven experience building and operating large-scale data pipelines in production
  • Strong proficiency with distributed data processing frameworks (Spark, Ray, Dask or similar)
  • Solid software engineering fundamentals: clean, testable, maintainable code
  • Experience with multimodal data (video, audio) and processing libraries (FFmpeg, decord) is a strong plus
  • Familiarity with ML data pipelines and understanding of how data quality/format affect model training
  • Ability to move fast: ship production versions quickly from prototype scripts
  • Experience building data pipelines for large-scale model training (bonus)
  • Data versioning and lineage tools experience (DVC, Delta Lake, Apache Iceberg, etc.) (bonus)
  • Experience with streaming data pipelines or online data processing (bonus)
  • Prior work at an AI lab, video platform, or data-intensive company (bonus)
  • Contributions to open-source data tooling (bonus)
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

$200k – $300k / yr

Member of Technical Staff — ML Data Infra · Nuance Labs

Apply on Sorce