Data Engineer, Platform
On-site · New York City, New York, United States
Job Summary
Data Engineers at Basis develop trustworthy data pipelines, curate datasets, and ensure reliable data infrastructure. Responsibilities include designing and building data pipelines for model training, implementing data quality frameworks, maintaining feature stores, and ensuring data provenance. Required skills include expert-level proficiency in SQL and Python, experience with cloud data platforms, and a strong understanding of ML data requirements and data governance practices. Ideal candidates will demonstrate achievements in data engineering and possess skills in both data technologies and analytical problem-solving. Enthusiasm for building collaborative data foundations that support extensive research initiatives is essential.
Required Qualifications
- Significant achievements in data engineering for ML/AI systems
- Proficiency in SQL, Python for data processing, distributed computing frameworks (Spark, Dask), and workflow orchestration tools (Airflow, Dagster, Prefect)
- Experience with cloud data platforms including data warehouses (Snowflake, BigQuery, Redshift), data lakes, object storage (S3), and streaming systems (Kafka, Kinesis, Flink)
- Understanding ML data requirements including feature engineering, training/validation/test splits, data versioning, and experiment reproducibility
- Skilled in data quality and governance including validation frameworks, anomaly detection, data lineage tracking, and compliance with privacy and security policies
- Knowledge of data modeling principles for relational and NoSQL systems
Desired Qualifications
- Experience with feature stores (Tecton, Feast) or building feature platforms
- Background in ML research or research engineering providing understanding of data needs across experiment lifecycle
- Experience with data lineage tools (Apache Atlas, DataHub, Monte Carlo) and metadata management
- Knowledge of vector databases and embedding pipelines for modern AI applications
- Contributions to data engineering open-source projects (Airflow, dbt, Great Expectations)
- Understanding of responsible AI and data governance practices
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.