Basis Research logo
Basis Research6 months ago

Data Engineer, Platform

On-site · New York City, New York, United States

Type
Full Time
Level
Mid Level
Education
Not Specified
Company size
Unknown

Job Summary

Data Engineers at Basis develop trustworthy data pipelines, curate datasets, and ensure reliable data infrastructure. Responsibilities include designing and building data pipelines for model training, implementing data quality frameworks, maintaining feature stores, and ensuring data provenance. Required skills include expert-level proficiency in SQL and Python, experience with cloud data platforms, and a strong understanding of ML data requirements and data governance practices. Ideal candidates will demonstrate achievements in data engineering and possess skills in both data technologies and analytical problem-solving. Enthusiasm for building collaborative data foundations that support extensive research initiatives is essential.

Required Qualifications

  • Significant achievements in data engineering for ML/AI systems
  • Proficiency in SQL, Python for data processing, distributed computing frameworks (Spark, Dask), and workflow orchestration tools (Airflow, Dagster, Prefect)
  • Experience with cloud data platforms including data warehouses (Snowflake, BigQuery, Redshift), data lakes, object storage (S3), and streaming systems (Kafka, Kinesis, Flink)
  • Understanding ML data requirements including feature engineering, training/validation/test splits, data versioning, and experiment reproducibility
  • Skilled in data quality and governance including validation frameworks, anomaly detection, data lineage tracking, and compliance with privacy and security policies
  • Knowledge of data modeling principles for relational and NoSQL systems

Desired Qualifications

  • Experience with feature stores (Tecton, Feast) or building feature platforms
  • Background in ML research or research engineering providing understanding of data needs across experiment lifecycle
  • Experience with data lineage tools (Apache Atlas, DataHub, Monte Carlo) and metadata management
  • Knowledge of vector databases and embedding pipelines for modern AI applications
  • Contributions to data engineering open-source projects (Airflow, dbt, Great Expectations)
  • Understanding of responsible AI and data governance practices
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

Basis Research

Data Engineer, Platform

Apply on Sorce