Data & Software Engineer
On-site · Chantilly, Virginia, United States
Job Summary
Data & Software Engineer to build and maintain end-to-end data pipelines for a production application. Focus on Python-centric data engineering, Spark-based processing, and scalable ETL workflows. You will collaborate with stakeholders to translate data requirements into robust pipelines, troubleshoot data quality issues, and contribute to platform modernization. Key duties include designing data pipelines with PySpark, configuring and updating Spark jobs, containerizing apps for AWS deployment, tuning MySQL/PostgreSQL schemas for analytical workloads, leveraging Git and IaC tooling, managing data catalogs and data lineage, handling diverse data formats (including geospatial data such as PostGIS), automating tasks with Bash, and integrating AI/ML services. Essentials include 5+ years in Spark/PySpark, Python (Pandas/NumPy), Docker/Podman, AWS (S3, Lambda, Step Functions), Iceberg, Airflow, SQL with Trino, NoSQL (DynamoDB), Unity Catalog OSS, Polaris, Superset, Terraform/CloudFormation, OpenLineage, H3, and PostGIS; experience with data migration, documentation, and best-practices development.
Required Qualifications
- Minimum of 5 years' experience with Apache Spark & PySpark
- Advanced Python skills (including Pandas & NumPy)
- Docker, Podman
- AWS S3, Lambda & Step Functions
- Apache Iceberg, Airflow, etc.
- SQL (with Trino)
- NoSQL, DynamoDB
- Unity Catalog OSS, Apache Polaris
- Apache Superset
- Terraform or CloudFormation
- OpenLineage
- H3, PostGIS
- Geospatial experience
- Experience building end-to-end data pipelines
- Experience with orchestration tools to deploy data pipelines
- Containerizing and deploying applications in cloud environments like AWS
- Experience with data governance concepts
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.