Vice President, AI Infrastructure Engineer
$162,000–$215,000 year
Hybrid · New York City, New York, United States
Job Summary
Lead the design, build, and operation of AI-focused infrastructure platforms supporting model development, training, evaluation, and inference across cloud environments (AWS, Azure, and hybrid). Engineer scalable, reliable, and secure cloud-native services to support AI workloads; collaborate with AI Engineering and Data Science to improve developer experience, performance, and operational stability; enable production deployment of ML models and LLMs within governed enterprise environments; implement infrastructure-as-code and automation for repeatable provisioning; build observability, monitoring, and alerting for AI platforms; integrate identity, access controls, data protection, and governance with Security and Risk teams; contribute to architectural decisions and standards for AI platforms across Aladdin; participate in on-call rotations and ongoing evaluation of emerging AI infrastructure technologies within BlackRock’s enterprise context.
Required Qualifications
- Strong experience in cloud infrastructure, platform engineering, or systems engineering roles.
- Hands-on expertise with AWS and/or Azure and/or GCP, including Azure ML, Azure Foundry, AWS Bedrock, Google Vertex, as well as cloud compute, networking, storage, and security services.
- Understanding of ML platform operations and governance concepts, including model deployment strategies, lifecycle management, monitoring/observability, and Disaster Recovery.
- Experience supporting AI and machine learning workloads, with exposure to managed compute for model training and fine-tuning, experimentation over large datasets, and end-to-end MLOps pipeline flow including data ingestion, training, validation and deployment.
- Proficiency with Infrastructure as Code tools (e.g., Terraform, ARM/Bicep, CloudFormation).
- Strong programming or scripting skills (e.g., Python, Bash, or similar).
- Experience building and operating containerized and Kubernetes based platforms.
- Solid understanding of reliability, scalability, observability, and operational best practices.
- Ability to work effectively in cross-functional teams and communicate complex technical concepts clearly.
- Preferred familiarity with GPU or accelerator-based infrastructure.
- Experience working in financial services or other highly regulated industries.
- Familiarity with multi-cloud architectures and enterprise governance requirements.
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.