JPMorgan Chase logo
JPMorgan Chase2 months ago

Lead Software Engineer-AI Platform Engineer

On-site · Jersey City, New Jersey, United States

Type
Full Time
Level
Senior Level
Education
License Or Certification
Company size
Enterprise
Industry
Investment Banking

Job Summary

Lead Software Engineer focusing on AI Platform Infrastructure. responsible for designing, developing, and deploying secure, scalable cloud infrastructure and AI/ML workloads; leading architectural evaluations with external vendors and internal teams; building CI/CD pipelines and automation for ML workloads; collaborating with AI teams to translate computational needs into infrastructure requirements; optimizing cloud resources for performance and cost; contributing to a culture of diversity, inclusion, and technical excellence; and advancing communities of practice around modern software engineering and AI platforms.

Required Qualifications

  • Formal training or certification in software engineering concepts with 5+ years of applied experience
  • Hands-on practical experience in delivering system design, application development, testing, and ensuring operational stability
  • Proficiency in at least one programming language (Python, Go, Java, or C#)
  • Proficiency in automation and continuous delivery methods
  • Proficient in all aspects of the Software Development Life Cycle
  • Demonstrated proficiency in software applications and technical processes within cloud/AI/ML domains
  • Foundational understanding of machine learning concepts (transformers, ML training, inference)
  • Experience with containerization (Docker, Kubernetes) and cloud service providers (AWS, Azure, GCP)
  • Experience with Infrastructure as Code
  • Deep understanding of cloud component architecture (microservices, containers, IaaS, storage, security)
  • Preferred: NVIDIA GPU infrastructure software, PyTorch, TensorBoard, MLflow, Prometheus, Grafana, vLLM, Ray, Slurm, SQL/NoSQL, Linux scripting

Desired Qualifications

  • Foundational understanding of machine learning concepts (transformer architecture, ML training and inference)
  • Hands-on experience delivering system design, application development, testing, and operational stability
  • Experience with containerization (Docker, Kubernetes) and cloud providers (AWS, Azure, GCP)
  • Experience with Infrastructure as Code
  • Experience with ML Ops tooling (MLflow)
  • Familiarity with observability tools (Prometheus, Grafana)
  • Strong programming skills in Python, Go, Java, or C#
  • CI/CD and automation proficiency
  • Experience with high-performance computing concepts and ML frameworks (e.g., PyTorch, TensorBoard)
  • Strong knowledge of network architecture, databases (SQL/NoSQL), and data modeling
  • Security-focused software engineering and scalable AI/ML infrastructure design
  • Leadership and collaboration across multiple teams and vendors
  • Experience with NVIDIA GPU infrastructure software (DCGM, BCM, Triton) (preferred)
  • Experience with ML frameworks and tools (e.g., vLLM, Ray, Slurm) (preferred)
  • Experience with distributed systems and microservices architecture (preferred)
  • Experience with monitoring/observability stacks (Prometheus, Grafana) (preferred)
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

JPMorgan Chase

Lead Software Engineer-AI Platform Engineer

Apply on Sorce