JPMorgan Chase2 months ago

Lead Software Engineer-AI Platform Engineer

JPMorgan Chase

On-site · Jersey City, New Jersey, United States

Jersey City, New Jersey, United StatesOn-siteFull TimeSenior LevelLicense Or CertificationInvestment BankingEnterprise

Type

Full Time

Level

Senior Level

Education

License Or Certification

Company size

Enterprise

Industry

Investment Banking

Job Summary

Lead Software Engineer focusing on AI Platform Infrastructure. responsible for designing, developing, and deploying secure, scalable cloud infrastructure and AI/ML workloads; leading architectural evaluations with external vendors and internal teams; building CI/CD pipelines and automation for ML workloads; collaborating with AI teams to translate computational needs into infrastructure requirements; optimizing cloud resources for performance and cost; contributing to a culture of diversity, inclusion, and technical excellence; and advancing communities of practice around modern software engineering and AI platforms.

Required Qualifications

Formal training or certification in software engineering concepts with 5+ years of applied experience
Hands-on practical experience in delivering system design, application development, testing, and ensuring operational stability
Proficiency in at least one programming language (Python, Go, Java, or C#)
Proficiency in automation and continuous delivery methods
Proficient in all aspects of the Software Development Life Cycle
Demonstrated proficiency in software applications and technical processes within cloud/AI/ML domains
Foundational understanding of machine learning concepts (transformers, ML training, inference)
Experience with containerization (Docker, Kubernetes) and cloud service providers (AWS, Azure, GCP)
Experience with Infrastructure as Code
Deep understanding of cloud component architecture (microservices, containers, IaaS, storage, security)
Preferred: NVIDIA GPU infrastructure software, PyTorch, TensorBoard, MLflow, Prometheus, Grafana, vLLM, Ray, Slurm, SQL/NoSQL, Linux scripting

Desired Qualifications

Foundational understanding of machine learning concepts (transformer architecture, ML training and inference)
Hands-on experience delivering system design, application development, testing, and operational stability
Experience with containerization (Docker, Kubernetes) and cloud providers (AWS, Azure, GCP)
Experience with Infrastructure as Code
Experience with ML Ops tooling (MLflow)
Familiarity with observability tools (Prometheus, Grafana)
Strong programming skills in Python, Go, Java, or C#
CI/CD and automation proficiency
Experience with high-performance computing concepts and ML frameworks (e.g., PyTorch, TensorBoard)
Strong knowledge of network architecture, databases (SQL/NoSQL), and data modeling
Security-focused software engineering and scalable AI/ML infrastructure design
Leadership and collaboration across multiple teams and vendors
Experience with NVIDIA GPU infrastructure software (DCGM, BCM, Triton) (preferred)
Experience with ML frameworks and tools (e.g., vLLM, Ray, Slurm) (preferred)
Experience with distributed systems and microservices architecture (preferred)
Experience with monitoring/observability stacks (Prometheus, Grafana) (preferred)

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started