Enchargeai3611 months ago

LLM Inference Deployment Engineer

Enchargeai36

$180,000–$240,000 year

Remote · United States or Canada

United States or CanadaRemoteFull Time$180,000–$240,000 yearMid LevelBachelors DegreeStartup

Type

Full Time

Level

Mid Level

Education

Bachelors Degree

Company size

Startup

Job Summary

LLM Inference Deployment Engineer to optimize, deploy, and scale large language models for high-performance inference on energy-efficient AI accelerators. Responsibilities include deploying and optimizing LLMs post-training from libraries like HuggingFace, using inference runtimes such as ONNX Runtime and vLLM, optimizing batching and tensor parallelism for real-time applications, and building high-performance inference pipelines with Docker and Kubernetes.

Required Qualifications

Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field
Experience in LLM inference deployment, model optimization, and runtime engineering
Strong expertise in LLM inference frameworks (PyTorch, ONNX Runtime, vLLM, TensorRT-LLM, DeepSpeed)
In-depth knowledge of Python for model integration and performance tuning
Experience with containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe)
Experience with real-time LLM applications (chatbots, code generation, retrieval-augmented generation)
EnchargeAI is an equal employment opportunity employer in the United States

Desired Qualifications

Experience in LLM inference deployment
Model optimization
Runtime engineering
Containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe)
Experience with HuggingFace libraries
Proficiency in PyTorch and ONNX Runtime
Familiarity with vLLM, TensorRT-LLM, DeepSpeed
Real-time LLM applications (chatbots, code generation, retrieval-augmented generation)
Python programming for model integration and performance tuning

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started