Enchargeai36 logo
Enchargeai3611 months ago

LLM Inference Deployment Engineer

$180,000–$240,000 year

Remote · United States or Canada

Type
Full Time
Level
Mid Level
Education
Bachelors Degree
Company size
Startup

Job Summary

LLM Inference Deployment Engineer to optimize, deploy, and scale large language models for high-performance inference on energy-efficient AI accelerators. Responsibilities include deploying and optimizing LLMs post-training from libraries like HuggingFace, using inference runtimes such as ONNX Runtime and vLLM, optimizing batching and tensor parallelism for real-time applications, and building high-performance inference pipelines with Docker and Kubernetes.

Required Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field
  • Experience in LLM inference deployment, model optimization, and runtime engineering
  • Strong expertise in LLM inference frameworks (PyTorch, ONNX Runtime, vLLM, TensorRT-LLM, DeepSpeed)
  • In-depth knowledge of Python for model integration and performance tuning
  • Experience with containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe)
  • Experience with real-time LLM applications (chatbots, code generation, retrieval-augmented generation)
  • EnchargeAI is an equal employment opportunity employer in the United States

Desired Qualifications

  • Experience in LLM inference deployment
  • Model optimization
  • Runtime engineering
  • Containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe)
  • Experience with HuggingFace libraries
  • Proficiency in PyTorch and ONNX Runtime
  • Familiarity with vLLM, TensorRT-LLM, DeepSpeed
  • Real-time LLM applications (chatbots, code generation, retrieval-augmented generation)
  • Python programming for model integration and performance tuning
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

$180k – $240k / yr

LLM Inference Deployment Engineer · Enchargeai36

Apply on Sorce