Senior Quantization Engineer - Edge AI
On-site · Hyderabad, Telangana, India
Job Summary
Senior Quantization Engineer for Edge AI focused on model quantization and optimization including speculative decoding, pruning, and other techniques for efficient on-device deployment. Focus areas include CNNs, Large Language Models (LLMs) and Vision-Language Models (VLMs) optimization to enhance NXP’s Ara2 NPUs. Responsibilities include researching latest work (NeurIPS/ICLR/CVPR), prototyping within hardware constraints, producing robust production code (C++/Python) with strict memory/compute efficiency, documenting algorithmic tradeoffs, mentoring on numerical methods, and contributing to IP through patents/publications. Preferred education includes MSc or Ph.D. with strong ML focus; required AI/ML background, PyTorch/ONNX proficiency, embedded systems awareness, and advancement in quantization techniques; experience with NPUs, hardware profiling, and MLIR/TVM knowledge are highly valued.
Required Qualifications
- MSc or Ph.D (focus on Machine Learning or Deep Learning)
- Proven practical experience in AI/ML with understanding of CNN architectures and Generative AI (Transformers, LLMs, VLMs)
- Hands-on experience with PyTorch, ONNX, and model conversion/optimization pipelines
- Proficient in Python and C++
- Familiarity with embedded systems constraints (latency, power, memory bandwidth)
- Experience with quantization techniques for discriminative and generative AI (e.g., GPTQ, SpinQuant)
- Experience with NPUs, device-level profiling, memory bottlenecks diagnostics
- Kernel development experience is a plus
- Knowledge of MLIR or TVM is a significant plus
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.