Principal Machine Learning Engineer (MLE)
$200,000–$300,000 year
On-site · Toronto, Ontario, Canada or Dallas, Texas, United States
Job Summary
Principal Machine Learning Engineer responsible for designing, building, deploying, and scaling machine learning and LLM-based solutions for production use across multi-cloud environments (GCP, AWS, Azure). Collaborate with AI Center of Excellence and business teams to translate advanced ML capabilities into reliable, production-grade systems. Develop end-to-end ML pipelines (data ingestion, feature engineering, model training, evaluation, deployment, monitoring), architect and implement LLM-powered systems across cloud platforms, and optimize workflows for performance, scalability, reliability, and cost. Containerize services, apply MLOps best practices (CI/CD, model versioning, experiment tracking, automated retraining), work with PyTorch and TensorFlow, deploy models in production with A/B testing, and translate insights into actionable improvements. Build and deploy classical ML models, NLP applications (sentiment analysis, summarization, Q&A, chatbots), information retrieval, and computer vision solutions (e.g., YOLOv7, DDRNet, RFTM) using datasets like COCO and Cityscapes. Demonstrate strong communication and collaboration to influence stakeholders across multi-cloud teams.
Required Qualifications
- PhD with 5+ years (or Master’s with 6+ years, or Bachelor’s with 7+ years) in Machine Learning, Computer Science, Data Science, or related field
- Strong proficiency in Python for machine learning and production systems
- Hands-on experience with at least one major cloud platform (GCP, Azure, or AWS)
- Experience building and deploying production-grade ML systems
- Strong communication skills to explain technical concepts to stakeholders
- Excellent time management, collaboration, and organizational skills
- Experience with deep learning frameworks (PyTorch, TensorFlow)
- Experience containerizing ML services (Docker) and deploying with Kubernetes or similar
- Experience with NLP fundamentals, transformers, embeddings, and text preprocessing
- Experience with ML lifecycle tooling (CI/CD, model versioning, experiment tracking, automated retraining)
- Experience with end-to-end ML pipelines including data ingestion, feature engineering, model training, evaluation, deployment, monitoring
- Experience with ML applications including NLP (sentiment analysis, summarization, Q&A), information retrieval, and computer vision (e.g., image classification, object detection)
- Experience deploying models in production and conducting A/B testing
- Ability to work across multi-cloud environments (GCP, AWS, Azure)
- Experience with evaluating buy vs. build decisions for AI applications
- Strong collaboration with business stakeholders and Generative AI Center of Excellence leaders
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.