AI Infrastructure Engineer
On-site · Montréal, Quebec, Canada
Job Summary
Design and implement AI/ML-powered solutions for infrastructure use cases such as predictive autoscaling and automated remediation using cloud environments. Responsibilities include building and maintaining AI-driven monitoring systems, developing automated incident response workflows, integrating AI tooling into CI/CD pipelines, and contributing to AI-based tools for self-service infrastructure guidance. Candidates should have 5–7 years of relevant experience, hands-on expertise with GCP and/or AWS, and practical experience in AI/ML integrations. Proficiency in Python and familiarity with Kubernetes and observability stacks are essential.
Required Qualifications
- 5–7 years of experience in infrastructure engineering, DevOps, SRE, or a related field
- Hands-on experience with GCP (priority) and/or AWS; solid understanding of cloud resource management, scaling, and cost structures
- Practical experience building or integrating AI/ML-powered tools in an operational context (anomaly detection, predictive models, LLM-based automation, or similar)
- Experience with infrastructure-as-code tools — Terraform, Puppet, Ansible, or equivalent
- Proficiency in Python for scripting, automation, and AI/ML integration; Bash or Go a plus
- Working knowledge of Kubernetes and container orchestration in production environments
- Familiarity with observability and monitoring stacks (Prometheus, Grafana, ELK, Datadog, or similar)
- Familiarity with LLM APIs (OpenAI, Anthropic, or similar) and prompt engineering for operational use cases
- Strong problem-solving mindset with a bias toward automation and eliminating toil
- Fluent in English (written and verbal)
Desired Qualifications
- Experience with AI workflow orchestration frameworks (LangChain, LlamaIndex, n8n, or similar)
- Exposure to AIOps platforms (Dynatrace, Datadog AI, Moogsoft, BigPanda, or similar)
- Background in FinOps or AI-driven cloud cost optimization
- Familiarity with vector databases (Weaviate, Pinecone, Qdrant) for knowledge retrieval systems
- Experience with VMware or hybrid cloud environments
- GCP and/or AWS cloud certifications
- Prior experience in gaming, high-growth tech, or SaaS platform environments
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.