Together AI logo
Together AI4 months ago

Senior Software Engineer Together Cloud Infrastructure

Hybrid · Amsterdam, North Holland, The Netherlands

Type
Full Time
Level
Senior Level
Education
No Requirement
Company size
Unknown
Industry
Artificial Intelligence

Job Summary

Senior AI Infrastructure Engineer needed to design, build, and operate scalable backend services and the IaaS layer for Together AI's cloud infrastructure. The role focuses on designing highly-available backend services for data-center hardware management (e.g., Infiniband, GPU virtualization), building out an IaaS layer for a new GB200 data center, and developing a global object store for large-scale datasets. You will contribute to the core Together AI platform, create tooling and docs, and implement testing frameworks for robustness and fault-tolerance. Requirements include 5+ years of professional software development, strong backend skills (Golang preferred), experience with micro-services across cloud providers, deep Kubernetes expertise, VPN/VPC and DC networking knowledge, and familiarity with automation and observability tools. The role is hybrid with two days in the Amsterdam office and involves strong collaboration and communication across technical and non-technical teams.

Required Qualifications

  • 5+ years of professional software development experience
  • Proficiency in at least one backend programming language (Golang desired)
  • 5+ years experience writing high-performance, production-quality code
  • Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Deep experience with Kubernetes internals (operators, plugins, custom schedulers, or patches) or Kubernetes itself
  • Deep experience with VMs/hypervisors (QEMU/KVM, cloud-hypervisor, VFIO, virtio, PCIe passthrough, Kubevirt, SR-IOV)
  • Deep experience with DC networking (VLAN, VXLAN, VPN, VPC, OVS/OVN)
  • Experience with Cluster API or similar
  • Experience with infrastructure automation tools (Terraform, Ansible) and monitoring stacks (Prometheus, Grafana) and CI/CD pipelines (GitHub Actions, ArgoCD)
  • Experience building IaaS or PaaS systems at scale
  • Experience with DPUs/SmartNICs
  • GPU programming, NCCL, CUDA knowledge
  • Experience with distributed compute, storage, and networking
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

Together AI

Senior Software Engineer Together Cloud Infrastructure

Apply on Sorce