Senior Software Engineer Together Cloud Infrastructure
Hybrid · Amsterdam, North Holland, The Netherlands
Job Summary
Senior AI Infrastructure Engineer needed to design, build, and operate scalable backend services and the IaaS layer for Together AI's cloud infrastructure. The role focuses on designing highly-available backend services for data-center hardware management (e.g., Infiniband, GPU virtualization), building out an IaaS layer for a new GB200 data center, and developing a global object store for large-scale datasets. You will contribute to the core Together AI platform, create tooling and docs, and implement testing frameworks for robustness and fault-tolerance. Requirements include 5+ years of professional software development, strong backend skills (Golang preferred), experience with micro-services across cloud providers, deep Kubernetes expertise, VPN/VPC and DC networking knowledge, and familiarity with automation and observability tools. The role is hybrid with two days in the Amsterdam office and involves strong collaboration and communication across technical and non-technical teams.
Required Qualifications
- 5+ years of professional software development experience
- Proficiency in at least one backend programming language (Golang desired)
- 5+ years experience writing high-performance, production-quality code
- Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
- Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
- Deep experience with Kubernetes internals (operators, plugins, custom schedulers, or patches) or Kubernetes itself
- Deep experience with VMs/hypervisors (QEMU/KVM, cloud-hypervisor, VFIO, virtio, PCIe passthrough, Kubevirt, SR-IOV)
- Deep experience with DC networking (VLAN, VXLAN, VPN, VPC, OVS/OVN)
- Experience with Cluster API or similar
- Experience with infrastructure automation tools (Terraform, Ansible) and monitoring stacks (Prometheus, Grafana) and CI/CD pipelines (GitHub Actions, ArgoCD)
- Experience building IaaS or PaaS systems at scale
- Experience with DPUs/SmartNICs
- GPU programming, NCCL, CUDA knowledge
- Experience with distributed compute, storage, and networking
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.