Together AI4 months ago

Senior Software Engineer Together Cloud Infrastructure

Together AI

Hybrid · Amsterdam, North Holland, The Netherlands

Amsterdam, North Holland, The NetherlandsHybridFull TimeSenior LevelNo RequirementArtificial IntelligenceUnknown

Type

Full Time

Level

Senior Level

Education

No Requirement

Company size

Unknown

Industry

Artificial Intelligence

Job Summary

Senior AI Infrastructure Engineer needed to design, build, and operate scalable backend services and the IaaS layer for Together AI's cloud infrastructure. The role focuses on designing highly-available backend services for data-center hardware management (e.g., Infiniband, GPU virtualization), building out an IaaS layer for a new GB200 data center, and developing a global object store for large-scale datasets. You will contribute to the core Together AI platform, create tooling and docs, and implement testing frameworks for robustness and fault-tolerance. Requirements include 5+ years of professional software development, strong backend skills (Golang preferred), experience with micro-services across cloud providers, deep Kubernetes expertise, VPN/VPC and DC networking knowledge, and familiarity with automation and observability tools. The role is hybrid with two days in the Amsterdam office and involves strong collaboration and communication across technical and non-technical teams.

Required Qualifications

5+ years of professional software development experience
Proficiency in at least one backend programming language (Golang desired)
5+ years experience writing high-performance, production-quality code
Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
Deep experience with Kubernetes internals (operators, plugins, custom schedulers, or patches) or Kubernetes itself
Deep experience with VMs/hypervisors (QEMU/KVM, cloud-hypervisor, VFIO, virtio, PCIe passthrough, Kubevirt, SR-IOV)
Deep experience with DC networking (VLAN, VXLAN, VPN, VPC, OVS/OVN)
Experience with Cluster API or similar
Experience with infrastructure automation tools (Terraform, Ansible) and monitoring stacks (Prometheus, Grafana) and CI/CD pipelines (GitHub Actions, ArgoCD)
Experience building IaaS or PaaS systems at scale
Experience with DPUs/SmartNICs
GPU programming, NCCL, CUDA knowledge
Experience with distributed compute, storage, and networking

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started