Etched logo
Etched12 months ago

Supercomputing Engineer (Network)

$150,000–$275,000 year

On-site · San Jose, California, United States

Type
Full Time
Level
Mid Level
Education
Not Specified
Company size
Startup

Job Summary

The role of Supercomputing Engineer (Network) involves developing, qualifying, and optimizing high-performance networking solutions for large-scale inference workloads. Responsibilities include designing RDMA based networking for low latency communication across nodes, developing tests for host processors and NICs, integrating burn-in tests for real-world workloads, and designing telemetry metrics for system performance. Candidates should possess strong programming skills, experience with RDMA technologies, a solid understanding of operating systems and network architectures, as well as the ability to troubleshoot complex network issues. Ideal candidates will have experience with GPU or TPU pods and understand the performance challenges in large ML deployments.

Required Qualifications

  • Proficiency in C/C++
  • Proficiency in at least one scripting language (e.g., Python, Bash, Go).
  • Strong experience with device-to-device networking technologies (RDMA, GPUDirect, etc.), including RoCE.
  • Experience with zero-copy networking, RDMA verbs and memory registration.
  • Familiarity with queue pairs, completions queues, and transport types.
  • Strong understanding of operating systems (Linux preferred) and server hardware architectures.
  • Ability to analyze complex technical problems and provide effective solutions.
  • Excellent communication and collaboration skills.
  • Ability to work independently and as part of a team.
  • Experience with version control systems (e.g., Git).
  • Experience with reading and interpreting hardware logs.

Desired Qualifications

  • Experience with networking technologies like NVLink, Infiniband, ML Pod interconnects.
  • Experience with widely deployed Top of Rack Switches (Cisco, Juniper, Arista, etc.)
  • Knowledge of server virtualization.
  • Experience with tracing tools like perf, eBPF, ftrace, etc.
  • Experience with performance testing and benchmarking tools (gProf, vTune, Wireshark, etc.).
  • Familiarity with hardware diagnostic tools and techniques
  • Experience with containerization technologies (e.g., Docker, Kubernetes).
  • Experience with CI/CD pipelines.
  • Experience with Rust.
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

$150k – $275k / yr

Supercomputing Engineer (Network) · Etched

Apply on Sorce