NVIDIA logo
NVIDIA3 weeks ago

System Software Engineer – Data Center GPU Compute Diagnostics

$152,000–$241,500 year

On-site · Durham, North Carolina, United States

Type
Full Time
Level
Mid Level
Education
Bachelors Degree
Company size
Enterprise

Job Summary

Develop CUDA/C++ diagnostic workloads and GPU compute tests to stress Tensor Cores, SMs, L2/cache hierarchy, HBM memory, and related power/thermal operating points; implement and tune GEMM-style diagnostic workloads with NVLink/PCIe/CPU subsystems; contribute to higher-level AI workload tests using PyTorch-based models to stress GPUs and system software; collaborate with hardware architecture, driver, manufacturing, and field teams throughout the product lifecycle; bring up and validate new hardware features with pre-beta GPU drivers, low-level diagnostic software, and system telemetry; triage and debug ECC, HBM behavior, thermal limits, voltage/frequency margining, and PCIe/NVLink errors.

Required Qualifications

  • BS or MS degree in Electrical Engineering, Computer Engineering, Computer Science, or equivalent experience
  • 5+ years of system software, GPU software, embedded software, or hardware validation experience
  • Experience writing low-level diagnostics, interacting with device firmware and hardware level debuggers
  • Strong C/C++ and Python programming skills
  • Exposure to GPU architecture, CUDA kernels, GPU compute workloads, or related accelerator programming is strongly preferred
  • Working knowledge of memory systems, ECC behavior and DMA engines
  • Familiarity with GEMM-style workloads
  • Awareness of voltage/frequency characterization, thermal testing, power stress, or related silicon validation concepts such as Vmin/Fmax and P-state testing
  • Experience using modern AI development and analysis tools to improve engineering velocity, including code development, debugging, and test creation
  • Strong problem solving and low-level debugging skills
  • Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD.
Sorce

Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.

Hiring someone like this?

Get your role in front of qualified candidates on Sorce.

Get started

$152k – $242k / yr

System Software Engineer – Data Center GPU Compute Diagnostics · NVIDIA

Apply on Sorce