CPU Performance Developer Technology Engineer
On-site · Shanghai, Shanghai, China or Beijing, Beijing, China
Job Summary
CPU Performance Developer Technology Engineer at NVIDIA Compute Devtech focusing on researching, designing, and implementing performance optimization strategies across AI data preprocessing, scientific and HPC workloads on Grace/Vera CPUs. You will profile, analyze, and optimize CPU performance from application algorithms down to microarchitecture, contribute to open-source frameworks and performance libraries, and collaborate with NVIDIA’s architecture, research, libraries, tools, and system software teams to influence next-generation CPU designs, compiler toolchains, and development workflows for improved developer productivity and throughput.
Required Qualifications
- BS, MS, or PhD in Computer Science, Computer Engineering, or a related field
- 5+ years of relevant experience in performance engineering or CPU optimization
- Strong programming proficiency in C/C++ and/or Python
- Solid grasp of CPU microarchitecture, performance analysis tools, and optimization methodologies
- Proven track record of CPU benchmarking and bottleneck-driven performance tuning
- Excellent communication and organizational skills, with the ability to collaborate effectively across teams and manage multiple priorities
- Experience optimizing AI or data preprocessing pipelines on CPUs (preferred)
- Familiarity with HPC applications, parallel computing, and distributed runtime environments (preferred)
- Hands-on experience with SIMD instruction sets, low-level intrinsics, or vectorization (preferred)
- Contributions to open-source performance tools or HPC frameworks (preferred)
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.