Senior Systems Software Engineer, Performance Architecture - Analytics and Data Intelligence
$224,000–$356,500 year
On-site · Santa Clara, California, United States
Job Summary
Senior Systems Software Engineer focused on performance architecture for GPU-accelerated structured data processing. Extend JIT and compiler-based execution support in cuDF and related GPU-accelerated structured data processing systems. Design approaches for lowering expressions, ASTs, or query fragments into optimized GPU execution paths. Investigate kernel fusion strategies to reduce materialization, memory traffic, launch overhead, and end-to-end query latency. Analyze analytics workloads to identify bottlenecks in expression evaluation, joins, aggregations, scans, data movement, and memory management. Build benchmarks and regression tests capturing real dataframe/SQL-like workloads from micro-benchmarks to end-to-end pipelines. Collaborate with cuDF, CUDA, compiler/runtime, and query engine teams to translate workload analysis into implementation plans. Prototype and evaluate execution strategies inspired by high-performance database engines, including fused operators, code generation, vectorized execution, and adaptive planning.
Required Qualifications
- Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field, or equivalent hands-on experience
- 12+ years of validated experience in systems performance engineering or performance-focused architecture
- Proven skills in profiling, instrumentation, and optimization for CPU and GPU systems, applying tools like tracing, counters, flame graphs, and kernel-level profiling
- Experience with compiler, JIT, code generation, query execution, or runtime optimization techniques
- Experience optimizing analytic database engines and/or query runtimes, including vectorized execution, join strategies, and columnar formats like Arrow and Parquet
- Proficient in C++ and/or Python, with a strong ability to analyze performance-critical code and implement effective solutions
- Experience with cuDF, RAPIDS, CUDA, Numba, LLVM, MLIR, NVRTC, or other JIT/codegen systems
- Experience with benchmarking frameworks, performance dashboards, and CI/CD regression gating, along with a proven grasp of modern analytics and machine learning workflows
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.