Kernel Developer
Hybrid · Gdańsk, Pomerania, Poland
Job Summary
Design and implement high-performance compute (operator) kernels in C/C++, optimize for AI accelerators, and develop core tensor operations. Profile, benchmark, and tune kernels to eliminate bottlenecks, apply low-level optimization techniques, and contribute to internal libraries and runtime systems for AI workloads. Bonus experience with GPU kernel programming (CUDA/ROCm/OpenCL), Triton or similar frameworks, ISA knowledge, LLVM-based stacks, and distributed communication frameworks (NCCL, MPI). Work on the performance-critical compute layer for next-generation AI accelerators and collaborate with experts in hardware, compilers, and systems.
Required Qualifications
- Strong proficiency in C/C++
- Experience with performance-critical software development
- Strong understanding of low-level optimization techniques
- Understanding of CPU/GPU or accelerator architecture fundamentals
- Ability to analyze and debug complex systems
- Experience working with large, complex codebases
- Strong communication and teamwork skills
Apply with one swipe on Sorce. We auto-fill applications and apply on your behalf — no cover letters, no 40-minute forms.
Hiring someone like this?
Get your role in front of qualified candidates on Sorce.