GPU Performance tooling engineer
RivosFull Time
Junior (1 to 2 years)
Key technologies and capabilities for this role
Common questions about this position
Must-haves include proven ability to ship high-performance distributed systems with large-scale GPU deployments, deep knowledge of GPU architecture, OS internals, parallel algorithms, and HW/SW co-design, proficiency in C++, Python, or Rust for hardware-aware code, obsession with performance profiling and GPU kernel tuning, passion for automation and testability, comfort navigating stack layers, strong communication, and an ownership-driven mindset.
This information is not specified in the job description.
This information is not specified in the job description.
This information is not specified in the job description.
A strong candidate has all the must-haves like experience shipping production-grade distributed GPU systems, deep GPU architecture knowledge, proficiency in low-level systems languages, performance obsession, and an ownership-driven mindset; nice-to-haves include experience with GPU inference systems like Triton or TensorRT, deploying ML/HPC workloads on clusters, and multi-GPU frameworks like PyTorch DDP.
AI inference technology for scalable solutions
Groq specializes in AI inference technology, providing the Groq LPU™, which is known for its high compute speed, quality, and energy efficiency. The Groq LPU™ is designed to handle AI processing tasks quickly and effectively, making it suitable for both cloud and on-premises applications. Unlike many competitors, Groq's products are designed, fabricated, and assembled in North America, which helps maintain high standards of quality and performance. The company targets a variety of clients across different industries that require fast and efficient AI processing capabilities. Groq's goal is to deliver scalable AI inference solutions that meet the growing demands for rapid data processing in the AI and machine learning market.