Groq

Sr. Staff Software Engineer – High Performance GPU Inference Systems

Palo Alto, California, United States

Not SpecifiedCompensation
Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Artificial Intelligence, Cloud Computing, HardwareIndustries

Sr. Staff Software Engineer - High Performance GPU Inference Systems

About Groq

Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. Headquartered in Silicon Valley, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.

Position Overview

Push the limits of heterogeneous GPU environments, dynamic global scheduling, and end-to-end system performance—all while running code as close to the metal as possible.

Responsibilities & Opportunities

  • Distributed Systems Engineering: Design and implement scalable, low-latency runtime systems that coordinate thousands of GPUs across tightly integrated, software-defined infrastructure.
  • Low-Level GPU Optimization: Build deterministic, hardware-aware abstractions optimized for CUDA, ROCm, or vendor-specific toolchains, ensuring ultra-efficient execution, fault isolation, and reliability.
  • Performance & Diagnostics: Develop profiling, observability, and diagnostics tooling for real-time insights into GPU utilization, memory bottlenecks, and latency deviations—continuously improving system SLOs.
  • Next-Gen Enablement: Future-proof the stack to support evolving GPU architectures (e.g., H100, MI300), NVLink/Fabric topologies, and multi-accelerator systems (including FPGAs or custom silicon).
  • Cross-Functional Collaboration: Work closely with teams across ML compilers, orchestration, cloud infrastructure, and hardware ops to ensure architectural alignment and unlock joint performance wins.

Ideal Candidate Profile

Must-Haves:

  • Proven ability to ship high-performance, production-grade distributed systems and maintain large-scale GPU production deployments.
  • Deep knowledge of GPU architecture (memory hierarchies, streams, kernels), OS internals, parallel algorithms, and HW/SW co-design principles.
  • Proficient in systems languages such as C++ (CUDA), Python, or Rust—with fluency in writing hardware-aware code.
  • Obsessed with performance profiling, GPU kernel tuning, memory coalescing, and resource-aware scheduling.
  • Passionate about automation, testability, and continuous integration in large-scale systems.
  • Comfortable navigating across stack layers—from GPU drivers and kernels to orchestration layers and inference serving.
  • Strong communicator, pragmatic problem-solver, and builder of clean, sustainable code.
  • Ownership-driven mindset—your code runs fast, scales gracefully, and meets real-world requirements.

Nice to Have:

  • Experience operating large-scale GPU inference systems in production (e.g., Triton, TensorRT, or custom GPU services).
  • Deploying and optimizing ML/HPC workloads on GPU clusters (Kubernetes, Slurm, Ray, etc.).
  • Hands-on experience with multi-GPU training/inference frameworks (e.g., PyTorch DDP, DeepSpeed, or JAX).
  • Familiarity with compiler tooling (e.g., TVM, MLIR, XLA) or deep learning graph optimization.
  • Successful track record of delivering technically ambitious projects in fast-paced environments.

Attributes of a Groqster

  • Humility: Egos are checked at the door.
  • Collaborative & Team Savvy: We make up the smartest person in the room, together.
  • Growth & Giver Mindset: Learn it all versus know it all; we share knowledge generously.
  • Curious & Innovative: Take a creative approach to projects, problems, and design.
  • Passion, Grit, & Boldness: No-limit thinking, fueling informed risk-taking.

Compensation & Benefits

  • Salary Range: $248,710 - $292,600 (based on skills, qualifications, experience, and internal benchmarks).
  • Package: Competitive base salary, equity, and benefits.

Location

  • Some roles may require being located near or on our primary sites, as indicated in the job description.

If this sounds like you, we’d love to hear from you!

Skills

GPU architecture
CUDA
ROCm
Distributed systems
Low-latency systems
Performance optimization
Profiling
Observability
Diagnostics tooling
OS internals
Parallel algorithms
HW/SW co-design
Heterogeneous GPU environments
Global scheduling
System performance
ML compilers
Orchestration
Cloud infrastructure

Groq

AI inference technology for scalable solutions

About Groq

Groq specializes in AI inference technology, providing the Groq LPU™, which is known for its high compute speed, quality, and energy efficiency. The Groq LPU™ is designed to handle AI processing tasks quickly and effectively, making it suitable for both cloud and on-premises applications. Unlike many competitors, Groq's products are designed, fabricated, and assembled in North America, which helps maintain high standards of quality and performance. The company targets a variety of clients across different industries that require fast and efficient AI processing capabilities. Groq's goal is to deliver scalable AI inference solutions that meet the growing demands for rapid data processing in the AI and machine learning market.

Mountain View, CaliforniaHeadquarters
2016Year Founded
$1,266.5MTotal Funding
SERIES_DCompany Stage
AI & Machine LearningIndustries
201-500Employees

Benefits

Remote Work Options
Company Equity

Risks

Increased competition from SambaNova Systems and Gradio in high-speed AI inference.
Geopolitical risks in the MENA region may affect the Saudi Arabia data center project.
Rapid expansion could strain Groq's operational capabilities and supply chain.

Differentiation

Groq's LPU offers exceptional compute speed and energy efficiency for AI inference.
The company's products are designed and assembled in North America, ensuring high quality.
Groq emphasizes deterministic performance, providing predictable outcomes in AI computations.

Upsides

Groq secured $640M in Series D funding, boosting its expansion capabilities.
Partnership with Aramco Digital aims to build the world's largest inferencing data center.
Integration with Touchcast's Cognitive Caching enhances Groq's hardware for hyper-speed inference.

Land your dream remote job 3x faster with AI