Position Overview
- Location Type: Onsite
- Employment Type: Full-time
- Salary: Not specified
Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. Headquartered in Silicon Valley, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.
Senior Staff Software Engineer - High Performance Inference System
Team Missions and Mandates:
- Build and operate real-time, distributed compute frameworks and runtimes to deliver planet-scale inference for LLMs and advanced AI applications at ultra-low latency, optimized for heterogeneous hardware and dynamic global workloads.
- Develop deterministic, low-overhead hardware abstractions for thousands of synchronously coordinated GroqChips across a software-scheduled interconnection network. Prioritize fault tolerance, real-time diagnostics, ultra-low-latency execution, and mission-critical reliability.
- Future-proof Groq's software stack for next-gen silicon, innovative multi-chip topologies, emerging form factors, and heterogeneous co-processors (e.g., FPGAs).
- Foster collaboration across cloud, compiler, infra, data centers, and hardware teams to align engineering efforts, enable seamless integrations, and drive progress toward shared goals.
- Reduce operational overhead, improve SLOs, and make tokens go brrrrrrrrrrr—positioning Groq as the largest inference powerhouse on earth. 🚀
- Your code will run at the edge of physics—every clock cycle saved reduces latency for millions of users and extends Groq's lead in the AI compute race.
Signs We Look For:
- Consistently ships high-impact, production-ready code while collaborating effectively with cross-functional teams.
- Possesses deep expertise in computer architecture, operating systems, algorithms, hardware-software interfaces, and parallel/distributed computing.
- Mastered system-level programming (C++, Rust, or similar) with emphasis on low-level optimizations and hardware-aware design.
- Excels at profiling and optimizing systems for latency, throughput, and efficiency, with zero tolerance for wasted cycles or resources.
- Committed to automated testing and CI/CD pipelines, believing that "untested code is broken code."
- Deeply curious about system internals—from kernel-level interactions to hardware dependencies—and fearless enough to solve problems across abstraction layers down to the PCB traces.
- Makes pragmatic technical judgments, balancing short-term velocity with long-term system health.
- Writes empathetic, maintainable code with strong version control and modular design, prioritizing readability and usability for future teammates.
Nice to Have:
- Experience shipping complex projects in fast-paced environments while maintaining team alignment and stakeholder support.
- Expertise operating large-scale distributed systems for high-traffic internet services.
- Experience deploying and optimizing machine learning (ML) or high-performance computing (HPC) workloads in production.
- Hands-on optimization of performance-critical applications using GPUs, FPGAs, or ASICs (e.g., memory management, kernel optimization).
- Familiarity with ML frameworks (e.g., PyTorch) and compiler tooling (e.g., MLIR) for AI/ML workflow integration.
The Ideal Candidate:
- Initiates (without derailing): Spots opportunities to solve problems or improve processes—while staying aligned with team priorities.
- Builds stuff that actually ships: Values "code in prod" over "perfect slides." Delivers real value instead of polishing whiteboard ideas.
- Is a craftsmanship junkie: Always asks, "How can we make this better?" and loves diving into details.
Early-career engineers:
If you've built high-performance systems in school/research and want to dive into cutting-edge hardware/