Groq

Staff Observability Engineer

Remote

Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Artificial Intelligence, Semiconductors, Cloud ComputingIndustries

Position Overview

  • Location Type: Hybrid
  • Job Type: Staff
  • Salary: $186,915 - $252,885 (Base Salary Range)

Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. Headquartered in Silicon Valley, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.

Mission

Ensure the reliability, scalability, and performance of Groq’s observability tools and services for provisioning and managing the full lifecycle of Groq hardware, software, and networking systems at massive scale.

Team

The observability team builds the monitoring and observability infrastructure and tooling that supports Groq’s inferencing hardware at massive scale, both in the cloud and our own datacenters. Our mission is to enable the reliability, scalability, and performance of Groq’s tools and service across a wide range of signals and appliances, to support teams in production excellence, and to evangelize technical and cultural best practices around alerting, instrumentation, debugging, SLOs, and on-call.

Responsibilities

  • Build and maintain comprehensive observability systems at massive scale.
  • Obsess about running high quality production systems with excellent uptime that engineers can trust.
  • Constantly iterate on, maintain, update, automate, and dogfood your own systems.
  • Put in place great monitoring of your own systems that can be used as best practices by the rest of the organization.
  • Instrument Kubernetes clusters, applications, and datacenter infrastructure components such as switches, PDUs, environmental sensors, cameras, chillers, etc.
  • Expertise in standing up and running monitoring, observability, and alerting systems - OpenTelemetry Tracing and Collector, Grafana/Prometheus, PagerDuty, AlertManager, IPMI, SNMP, etc.
  • Be a teacher: advise teams on instrumenting their applications in a variety of languages (Rust, C++, TypeScript, GoLang), implementing sensible SLO and alerting strategies, as well as on-call best practices.

Requirements

  • 4+ years of experience in observability as a core responsibility of previous roles.
  • Deep understanding of cloud-native technologies and infrastructure as a service (IaaS) such as Terraform and Flux.
  • Have instrumented large Kubernetes clusters and built operators.
  • Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation.
  • Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams.

Attributes of a Groqster

  • Humility - Egos are checked at the door
  • Collaborative & Team Savvy - We make up the smartest person in the room, together
  • Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
  • Curious & Innovative - Take a creative approach to projects, problems, and design
  • Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

Company Information

  • About Groq: Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. Headquartered in Silicon Valley, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.
  • Compensation: At Groq, a competitive base salary is part of our comprehensive compensation package, which includes equity and benefits. For this role, the base salary range is $186,915 to $252,885 determined by your skills, qualifications, experience and internal benchmarks.
  • Location: Some roles may require being located near or on our primary sites, as indicated in the job description.
  • Equal Opportunity Employer: Groq is an equal opportunity employer committed to diversity, inclusion, and

Skills

Kubernetes
Tracing
Logging
Metrics
Instrumentation
Monitoring
Automation
Observability
Datacenter Infrastructure
Networking
Alerting
Debugging
SLOs
On-call

Groq

AI inference technology for scalable solutions

About Groq

Groq specializes in AI inference technology, providing the Groq LPU™, which is known for its high compute speed, quality, and energy efficiency. The Groq LPU™ is designed to handle AI processing tasks quickly and effectively, making it suitable for both cloud and on-premises applications. Unlike many competitors, Groq's products are designed, fabricated, and assembled in North America, which helps maintain high standards of quality and performance. The company targets a variety of clients across different industries that require fast and efficient AI processing capabilities. Groq's goal is to deliver scalable AI inference solutions that meet the growing demands for rapid data processing in the AI and machine learning market.

Mountain View, CaliforniaHeadquarters
2016Year Founded
$1,266.5MTotal Funding
SERIES_DCompany Stage
AI & Machine LearningIndustries
201-500Employees

Benefits

Remote Work Options
Company Equity

Risks

Increased competition from SambaNova Systems and Gradio in high-speed AI inference.
Geopolitical risks in the MENA region may affect the Saudi Arabia data center project.
Rapid expansion could strain Groq's operational capabilities and supply chain.

Differentiation

Groq's LPU offers exceptional compute speed and energy efficiency for AI inference.
The company's products are designed and assembled in North America, ensuring high quality.
Groq emphasizes deterministic performance, providing predictable outcomes in AI computations.

Upsides

Groq secured $640M in Series D funding, boosting its expansion capabilities.
Partnership with Aramco Digital aims to build the world's largest inferencing data center.
Integration with Touchcast's Cognitive Caching enhances Groq's hardware for hyper-speed inference.

Land your dream remote job 3x faster with AI