Senior Site Reliability Engineer, Observability
Chainlink LabsFull Time
Senior (5 to 8 years), Expert & Leadership (9+ years)
Key technologies and capabilities for this role
Common questions about this position
The annual salary range is $267K - $401K, though it may vary based on qualifications differing from those listed.
This is a hybrid role requiring presence in the San Francisco office 4 days per week, with Tuesday designated as the work-from-home day.
Candidates need 8+ years in software engineering with 3+ years in Go, 5+ years in SRE practices, proven understanding of observability tools, Kubernetes experience for deployment and monitoring, and CI/CD pipeline building.
Lambda emphasizes collaboration across team boundaries, expects quality and reliability in solutions, and offers generous cash and equity compensation in a fast-growing environment with around 350 employees.
A strong candidate has the required 8+ years software engineering experience including Go and SRE, plus nice-to-haves like Prometheus, OpenTelemetry, AI/HPC monitoring, and infrastructure tools like Ansible and Terraform.
Cloud-based GPU services for AI training
Lambda Labs provides cloud-based services for artificial intelligence (AI) training and inference, focusing on large language models and generative AI. Their main product, the AI Developer Cloud, utilizes NVIDIA's GH200 Grace Hopper™ Superchip to deliver efficient and cost-effective GPU resources. Customers can access on-demand and reserved cloud GPUs, which are essential for processing large datasets quickly, with pricing starting at $1.99 per hour for NVIDIA H100 instances. Lambda Labs serves AI developers and companies needing extensive GPU deployments, offering competitive pricing and infrastructure ownership options through their Lambda Echelon service. Additionally, they provide Lambda Stack, a software solution that simplifies the installation and management of AI-related tools for over 50,000 machine learning teams. The goal of Lambda Labs is to support AI development by providing accessible and efficient cloud GPU services.