Site Reliability Engineer
Keeper Security- Full Time
- Junior (1 to 2 years)
Candidates should have 7+ years of experience in software engineering with a strong focus on production systems and distributed architectures, thriving in high-leverage roles that improve how everyone else builds, ships, and fixes software. They should have experience leading or playing a significant role in incident response, building systems, and culture around continuous improvement, and be excited by complexity, not afraid of it, and deeply motivated to make systems safer and teams faster.
The Site Reliability Engineer will drive reliability and observability improvements across large-scale distributed systems, serve as a force multiplier across all engineering teams by reducing downtime, improving tooling, and freeing up senior engineers from firefighting, own and evolve the incident review process, leading postmortems and embedding learnings into tools, practices, and culture across the company, collaborate with teams to improve release hygiene including automating release gating, preventing code from stagnating in staging environments, and implementing pre-prod automated test pipelines to catch issues early, build and maintain Nominal’s gRPC middleware to ensure safe, observable, and performant service communication, improve alerting, debugging, and monitoring to ensure production health and rapid root cause analysis.
Software tools for engineering hardware systems
Nominal.io provides software tools designed specifically for engineering teams working with complex hardware systems. Their platform allows these teams to test and deploy hardware systems significantly faster than traditional methods, making it particularly beneficial for industries such as aerospace, defense, energy, and telecommunications, where hardware performance is critical. The platform consolidates data from various sources, enabling engineers to monitor and analyze their systems effectively in a secure environment. Unlike many competitors, Nominal.io focuses on a niche market with high demands for reliability, offering a software-as-a-service (SaaS) model that ensures clients have continuous access to the latest features. The company's goal is to enhance the resilience and performance of hardware systems, positioning itself as a key partner for engineering teams looking to improve their deployment processes.