Customer Reliability Engineer, Infrastructure
AstronomerFull Time
Mid-level (3 to 4 years)
Candidates should have experience operating and scaling production systems in cloud environments, specifically AWS, and familiarity with service reliability concepts such as monitoring, alerting, incident response, and root cause analysis. They should possess strong debugging and systems thinking skills, the ability to work independently, and take ownership of projects from problem discovery to resolution. Experience with Kubernetes or similar orchestration systems and exposure to observability stacks like Prometheus, Grafana, Datadog, or OpenTelemetry is considered a nice to have.
The Site Reliability Engineer will design and evolve systems, tooling, and processes to improve the reliability and performance of WorkOS. They will collaborate with product and infrastructure teams to ensure services are production-ready, observable, and resilient to failure. They will define and measure SLIs/SLOs to guide reliability improvements, write and optimize backend systems in TypeScript with a focus on performance and maintainability, improve the incident response process, lead postmortems, and drive follow-through on reliability risks. Furthermore, they will develop internal tools and automations, participate in on-call rotations, and contribute to design discussions with a focus on operability and long-term sustainability, documenting systems and sharing learnings to foster a reliability-minded engineering culture.
API platform for integrating enterprise features
WorkOS provides an API platform that helps software as a service (SaaS) companies easily integrate important enterprise features into their products. The platform includes tools for User Management, Single Sign-On (SSO), Directory Sync (SCIM), and Audit Logs, allowing companies to quickly become enterprise-ready. This means that instead of taking months to add these features, companies can do it in just minutes, which helps them secure larger enterprise contracts. What sets WorkOS apart from its competitors is its unified approach and developer-first design, making it user-friendly for engineers. The goal of WorkOS is to enable SaaS companies to focus on their unique product offerings while efficiently adding essential enterprise functionalities, ultimately speeding up their time-to-market and increasing the value of their products.