Staff Site Reliability Engineer
AddeparFull Time
Expert & Leadership (9+ years)
Candidates should have a demonstrable Linux familiarity, 4 years of professional experience, and experience with Kubernetes/Docker/Containers. Experience with any major cloud provider such as AWS, GCP, or Azure is required. Bonus points are awarded for previous customer-facing experience, DevOps experience, contributions to open-source projects, or familiarity with Splunk or Prometheus.
The Customer Reliability Engineer will operate, monitor, and maintain the managed Airflow service to ensure availability, predictability, and reliable operations. This role involves building expertise in Kubernetes, cloud engineering, and cloud networking across AWS, Azure, and GCP. Responsibilities include creating strong customer relationships, helping customers achieve reliability goals, working directly with customer data engineers and DevOps teams, providing feedback to shape product direction, owning the customer experience by prioritizing and solving issues, meeting SLAs, maintaining 24x7 coverage through a pager period, and participating in paid on-call rotations. Up to 20% of time can be dedicated to side projects like contributing to the open-source Airflow repository or developing internal monitoring and alerting systems.
Data orchestration platform for pipeline management
Astronomer.io provides a data orchestration platform that utilizes Apache Airflow to simplify the deployment of data pipelines. Its main product, Astro, helps businesses manage and monitor their data flows, allowing them to focus on delivering essential data pipelines. The platform supports data unification across various clouds and offers over 1500 integrations, making it suitable for data and machine learning teams in industries like finance and e-commerce. Astronomer.io distinguishes itself from competitors by offering enterprise-grade security, zero-downtime upgrades for Airflow, and tools for monitoring pipeline health, which enhance compute efficiency and reduce delays in task scheduling. The company's goal is to empower organizations to optimize their data strategies and achieve a significant return on investment by ensuring their applications operate with maximum reliability.