Site Reliability Engineer
Close- Full Time
- Junior (1 to 2 years)
Candidates must possess a Bachelor’s or Master’s degree in Computer Science or a related field, and a minimum of 8 years of experience in Reliability Engineering, QA, or customer-facing engineering. Previous experience operating ClickHouse or other SQL databases in production is required, along with excellent understanding of distributed database internals and SQL, particularly ClickHouse. Scripting experience with Shell or Python, and the ability to read and understand C++ code are also necessary. Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform is required, and strong production debugging skills are essential.
As a Site Reliability Engineer, the individual will continuously improve the reliability and performance of ClickHouse core, enhance and create metrics and alerts, investigate and resolve customer-reported problems, refine incident response processes and post-mortem analysis, plan and execute Chaos initiatives, manage on-call processes, and coordinate escalation to minimize customer impact. They will collaborate with various teams including Control Plane, Dataplane, Security, Support, and Operations, and guide them in implementing ClickHouse for customers. The role also involves managing engineering escalation management and response, conducting post-mortem analysis, and driving continuous improvement of how ClickHouse is run and optimized in the cloud.
High-speed column-oriented database management system
ClickHouse provides a high-speed, column-oriented database management system designed for developers and businesses that manage large-scale data. Its primary product processes analytical queries quickly by storing data from the same columns together, making it significantly faster than traditional row-oriented databases, especially in Online Analytical Processing (OLAP) scenarios. ClickHouse stands out from competitors by offering a free, open-source database that can be deployed on local machines or in the cloud, along with a fully managed service on platforms like AWS, GCP, and Microsoft Azure. The company's goal is to deliver a cost-effective solution that simplifies data management for its clients, as evidenced by user feedback highlighting substantial cost savings.