Software Engineer, SRE (Staff/Senior Levels)
KustomerFull Time
Expert & Leadership (9+ years), Senior (5 to 8 years)
Candidates should possess a Bachelor's or Master's degree in Computer Science or a related field, with a minimum of 8 years of experience in Site Reliability Engineering or a similar role. Prior experience utilizing ClickHouse in a production environment is required, along with proficiency in Go and/or Python coding. A strong understanding of cloud computing platforms like AWS, Azure, or Google Cloud Platform is essential, as is hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm. Significant experience with automation and configuration management tools like Ansible, Terraform, or Puppet is also necessary. The ideal candidate is a skilled problem-solver with robust production debugging abilities, a passion for efficiency, availability, scalability, and data governance, and the capacity to thrive in a fast-paced global team environment. High levels of responsibility, ownership, and accountability, along with excellent communication and interpersonal skills, are expected.
The Senior Site Reliability Engineer will collaborate with engineering teams to design and implement scalable, secure, and highly available systems for ClickHouse. This role involves establishing and managing service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud. Responsibilities include ensuring monitoring and alerting are in place for all infrastructure components, enhancing incident response processes and post-mortem analysis, and continuously improving the reliability and performance of ClickHouse services. The engineer will also plan and drive Chaos initiatives, manage on-call processes to address performance and reliability issues, and establish best practices for issue resolution and downtime minimization.
High-speed column-oriented database management system
ClickHouse provides a high-speed, column-oriented database management system designed for developers and businesses that manage large-scale data. Its primary product processes analytical queries quickly by storing data from the same columns together, making it significantly faster than traditional row-oriented databases, especially in Online Analytical Processing (OLAP) scenarios. ClickHouse stands out from competitors by offering a free, open-source database that can be deployed on local machines or in the cloud, along with a fully managed service on platforms like AWS, GCP, and Microsoft Azure. The company's goal is to deliver a cost-effective solution that simplifies data management for its clients, as evidenced by user feedback highlighting substantial cost savings.