Software Engineer, SRE (Staff/Senior Levels)
KustomerFull Time
Expert & Leadership (9+ years), Senior (5 to 8 years)
Candidates should possess a Bachelor's or Master's degree in Computer Science or a related field, with at least 8 years of experience in Site Reliability Engineering. Previous experience using ClickHouse in production, coding proficiency in Go and/or Python, and strong knowledge of cloud platforms like AWS, Azure, or GCP are required. Hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm, and automation/configuration management tools like Ansible, Terraform, or Puppet are also necessary. Excellent problem-solving, production debugging skills, and a passion for efficiency, availability, scalability, and data governance are essential, as is the ability to thrive in a fast-paced global team environment with a high level of responsibility and accountability. Excellent communication and interpersonal skills are also required.
The Senior Site Reliability Engineer will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance of the cloud infrastructure running ClickHouse databases. This includes collaborating with engineering teams to design and implement scalable, secure, and highly available systems, establishing and managing service level objectives (SLOs) and service level agreements (SLAs), and ensuring monitoring and alerting are in place for all infrastructure components. Responsibilities also include enhancing incident response processes and post-mortem analysis, continuously improving service reliability and performance, planning and driving Chaos initiatives, and managing on-call processes to respond to issues and minimize downtime. The role involves leveraging software engineering expertise to develop platforms and tools that optimize operational and engineering efficiencies for ClickHouse Cloud.
High-speed column-oriented database management system
ClickHouse provides a high-speed, column-oriented database management system designed for developers and businesses that manage large-scale data. Its primary product processes analytical queries quickly by storing data from the same columns together, making it significantly faster than traditional row-oriented databases, especially in Online Analytical Processing (OLAP) scenarios. ClickHouse stands out from competitors by offering a free, open-source database that can be deployed on local machines or in the cloud, along with a fully managed service on platforms like AWS, GCP, and Microsoft Azure. The company's goal is to deliver a cost-effective solution that simplifies data management for its clients, as evidenced by user feedback highlighting substantial cost savings.