Software Engineer, SRE (Staff/Senior Levels)
KustomerFull Time
Expert & Leadership (9+ years), Senior (5 to 8 years)
Candidates should possess a Bachelor's or Master's degree in Computer Science or a related field, with at least 8 years of experience in Site Reliability Engineering. Prior experience using ClickHouse in production is required, along with coding proficiency in Go and/or Python. Strong knowledge of cloud platforms like AWS, Azure, or Google Cloud Platform, hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm, and expertise in automation and configuration management tools like Ansible, Terraform, or Puppet are essential. Excellent problem-solving, production debugging skills, and a passion for efficiency, availability, scalability, and data governance are also necessary. The ideal candidate thrives in a fast-paced, global team environment, demonstrates a high level of responsibility, ownership, and accountability, and possesses excellent communication and interpersonal skills. Experience with distributed databases and SQL, particularly ClickHouse, is a significant advantage.
The Senior Site Reliability Engineer will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance of cloud infrastructure running ClickHouse databases. This includes collaborating with various engineering teams to design and implement scalable, secure, and highly available systems, establishing and managing service level objectives (SLOs) and service level agreements (SLAs). The role involves ensuring monitoring and alerting are in place for all infrastructure components, enhancing incident response processes, and conducting post-mortem analyses. Key duties also include continuously improving the reliability and performance of ClickHouse services, planning and driving Chaos initiatives, and managing on-call processes to respond to performance and reliability issues, while establishing best practices for escalation and minimizing downtime. Leveraging software engineering expertise to develop platforms and tools for operational and engineering efficiency is also a core responsibility.
High-speed column-oriented database management system
ClickHouse provides a high-speed, column-oriented database management system designed for developers and businesses that manage large-scale data. Its primary product processes analytical queries quickly by storing data from the same columns together, making it significantly faster than traditional row-oriented databases, especially in Online Analytical Processing (OLAP) scenarios. ClickHouse stands out from competitors by offering a free, open-source database that can be deployed on local machines or in the cloud, along with a fully managed service on platforms like AWS, GCP, and Microsoft Azure. The company's goal is to deliver a cost-effective solution that simplifies data management for its clients, as evidenced by user feedback highlighting substantial cost savings.