Senior Data Engineer - (Remote)
KUBRAFull Time
Senior (5 to 8 years)
Candidates should possess strong proficiency in Python and be familiar with data science libraries and data processing frameworks such as Pandas, Polars, Scikit-learn, and PyTorch. Experience working with databases, query engines, or analytical tools like Snowflake, BigQuery, ClickHouse, or DuckDB is required, along with an understanding of distributed systems and OLAP databases. Hands-on experience with data engineering workflows and integrating databases with machine learning or analytics pipelines is necessary, as is prior contribution to open-source projects or experience in an open-source development environment. Strong problem-solving, cross-functional teamwork, and communication skills are essential.
The Senior Python Engineer will enhance ClickHouse's Python ecosystem to simplify data ingestion, transformation, and analysis for data scientists. Responsibilities include improving existing Python integrations for a seamless data science experience, contributing to open-source repositories to ensure compatibility with popular data science toolkits, and optimizing query execution and data handling when interfacing with Python. The engineer will also collaborate with product managers, engineers, and the data science community to gather feedback and refine features, and educate users on best practices for leveraging ClickHouse in data science applications through examples and reference architectures.
High-speed column-oriented database management system
ClickHouse provides a high-speed, column-oriented database management system designed for developers and businesses that manage large-scale data. Its primary product processes analytical queries quickly by storing data from the same columns together, making it significantly faster than traditional row-oriented databases, especially in Online Analytical Processing (OLAP) scenarios. ClickHouse stands out from competitors by offering a free, open-source database that can be deployed on local machines or in the cloud, along with a fully managed service on platforms like AWS, GCP, and Microsoft Azure. The company's goal is to deliver a cost-effective solution that simplifies data management for its clients, as evidenced by user feedback highlighting substantial cost savings.