Site Reliability Engineer - Observability
Second Front SystemsFull Time
Senior (5 to 8 years)
Key technologies and capabilities for this role
Common questions about this position
Yes, this is a remote position with a remote, async-friendly team.
Candidates need 5+ years building and running production systems at scale, proficiency in Golang, experience with Kubernetes, Helm, ArgoCD, and Terraform or similar IaC tools, comfort with at least one major cloud provider (AWS, GCP, Azure), and experience with OpenTelemetry, Prometheus, Grafana, or similar tools.
This information is not specified in the job description.
The team operates in fast-paced environments, emphasizes hands-on incident response, automation, and collaboration with product and infrastructure teams in a remote, async-friendly setting.
Strong candidates demonstrate a bias for action and ownership, excellent production debugging and problem-solving skills, strong communication, experience balancing performance, reliability, and cost, and the ability to iterate quickly on MVPs.
High-speed column-oriented database management system
ClickHouse provides a high-speed, column-oriented database management system designed for developers and businesses that manage large-scale data. Its primary product processes analytical queries quickly by storing data from the same columns together, making it significantly faster than traditional row-oriented databases, especially in Online Analytical Processing (OLAP) scenarios. ClickHouse stands out from competitors by offering a free, open-source database that can be deployed on local machines or in the cloud, along with a fully managed service on platforms like AWS, GCP, and Microsoft Azure. The company's goal is to deliver a cost-effective solution that simplifies data management for its clients, as evidenced by user feedback highlighting substantial cost savings.