Senior Site Reliability Engineer
Chainlink LabsFull Time
Senior (5 to 8 years)
Candidates must possess a deep understanding of distributed systems, incident management, observability, and automation. Experience with AWS, Kubernetes, and Infrastructure-as-Code (Terraform preferred) is required. A strategic mindset and the ability to guide long-term reliability efforts are essential, along with hands-on experience implementing DevSecOps practices, including secure IaC and policy-as-code.
The Principal Site Reliability Engineer will design and implement fault-tolerant, scalable, and highly available systems on AWS. They will partner with engineering teams to define SLIs/SLOs, conduct root cause analyses, and lead post-incident reviews for systemic improvements. Responsibilities include building automation for deployment and incident response, embedding security controls in pipelines, developing comprehensive observability solutions, contributing to Terraform infrastructure management, and leading capacity planning and performance tuning efforts.
Unified platform for Apple device management
Kandji offers a platform specifically designed for managing and securing Apple devices in businesses. Their system allows companies to easily deploy secure devices, update software, and address vulnerabilities across all their devices. A key feature is the MigrationAgent, which simplifies the transition from older Mobile Device Management (MDM) solutions to Kandji's platform, requiring minimal user interaction. What sets Kandji apart from competitors is their deep knowledge of the Apple ecosystem and their dedicated customer support, with engineers available to assist users who have experience with Mac administration. The goal of Kandji is to help businesses strengthen their IT infrastructure and support their growth by providing effective device management solutions.