Senior Site Reliability Engineer
Chainlink LabsFull Time
Senior (5 to 8 years)
Candidates should have 5+ years of experience in SRE or DevOps production operations supporting a highly available environment for SaaS software or cloud service provider. Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as Code, preferably Terraform, is required. Familiarity with containerization technology and/or Kubernetes, metrics, tracing, and logging observability tools such as Prometheus, Grafana, Honeycomb, and Kibana is necessary. Experience with incident management, including conducting incident reviews, and programming languages (Java, Python, Go, etc.) is also required. A strong understanding of Linux, software development, systems, networking, and Cloud concepts is essential, along with strong interpersonal and teaming skills, and excellent communication skills with English fluency of C1 or higher preferred. A Bachelor's degree in Computer Science or other technical discipline, or equivalent experience, is preferred but not required.
The Site Reliability Engineer will be responsible for making it easy for everyone to create, consume, manage, and scale reliable cloud production services. They will work independently or collaboratively on SailPoint SaaS services to design, develop, and improve end-to-end reliability and maintainability. This role involves coaching engineering teams on observability best practices, leading post-incident reviews to define preventive actions, and collaborating with developers to increase system reliability through short-term embedding programs. The engineer will enable engineering teams to scale enterprise operations by providing guidance, best practices, and support as part of an SRE Center of Excellence, and manage cross-functional requirements working with Engineering, Product, Services, and other departments. Developing and implementing automation tools and processes to streamline operations and enhance system performance, mentoring quality for design reviews, code, test cases, automation, observability, root cause analysis, and self-healing are key duties. Influencing architectural design, implementation, consolidation, and simplification for global scale, and focusing on expanding own skills and improving teammates' skills are also expected. Driving operational excellence to deliver frictionless operation, happy on-call, and optimal customer experience is a core responsibility.
Provides identity security solutions for enterprises
SailPoint provides identity security solutions that help organizations manage and protect digital identities. Its main products, including IdentityIQ, IdentityNow, and File Access Manager, assist businesses in ensuring compliance with regulations, reducing risks, and controlling access to sensitive information. These products work by giving organizations visibility into who has access to what data, allowing them to manage permissions effectively. SailPoint stands out from competitors by utilizing advanced technologies like artificial intelligence and machine learning to enhance its identity governance capabilities. The company's goal is to be a trusted partner for enterprises in navigating the complexities of identity security, ensuring that they can securely manage access to their critical information.