Site Reliability Engineer
Close- Full Time
- Junior (1 to 2 years)
Candidates should have a Bachelor's degree in Computer Science or equivalent work experience. They must possess 5+ years of experience as a Site Reliability Engineer or in a similar role, focusing on maintaining high-performance, scalable, and reliable high-throughput web systems. Proficiency in using Kubernetes for container orchestration and management is required, along with 5+ years of experience with Cloud Providers (AWS, GCP) and a deep understanding of cloud services and architectures. Candidates should be proficient in scripting and programming languages such as Python, Bash, Go, or NodeJS, and should be comfortable with Linux systems and the command line. A solid understanding of infrastructure as code (IaC) principles and experience with tools like Terraform is essential. Experience with continuous integration and deployment (CI/CD) pipelines is also necessary, along with excellent problem-solving and troubleshooting skills and strong communication and collaboration skills to work effectively in a team environment.
The Senior Site Reliability Engineer will manage multi-TB PostgreSQL clusters in the public cloud, optimizing parameters, storage settings, and data structure. They will operate RabbitMQ and Redis with tens of thousands of operations per second and manage 10+ full-featured GKE clusters worldwide, serving over 10,000 tenants. The role involves adopting new technologies such as Kafka, Debezium, and Apache Flink, as well as facilitating rollout strategies at scale using Gitlab CI and ArgoCD. The engineer will roll out best practices around Kubernetes/Helm/Operators, SLIs/SLOs, Incident Management, Observability, Security, and Disaster Recovery to all Product-Engineering teams and drive their adoption. Additionally, they will automate complex infrastructure components for the worldwide footprint using best practices in Infrastructure as Code (IaC) with Terraform and strong scripting in Python or Golang.
AI-powered customer service for e-commerce
Gorgias provides an AI-powered customer service solution tailored for Shopify stores in the e-commerce sector. Its main service automates the customer support process by sorting, prioritizing, and tagging customer inquiries, known as "tickets," as they arrive. This automation minimizes the need for manual ticket management, allowing businesses to save time and resources. Gorgias' AI can also deliver instant resolutions to customer issues, enhancing the efficiency of support operations. Additionally, the platform enables businesses to create detailed customer profiles that include order history, loyalty status, and reviews, which helps support teams offer personalized service. Gorgias operates on a subscription model, generating consistent revenue while helping businesses improve their customer service and reduce support workloads. The company caters to a diverse range of clients, from small businesses to large corporations, all within the competitive e-commerce landscape.