Site Reliability Engineer
Keeper Security- Full Time
- Junior (1 to 2 years)
Candidates should possess a Bachelor's degree in Computer Science, Engineering, or a related field, and have experience with AWS, K8S, and Service mesh technologies. Strong experience in building and operating high-availability systems in public cloud environments, utilizing tools like Terraform, Helm, ArgoCD, and Spinnaker is required. Experience with containerized application deployments in Kubernetes environments is also necessary. Experience with Zabbix, Prometheus, OpenTelemetry, Elasticsearch, and Grafana Mimir for logging, monitoring, and automation is desired. Experience with SLO/SLI-based service improvement and problem-solving is crucial.
The Site Reliability Engineer will be responsible for building and operating high-availability systems infrastructure on AWS, deploying applications to Kubernetes environments via Spinnaker, automating logging and monitoring using tools like Zabbix and Prometheus, and utilizing OpenTelemetry and Elasticsearch for application monitoring. They will also lead service incident response, conduct post-mortem analysis, and develop strategies to prevent future issues. Furthermore, they will identify and optimize service improvement points based on SLO/SLI metrics, contribute to PoC and production application of new technologies, and actively participate in developing and improving processes and tools to support a robust and reliable environment.
Provides online dating and social discovery
Match Group leverages the Swipe feature® and social discovery, facilitating deeper connections through its portfolio of online dating brands, with a global presence and availability in over 40 languages.