Senior Site Reliability Engineer (Events SRE Team) at RingCentral

Bulgaria

Apply Now

Not SpecifiedCompensation

Senior (5 to 8 years)Experience Level

Full TimeJob Type

UnknownVisa

Technology, EventsIndustries

Requirements

Familiarity with cloud-native services and architectures, experience with cloud providers - our infrastructure is built on AWS
Experience in running mission critical services at scale without disruption
Hands-on experience with Kubernetes and infrastructure as code (IaC) using Terraform, focusing on scalability and efficient infrastructure management
Proficiency in designing and maintaining CI/CD pipelines, with a preference for GitLab CI
Experience with monitoring, APM, logging, and analytics tools
Strong problem-solving skills with the ability to analyze and debug complex distributed systems, tracing requests and data flows from the kernel to the web to identify root causes
Ability to spot, address, and optimize performance bottlenecks
Proactive approach, favoring iterative action over waiting for things to happen or to be perfect
Familiarity with incident, problem and change management processes and best practices

Responsibilities

Manage cloud infrastructure on AWS and EKS, leveraging IaC and GitOps to ensure scalability
Participate in service capacity planning, software performance analysis, and system tuning
Design, consult, re-platform, and re-factor the observability of current cloud infrastructure
Participate in release management, working closely with engineering teams to bring GitOps principles to our release process and manage CI/CD pipelines using GitLab CI
Take part in 24/7 on-call responsibilities (~2 days/month based on rotation schedule) to ensure continuous availability and quick response to issues in production
Conduct blameless post-mortems to learn from incidents and prevent future ones
Develop and test disaster recovery plans and runbooks to ensure business continuity
Implement security best practices and controls within the infrastructure to meet compliance standards and prepare for audits

Skills

Key technologies and capabilities for this role

AWSEKSIaCGitOpsCI/CDGitLab CIObservabilityCapacity PlanningRelease ManagementDisaster RecoveryPost-MortemsSecurity Best Practices

Questions & Answers

Common questions about this position

What are the key responsibilities for this Senior Site Reliability Engineer role?

Responsibilities include managing AWS and EKS infrastructure with IaC and GitOps, participating in capacity planning and release management with GitLab CI, handling 24/7 on-call (~2 days/month), conducting blameless post-mortems, developing disaster recovery plans, and implementing security best practices.

What are the required skills and experience for this position?

Key requirements include familiarity with AWS cloud-native services, hands-on Kubernetes and Terraform IaC experience, proficiency in GitLab CI/CD pipelines, expertise in monitoring/APM/logging tools, strong problem-solving for distributed systems, performance optimization skills, and knowledge of incident management processes.

Is this a remote position, or is there a required office location?

This information is not specified in the job description.

What is the compensation or salary for this Senior SRE role?

This information is not specified in the job description.

What on-call expectations should I expect in this role?

The role involves 24/7 on-call responsibilities approximately 2 days per month based on the rotation schedule to ensure continuous availability and quick issue response.