Senior Site Reliability Engineer (Events SRE Team) at RingCentral

Bulgaria

RingCentral Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, EventsIndustries

Requirements

  • Familiarity with cloud-native services and architectures, experience with cloud providers - our infrastructure is built on AWS
  • Experience in running mission critical services at scale without disruption
  • Hands-on experience with Kubernetes and infrastructure as code (IaC) using Terraform, focusing on scalability and efficient infrastructure management
  • Proficiency in designing and maintaining CI/CD pipelines, with a preference for GitLab CI
  • Experience with monitoring, APM, logging, and analytics tools
  • Strong problem-solving skills with the ability to analyze and debug complex distributed systems, tracing requests and data flows from the kernel to the web to identify root causes
  • Ability to spot, address, and optimize performance bottlenecks
  • Proactive approach, favoring iterative action over waiting for things to happen or to be perfect
  • Familiarity with incident, problem and change management processes and best practices

Responsibilities

  • Manage cloud infrastructure on AWS and EKS, leveraging IaC and GitOps to ensure scalability
  • Participate in service capacity planning, software performance analysis, and system tuning
  • Design, consult, re-platform, and re-factor the observability of current cloud infrastructure
  • Participate in release management, working closely with engineering teams to bring GitOps principles to our release process and manage CI/CD pipelines using GitLab CI
  • Take part in 24/7 on-call responsibilities (~2 days/month based on rotation schedule) to ensure continuous availability and quick response to issues in production
  • Conduct blameless post-mortems to learn from incidents and prevent future ones
  • Develop and test disaster recovery plans and runbooks to ensure business continuity
  • Implement security best practices and controls within the infrastructure to meet compliance standards and prepare for audits

Skills

Key technologies and capabilities for this role

AWSEKSIaCGitOpsCI/CDGitLab CIObservabilityCapacity PlanningRelease ManagementDisaster RecoveryPost-MortemsSecurity Best Practices

Questions & Answers

Common questions about this position

What are the key responsibilities for this Senior Site Reliability Engineer role?

Responsibilities include managing AWS and EKS infrastructure with IaC and GitOps, participating in capacity planning and release management with GitLab CI, handling 24/7 on-call (~2 days/month), conducting blameless post-mortems, developing disaster recovery plans, and implementing security best practices.

What are the required skills and experience for this position?

Key requirements include familiarity with AWS cloud-native services, hands-on Kubernetes and Terraform IaC experience, proficiency in GitLab CI/CD pipelines, expertise in monitoring/APM/logging tools, strong problem-solving for distributed systems, performance optimization skills, and knowledge of incident management processes.

Is this a remote position, or is there a required office location?

This information is not specified in the job description.

What is the compensation or salary for this Senior SRE role?

This information is not specified in the job description.

What on-call expectations should I expect in this role?

The role involves 24/7 on-call responsibilities approximately 2 days per month based on the rotation schedule to ensure continuous availability and quick issue response.

RingCentral

Phone and video system

About RingCentral

N/AHeadquarters
N/AYear Founded
N/ACompany Stage

Land your dream remote job 3x faster with AI