Staff Site Reliability Engineer (Staff SRE) at The Walt Disney Company

Vancouver, British Columbia, Canada

The Walt Disney Company Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Animation, Entertainment, MediaIndustries

Requirements

  • Expertise in systems administration skills in Linux platforms
  • Experience with software development (e.g. Python, Go, Java, Node)
  • Experience with CI Pipeline tools (e.g. Jenkins)
  • Experience with Git source management
  • Experience with cloud hosting (AWS, GCP & Azure)
  • Experience with container computing (e.g. Docker, OCI)
  • Experience with web technologies
  • Aptitude for working collaboratively with a technical team
  • Clear understanding of SRE principles and best practices
  • Passion for constantly learning, applying technology to solve complex problems, and being a highly motivated, optimistic, proactive, creative thought leader and project manager

Responsibilities

  • Translate ideas into tangible products focusing on automation, resiliency, efficiency, stability, security, performance, and capacity management, as well as documentation
  • Serve as a subject matter expert in multiple areas and elaborate on SRE principles and best practices
  • Uphold and improve reliability aspects for services, with focus on SLIs and SLOs, for large-scale user-facing and internal services
  • Maintain understanding of stakeholder workflows and requirements, and translate into end-to-end architectural design
  • Work with engineering, creative, and production teams to brainstorm, architect, gather requirements, troubleshoot, and provide customer support
  • Support a wide range of on-premises and cloud deployments using infrastructure-as-code, self-healing, and security automation patterns, and facilitate others to use Infrastructure as Code
  • Deploy and manage a wide array of on-premises and cloud deployments
  • Develop useful telemetry, alerts, and response to reduce Mean Time To Repair (MTTR)
  • Collaborate and provide technical excellence within and across teams
  • Consult on best practices and develop tools to enable smooth adoption of good service reliability practices and methods
  • Identify areas of improvement

Skills

Linux
Python
Go
Java
Node
Jenkins
Git
AWS
GCP
Azure
Docker
OCI
CI/CD
Configuration Management
DevOps

The Walt Disney Company

Leading producers & providers of entertainment and information

About The Walt Disney Company

N/AHeadquarters
1923Year Founded
N/ACompany Stage
10,001+Employees

Land your dream remote job 3x faster with AI