Sr Systems Reliability Engineer at The Walt Disney Company

Nicasio, California, United States

The Walt Disney Company Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Entertainment, MediaIndustries

Requirements

  • BS Degree in Computer Science
  • 5+ years of experience in DevOps, Site Reliability Engineering, or a related field
  • Extensive AWS knowledge: EC2, ECS/EKS, Lambda, ELB, ASGs, Route53, KMS, SSM, IAM, S3, ACM, VPC, RDS, Elasticache
  • Proficiency with modern observability practices: application monitoring, tracing, and profiling tools (e.g. Datadog, New Relic, OpenTelemetry, Splunk)
  • Proficiency with GitLab CI, Terraform, Helm, and Packer
  • Demonstrated experience designing and managing CI/CD pipelines for complex software platforms
  • In-depth knowledge of Containers and Container Orchestration technologies: Docker, Kubernetes
  • Experience with Terraform or other infrastructure as code tooling
  • Strong scripting skills in Python, Bash, or similar languages
  • Familiarity with modern security practices for protecting sensitive assets in distributed systems
  • Exceptional problem-solving skills, with a proactive and collaborative mindset
  • Preferred Qualifications
  • Experience working with media and entertainment pipelines or pre-release content workflows
  • Proficiency with Golang, Python, or C++
  • Experience with modern AI/ML frameworks (e.g., TensorFlow, PyTorch, Hugging Face) and their integration into operational workflows
  • Knowledge of container security tools and systems, such as Falco or Aqua Security
  • Experience with emerging deployment systems like ArgoCD or Flux for GitOps workflows
  • Familiarity with serverless computing paradigms and technologies such as AWS Lambda or Google Cloud Run/Functions
  • Understanding of high-performance computing systems in cloud environments
  • Experience with administering VMWare vSphere clusters

Responsibilities

  • Design, manage and maintain critical infrastructure for both software development and deployed global production resources
  • Collaborate on the provisioning of cloud infrastructure in AWS using Terraform to ensure consistency and scalability
  • Maintain and manage multiple Kubernetes clusters across both cloud and on-premise environments
  • Implement and enforce best practices for secure software development and deployment in alignment with industry standards
  • Monitor, troubleshoot, and optimize build and deployment processes to maximize efficiency and minimize downtime
  • Collaborate with cross-functional teams, including developers and security experts, to ensure systems meet operational requirements
  • Develop, maintain, and enhance CI/CD pipelines using GitLab to support build automation, unit testing, and integration testing
  • Continuously evaluate and implement tools and technologies to improve workflows and platform reliability

Skills

Key technologies and capabilities for this role

AWSTerraformKubernetesCloud InfrastructureSecurityMonitoringTroubleshootingCI/CDInfrastructure as Code

Questions & Answers

Common questions about this position

Is this role remote or hybrid?

This role is hybrid, requiring 2-3 days onsite at the Nicasio, CA office and occasional work from home.

What salary or compensation is offered for this position?

This information is not specified in the job description.

What key skills and experience are required for this Sr Systems Reliability Engineer role?

Candidates need a BS in Computer Science, 5+ years in DevOps or SRE, extensive AWS knowledge (EC2, ECS/EKS, etc.), proficiency in observability tools like Datadog and OpenTelemetry, GitLab CI, Terraform, Helm, Packer, Kubernetes, Docker, and strong scripting skills.

What is the team environment like at Skywalker Sound Development Group?

The team collaborates closely with master audio content creators to develop next-generation tools for audio soundtracks, working with cross-functional teams including developers and security experts in a hybrid setup.

What makes a strong candidate for this position?

A strong candidate will have 5+ years in DevOps or SRE, deep AWS expertise, hands-on experience with Kubernetes, Terraform, GitLab CI, observability tools, and a proven track record designing CI/CD pipelines for complex platforms.

The Walt Disney Company

Leading producers & providers of entertainment and information

About The Walt Disney Company

N/AHeadquarters
1923Year Founded
N/ACompany Stage
10,001+Employees

Land your dream remote job 3x faster with AI