Sr Systems Reliability Engineer at The Walt Disney Company

Nicasio, California, United States

The Walt Disney Company Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Entertainment, MediaIndustries

Requirements

  • BS Degree in Computer Science
  • 5+ years of experience in DevOps, Site Reliability Engineering, or a related field
  • Extensive AWS knowledge: EC2, ECS/EKS, Lambda, ELB, ASGs, Route53, KMS, SSM, IAM, S3, ACM, VPC, RDS, Elasticache
  • Proficiency with modern observability practices: application monitoring, tracing, and profiling tools (e.g. Datadog, New Relic, OpenTelemetry, Splunk)
  • Proficiency with GitLab CI, Terraform, Helm, and Packer
  • Demonstrated experience designing and managing CI/CD pipelines for complex software platforms
  • In-depth knowledge of Containers and Container Orchestration technologies: Docker, Kubernetes
  • Experience with Terraform or other infrastructure as code tooling
  • Strong scripting skills in Python, Bash, or similar languages
  • Familiarity with modern security practices for protecting sensitive assets in distributed systems
  • Exceptional problem-solving skills, with a proactive and collaborative mindset
  • Preferred Qualifications
  • Experience working with media and entertainment pipelines or pre-release content workflows
  • Proficiency with Golang, Python, or C++
  • Experience with modern AI/ML frameworks (e.g., TensorFlow, PyTorch, Hugging Face) and their integration into operational workflows
  • Knowledge of container security tools and systems, such as Falco or Aqua Security
  • Experience with emerging deployment systems like ArgoCD or Flux for GitOps workflows
  • Familiarity with serverless computing paradigms and technologies such as AWS Lambda or Google Cloud Run/Functions
  • Understanding of high-performance computing systems in cloud environments
  • Experience with administering VMWare vSphere clusters

Responsibilities

  • Design, manage and maintain critical infrastructure for both software development and deployed global production resources
  • Collaborate on the provisioning of cloud infrastructure in AWS using Terraform to ensure consistency and scalability
  • Maintain and manage multiple Kubernetes clusters across both cloud and on-premise environments
  • Implement and enforce best practices for secure software development and deployment in alignment with industry standards
  • Monitor, troubleshoot, and optimize build and deployment processes to maximize efficiency and minimize downtime
  • Collaborate with cross-functional teams, including developers and security experts, to ensure systems meet operational requirements
  • Develop, maintain, and enhance CI/CD pipelines using GitLab to support build automation, unit testing, and integration testing
  • Continuously evaluate and implement tools and technologies to improve workflows and platform reliability

Skills

AWS
Terraform
Kubernetes
Cloud Infrastructure
Security
Monitoring
Troubleshooting
CI/CD
Infrastructure as Code

The Walt Disney Company

Leading producers & providers of entertainment and information

About The Walt Disney Company

N/AHeadquarters
1923Year Founded
N/ACompany Stage
10,001+Employees

Land your dream remote job 3x faster with AI