AI Agent
Resume AI
Interview Prep
Remote Jobs
Login
Sign up
Sr ML Ops Engineer
at
The Walt Disney Company
Nicasio, California, United States
Apply Now
Not Specified
Compensation
Senior (5 to 8 years)
Experience Level
Full Time
Job Type
Unknown
Visa
Entertainment, Media
Industries
Requirements
Bachelor’s in Computer Science, Engineering, or a related field. Master’s Degree is preferred
5+ years of experience in DevOps, Site Reliability Engineering, or a related role, with at least 2+ years focusing on ML Ops
Expertise in building and maintaining CI/CD pipelines for machine learning applications
Strong proficiency with containerization (Docker) and orchestration tools (Kubernetes)
Proficiency in deploying machine learning models using frameworks such as TensorFlow Serving, TorchServe, or custom APIs
Deep understanding of cloud infrastructure and services (AWS, GCP, or Azure) for ML workloads, including GPUs and TPU utilization
Experience managing large-scale distributed training workflows and optimizing resource allocation
Familiarity with tools like MLflow, DVC, Weight+Biases, or similar for data and model tracking and versioning
Solid understanding of security best practices for machine learning systems and sensitive data handling
Strong scripting and programming skills in Python, Bash, or Go
Preferred Qualifications
Experience with data orchestration tools like DataChain, Weights and Biases, etc, for managing ML workflows
Hands-on experience with automated hyperparameter tuning and optimization frameworks
Familiarity with model monitoring tools like Prometheus, Grafana, or custom solutions for model drift and data quality checks
Experience integrating pre-trained foundational models and managing their deployment at scale
Contributions to open-source ML Ops projects or relevant research publications
Responsibilities
Develop, deploy, and maintain scalable infrastructure for machine learning model training, retraining, and inference
Design and optimize CI/CD pipelines specifically tailored for machine learning workflows, ensuring efficient delivery from research to production
Implement robust monitoring and logging systems to track model performance and identify potential issues in production environments
Collaborate with AI researchers and data scientists to ensure infrastructure aligns with project requirements and supports iterative experimentation
Manage compute resources (cloud and on-premises) to enable large-scale distributed training and inference tasks
Containerize machine learning models and applications using Docker and deploy them via Kubernetes or equivalent orchestration systems
Automate deployment workflows for serving ML models using frameworks such as TorchServe, TensorFlow Serving and FastAPI
Implement model versioning, rollback strategies, and governance for maintaining production stability
Optimize cost efficiency and performance of machine learning workflows in cloud environments such as AWS, GCP, or Azure
Stay updated with emerging ML Ops tools and practices, integrating them into existing workflows to improve performance and reliability
Skills
Docker
Kubernetes
TorchServe
TensorFlow Serving
FastAPI
CI/CD
MLOps
DevOps
Model Deployment
Distributed Training
Monitoring
Logging
Model Versioning
The Walt Disney Company
Leading producers & providers of entertainment and information
Website
About The Walt Disney Company
N/A
Headquarters
1923
Year Founded
N/A
Company Stage
10,001+
Employees
Related Jobs
Remote
Remote
Senior Machine Learning Engineer - Machine Learning Infrastructure
Flip
Salary not specified
Full Time
Senior (5 to 8 years)
Remote
Remote
Lead Data/ML Engineer
Superside
Salary not specified
Full Time
Mid-level (3 to 4 years), Senior (5 to 8 years)
Boston
Remote
Principal Software Engineer, ML Infrastructure
Motional
Salary not specified
Full Time
Expert & Leadership (9+ years)
Remote
Remote
Senior MLOps Engineer
TRM Labs
Salary not specified
Full Time
Senior (5 to 8 years)
United States
Remote
Senior AI Infrastructure Engineer
TetraScience
Salary not specified
Full Time
Senior (5 to 8 years), Expert & Leadership (9+ years)
San Francisco +3 more
Remote
AI / ML Platform Engineer
WhatNot
Salary not specified
United States
Remote
Senior Cloud Solutions Engineer
Lambda
$249,600 - $374,400
/year
Full Time
Senior (5 to 8 years)
San Francisco +7 more
Remote
Senior Site Reliability Engineer
Chainlink Labs
Salary not specified
Full Time
Senior (5 to 8 years)
Remote
Remote
Staff Software Engineer - AI Workflow Development
Philo
$60,000 - $90,000
/year
Internship
Mid-level (3 to 4 years), Senior (5 to 8 years)
United States
Remote
Lead Machine Learning & DevOps Engineer
Natera
Salary not specified
Full Time
Senior (5 to 8 years), Expert & Leadership (9+ years)
Land your dream remote job 3x faster with AI
Try Jobo Free