Staff Data Engineer at The Walt Disney Company

Nicasio, California, United States

Apply Now

Not SpecifiedCompensation

Senior (5 to 8 years)Experience Level

Full TimeJob Type

UnknownVisa

Entertainment, MediaIndustries

Requirements

Master’s Degree with preference for PhD in Data Engineering/Science, Computer Science, Signal Processing, or a related field
8+ years of experience in data engineering or data science with a focus on building pipelines for AI/ML applications
Proficiency in Python, with expertise in data manipulation libraries such as Pandas, NumPy, and PyTorch’s data utilities
Hands-on experience with audio processing libraries and tools (e.g., Librosa, FFmpeg, SoX) for handling complex audio formats
Familiarity with scalable pipeline tools like GitLab, Apache Spark, Airflow, or Luigi, and experience with containerized workflows (Docker, Kubernetes)
Strong understanding of data pipeline requirements for model training, retraining, and evaluation in iterative research workflows
Experience with immersive and multichannel audio formats
Knowledge of cloud-based platforms and tools for storage and processing, such as AWS S3, Redshift, or Google BigQuery
Strong problem-solving skills, with a proactive mindset for addressing evolving data challenges
Preferred Qualifications
Experience integrating data pipelines with AI/ML workflows, including active learning and model retraining
Familiarity with audio-specific datasets and metadata management strategies
Knowledge of machine learning principles and how data quality impacts model performance
Experience with distributed training pipelines and large-scale dataset processing
Contributions to open-source projects or published research in the fields of data science or audio processing
Experience with visualization tools (e.g., Tableau, Matplotlib) for quality assurance and exploratory data analysis
Expertise in designing systems to support AI/ML model monitoring and retraining over time

Responsibilities

Design, implement, and maintain scalable, automated data pipelines for the ingestion, preprocessing, and transformation of large-scale audio datasets
Ensure pipelines support efficient model training and retraining workflows, enabling continuous improvement of AI/ML models
Collaborate with AI/ML researchers to define data requirements and integrate feedback to improve data pipeline functionality
Develop advanced preprocessing techniques for immersive and multichannel audio formats (e.g., Dolby Atmos, high-order ambisonics)
Automate data cleaning, normalization, and augmentation processes to prepare datasets for various model architectures, including foundational models and transformers
Integrate external datasets and APIs while ensuring compliance with legal and ethical data usage standards
Monitor and optimize pipeline performance to handle complex and dynamic data structures effectively
Create tools and workflows for annotating, labeling, and curating datasets, including the use of active learning methods
Perform exploratory data analysis to uncover trends, validate dataset quality, and identify data gaps

Skills

Key technologies and capabilities for this role

Data PipelinesETLData IngestionData PreprocessingData TransformationAudio ProcessingDolby AtmosAmbisonicsData CleaningData NormalizationData AugmentationMachine LearningTransformersAPI IntegrationActive Learning

Questions & Answers

Common questions about this position

Is this role remote or hybrid?

This role is hybrid, requiring 2-3 days onsite at the Nicasio, CA office and occasional work from home.

What education and experience are required for this position?

A Master’s Degree is required, with preference for a PhD in Data Engineering/Science, Computer Science, Signal Processing, or related field, plus 8+ years of experience in data engineering or data science focused on AI/ML pipelines.

What technical skills are essential for this data engineer role?

Proficiency in Python with Pandas, NumPy, and PyTorch data utilities is required, along with experience in audio processing libraries like Librosa, FFmpeg, SoX, and scalable tools such as Apache Spark, Airflow, GitLab, Docker, and Kubernetes.

What is the salary range for this Staff Data Engineer position?

This information is not specified in the job description.

What experience stands out for strong candidates applying to this role?

Candidates with hands-on experience in immersive and multichannel audio formats, data pipelines for AI/ML model training/retraining, and advanced preprocessing for audio datasets will be strongest.