MSc or PhD in Computer Science, Electrical Engineering, Applied Math, or a related field with a focus on AI/ML and multi-modal signal processing
5 years of professional experience in applied ML, with a deep focus on audio-centric AI/ML research and deployment
Expertise in building and scaling models using PyTorch, with fluency in training, fine-tuning, and inference for deep neural networks
Demonstrated experience developing generative models such as VAE, GAN, diffusion models, or neural vocoders (e.g., HiFi-GAN, WaveNet)
Deep understanding of audio-specific ML domains, including source separation, speech enhancement, music processing, and cross-modal tasks
Experience with MLOps tooling (e.g., Weights & Biases, MLflow, Datachain), Docker-based containerization, and scalable infrastructure for distributed training
Fluency in audio signal processing fundamentals and the integration of DSP into ML pipelines
Proven ability to contribute to architectural planning, research strategy, and production deployment in complex, multi-stakeholder environments
Preferred Qualifications
Familiarity with audio/text/video multi-modal frameworks and cross-domain representations
Experience implementing real-time or near-real-time inference pipelines in cloud or edge environments (e.g., AWS, GCP, on-prem GPUs)
Working knowledge of latent diffusion audio models (e.g., stable-audio, AudioLDM, AudioGen)
Strong knowledge of industry-standard audio datasets and benchmarks (LibriSpeech, VCTK, MUSDB, etc.)
Responsibilities
Lead the research, design, and implementation of state-of-the-art machine learning algorithms for speech processing, voice transfer, source separation, and upmixing in media post-production environments
Drive the architecture and deployment of scalable model training pipelines using PyTorch and distributed computing frameworks
Develop novel generative audio models, including latent diffusion, flow-based models, variational autoencoders, and neural vocoders, optimized for professional soundtrack production
Own end-to-end model lifecycle management: pretraining, fine-tuning, validation, inference optimization, and CI/CD integration
Guide the development of personalized model adaptation workflows to support per-user tuning, cross-project continuity, and flexible deployment
Collaborate with product, platform, and engineering leads to define integration strategies within a secure, cloud-optimized SaaS environment
Stay at the forefront of generative audio, multi-modal modeling, and self-supervised learning—translating emerging research into applied innovation
Contribute to internal tooling and infrastructure that improves iteration speed, reproducibility, and explainability of deployed models
Mentor junior researchers and engineers, and contribute to a culture of rigorous experimentation, collaboration, and continuous improvement
Skills
PyTorch
Machine Learning
Speech Processing
Source Separation
Upmixing
Neural Vocoders
Latent Diffusion Models
Variational Autoencoders
Flow-based Models
Distributed Computing
CI/CD
Generative Audio
Model Training Pipelines
The Walt Disney Company
Leading producers & providers of entertainment and information