Research Scientist - Voice AI Foundations
DeepgramFull Time
Mid-level (3 to 4 years), Senior (5 to 8 years)
Candidates must possess strong programming skills in Python and PyTorch, along with experience in large-scale dataset management and multimodal data processing pipelines. A solid understanding of computer vision, audio processing, and/or natural language processing techniques is essential. Preferred qualifications include expertise in working with interleaved multimodal data and hands-on experience with Vision Language Models, Audio Language Models, or generative video models.
The Research Scientist/Engineer will identify capability gaps and research solutions to improve multimodal AI systems. Responsibilities include designing datasets and data-mixture ablations, developing evaluation frameworks and benchmarking approaches for multimodal AI capabilities, and creating prototypes and demonstrations to showcase new multimodal capabilities.
Develops multimodal AI technologies for creativity
Luma AI develops multimodal artificial intelligence technologies that enhance human creativity and capabilities. Their main product, the Dream Machine, allows users to interact with various types of data, enabling creative professionals, businesses, and developers to explore innovative applications of AI. Unlike many competitors, Luma AI focuses on integrating multiple modes of interaction, which broadens the possibilities for users. The company operates on a subscription model, providing access to its AI tools and services, and aims to lead the way in AI-driven creativity and productivity.