Senior ML Research Scientist (Speech)
Rad AI- Full Time
- Senior (5 to 8 years)
Deepgram is the leading voice AI platform for developers building speech-to-text (STT), text-to-speech (TTS) and full speech-to-speech (STS) offerings. 200,000+ developers build with Deepgram’s voice-native foundational models – accessed through APIs or as self-managed software – due to our unmatched accuracy, latency and pricing. Customers include software companies building voice products, co-sell partners working with large enterprises, and enterprises solving internal voice AI use cases. The company ended 2024 cash-flow positive with 400+ enterprise customers, 3.3x annual usage growth across the past 4 years, over 50,000 years of audio processed and over 1 trillion words transcribed. There is no organization in the world that understands voice better than Deepgram.
Voice AI stands at the cusp of a paradigm shift. Current approaches to voice interaction are fundamentally limited by the scarcity and diversity of audio data, combined with the prohibitive computational costs of processing high-dimensional audio at scale. These challenges have created a gap between the promise of universal voice interaction and today's reality. While our research scientists pioneer new approaches in Latent Space Models (LSMs) to solve these fundamental challenges, we need exceptional program leadership to transform these breakthroughs into real-world impact. The complexity of our mission—spanning audio compression, generative modeling, and multimodal systems—demands strategic orchestration of multiple research workstreams, careful resource allocation, and relentless focus on outcomes that advance our vision of making voice interaction universally accessible.
You will lead the strategic execution of our Voice AI Foundations research program, working at the intersection of cutting-edge research and practical implementation. Your mission is to accelerate our path to breakthrough voice AI capabilities by:
Speech recognition APIs for audio transcription
Deepgram specializes in artificial intelligence for speech recognition, offering a set of APIs that developers can use to transcribe and understand audio content. Their technology allows clients, ranging from startups to large organizations like NASA, to process millions of audio minutes daily. Deepgram's APIs are designed to be fast, accurate, scalable, and cost-effective, making them suitable for businesses needing to handle large volumes of audio data. The company operates on a pay-per-use model, where clients are charged based on the amount of audio they transcribe, allowing Deepgram to grow its revenue alongside client usage. With a focus on the high-growth market of speech recognition, Deepgram is positioned for future success.