Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems) at Tencent

Bellevue, Washington, United States

Tencent Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, Artificial IntelligenceIndustries

Requirements

  • Ph.D. in Computer Science, Electrical Engineering, Artificial Intelligence, Linguistics, or a related field; or Master’s degree with several years of relevant experience
  • Solid understanding of speech and audio signal processing, acoustic modeling, language modeling, and large model architectures
  • Proficient in one or more core speech system development pipelines such as ASR, TTS, or speech translation; experience with multilingual, multitask, or end-to-end systems is a plus
  • Proficient in deep learning frameworks such as PyTorch or TensorFlow; experience with large-scale training and distributed systems is a plus
  • Familiar with Transformer-based architectures and their applications in speech and multimodal training/inference
  • In-depth research or practical experience in speech representation pretraining (e.g., HuBERT, Wav2Vec, Whisper) strongly preferred
  • Experience in multimodal alignment and cross-modal modeling (e.g., audio-visual-text) strongly preferred
  • Experience driving state-of-the-art (SOTA) performance on audio understanding tasks with large models strongly preferred

Responsibilities

  • Develop general-purpose, end-to-end large speech models covering multilingual automatic speech recognition (ASR), speech translation, speech synthesis, paralinguistic understanding, and general audio understanding
  • Advance research on speech representation learning and encoder/decoder architectures to build unified acoustic representations for multi-task and multimodal applications
  • Explore representation alignment and fusion mechanisms between audio/speech and other modalities in large multimodal models, enabling joint modeling with image and text
  • Build and maintain high-quality multimodal speech datasets, including automatic annotation and data synthesis technologies

Skills

PyTorch
TensorFlow
ASR
TTS
Speech Translation
HuBERT
Wav2Vec
Whisper
Speech Representation Learning
Multimodal Alignment
Acoustic Modeling
Language Modeling
Deep Learning
End-to-End Systems

Tencent

Internet platform for social, gaming, fintech

About Tencent

Tencent is a technology company that focuses on enhancing the daily lives of internet users and assisting businesses in their digital transformation. It operates in various sectors, including social networking, entertainment, fintech, and cloud computing. Tencent's main products include WeChat, a messaging and mobile payment app with over a billion users, and Tencent Games, which produces popular video games like Honor of Kings and PUBG Mobile. The company generates revenue through online advertising, subscription services, in-app purchases, mobile payments, and cloud services. Unlike many competitors, Tencent has a diverse business model that allows it to serve both individual users and enterprises effectively. The goal of Tencent is to enrich user experiences and support businesses in their digital journeys.

Shenzhen, ChinaHeadquarters
1998Year Founded
$31.5MTotal Funding
IPOCompany Stage
Consumer Software, Enterprise Software, Fintech, AI & Machine Learning, GamingIndustries
10,001+Employees

Benefits

Professional Development Budget

Risks

Tencent's addition to the US blacklist may affect its operations and partnerships.
Developing Call of Duty mobile version may lead to competitive tensions with Microsoft.
Investment in blockchain exposes Tencent to volatile regulatory environments.

Differentiation

Tencent's WeChat app integrates messaging, social media, and mobile payments seamlessly.
Tencent Games is a global leader with popular titles like Honor of Kings and PUBG Mobile.
Tencent Cloud offers scalable solutions for businesses, enhancing digital transformation efforts.

Upsides

Tencent's investment in blockchain technology could enhance its fintech and cloud services.
The Hunyuan-Large language model advances Tencent's AI capabilities in social networking and gaming.
Collaboration with DYXnet on AI solutions opens new avenues in digital transformation services.

Land your dream remote job 3x faster with AI