Machine Learning Researcher
ElevenLabsFull Time
Mid-level (3 to 4 years), Senior (5 to 8 years)
Candidates should possess an ML Research background with interests in HW co-design, experience with Python, Pytorch, and/or JAX, and familiarity with transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.) and/or experience working in distributed inference/training environments. Strong candidates may also have ML Systems Research and HW Co-design backgrounds, published inference-time compute research and/or efficient ML research, and experience with Rust.
The Machine Learning Research Engineer will propose and conduct novel research to achieve results on Sohu that are unviable on GPUs, translate core mathematical operations from the most popular Transformer-based models into maximally performant instruction sequences for Sohu, develop deep architectural knowledge informing best-in-the-world software performance on Sohu HW, collaborate with HW architects and designers, co-design and finetune emerging model architectures for highest efficiency on Sohu, guide and contribute to the Sohu software stack, performance characterization tools, and runtime abstractions by implementing frontier models using Python and Rust, propose and implement a novel test time compute algorithm that leverages Sohu’s unique capabilities, implement diffusion models on Sohu to achieve GPU-impossible latencies, optimize model instructions and scheduling algorithms, and implement model-specific inference-time acceleration techniques.
Develops servers for transformer inference
The company specializes in developing powerful servers for transformer inference, utilizing transformer architecture integrated into their chips to achieve highly efficient and advanced technology. The main technologies used in the product are transformer architecture and advanced chip integration.