AI Hardware Architect
d-MatrixFull Time
Expert & Leadership (9+ years)
Candidates should possess co-design expertise across both software and hardware domains, strong software engineering skills with systems programming experience, and a deep knowledge of transformer model architectures and/or inference serving stacks such as vLLM and SGLang. Strong mathematical skills, particularly in linear algebra, are required, along with the ability to reason about performance bottlenecks and optimization opportunities. Experience working cross-functionally in diverse software and hardware organizations is also beneficial.
The Machine Learning Co-Design Researcher will translate core mathematical operations from transformer models into optimized operation sequences for Sohu, develop and leverage a deep understanding of Sohu to co-design both hardware instructions and model architecture operations, and implement high-performance software components for the Model Toolkit. They will collaborate with hardware engineers to maximize chip utilization and minimize latency, implement efficient batching strategies and execution plans for inference workloads, design and implement cutting-edge inference time compute scaling methods, alter and fine-tune model architectures or inference time compute algorithms, and contribute to the evolution of the company's system architecture and programming model. Specific tasks include optimizing operation sequences to maximize Sohu’s computational resources, researching and implementing efficient memory management techniques, developing algorithms for continuous batching and batch interleaving, and researching and implementing model-specific inference-time acceleration algorithms such as speculative decoding and KV cache sharing.
Develops servers for transformer inference
The company specializes in developing powerful servers for transformer inference, utilizing transformer architecture integrated into their chips to achieve highly efficient and advanced technology. The main technologies used in the product are transformer architecture and advanced chip integration.