AI Engineer & Researcher, Inference
SpeechifyFull Time
Junior (1 to 2 years)
Candidates should possess proficiency in Rust and/or C++, along with a good familiarity with PyTorch and/or JAX. They should also have experience porting applications to non-standard or accelerator hardware platforms, and a solid systems knowledge including Linux internals, accelerator architectures (e.g., GPUs, TPUs), and high-speed interconnects (e.g., NVLink, InfiniBand). Strong candidates may have experience with low-latency, high-performance applications using both kernel-level and user-space networking stacks, and a deep understanding of distributed systems concepts, algorithms, and challenges.
The Inference Software Engineer will support porting state-of-the-art models to the company’s architecture, help build programming abstractions and testing capabilities to rapidly iterate on model porting, scale and enhance Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling, and optimize routing and communication layers using Sohu’s collectives. They will also develop tools for performance profiling and debugging, identifying bottlenecks and correctness issues, and may be involved in analyzing performance traces and logs from distributed systems and ML workloads, building applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths, and familiarizing themselves with cluster orchestration tools (e.g., Kubernetes, Slurm) and ML platforms (e.g., Ray, Kubeflow).
Develops servers for transformer inference
The company specializes in developing powerful servers for transformer inference, utilizing transformer architecture integrated into their chips to achieve highly efficient and advanced technology. The main technologies used in the product are transformer architecture and advanced chip integration.