Senior Tech Lead Manager, Model Efficiency
Cohere- Full Time
- Senior (5 to 8 years)
Candidates must have a Bachelor's degree in Computer Science with at least 7 years of relevant industry experience, or a Master's degree in Computer Science with at least 5 years of relevant industry experience. Proficiency in delivering production quality code in modern C++ is required, along with experience in modern compiler infrastructures such as LLVM or MLIR. Familiarity with machine learning frameworks and interfaces like ONNX, TensorFlow, and PyTorch is essential, as well as experience in production compiler development. Preferred qualifications include algorithm design skills and experience with relevant Open Source ML projects like Torch-MLIR, ONNX-MLIR, Caffe, or TVM.
The ML Compiler Engineer will develop the compiler backend, focusing on the assignment of hardware resources in a spatial architecture to execute low-level instructions. The role involves solving algorithmic compiler problems and learning the intricate details of the underlying hardware and software architectures. The engineer will work on model partitioning, tiling, resource allocation, memory management, scheduling, and optimization for latency, bandwidth, and throughput while collaborating with a team of experienced compiler developers.
AI compute platform for datacenters
d-Matrix focuses on improving the efficiency of AI computing for large datacenter customers. Its main product is the digital in-memory compute (DIMC) engine, which combines computing capabilities directly within programmable memory. This design helps reduce power consumption and enhances data processing speed while ensuring accuracy. d-Matrix differentiates itself from competitors by offering a modular and scalable approach, utilizing low-power chiplets that can be tailored for different applications. The company's goal is to provide high-performance, energy-efficient AI inference solutions to large-scale datacenter operators.