Performance Modeling Engineer
GroqFull Time
Expert & Leadership (9+ years)
Candidates should possess an engineering degree in Electrical Engineering, Computer Engineering, Computer Science, or a related field, along with over 10 years of relevant experience in AI/ML hardware, software, and infrastructure. A strong background in deep learning and neural networks, particularly in generative AI, is essential, as well as academic experience in computer architecture and performance modeling. Proven experience in analyzing and tuning inference performance on GPUs is required, along with familiarity with common deep learning software packages like PyTorch and understanding of model compilation and execution stack. Proficiency in C++ and Python programming languages is necessary, and excellent communication and presentation skills are expected. Experience in customer engineering and field support for enterprise-level AI and datacenter products is preferred.
The Application Performance Engineer will provide expert guidance and support to customers with workload performance analysis, debugging, profiling, and implementing optimizations on d-Matrix hardware and software. They will develop tools and technical materials to simplify user experience with workload analysis and optimization. The role involves profiling and analyzing existing and emerging workloads on simulators and state-of-the-art hardware, as well as benchmarking performance for generative AI model inference. Additionally, the engineer will profile and analyze workloads in potential new product areas to guide roadmap decisions.
AI compute platform for datacenters
d-Matrix focuses on improving the efficiency of AI computing for large datacenter customers. Its main product is the digital in-memory compute (DIMC) engine, which combines computing capabilities directly within programmable memory. This design helps reduce power consumption and enhances data processing speed while ensuring accuracy. d-Matrix differentiates itself from competitors by offering a modular and scalable approach, utilizing low-power chiplets that can be tailored for different applications. The company's goal is to provide high-performance, energy-efficient AI inference solutions to large-scale datacenter operators.