Application Performance Engineer, Technical Lead / Principal at d-Matrix

Santa Clara, California, United States

d-Matrix Logo
$196,000 – $300,000Compensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Artificial Intelligence, Machine Learning, HardwareIndustries

Requirements

Candidates should possess an engineering degree in Electrical Engineering, Computer Engineering, Computer Science, or a related field, along with over 10 years of relevant experience in AI/ML hardware, software, and infrastructure. A strong background in deep learning and neural networks, particularly in generative AI, is essential, as well as academic experience in computer architecture and performance modeling. Proven experience in analyzing and tuning inference performance on GPUs is required, along with familiarity with common deep learning software packages like PyTorch and understanding of model compilation and execution stack. Proficiency in C++ and Python programming languages is necessary, and excellent communication and presentation skills are expected. Experience in customer engineering and field support for enterprise-level AI and datacenter products is preferred.

Responsibilities

The Application Performance Engineer will provide expert guidance and support to customers with workload performance analysis, debugging, profiling, and implementing optimizations on d-Matrix hardware and software. They will develop tools and technical materials to simplify user experience with workload analysis and optimization. The role involves profiling and analyzing existing and emerging workloads on simulators and state-of-the-art hardware, as well as benchmarking performance for generative AI model inference. Additionally, the engineer will profile and analyze workloads in potential new product areas to guide roadmap decisions.

Skills

C++
Python
PyTorch
vLLM
CUDA
OpenAI Triton
GPUs
deep learning
neural networks
generative AI
computer architecture
hardware-software co-design
performance modeling

d-Matrix

AI compute platform for datacenters

About d-Matrix

d-Matrix focuses on improving the efficiency of AI computing for large datacenter customers. Its main product is the digital in-memory compute (DIMC) engine, which combines computing capabilities directly within programmable memory. This design helps reduce power consumption and enhances data processing speed while ensuring accuracy. d-Matrix differentiates itself from competitors by offering a modular and scalable approach, utilizing low-power chiplets that can be tailored for different applications. The company's goal is to provide high-performance, energy-efficient AI inference solutions to large-scale datacenter operators.

Santa Clara, CaliforniaHeadquarters
2019Year Founded
$149.8MTotal Funding
SERIES_BCompany Stage
Enterprise Software, AI & Machine LearningIndustries
201-500Employees

Benefits

Hybrid Work Options

Risks

Competition from Nvidia, AMD, and Intel may pressure d-Matrix's market share.
Complex AI chip design could lead to delays or increased production costs.
Rapid AI innovation may render d-Matrix's technology obsolete if not updated.

Differentiation

d-Matrix's DIMC engine integrates compute into memory, enhancing efficiency and accuracy.
The company offers scalable AI solutions through modular, low-power chiplets.
d-Matrix focuses on brain-inspired AI compute engines for diverse inferencing workloads.

Upsides

Growing demand for energy-efficient AI solutions boosts d-Matrix's low-power chiplets appeal.
Partnerships with companies like Microsoft could lead to strategic alliances.
Increasing adoption of modular AI hardware in data centers benefits d-Matrix's offerings.

Land your dream remote job 3x faster with AI