Inference Runtime Systems - Software Engineer, Staff

Santa Clara, California, United States

Apply with AI Apply

$142,500 – $230,000Compensation

Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level

Full TimeJob Type

UnknownVisa

AI & Machine Learning, HardwareIndustries

Requirements

Candidates should have a Bachelor's degree with a minimum of 6+ years of professional experience in software development focused on C++, or a Master's degree preferred in computer science, engineering, or a related field with 3+ years of relevant experience. A strong background in architecting and building complex software systems is required, along with experience in distributed systems or high-performance computing (HPC) applications. Familiarity with PyTorch internals or similar machine learning frameworks is a significant advantage. Technical skills should include strong proficiency in modern C++ (C++11 and above) and Python, a solid understanding of software design patterns and best practices, experience with parallel and concurrent programming, and proficiency in CMake and Pytest. Knowledge of GPU programming and acceleration techniques is a plus.

Responsibilities

The Staff Software Engineer will lead the design and implementation of a high-performance inference runtime that leverages d-Matrix's advanced hardware capabilities. They will integrate the inference runtime with PyTorch to enable upstream software capabilities like inference and finetuning. The role involves collaborating closely with cross-functional teams including hardware engineers, data scientists, and product managers to define requirements and deliver integrated solutions. Additionally, they will develop and implement optimization techniques to ensure low latency and high throughput in distributed and HPC environments, ensure code quality and performance through rigorous testing and code reviews, and create technical documentation to support development, deployment, and maintenance activities.

Skills

C++

Python

CMake

Pytest

GPU Programming

Distributed Systems

HPC

PyTorch

Software Design Patterns

Parallel Programming

Concurrent Programming

d-Matrix

AI compute platform for datacenters

About d-Matrix

d-Matrix focuses on improving the efficiency of AI computing for large datacenter customers. Its main product is the digital in-memory compute (DIMC) engine, which combines computing capabilities directly within programmable memory. This design helps reduce power consumption and enhances data processing speed while ensuring accuracy. d-Matrix differentiates itself from competitors by offering a modular and scalable approach, utilizing low-power chiplets that can be tailored for different applications. The company's goal is to provide high-performance, energy-efficient AI inference solutions to large-scale datacenter operators.

Santa Clara, CaliforniaHeadquarters

2019Year Founded

$149.8MTotal Funding

SERIES_BCompany Stage

Enterprise Software, AI & Machine LearningIndustries

201-500Employees

Benefits

Hybrid Work Options

Risks

Competition from Nvidia, AMD, and Intel may pressure d-Matrix's market share.

Complex AI chip design could lead to delays or increased production costs.

Rapid AI innovation may render d-Matrix's technology obsolete if not updated.

Differentiation

d-Matrix's DIMC engine integrates compute into memory, enhancing efficiency and accuracy.

The company offers scalable AI solutions through modular, low-power chiplets.

d-Matrix focuses on brain-inspired AI compute engines for diverse inferencing workloads.

Upsides

Growing demand for energy-efficient AI solutions boosts d-Matrix's low-power chiplets appeal.

Partnerships with companies like Microsoft could lead to strategic alliances.

Increasing adoption of modular AI hardware in data centers benefits d-Matrix's offerings.

Land your dream remote job 3x faster with AI

Try Jobo Free