AI Infrastructure Solution Architect, Principal at d-Matrix

Santa Clara, California, United States

Apply Now

$175,000 – $260,000Compensation

Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level

Full TimeJob Type

UnknownVisa

Artificial Intelligence, Technology, HardwareIndustries

Requirements

Bachelor's or Master’s degree in Computer Science, or related technical field
10+ years of experience in infrastructure solution architecture, systems management, DevOps, or platform engineering roles
Experience working with GPUs, custom AI accelerators or heterogeneous compute environments
Proven expertise in building, managing, and monitoring full-stack AI infrastructure at scale
Strong scripting/automation skills: Python, Bash, Ansible, Terraform, Helm, Docker/Kubernetes
Deep understanding of orchestration technologies (Kubernetes, Ray, KServe, etc.), containerization, server clusters, multi-tenant serving, etc
Experience with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.)
Familiarity with model serving and orchestration platforms (e.g., Triton Inference Server, Ray Serve, Kubeflow)
Strong system debugging and incident response skills
Outstanding collaboration and communication skills

Responsibilities

Develop end-to-end AI infrastructure reference solutions optimized for d-Matrix servers including compute, networking, storage, and orchestration layers, in collaboration with various internal teams
Create reference blueprints that integrate smoothly into cloud-native and on-prem environments
Develop infrastructure-as-code templates and examples using Ansible, Terraform, and Helm for provisioning d-Matrix-based nodes and clusters
Integrate with Kubernetes-based systems to enable model deployment, auto-scaling, and fault-tolerant execution
Design and deploy telemetry and monitoring frameworks to support real-time visibility into d-Matrix cluster health, job status, and system performance
Integrate with industry-standard observability stacks (e.g., Prometheus, Grafana, OpenTelemetry) for data collection, visualization, and alerting
Develop dashboards, health check systems, and metric pipelines that track performance, availability, and operational KPIs
Collaborate with performance and software teams to validate infrastructure using real-world workloads and benchmarks
Incorporate telemetry hooks for benchmark reporting and feedback-driven tuning
Create and publish detailed infrastructure deployment guides, monitoring configuration templates, and operational best practices
Collaborate with customers and OEM/ISV ecosystem, enable them to adopt and customize reference solutions to their specific datacenter environments and/or software stacks

Skills

Key technologies and capabilities for this role

AI InfrastructureSolution ArchitectureKubernetesTerraformAnsibleHelmInfrastructure as CodeTelemetryMonitoringGen AI InferenceNetworkingStorageOrchestrationAuto-scaling

Questions & Answers

Common questions about this position

What is the salary range for this position?

The salary range is $175K - $260K.

Is this role remote or hybrid, and what are the location requirements?

This is a hybrid role requiring onsite work at the Santa Clara, CA headquarters 3-5 days per week.

What key skills and experience are required for this role?

Candidates need a Bachelor's or Master’s degree in Computer Science or related field, plus 10+ years of experience in infrastructure solution architecture and systems management. Key technical skills include Ansible, Terraform, Helm, Kubernetes, and observability tools like Prometheus, Grafana, and OpenTelemetry.

What is the company culture like at d-Matrix?

The culture emphasizes respect, collaboration, humility, direct communication, inclusivity, and diverse perspectives for better solutions. They seek passionate individuals driven by execution to tackle AI challenges.

What makes a strong candidate for this AI Infrastructure Solution Architect role?

Strong candidates have 10+ years in infrastructure solution architecture, expertise in tools like Ansible, Terraform, Helm, Kubernetes, and observability stacks, plus a passion for AI challenges and collaborative execution.

d-Matrix

AI compute platform for datacenters

About d-Matrix

d-Matrix focuses on improving the efficiency of AI computing for large datacenter customers. Its main product is the digital in-memory compute (DIMC) engine, which combines computing capabilities directly within programmable memory. This design helps reduce power consumption and enhances data processing speed while ensuring accuracy. d-Matrix differentiates itself from competitors by offering a modular and scalable approach, utilizing low-power chiplets that can be tailored for different applications. The company's goal is to provide high-performance, energy-efficient AI inference solutions to large-scale datacenter operators.

Santa Clara, CaliforniaHeadquarters

2019Year Founded

$149.8MTotal Funding

SERIES_BCompany Stage

Enterprise Software, AI & Machine LearningIndustries

201-500Employees

Benefits

Hybrid Work Options

Risks

Competition from Nvidia, AMD, and Intel may pressure d-Matrix's market share.

Complex AI chip design could lead to delays or increased production costs.

Rapid AI innovation may render d-Matrix's technology obsolete if not updated.

Differentiation

d-Matrix's DIMC engine integrates compute into memory, enhancing efficiency and accuracy.

The company offers scalable AI solutions through modular, low-power chiplets.

d-Matrix focuses on brain-inspired AI compute engines for diverse inferencing workloads.

Upsides

Growing demand for energy-efficient AI solutions boosts d-Matrix's low-power chiplets appeal.

Partnerships with companies like Microsoft could lead to strategic alliances.

Increasing adoption of modular AI hardware in data centers benefits d-Matrix's offerings.

Land your dream remote job 3x faster with AI

Try Jobo Free