AI Infrastructure Solution Architect, Principal at d-Matrix

Santa Clara, California, United States

d-Matrix Logo
$175,000 – $260,000Compensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Artificial Intelligence, Technology, HardwareIndustries

Requirements

  • Bachelor's or Master’s degree in Computer Science, or related technical field
  • 10+ years of experience in infrastructure solution architecture, systems management, DevOps, or platform engineering roles
  • Experience working with GPUs, custom AI accelerators or heterogeneous compute environments
  • Proven expertise in building, managing, and monitoring full-stack AI infrastructure at scale
  • Strong scripting/automation skills: Python, Bash, Ansible, Terraform, Helm, Docker/Kubernetes
  • Deep understanding of orchestration technologies (Kubernetes, Ray, KServe, etc.), containerization, server clusters, multi-tenant serving, etc
  • Experience with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.)
  • Familiarity with model serving and orchestration platforms (e.g., Triton Inference Server, Ray Serve, Kubeflow)
  • Strong system debugging and incident response skills
  • Outstanding collaboration and communication skills

Responsibilities

  • Develop end-to-end AI infrastructure reference solutions optimized for d-Matrix servers including compute, networking, storage, and orchestration layers, in collaboration with various internal teams
  • Create reference blueprints that integrate smoothly into cloud-native and on-prem environments
  • Develop infrastructure-as-code templates and examples using Ansible, Terraform, and Helm for provisioning d-Matrix-based nodes and clusters
  • Integrate with Kubernetes-based systems to enable model deployment, auto-scaling, and fault-tolerant execution
  • Design and deploy telemetry and monitoring frameworks to support real-time visibility into d-Matrix cluster health, job status, and system performance
  • Integrate with industry-standard observability stacks (e.g., Prometheus, Grafana, OpenTelemetry) for data collection, visualization, and alerting
  • Develop dashboards, health check systems, and metric pipelines that track performance, availability, and operational KPIs
  • Collaborate with performance and software teams to validate infrastructure using real-world workloads and benchmarks
  • Incorporate telemetry hooks for benchmark reporting and feedback-driven tuning
  • Create and publish detailed infrastructure deployment guides, monitoring configuration templates, and operational best practices
  • Collaborate with customers and OEM/ISV ecosystem, enable them to adopt and customize reference solutions to their specific datacenter environments and/or software stacks

Skills

Key technologies and capabilities for this role

AI InfrastructureSolution ArchitectureKubernetesTerraformAnsibleHelmInfrastructure as CodeTelemetryMonitoringGen AI InferenceNetworkingStorageOrchestrationAuto-scaling

Questions & Answers

Common questions about this position

What is the salary range for this position?

The salary range is $175K - $260K.

Is this role remote or hybrid, and what are the location requirements?

This is a hybrid role requiring onsite work at the Santa Clara, CA headquarters 3-5 days per week.

What key skills and experience are required for this role?

Candidates need a Bachelor's or Master’s degree in Computer Science or related field, plus 10+ years of experience in infrastructure solution architecture and systems management. Key technical skills include Ansible, Terraform, Helm, Kubernetes, and observability tools like Prometheus, Grafana, and OpenTelemetry.

What is the company culture like at d-Matrix?

The culture emphasizes respect, collaboration, humility, direct communication, inclusivity, and diverse perspectives for better solutions. They seek passionate individuals driven by execution to tackle AI challenges.

What makes a strong candidate for this AI Infrastructure Solution Architect role?

Strong candidates have 10+ years in infrastructure solution architecture, expertise in tools like Ansible, Terraform, Helm, Kubernetes, and observability stacks, plus a passion for AI challenges and collaborative execution.

d-Matrix

AI compute platform for datacenters

About d-Matrix

d-Matrix focuses on improving the efficiency of AI computing for large datacenter customers. Its main product is the digital in-memory compute (DIMC) engine, which combines computing capabilities directly within programmable memory. This design helps reduce power consumption and enhances data processing speed while ensuring accuracy. d-Matrix differentiates itself from competitors by offering a modular and scalable approach, utilizing low-power chiplets that can be tailored for different applications. The company's goal is to provide high-performance, energy-efficient AI inference solutions to large-scale datacenter operators.

Santa Clara, CaliforniaHeadquarters
2019Year Founded
$149.8MTotal Funding
SERIES_BCompany Stage
Enterprise Software, AI & Machine LearningIndustries
201-500Employees

Benefits

Hybrid Work Options

Risks

Competition from Nvidia, AMD, and Intel may pressure d-Matrix's market share.
Complex AI chip design could lead to delays or increased production costs.
Rapid AI innovation may render d-Matrix's technology obsolete if not updated.

Differentiation

d-Matrix's DIMC engine integrates compute into memory, enhancing efficiency and accuracy.
The company offers scalable AI solutions through modular, low-power chiplets.
d-Matrix focuses on brain-inspired AI compute engines for diverse inferencing workloads.

Upsides

Growing demand for energy-efficient AI solutions boosts d-Matrix's low-power chiplets appeal.
Partnerships with companies like Microsoft could lead to strategic alliances.
Increasing adoption of modular AI hardware in data centers benefits d-Matrix's offerings.

Land your dream remote job 3x faster with AI