Inference Software Engineer at Etched.ai

San Jose, California, United States

Apply Now

$175,000 – $275,000Compensation

Junior (1 to 2 years)Experience Level

Full TimeJob Type

UnknownVisa

Artificial Intelligence, Semiconductor, SoftwareIndustries

Requirements

Candidates should possess proficiency in Rust and/or C++, along with a good familiarity with PyTorch and/or JAX. They should also have experience porting applications to non-standard or accelerator hardware platforms, and a solid systems knowledge including Linux internals, accelerator architectures (e.g., GPUs, TPUs), and high-speed interconnects (e.g., NVLink, InfiniBand). Strong candidates may have experience with low-latency, high-performance applications using both kernel-level and user-space networking stacks, and a deep understanding of distributed systems concepts, algorithms, and challenges.

Responsibilities

The Inference Software Engineer will support porting state-of-the-art models to the company’s architecture, help build programming abstractions and testing capabilities to rapidly iterate on model porting, scale and enhance Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling, and optimize routing and communication layers using Sohu’s collectives. They will also develop tools for performance profiling and debugging, identifying bottlenecks and correctness issues, and may be involved in analyzing performance traces and logs from distributed systems and ML workloads, building applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths, and familiarizing themselves with cluster orchestration tools (e.g., Kubernetes, Slurm) and ML platforms (e.g., Ray, Kubeflow).

Skills

Rust

C++

PyTorch

JAX

transformers

Linux

Accelerator Architectures

NVLink

InfiniBand

Performance Profiling

Debugging

Systems Knowledge

Etched.ai

Develops servers for transformer inference

About Etched.ai

The company specializes in developing powerful servers for transformer inference, utilizing transformer architecture integrated into their chips to achieve highly efficient and advanced technology. The main technologies used in the product are transformer architecture and advanced chip integration.

Cupertino, CA, USAHeadquarters

2022Year Founded

$5.4MTotal Funding

SEEDCompany Stage

HardwareIndustries

11-50Employees