Inference Software Engineer at Etched.ai

Cupertino, California, United States

Etched.ai Logo
Not SpecifiedCompensation
Junior (1 to 2 years)Experience Level
Full TimeJob Type
UnknownVisa
Semiconductor, Artificial IntelligenceIndustries

Requirements

Candidates should have experience with C++ and Python, familiarity with transformer model architectures and inference serving stacks (vLLM, SGLang, etc.), and experience working in distributed inference/training environments. Strong candidates may also possess experience with Rust and familiarity with GPU kernels, the CUDA compilation stack, or other hardware accelerators, as well as an understanding of distributed systems, networking, and parallel programming.

Responsibilities

The Inference Software Engineer will contribute to the architecture and design of the Sohu host software stack, implement high-performance, modular code across the Etched software stack using Rust, C++, and Python, interface with firmware and drivers teams, and work with AI model researchers and product-facing teams building the Etched serving front-end. They will also build scheduling logic for continuous batching and real-time inference, implement inference-time acceleration techniques, and implement distributed networking primitives for efficient multi-server inference.

Skills

C++
Python
Transformer model architectures
Inference serving stacks
Rust
Distributed systems
Networking
Parallel programming

Etched.ai

Develops servers for transformer inference

About Etched.ai

The company specializes in developing powerful servers for transformer inference, utilizing transformer architecture integrated into their chips to achieve highly efficient and advanced technology. The main technologies used in the product are transformer architecture and advanced chip integration.

Cupertino, CA, USAHeadquarters
2022Year Founded
$5.4MTotal Funding
SEEDCompany Stage
HardwareIndustries
11-50Employees

Land your dream remote job 3x faster with AI