Machine Learning Co-Design Researcher at Etched.ai

San Jose, California, United States

Apply Now

Not SpecifiedCompensation

Junior (1 to 2 years)Experience Level

Full TimeJob Type

UnknownVisa

Artificial Intelligence, Semiconductor, HardwareIndustries

Requirements

Candidates should possess co-design expertise across both software and hardware domains, strong software engineering skills with systems programming experience, and a deep knowledge of transformer model architectures and/or inference serving stacks such as vLLM and SGLang. Strong mathematical skills, particularly in linear algebra, are required, along with the ability to reason about performance bottlenecks and optimization opportunities. Experience working cross-functionally in diverse software and hardware organizations is also beneficial.

Responsibilities

The Machine Learning Co-Design Researcher will translate core mathematical operations from transformer models into optimized operation sequences for Sohu, develop and leverage a deep understanding of Sohu to co-design both hardware instructions and model architecture operations, and implement high-performance software components for the Model Toolkit. They will collaborate with hardware engineers to maximize chip utilization and minimize latency, implement efficient batching strategies and execution plans for inference workloads, design and implement cutting-edge inference time compute scaling methods, alter and fine-tune model architectures or inference time compute algorithms, and contribute to the evolution of the company's system architecture and programming model. Specific tasks include optimizing operation sequences to maximize Sohu’s computational resources, researching and implementing efficient memory management techniques, developing algorithms for continuous batching and batch interleaving, and researching and implementing model-specific inference-time acceleration algorithms such as speculative decoding and KV cache sharing.

Skills

Transformer model architectures

Inference serving stacks

Systems programming

Linear algebra

Performance optimization

Software engineering

Mathematical skills

Hardware design

Batching strategies

Memory management

Etched.ai

Develops servers for transformer inference

About Etched.ai

The company specializes in developing powerful servers for transformer inference, utilizing transformer architecture integrated into their chips to achieve highly efficient and advanced technology. The main technologies used in the product are transformer architecture and advanced chip integration.

Cupertino, CA, USAHeadquarters

2022Year Founded

$5.4MTotal Funding

SEEDCompany Stage

HardwareIndustries

11-50Employees