Machine Learning Research Engineer at Etched.ai

Cupertino, California, United States

Etched.ai Logo
Not SpecifiedCompensation
Mid-level (3 to 4 years)Experience Level
Full TimeJob Type
UnknownVisa
Artificial Intelligence, Semiconductor, SoftwareIndustries

Requirements

Candidates should possess an ML Research background with interests in HW co-design, experience with Python, Pytorch, and/or JAX, and familiarity with transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.) and/or experience working in distributed inference/training environments. Strong candidates may also have ML Systems Research and HW Co-design backgrounds, published inference-time compute research and/or efficient ML research, and experience with Rust.

Responsibilities

The Machine Learning Research Engineer will propose and conduct novel research to achieve results on Sohu that are unviable on GPUs, translate core mathematical operations from the most popular Transformer-based models into maximally performant instruction sequences for Sohu, develop deep architectural knowledge informing best-in-the-world software performance on Sohu HW, collaborate with HW architects and designers, co-design and finetune emerging model architectures for highest efficiency on Sohu, guide and contribute to the Sohu software stack, performance characterization tools, and runtime abstractions by implementing frontier models using Python and Rust, propose and implement a novel test time compute algorithm that leverages Sohu’s unique capabilities, implement diffusion models on Sohu to achieve GPU-impossible latencies, optimize model instructions and scheduling algorithms, and implement model-specific inference-time acceleration techniques.

Skills

Machine Learning
Research
Python
Rust
Transformer Models
Model Architecture
Performance Optimization
Diffusion Models
Inference-time Acceleration
Algorithm Design

Etched.ai

Develops servers for transformer inference

About Etched.ai

The company specializes in developing powerful servers for transformer inference, utilizing transformer architecture integrated into their chips to achieve highly efficient and advanced technology. The main technologies used in the product are transformer architecture and advanced chip integration.

Cupertino, CA, USAHeadquarters
2022Year Founded
$5.4MTotal Funding
SEEDCompany Stage
HardwareIndustries
11-50Employees

Land your dream remote job 3x faster with AI