Machine Learning Research Engineer at Etched.ai

Cupertino, California, United States

Apply Now

Not SpecifiedCompensation

Mid-level (3 to 4 years)Experience Level

Full TimeJob Type

UnknownVisa

Artificial Intelligence, Semiconductor, SoftwareIndustries

Requirements

Candidates should possess an ML Research background with interests in HW co-design, experience with Python, Pytorch, and/or JAX, and familiarity with transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.) and/or experience working in distributed inference/training environments. Strong candidates may also have ML Systems Research and HW Co-design backgrounds, published inference-time compute research and/or efficient ML research, and experience with Rust.

Responsibilities

The Machine Learning Research Engineer will propose and conduct novel research to achieve results on Sohu that are unviable on GPUs, translate core mathematical operations from the most popular Transformer-based models into maximally performant instruction sequences for Sohu, develop deep architectural knowledge informing best-in-the-world software performance on Sohu HW, collaborate with HW architects and designers, co-design and finetune emerging model architectures for highest efficiency on Sohu, guide and contribute to the Sohu software stack, performance characterization tools, and runtime abstractions by implementing frontier models using Python and Rust, propose and implement a novel test time compute algorithm that leverages Sohu’s unique capabilities, implement diffusion models on Sohu to achieve GPU-impossible latencies, optimize model instructions and scheduling algorithms, and implement model-specific inference-time acceleration techniques.

Skills

Machine Learning

Research

Python

Rust

Transformer Models

Model Architecture

Performance Optimization

Diffusion Models

Inference-time Acceleration

Algorithm Design

Etched.ai

Develops servers for transformer inference

About Etched.ai

The company specializes in developing powerful servers for transformer inference, utilizing transformer architecture integrated into their chips to achieve highly efficient and advanced technology. The main technologies used in the product are transformer architecture and advanced chip integration.

Cupertino, CA, USAHeadquarters

2022Year Founded

$5.4MTotal Funding

SEEDCompany Stage

HardwareIndustries

11-50Employees