Distributed LLM Inference Engineer at Anyscale

San Francisco, California, United States

Apply Now

$170,100 – $247,000Compensation

Junior (1 to 2 years)Experience Level

Full TimeJob Type

UnknownVisa

SoftwareIndustries

Requirements

Candidates should have familiarity with running ML inference at large scale with high throughput and low latency, and a solid understanding of distributed systems and ML inference challenges. Familiarity with deep learning and deep learning frameworks like PyTorch is also required. Bonus points include ML Systems knowledge, experience using Ray, working closely with the community on LLM engines like vLLM or TensorRT-LLM, and contributions to deep learning frameworks or compilers.

Responsibilities

The Distributed LLM Inference Engineer will iterate quickly with product teams to ship end-to-end solutions for batch and online inference at high scale. This role involves working across the stack to integrate Ray Data and LLM engines, providing optimizations for cost-effective large-scale ML inference. Additionally, the engineer will integrate with open-source software like vLLM, contribute improvements to open source, and implement and extend best practices from the latest state-of-the-art in the open-source and research communities.

Skills

LLM Inference

Distributed Systems

ML Inference

PyTorch

Ray Data

vLLM

Anyscale

Platform for scaling AI workloads

About Anyscale

Anyscale provides a platform designed to scale and productionize artificial intelligence (AI) and machine learning (ML) workloads. Its main product, Ray, is an open-source framework that helps developers manage and scale AI applications across various fields, including Generative AI, Large Language Models (LLMs), and computer vision. Ray allows companies to enhance the performance, fault tolerance, and scalability of their AI systems, with some users reporting over 90% improvements in efficiency, latency, and cost-effectiveness. Anyscale primarily serves clients in the AI and ML sectors, including major companies like OpenAI and Ant Group, who rely on Ray for training large models. The company operates on a software-as-a-service (SaaS) model, charging clients a subscription fee for access to the Ray platform. Anyscale's goal is to empower organizations to effectively scale their AI workloads and optimize their operations.

San Francisco, CaliforniaHeadquarters

2019Year Founded

$252.5MTotal Funding

SERIES_CCompany Stage

Enterprise Software, AI & Machine LearningIndustries

201-500Employees

Benefits

Medical, Dental, and Vision insurance

401K retirement savings

Flexible time off

FSA and Commuter benefits

Parental and family leave

Office & phone plan reimbursement

Risks

ShadowRay vulnerability in Ray framework poses significant security risk with no patch.

OctoML's OctoAI service increases competition in AI infrastructure market.

Dependency on Nvidia's technology could be risky if Nvidia faces issues.

Differentiation

Anyscale's Ray framework scales AI applications from laptops to cloud seamlessly.

Ray is widely used in Generative AI, LLMs, and computer vision fields.

Anyscale's SaaS model provides recurring revenue through subscription fees for Ray platform.

Upsides

Anyscale's $100M Series C funding indicates strong investor confidence and growth potential.

Partnership with Nvidia enhances performance and cost-efficiency for AI deployments.

Anyscale Endpoints offers 10X cost-efficiency for popular open-source LLMs.

Land your dream remote job 3x faster with AI

Try Jobo Free