Software Engineer, Infrastructure at Anyscale

San Francisco, California, United States

Anyscale Logo
Not SpecifiedCompensation
Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, AI/ML, Cloud ComputingIndustries

Requirements

Candidates should possess a Bachelor’s degree in Computer Science or a related field, and have at least 3 years of experience in software engineering, with a focus on control plane and data plane development. Strong expertise in Kubernetes, container orchestration, and cloud-native infrastructure is required, along with experience in designing, implementing, and optimizing scalable services.

Responsibilities

The Software Engineer will design, build, and scale services that orchestrate Ray clusters across cloud and on-prem environments, supporting both VM-based and Kubernetes-based deployments. They will also optimize control plane components for large-scale, distributed AI/ML workloads, build intelligent scheduling and resource management systems, develop features to enhance infrastructure reliability and performance, support accelerator integration, handle container image management, participate in code reviews, and provide on-call support for troubleshooting infrastructure issues.

Skills

Kubernetes
Ray
container orchestration
distributed systems
GPUs
TPUs
resource management
scheduling
cloud-native
observability

Anyscale

Platform for scaling AI workloads

About Anyscale

Anyscale provides a platform designed to scale and productionize artificial intelligence (AI) and machine learning (ML) workloads. Its main product, Ray, is an open-source framework that helps developers manage and scale AI applications across various fields, including Generative AI, Large Language Models (LLMs), and computer vision. Ray allows companies to enhance the performance, fault tolerance, and scalability of their AI systems, with some users reporting over 90% improvements in efficiency, latency, and cost-effectiveness. Anyscale primarily serves clients in the AI and ML sectors, including major companies like OpenAI and Ant Group, who rely on Ray for training large models. The company operates on a software-as-a-service (SaaS) model, charging clients a subscription fee for access to the Ray platform. Anyscale's goal is to empower organizations to effectively scale their AI workloads and optimize their operations.

San Francisco, CaliforniaHeadquarters
2019Year Founded
$252.5MTotal Funding
SERIES_CCompany Stage
Enterprise Software, AI & Machine LearningIndustries
201-500Employees

Benefits

Medical, Dental, and Vision insurance
401K retirement savings
Flexible time off
FSA and Commuter benefits
Parental and family leave
Office & phone plan reimbursement

Risks

ShadowRay vulnerability in Ray framework poses significant security risk with no patch.
OctoML's OctoAI service increases competition in AI infrastructure market.
Dependency on Nvidia's technology could be risky if Nvidia faces issues.

Differentiation

Anyscale's Ray framework scales AI applications from laptops to cloud seamlessly.
Ray is widely used in Generative AI, LLMs, and computer vision fields.
Anyscale's SaaS model provides recurring revenue through subscription fees for Ray platform.

Upsides

Anyscale's $100M Series C funding indicates strong investor confidence and growth potential.
Partnership with Nvidia enhances performance and cost-efficiency for AI deployments.
Anyscale Endpoints offers 10X cost-efficiency for popular open-source LLMs.

Land your dream remote job 3x faster with AI