Software Engineer, Infrastructure at Anyscale

Bengaluru, Karnataka, India

Anyscale Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
AI, Machine Learning, Technology, Cloud ComputingIndustries

Requirements

  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
  • 3+ years of experience writing high-quality production code
  • Hands-on experience in building and maintaining highly available, scalable, and performant distributed systems
  • Expertise in cloud-native technologies (AWS, Azure, GCP) and Kubernetes-based deployments
  • Deep understanding of networking, security, and authentication mechanisms in cloud environments
  • Familiarity with observability stacks (Prometheus, Grafana etc.)
  • Proficiency in Go and Python
  • Knowledge of low-level operating system foundations (Linux kernel, file systems, containers)

Responsibilities

  • Design, build, and scale services that orchestrate Ray clusters across cloud and on-prem environments, supporting both VM-based and Kubernetes-based deployments
  • Optimize control plane components for large-scale, distributed AI/ML workloads
  • Build intelligent scheduling and resource management systems for heterogeneous compute clusters
  • Develop features to enhance the reliability, performance, scalability, and observability of Anyscale-managed Ray workloads
  • Support and optimize accelerator integration (e.g., GPUs, TPUs)
  • Handle container image management and dependency resolution for distributed workloads
  • Participate in code reviews, design and architecture discussions
  • Provide on-call support, working closely with customer and field teams to troubleshoot infrastructure issues
  • Collaborate with leading distributed systems and machine learning experts to push the boundaries of AI infrastructure

Skills

Kubernetes
Ray
Container Orchestration
Cloud-Native Infrastructure
Distributed Systems
Control Plane
Data Plane
Scalable Services

Anyscale

Platform for scaling AI workloads

About Anyscale

Anyscale provides a platform designed to scale and productionize artificial intelligence (AI) and machine learning (ML) workloads. Its main product, Ray, is an open-source framework that helps developers manage and scale AI applications across various fields, including Generative AI, Large Language Models (LLMs), and computer vision. Ray allows companies to enhance the performance, fault tolerance, and scalability of their AI systems, with some users reporting over 90% improvements in efficiency, latency, and cost-effectiveness. Anyscale primarily serves clients in the AI and ML sectors, including major companies like OpenAI and Ant Group, who rely on Ray for training large models. The company operates on a software-as-a-service (SaaS) model, charging clients a subscription fee for access to the Ray platform. Anyscale's goal is to empower organizations to effectively scale their AI workloads and optimize their operations.

San Francisco, CaliforniaHeadquarters
2019Year Founded
$252.5MTotal Funding
SERIES_CCompany Stage
Enterprise Software, AI & Machine LearningIndustries
201-500Employees

Benefits

Medical, Dental, and Vision insurance
401K retirement savings
Flexible time off
FSA and Commuter benefits
Parental and family leave
Office & phone plan reimbursement

Risks

ShadowRay vulnerability in Ray framework poses significant security risk with no patch.
OctoML's OctoAI service increases competition in AI infrastructure market.
Dependency on Nvidia's technology could be risky if Nvidia faces issues.

Differentiation

Anyscale's Ray framework scales AI applications from laptops to cloud seamlessly.
Ray is widely used in Generative AI, LLMs, and computer vision fields.
Anyscale's SaaS model provides recurring revenue through subscription fees for Ray platform.

Upsides

Anyscale's $100M Series C funding indicates strong investor confidence and growth potential.
Partnership with Nvidia enhances performance and cost-efficiency for AI deployments.
Anyscale Endpoints offers 10X cost-efficiency for popular open-source LLMs.

Land your dream remote job 3x faster with AI