Infrastructure Software Engineer
BasetenFull Time
Junior (1 to 2 years)
Candidates should possess a Bachelor’s degree in Computer Science or a related field, and have at least 3 years of experience in software engineering, with a focus on control plane and data plane development. Strong expertise in Kubernetes, container orchestration, and cloud-native infrastructure is required, along with experience in designing, implementing, and optimizing scalable services.
The Software Engineer will design, build, and scale services that orchestrate Ray clusters across cloud and on-prem environments, supporting both VM-based and Kubernetes-based deployments. They will also optimize control plane components for large-scale, distributed AI/ML workloads, build intelligent scheduling and resource management systems, develop features to enhance infrastructure reliability and performance, support accelerator integration, handle container image management, participate in code reviews, and provide on-call support for troubleshooting infrastructure issues.
Platform for scaling AI workloads
Anyscale provides a platform designed to scale and productionize artificial intelligence (AI) and machine learning (ML) workloads. Its main product, Ray, is an open-source framework that helps developers manage and scale AI applications across various fields, including Generative AI, Large Language Models (LLMs), and computer vision. Ray allows companies to enhance the performance, fault tolerance, and scalability of their AI systems, with some users reporting over 90% improvements in efficiency, latency, and cost-effectiveness. Anyscale primarily serves clients in the AI and ML sectors, including major companies like OpenAI and Ant Group, who rely on Ray for training large models. The company operates on a software-as-a-service (SaaS) model, charging clients a subscription fee for access to the Ray platform. Anyscale's goal is to empower organizations to effectively scale their AI workloads and optimize their operations.