Staff Software Engineer - SRE, Backend (Reliability Engineering)
Affirm- Full Time
- Senior (5 to 8 years)
Candidates must have a minimum of 5 years of experience as a reliability engineer, production engineer, infrastructure software engineer, or in a similar role within a fast-paced, rapidly scaling company. Strong proficiency in GPU cloud infrastructure and familiarity with scheduling, scaling, cloud storage, networking, and security is required. Proficiency in programming/scripting languages and experience with containerization technologies and orchestration platforms like Kubernetes is essential. Knowledge of Infrastructure as Code tools such as Terraform or CloudFormation is necessary, along with excellent problem-solving skills and strong communication abilities. Familiarity with observability tools like DataDog, Prometheus, Grafana, and security best practices in cloud environments is also required. Experience as an SRE in the AI/ML space is preferred.
The Senior Software Engineer - Reliability will collaborate with researchers and engineers to define the availability, performance, correctness, and efficiency requirements of GPU infrastructure. They will manage GPU cloud providers to scale and monitor thousands of GPUs across multiple clusters. The role includes designing and implementing solutions for infrastructure scalability, managing monitoring systems to identify production issues, and implementing fault-tolerant design patterns. The engineer will build automation tools to enhance system reliability, participate in an on-call rotation for critical incidents, and develop service level objectives and indicators to ensure system reliability.
Develops multimodal AI technologies for creativity
Luma AI develops multimodal artificial intelligence technologies that enhance human creativity and capabilities. Their main product, the Dream Machine, allows users to interact with various types of data, enabling creative professionals, businesses, and developers to explore innovative applications of AI. Unlike many competitors, Luma AI focuses on integrating multiple modes of interaction, which broadens the possibilities for users. The company operates on a subscription model, providing access to its AI tools and services, and aims to lead the way in AI-driven creativity and productivity.