Software Engineer - Model Performance
Baseten- Full Time
- Mid-level (3 to 4 years), Senior (5 to 8 years)
Luma AI
Candidates must have significant experience optimizing for memory, latency, and throughput in PyTorch, along with experience benchmarking and profiling GPU and CPU code for optimal device utilization. Experience with non-NVIDIA systems, using torch.compile or torch.XLA, and working with transformer models and attention implementations is required. Familiarity with high-performance Triton/CUDA and writing custom PyTorch kernels is preferred, as well as experience with parallel inference, particularly with tensor parallelism and pipeline parallelism. Additionally, experience in writing high-performance parallel C++ within an ML context and building inference or demo prototype code is a bonus.
The Senior Machine Learning Engineer will ensure efficient implementation of models and systems by designing, maintaining, and writing abstractions that scale beyond NVIDIA/CUDA hardware. They will identify and remedy efficiency bottlenecks by profiling and implementing high-performance PyTorch code. Responsibilities also include benchmarking products across various hardware and software to inform optimal tradeoffs, collaborating with partners to identify bottlenecks, and working closely with the research team to ensure systems are efficient from start to finish while addressing potential hardware integration issues.
Develops multimodal AI technologies for creativity
Luma AI develops multimodal artificial intelligence technologies that enhance human creativity and capabilities. Their main product, the Dream Machine, allows users to interact with various types of data, enabling creative professionals, businesses, and developers to explore innovative applications of AI. Unlike many competitors, Luma AI focuses on integrating multiple modes of interaction, which broadens the possibilities for users. The company operates on a subscription model, providing access to its AI tools and services, and aims to lead the way in AI-driven creativity and productivity.