Senior Software Engineer-Distributed Inference
NVIDIA- Full Time
- Senior (5 to 8 years)
Candidates must have significant experience training large models using Python and PyTorch, including practical experience with the full development pipeline from data processing to training and inference. They should have experience profiling GPU and CPU code in PyTorch for optimal device utilization, writing and improving highly parallel and distributed PyTorch code for large generative models, and working with transformer models and attention implementations. Familiarity with high-performance Triton/CUDA and writing custom PyTorch kernels is preferred, along with experience in high-performance parallel C++ and building inference or demo prototype code.
The Senior Research Engineer will ensure efficient implementation of models and systems with a focus on large-scale training. They will identify and implement optimization techniques for massively parallel and distributed systems, remedy efficiency bottlenecks by profiling and implementing high-performance PyTorch code, and work closely with the research team to ensure systems are planned for efficiency. Additionally, they will conduct research and experiments on state-of-the-art large-scale generative AI models to improve latency and throughput for training and inference.
Develops multimodal AI technologies for creativity
Luma AI develops multimodal artificial intelligence technologies that enhance human creativity and capabilities. Their main product, the Dream Machine, allows users to interact with various types of data, enabling creative professionals, businesses, and developers to explore innovative applications of AI. Unlike many competitors, Luma AI focuses on integrating multiple modes of interaction, which broadens the possibilities for users. The company operates on a subscription model, providing access to its AI tools and services, and aims to lead the way in AI-driven creativity and productivity.