Senior Software Engineer-Distributed Inference
NVIDIA- Full Time
- Senior (5 to 8 years)
Candidates should have at least 5 years of work experience in distributed systems and machine learning. Experience with multi-modal ML pipelines and high-performance computing is required, along with strong skills in Python and software development, particularly with Pytorch. Familiarity with high-performance C++ or CUDA is a plus, as is a passion for understanding system implementations to enhance performance and maintainability.
The Senior Distributed Systems Engineer will collaborate with researchers to scale systems for next-generation models trained on multi-thousand GPU clusters. They will profile and optimize the model training codebase for hardware efficiency, design and implement robust training methods in the presence of hardware failures, and build tooling to analyze issues in large training jobs.
Develops multimodal AI technologies for creativity
Luma AI develops multimodal artificial intelligence technologies that enhance human creativity and capabilities. Their main product, the Dream Machine, allows users to interact with various types of data, enabling creative professionals, businesses, and developers to explore innovative applications of AI. Unlike many competitors, Luma AI focuses on integrating multiple modes of interaction, which broadens the possibilities for users. The company operates on a subscription model, providing access to its AI tools and services, and aims to lead the way in AI-driven creativity and productivity.