Senior Solutions Architect, HPC Systems Engineer
NVIDIA- Full Time
- Senior (5 to 8 years)
Candidates must have at least 8 years of experience as an infrastructure engineer or DevOps in large and complex distributed systems. A deep understanding of networking is essential, with bonus points for experience in HPC networking. Experience in developing high-quality software in a general-purpose programming language, preferably Python, is required. Candidates should possess excellent problem-solving skills, attention to detail, and experience with GPUs in large-scale clusters. Strong knowledge of observability and monitoring in distributed systems is necessary, along with the ability to troubleshoot hardware and network topology failures in distributed systems. Independent drive and the ability to own problems and build solutions from end-to-end are expected, as well as experience with large-scale data center operations and proficiency in cloud orchestration and system tools.
The Senior HPC Engineer will design systems that maximize performance for running large-scale AI models and work at the intersection of hardware and software. They will manage training HPC clusters at Luma from provisioning to performance tuning, focusing on observability, distributed job tracing, GPU diagnostics, and software environment management. The role involves writing code to automate monitoring and healing of systems and directly accelerating machine learning researchers, while also identifying and solving critical problems in a self-directed manner.
Develops multimodal AI technologies for creativity
Luma AI develops multimodal artificial intelligence technologies that enhance human creativity and capabilities. Their main product, the Dream Machine, allows users to interact with various types of data, enabling creative professionals, businesses, and developers to explore innovative applications of AI. Unlike many competitors, Luma AI focuses on integrating multiple modes of interaction, which broadens the possibilities for users. The company operates on a subscription model, providing access to its AI tools and services, and aims to lead the way in AI-driven creativity and productivity.