Luma AI

Senior Research Engineer - Training Efficiency

Remote

$220,000 – $300,000Compensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
AI & Machine LearningIndustries

Position Overview

  • Location Type: Remote
  • Job Type: Full-Time
  • Salary: $220K - $300K

Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change. We are looking for engineers with significant experience solving hard problems in PyTorch, CUDA and distributed systems. You will work alongside the rest of the research team to build & train cutting edge foundation models on thousands of GPUs that are built to scale from the ground up.

Requirements

  • Significant experience with PyTorch
  • Strong experience with CUDA
  • Experience with distributed systems
  • Experience training large models using Python & PyTorch, including practical experience with the full development pipeline (data processing, preparation & dataloading, training, and inference).
  • Experience profiling GPU & CPU code in PyTorch for optimal device utilization (e.g., torch profiler, NVIDIA Nsight systems/compute, memory profilers, trace viewers, custom tooling).
  • Experience writing & improving highly parallel & distributed PyTorch code of large generative models (FSDP, Tensor Parallel, Sequence/Context Parallel, Pipeline Parallel, etc.).
  • Experience with transformer models and attention implementations.

Responsibilities

  • Ensure efficient implementation of models & systems with a focus on large-scale training.
  • Identify and implement optimization techniques for massively parallel and distributed systems, including the underlying communication layer.
  • Identify and remedy efficiency bottlenecks (memory, speed, utilization, communication) by profiling and implementing high-performance PyTorch code, deferring to Triton, CUDA, and lower levels as necessary.
  • Work closely together with the rest of the research team to ensure systems are planned to be as efficient as possible from start to finish.
  • Conduct research & experiments on state-of-the-art large-scale generative AI models with the goal to improve latency & throughput for training and inference.

Additional Experience (Good to Have)

  • Experience with high-performance Triton/CUDA and writing custom PyTorch kernels and ops.
  • Top candidates will be able to write fused kernels for common hot paths, understand when to make use of lower level features like tensor cores or warp intrinsics, and will understand where these tools can be most impactful.
  • Experience writing high-performance parallel C++, ideally within an ML context with PyTorch (e.g., for data loading, data processing, inference code).
  • Experience building inference / demo prototype code (incl. Gradio, Docker, etc.).

Skills

PyTorch
CUDA
Distributed Systems
Python
Transformer Models
Attention Implementations
GPU Profiling
CPU Profiling
FSDP
Tensor Parallel
Sequence Parallel
Pipeline Parallel

Luma AI

Develops multimodal AI technologies for creativity

About Luma AI

Luma AI develops multimodal artificial intelligence technologies that enhance human creativity and capabilities. Their main product, the Dream Machine, allows users to interact with various types of data, enabling creative professionals, businesses, and developers to explore innovative applications of AI. Unlike many competitors, Luma AI focuses on integrating multiple modes of interaction, which broadens the possibilities for users. The company operates on a subscription model, providing access to its AI tools and services, and aims to lead the way in AI-driven creativity and productivity.

San Francisco, CaliforniaHeadquarters
2021Year Founded
$84.9MTotal Funding
LATE_VCCompany Stage
Consumer Software, AI & Machine LearningIndustries
51-200Employees

Benefits

Company Equity
Stock Options

Risks

Competition from Google's Veo 2 and OpenAI's Sora Turbo challenges Luma AI's market position.
AWS's cost-reducing features may make Luma AI's services less competitive.
AI-generated content controversies could harm Luma AI's reputation if outputs are misleading.

Differentiation

Luma AI transforms text into 3D models, enhancing user interaction with digital content.
The Dream Machine integrates multimodal AI, offering a unique creative platform for users.
Luma AI's Photon models provide high-quality, personalized image generation capabilities.

Upsides

Luma AI's $90 million funding boosts its capacity for innovation and market expansion.
Dream Machine 1.5's advanced text-to-video features attract creative professionals and businesses.
Expanding Dream Machine into a mobile app increases accessibility and user engagement.

Land your dream remote job 3x faster with AI