Luma AI

Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization

Remote

$220,000 – $300,000Compensation
Senior (5 to 8 years), Mid-level (3 to 4 years)Experience Level
Full TimeJob Type
UnknownVisa
AI & Machine Learning, Hardware, Enterprise SoftwareIndustries

Position Overview

  • Location Type: Remote
  • Job Type: FullTime
  • Salary: $220K - $300K

Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable, and useful systems, the next step-change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change. We are looking for engineers with significant experience maintaining & designing highly efficient systems and code that can be optimized to run on multiple hardware platforms, bringing our state-of-the-art models to as many people at the best performance per dollar.

Requirements

  • Significant experience maintaining & designing highly efficient systems and code.
  • Experience optimizing for memory, latency, and throughput in PyTorch.
  • Experience benchmarking and profiling GPU & CPU code in PyTorch for optimal device utilization (e.g., torch profiler, memory profilers, trace viewers, custom tooling).
  • Experience with parallel inference, particularly with tensor parallelism and pipeline parallelism.
  • Experience writing high-performance parallel C++.

Responsibilities

  • Ensure efficient implementation of models & systems with a focus on designing, maintaining, and writing abstractions that scale beyond NVIDIA/CUDA hardware.
  • Identify and remedy efficiency bottlenecks (memory, speed, utilization, communication) by profiling and implementing high-performance PyTorch code, deferring to Triton or similar kernel-level languages as necessary.
  • Benchmark our products across a variety of hardware & software to help the product team understand the optimal tradeoffs between latency, throughput, and cost at various degrees of parallelism.
  • Work together with our partners to help them identify bottlenecks and push forward new iterations of hardware and software.
  • Work closely together with the rest of the research team to ensure systems are planned to be as efficient as possible from start to finish and raise potential issues for hardware integration.
  • Write fused kernels for common hot paths, understand when to make use of lower-level features like tensor cores or warp intrinsics, and understand where these tools can be most impactful.

Bonus Experience

  • Experience with non-NVIDIA systems.
  • Experience using torch.compile / torch.XLA.
  • Experience building tools & abstractions to ensure models run optimally on different hardware and software stacks.
  • Experience working with transformer models and attention implementations.
  • Experience building inference / demo prototype code (incl. Gradio, Docker etc.).

Skills

PyTorch
Tensor Parallelism
Pipeline Parallelism
GPU Profiling
CPU Profiling
Memory Optimization
Latency Optimization
Throughput Optimization
Triton
CUDA

Luma AI

Develops multimodal AI technologies for creativity

About Luma AI

Luma AI develops multimodal artificial intelligence technologies that enhance human creativity and capabilities. Their main product, the Dream Machine, allows users to interact with various types of data, enabling creative professionals, businesses, and developers to explore innovative applications of AI. Unlike many competitors, Luma AI focuses on integrating multiple modes of interaction, which broadens the possibilities for users. The company operates on a subscription model, providing access to its AI tools and services, and aims to lead the way in AI-driven creativity and productivity.

San Francisco, CaliforniaHeadquarters
2021Year Founded
$84.9MTotal Funding
LATE_VCCompany Stage
Consumer Software, AI & Machine LearningIndustries
51-200Employees

Benefits

Company Equity
Stock Options

Risks

Competition from Google's Veo 2 and OpenAI's Sora Turbo challenges Luma AI's market position.
AWS's cost-reducing features may make Luma AI's services less competitive.
AI-generated content controversies could harm Luma AI's reputation if outputs are misleading.

Differentiation

Luma AI transforms text into 3D models, enhancing user interaction with digital content.
The Dream Machine integrates multimodal AI, offering a unique creative platform for users.
Luma AI's Photon models provide high-quality, personalized image generation capabilities.

Upsides

Luma AI's $90 million funding boosts its capacity for innovation and market expansion.
Dream Machine 1.5's advanced text-to-video features attract creative professionals and businesses.
Expanding Dream Machine into a mobile app increases accessibility and user engagement.

Land your dream remote job 3x faster with AI