AI Performance Optimization Engineer at Lightning AI

San Francisco, California, United States

Apply Now

Not SpecifiedCompensation

Junior (1 to 2 years)Experience Level

Full TimeJob Type

UnknownVisa

Artificial Intelligence, AI & Machine LearningIndustries

Requirements

Candidates should possess strong experience with deep learning frameworks such as PyTorch, JAX, or TensorFlow, and expertise in compiler development or optimizations related to distributed training and inference workflows. A proven track record of contributing to open-source projects, particularly in machine learning or high-performance computing, is highly valued, along with experience collaborating with external partners.

Responsibilities

The AI Performance Optimization Engineer will develop the Thunder compiler, an open-source project in collaboration with a strategic partner, focusing on performance-oriented model optimizations for distributed training and inference. They will also develop optimized kernels in CUDA or Triton, integrate Thunder throughout the PyTorch Lightning ecosystem, engage with the community to champion its growth, and support the adoption of Thunder across the industry, while working closely with the Lightning team as a strategic partner.

Skills

PyTorch

JAX

TensorFlow

Compiler Development

Distributed Training

Inference Optimization

CUDA

Triton

Open-Source Projects

Machine Learning

High-Performance Computing

Lightning AI

AI development platform for coding and deployment

About Lightning AI

Lightning AI provides a platform for developing artificial intelligence applications, supporting users throughout the entire AI lifecycle from initial ideas to final deployment. The platform is accessible via web browsers, allowing developers and data scientists to easily code, prototype, and train AI models using GPUs without needing extensive setup. It operates on a subscription model, offering a cloud-based AI Studio that functions like a virtual laptop with persistent storage and environments. This setup enables users to code on CPUs, debug on GPUs, and scale their projects across multiple nodes. Key features of the platform include tools like PyTorch Lightning, Fabric, Lit-GPT, and torchmetrics, which help in optimizing and scaling AI models. Lightning AI aims to provide a user-friendly and comprehensive solution for both enterprises and individual developers looking to enhance their AI development capabilities.

New York City, New YorkHeadquarters

2015Year Founded

$105.6MTotal Funding

LATE_VCCompany Stage

AI & Machine LearningIndustries

51-200Employees