AI Performance Optimization Engineer at Lightning AI

San Francisco, California, United States

Lightning AI Logo
Not SpecifiedCompensation
Junior (1 to 2 years)Experience Level
Full TimeJob Type
UnknownVisa
Artificial Intelligence, AI & Machine LearningIndustries

Requirements

Candidates should possess strong experience with deep learning frameworks such as PyTorch, JAX, or TensorFlow, and expertise in compiler development or optimizations related to distributed training and inference workflows. A proven track record of contributing to open-source projects, particularly in machine learning or high-performance computing, is highly valued, along with experience collaborating with external partners.

Responsibilities

The AI Performance Optimization Engineer will develop the Thunder compiler, an open-source project in collaboration with a strategic partner, focusing on performance-oriented model optimizations for distributed training and inference. They will also develop optimized kernels in CUDA or Triton, integrate Thunder throughout the PyTorch Lightning ecosystem, engage with the community to champion its growth, and support the adoption of Thunder across the industry, while working closely with the Lightning team as a strategic partner.

Skills

PyTorch
JAX
TensorFlow
Compiler Development
Distributed Training
Inference Optimization
CUDA
Triton
Open-Source Projects
Machine Learning
High-Performance Computing

Lightning AI

AI development platform for coding and deployment

About Lightning AI

Lightning AI provides a platform for developing artificial intelligence applications, supporting users throughout the entire AI lifecycle from initial ideas to final deployment. The platform is accessible via web browsers, allowing developers and data scientists to easily code, prototype, and train AI models using GPUs without needing extensive setup. It operates on a subscription model, offering a cloud-based AI Studio that functions like a virtual laptop with persistent storage and environments. This setup enables users to code on CPUs, debug on GPUs, and scale their projects across multiple nodes. Key features of the platform include tools like PyTorch Lightning, Fabric, Lit-GPT, and torchmetrics, which help in optimizing and scaling AI models. Lightning AI aims to provide a user-friendly and comprehensive solution for both enterprises and individual developers looking to enhance their AI development capabilities.

New York City, New YorkHeadquarters
2015Year Founded
$105.6MTotal Funding
LATE_VCCompany Stage
AI & Machine LearningIndustries
51-200Employees

Benefits

Health Insurance
Dental Insurance
Vision Insurance
Life Insurance
Flexible Paid Time Off
Paid Family Leave
Phone/Internet Stipend
Home Office Stipend
Professional Development Budget
Gym Membership
Mental Health Support
Stock Options

Risks

Open-source models like Dolly and Alpaca challenge Lightning AI's proprietary offerings.
Rapid AI development may outpace Lightning AI's innovation capabilities.
Dependence on AWS could affect cost structure if service terms change.

Differentiation

Lightning AI offers a comprehensive AI lifecycle platform from ideation to deployment.
The platform's integration with AWS Marketplace simplifies enterprise procurement processes.
PyTorch Lightning's popularity supports Lightning AI's open-source framework approach.

Upsides

Recent $50M funding round indicates strong market confidence in Lightning AI.
Thunder compiler speeds up AI model training, reducing costs significantly.
Collaboration with AWS enhances AI model performance using cutting-edge hardware.

Land your dream remote job 3x faster with AI