Captions

Engineering Manager, Machine Learning

New York, New York, United States

$200,000 – $300,000Compensation
Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Video AI, Artificial Intelligence, BiotechnologyIndustries

About Captions

Captions is the leading video AI company, building the future of video creation. Over 10 million creators and businesses have used Captions to create videos for social media, marketing, sales, and more. We're on a mission to serve the next billion. We are a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC. You'll join an early team and have an outsized impact on the product and the company's culture.

We’re very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures (Series C lead), Kleiner Perkins (Series B lead), Sequoia Capital (Series A and Seed co-lead), Andreessen Horowitz (Series A and Seed co-lead), Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.

Check out our latest financing milestone and some other coverage:

  • The Information: 50 Most Promising Startups
  • Fast Company: Next Big Things in Tech
  • The New York Times: When A.I. Bridged a Language Gap, They Fell in Love
  • Business Insider: 34 most promising AI startups
  • Time: The Best Inventions of 2024

Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square). We do not work with third-party recruiting agencies, please do not contact us.

About the Role

Captions is seeking an ML Engineering Lead to lead a small, high-impact team of ML engineers that bring large-scale multimodal video diffusion models into production. ML engineering is responsible for optimizing and deploying state-of-the-art generative models (tens to hundreds of billions of parameters) to deliver low-latency, high-throughput inference at scale. This is a unique opportunity to work on cutting-edge AI—spanning audio-video generation, diffusion architectures, and temporal modeling—and ensure these innovations reach millions of creators worldwide.

Responsibilities

Technical Leadership

  • Drive the technical vision for deploying large-scale multimodal diffusion models (tens to hundreds of billions of parameters) in production.
  • Oversee and contribute to core ML pipelines—from GPU-based inference to model optimization.
  • Collaborate with researchers to adapt state-of-the-art generative models for real-world performance and reliability.

Inference & Deployment

  • Develop high-performance GPU-based inference pipelines for large multimodal diffusion models.
  • Build, optimize, and maintain serving infrastructure to deliver low-latency predictions at large scale.
  • Collaborate with software engineering teams to containerize models, manage autoscaling, and ensure uptime SLAs.

Model Optimization & Fine-Tuning

  • Leverage techniques like quantization, pruning, and distillation to reduce latency and memory footprint without compromising quality.
  • Implement continuous fine-tuning workflows to adapt models based on real-world data and feedback.

Production MLOps, Performance, Scaling

  • Design and maintain automated CI/CD pipelines for model deployment, versioning, and rollback.
  • Implement robust monitoring (latency, throughput, concept drift) and alerting for critical production systems.
  • Explore cutting-edge GPU acceleration frameworks (e.g., TensorRT, Triton, TorchServe) to continuously improve throughput and reduce costs.

Requirements

Technical Expertise

  • Proven experience deploying deep learning models on GPU-based infrastructure (NVIDIA GPUs, CUDA, TensorRT, etc.).
  • Strong knowledge of containerization (Docker, Kubernetes) and microservice architectures for ML model serving.
  • Proficiency with Python and at least one deep learning framework (PyTorch, TensorFlow).

Model Optimization

  • Familiarity with compression techniques (quantization, pruning, distillation) for large-scale models.
  • Experience profiling and optimizing model inference (batching, concurrency, hardware utilization).

Infrastructure

  • Hands-on experie

Skills

Machine Learning
ML Engineering
Multimodal Video Diffusion Models
Generative Models
Low-latency Inference
High-throughput Inference
Audio-video Generation
Diffusion Architectures
Temporal Modeling
AI

Captions

Video captioning and translation services

About Captions

Captions.ai enhances video content by providing captioning and translation services tailored for content creators, social media influencers, marketing agencies, and businesses. Their main offerings include automatic subtitle generation, translation into 28 languages, and video compression to improve performance. These tools simplify the video production process, allowing users to produce professional-quality videos with ease. Unlike many competitors, Captions.ai uses a freemium model, offering basic services for free while charging for advanced features, which helps attract a large user base and convert free users into paying customers. The company's goal is to make high-quality video content accessible to a wider audience, and recent funding will support their growth and product development.

New York City, New YorkHeadquarters
2021Year Founded
$82.7MTotal Funding
SERIES_CCompany Stage
Consumer Software, EntertainmentIndustries
51-200Employees

Benefits

Health Insurance
Dental Insurance
Vision Insurance
401(k) Retirement Plan
401(k) Company Match
Commuter Benefits
Wellness Program
Unlimited Paid Time Off
Flexible Work Hours

Risks

Increased competition from startups like Beeble AI could challenge Captions' market position.
Integration challenges from AlpacaML acquisition may delay product enhancements.
Rapid expansion may stretch resources, potentially affecting service quality.

Differentiation

Captions offers AI-powered video editing with automatic subtitle generation and language dubbing.
The platform supports video compression for optimized performance and accessibility.
Captions uses a freemium model to attract a wide user base and convert to paid plans.

Upsides

Captions secured $60 million in Series C funding, indicating strong investor confidence.
The acquisition of AlpacaML enhances Captions' creative tools with AI rendering capabilities.
Expansion to web and desktop platforms increases accessibility and user engagement.

Land your dream remote job 3x faster with AI