Member of Technical Staff, Large Generative Models at Captions

New York, New York, United States

Apply Now

$215,000 – $340,000Compensation

Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level

Full TimeJob Type

UnknownVisa

AI, Video Technology, Machine LearningIndustries

Requirements

Master's or PhD in Computer Science, Machine Learning, or related field
Track record of research contributions

Responsibilities

Design and implement novel architectures for large-scale video and multimodal diffusion models
Develop new approaches to multimodal fusion, temporal modeling, and video control
Research temporal video editing techniques and controllable generation
Research and validate scaling laws for video generation models
Create new loss functions and training objectives for improved generation quality
Drive rapid experimentation with model architectures and training strategies
Validate research directly through product deployment and user feedback
Train and optimize models at massive scale (10s-100s of billions of parameters)
Develop sophisticated distributed training approaches using FSDP, DeepSpeed, Megatron-LM
Design and implement model surgery techniques (pruning, distillation, quantization)
Create new approaches to memory optimization and training efficiency
Research techniques for improving training stability at scale
Conduct systematic empirical studies of architecture and optimization choices
Advance state-of-the-art in video model architecture design and optimization
Develop new approaches to temporal modeling for video generation
Create novel solutions for multimodal learning and cross-modal alignment
Research and implement new optimization techniques for generative modeling and sampling
Design and validate new evaluation metrics for generation quality
Systematically analyze and improve model behavior across different regimes

Skills

Key technologies and capabilities for this role

diffusion modelsgenerative modelingmultimodal modelsvideo generationlarge-scale trainingmodel scalingML researchAI research

Questions & Answers

Common questions about this position

What is the salary range for this position?

The salary range is $215K - $340K.

Is this role remote or does it require in-person work?

All roles require in-person work at the NYC HQ located in Union Square.

What key skills or experiences are required for this role?

The role requires expertise in designing novel architectures for large-scale video and multimodal diffusion models, developing training techniques, scaling models to billions of parameters, and experience in multimodal fusion, temporal modeling, video control, and generative modeling.

What is the company culture like?

The company has a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC, where early members have an outsized impact on products and company culture.

What makes a strong candidate for this role?

Strong candidates are exceptional Research Engineers with experience advancing large-scale multimodal video diffusion models, conducting novel research in generative architectures, and driving rapid experimentation with product impact.

Captions

Video captioning and translation services

About Captions

Captions.ai enhances video content by providing captioning and translation services tailored for content creators, social media influencers, marketing agencies, and businesses. Their main offerings include automatic subtitle generation, translation into 28 languages, and video compression to improve performance. These tools simplify the video production process, allowing users to produce professional-quality videos with ease. Unlike many competitors, Captions.ai uses a freemium model, offering basic services for free while charging for advanced features, which helps attract a large user base and convert free users into paying customers. The company's goal is to make high-quality video content accessible to a wider audience, and recent funding will support their growth and product development.

New York City, New YorkHeadquarters

2021Year Founded

$82.7MTotal Funding

SERIES_CCompany Stage

Consumer Software, EntertainmentIndustries

51-200Employees

Benefits

Health Insurance

Dental Insurance

Vision Insurance

401(k) Retirement Plan

401(k) Company Match

Commuter Benefits

Wellness Program

Unlimited Paid Time Off

Flexible Work Hours

Risks

Increased competition from startups like Beeble AI could challenge Captions' market position.

Integration challenges from AlpacaML acquisition may delay product enhancements.

Rapid expansion may stretch resources, potentially affecting service quality.

Differentiation

Captions offers AI-powered video editing with automatic subtitle generation and language dubbing.

The platform supports video compression for optimized performance and accessibility.

Captions uses a freemium model to attract a wide user base and convert to paid plans.

Upsides

Captions secured $60 million in Series C funding, indicating strong investor confidence.

The acquisition of AlpacaML enhances Captions' creative tools with AI rendering capabilities.

Expansion to web and desktop platforms increases accessibility and user engagement.

Land your dream remote job 3x faster with AI

Try Jobo Free