Backend Engineer – Inference Optimization at Vercel

Seattle, Washington, United States

Vercel Logo
$150,000 – $250,000Compensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
AI, Machine Learning, TechnologyIndustries

Requirements

  • Must have:
  • Deep experience in optimizing model inference pipelines, model quantization and KV caching
  • Proficiency in backend systems and high-performance programming (Python, C++, or Rust)
  • Familiarity with distributed serving, GPU acceleration, and large-scale systems
  • Ability to debug complex performance issues across model, runtime, and hardware layers
  • Comfort working in fast-moving environments with ambitious technical goals
  • Nice to have:
  • Hands-on experience with vLLM or similar inference frameworks
  • Background in GPU kernel optimization (CUDA, Triton, ROCm)
  • Experience scaling inference across multi-node or heterogeneous clusters
  • Prior work in model compilation (e.g., TensorRT, TVM, ONNX Runtime)
  • Hands-on experience with model quantization

Responsibilities

  • Own the design and optimization of inference pipelines for large-scale models
  • Work closely with researchers and infrastructure engineers to identify bottlenecks
  • Implement advanced techniques like quantization and KV caching
  • Deploy high-performance serving systems in production

Skills

Python
C++
Rust
model quantization
KV caching
distributed serving
GPU acceleration
vLLM
CUDA
Triton
ROCm

Vercel

Platform for building and deploying web applications

About Vercel

Vercel provides a platform for developers and businesses to build, deploy, and manage modern web applications. Its services include tools that enhance image and video workflows using AI features like smart cropping and object detection. Vercel simplifies the complexities of serverless architecture, allowing for global content delivery without extra infrastructure. The company ensures high security and uptime with features such as automatic HTTPS and DDoS protection. Unlike competitors, Vercel focuses on a managed global rendering layer and offers a subscription-based model tailored to various client needs, from individual developers to large enterprises. The goal of Vercel is to empower developers to create efficient and secure web applications.

San Francisco, CaliforniaHeadquarters
2015Year Founded
$547.6MTotal Funding
SERIES_ECompany Stage
Consumer Software, Enterprise Software, AI & Machine LearningIndustries
501-1,000Employees

Benefits

Health Insurance
Stock Options
Company Equity
Professional Development Budget
Unlimited Paid Time Off
Remote Work Options
Home Office Stipend

Risks

Increased competition in the cloud application platform space threatens Vercel's market share.
Rapid AI evolution may outpace Vercel's current offerings, risking competitive edge loss.
Reliance on a subscription model could be risky during economic downturns.

Differentiation

Vercel offers a managed global rendering layer for modern web applications.
The company provides advanced AI-powered tools for image and video optimization.
Vercel's platform supports full lifecycle media management with auto-tagging and access control.

Upsides

Vercel secured $250 million in Series E funding for growth and platform development.
The introduction of V0 enhances Vercel's offerings in AI-driven web development.
Recognition as a Visionary in Gartner's Magic Quadrant boosts Vercel's market position.

Land your dream remote job 3x faster with AI