Software Engineer - Model API's at Baseten

San Francisco, California, United States

Apply Now

$150,000 – $230,000Compensation

Senior (5 to 8 years)Experience Level

Full TimeJob Type

UnknownVisa

AI, Machine LearningIndustries

Requirements

3+ years experience building and operating distributed systems or large-scale APIs
Proven track record of owning low-latency, reliable backend services (rate-limiting, auth, quotas, metering, migrations)
Infra instincts with performance sensibilities: profiling, tracing, capacity planning, and SLO management
Comfortable debugging complex systems, from runtime internals to GPU execution traces
Strong written communication; able to produce clear design docs and collaborate across functions

Responsibilities

Design, build, and operate the Model APIs surface with focus on advanced inference capabilities: structured outputs (JSON mode, grammar-constrained generation), tool/function calling and multi-modal serving
Profile and optimize TensorRT-LLM kernels, analyze CUDA kernel performance, implement custom CUDA operators, tune memory allocation patterns for maximum throughput and optimize communication patterns across multi-GPU setups
Productionize performance improvements across runtimes with deep understanding of their internals: speculative decoding implementations, guided generation for structured outputs, custom scheduling and routing algorithms for high-performance serving
Build comprehensive benchmarking frameworks that measure real-world performance across different model architectures, batch sizes, sequence lengths, and hardware configurations
Productionize performance improvements across runtimes (e.g. TensorRT, TensorRT-LLM): speculative decoding, quantization, batching, and KV-cache reuse
Instrument deep observability (metrics, traces, logs) and build repeatable benchmarks to measure speed, reliability, and quality
Implement platform fundamentals: API versioning, validation, usage metering, quotas, and authentication
Collaborate closely with other teams to deliver robust, developer-friendly model serving experiences

Skills

Key technologies and capabilities for this role

TensorRT-LLMCUDAdistributed systemsmodel servingmulti-GPUspeculative decodingstructured outputstool callingmulti-modal servingbenchmarkingcustom CUDA operatorsmemory allocationcustom schedulingrouting algorithms

Questions & Answers

Common questions about this position

What is the salary range for the Software Engineer - Model APIs position?

The salary range is $150K - $230K.

Is this a remote or hybrid role?

This is a hybrid position.

What skills and experience are required for this role?

Candidates need 3+ years experience building and operating distributed systems or large-scale APIs, a proven track record of owning low-latency reliable backend services including rate-limiting auth quotas and metering, infra instincts with performance sensibilities like profiling tracing and SLO management, and comfort debugging complex systems.

What is the team structure like at Baseten for this role?

You'll join a small, high-impact team on the Model Performance team operating at the intersection of product, model performance, and infra.

What makes a strong candidate for this Software Engineer position?

Strong candidates have 3+ years in distributed systems or large-scale APIs, experience owning low-latency backend services with rate-limiting and auth, performance optimization instincts including profiling and SLOs, and the ability to debug complex systems while producing clear design docs.

Baseten

Platform for deploying and managing ML models

About Baseten

Baseten provides a platform for deploying and managing machine learning (ML) models, aimed at simplifying the process for businesses. Users can select from a library of open-source foundation models and deploy them with just two clicks, making it easier to implement ML solutions. The platform features autoscaling, which adjusts resources based on demand, and comprehensive monitoring tools for tracking performance and troubleshooting. A key differentiator is Baseten's open-source model packaging framework, Truss, which allows users to package and deploy custom models easily. The company operates on a usage-based pricing model, where clients pay only for the time their models are actively deployed, helping them manage costs effectively.

San Francisco, CaliforniaHeadquarters

2019Year Founded

$58.4MTotal Funding

SERIES_BCompany Stage

AI & Machine LearningIndustries

51-200Employees

Benefits

💰 Competitive compensation: We aim to provide 90th percentile (or better) salaries and equity grants for every team member commensurate with their experience.

🌎 Remote-first work environment: The Baseten team is welcome to work from wherever they want; fully remote, in our San Francisco office, or a mix of both. We provide a $1,000 stipend for you to make your home office comfortable and productive.

🏓 Regular in-person team summits: We get together as a team three times a year to plan, workshop, and most importantly, get to know each other better.

🌴 Unlimited PTO: We ask that everyone take at least 4 weeks of vacation. And we have a company-wide break between Christmas and New Year's Day.

🏥 Full healthcare coverage: Medical, dental and vision insurance for you and your family.

🍼 Paid parental leave: 16-weeks fully paid parental leave (adoptive and non-birth parents included) and flexibility with schedules while returning to work.

📈 401(k): Company-sponsored 401(k) for you to contribute to.

🧠: Learning and development budget: We encourage you to take classes, attend conferences, and invest in your craft and we’ll cover expenses to make it happen.

Risks

Increased competition from specialized AI models tailored for specific industries.

Potential over-reliance on Google Cloud Marketplace may limit flexibility and control.

Rapid AI model development could render Baseten's offerings obsolete without continuous innovation.

Differentiation

Baseten offers a serverless backend for machine-learning applications with auto-scaling.

Truss, an open-source model packaging framework, allows seamless deployment of custom models.

Baseten's platform provides comprehensive monitoring tools for efficient model performance tracking.

Upsides

Integration with Google Cloud Marketplace boosts visibility and customer acquisition potential.

$40M Series B funding enhances Baseten's platform capabilities and market reach.

Chains framework positions Baseten for complex AI workflows, attracting sophisticated projects.

Land your dream remote job 3x faster with AI

Try Jobo Free