LLM Inference Engineer at Hippocratic AI

Palo Alto, California, United States

Hippocratic AI Logo
Not SpecifiedCompensation
Mid-level (3 to 4 years)Experience Level
Full TimeJob Type
UnknownVisa
HealthcareIndustries

Requirements

  • Extensive hands-on experience with state-of-the-art inference optimization techniques
  • A track record of deploying efficient, scalable LLM systems in production environments
  • 2+ years of experience optimizing LLM inference systems at scale
  • Proven expertise with distributed serving architectures for large language models
  • Hands-on experience implementing quantization techniques for transformer models
  • Strong understanding of modern inference optimization methods, including speculative decoding techniques with draft models and Eagle speculative decoding approaches
  • Proficiency in Python and C++
  • Experience with CUDA programming and GPU optimization (familiarity required, expert-level not necessary)
  • Contributions to open-source inference frameworks such as vLLM, SGLang, or TensorRT-LLM (preferred)
  • Experience with custom CUDA kernels (preferred)
  • Track record of deploying inference systems in production environments (preferred)
  • Deep understanding of performance optimization systems (preferred)

Responsibilities

  • Design and implement multi-node serving architectures for distributed LLM inference
  • Optimize multi-LoRA serving systems
  • Apply advanced quantization techniques (FP4/FP6) to reduce model footprint while preserving quality
  • Implement speculative decoding and other latency optimization strategies
  • Develop disaggregated serving solutions with optimized caching strategies for prefill and decoding phases
  • Continuously benchmark and improve system performance across various deployment scenarios and GPU types

Skills

Python
C++
CUDA
LLM Inference
Quantization
Speculative Decoding
Distributed Serving
Multi-LoRA
GPU Optimization
Transformer Models

Hippocratic AI

About Hippocratic AI

N/AHeadquarters
N/AYear Founded
N/ACompany Stage

Land your dream remote job 3x faster with AI