Key technologies and capabilities for this role
Common questions about this position
Yes, this is a remote position.
This information is not specified in the job description.
Required skills include 2+ years optimizing LLM inference systems at scale, expertise with distributed serving architectures, hands-on experience with quantization techniques for transformer models, proficiency in Python and C++, and experience with CUDA programming and GPU optimization.
The team comprises ex-researchers from Microsoft, Meta, Nvidia, Apple, Stanford, John Hopkins, and HuggingFace, focused on building safety-focused LLMs for healthcare with visionary leadership and strong backing from top investors.
A strong candidate has extensive hands-on experience with state-of-the-art inference optimization techniques and a track record of deploying efficient, scalable LLM systems in production, plus preferred experience with open-source frameworks like vLLM or custom CUDA kernels.