Distributed LLM Inference Engineer
AnyscaleCandidates should have familiarity with running ML inference at large scale with high throughput and low latency, and a solid understanding of distributed systems and ML inference challenges. Familiarity with deep learning and deep learning frameworks like PyTorch is also required. Bonus points include ML Systems knowledge, experience using Ray, working closely with the community on LLM engines like vLLM or TensorRT-LLM, and contributions to deep learning frameworks or compilers.
$170,100 - $247,000/year
Full Time
Junior (1 to 2 years)