Senior Software Engineer-Distributed Inference
NVIDIA- Full Time
- Senior (5 to 8 years)
Candidates must hold a Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or a related field. Experience with one or more general-purpose programming languages, such as Python or C++, is required. Familiarity with LLM optimization techniques, including quantization and speculative decoding, is essential. Strong familiarity with ML libraries, particularly PyTorch, TensorRT, or TensorRT-LLM, is necessary. A demonstrated interest and experience in large language models is also required, along with a deep understanding of GPU architecture.
The Software Engineer will implement, refine, and productionize advanced techniques for ML model inference and infrastructure. They will deep dive into underlying codebases of various libraries to debug ML performance issues. The role involves applying and scaling optimization techniques across a wide range of ML models, particularly large language models. Collaboration with a diverse team to design and implement innovative solutions is expected, along with owning projects from idea to production.
Platform for deploying and managing ML models
Baseten provides a platform for deploying and managing machine learning (ML) models, aimed at simplifying the process for businesses. Users can select from a library of open-source foundation models and deploy them with just two clicks, making it easier to implement ML solutions. The platform features autoscaling, which adjusts resources based on demand, and comprehensive monitoring tools for tracking performance and troubleshooting. A key differentiator is Baseten's open-source model packaging framework, Truss, which allows users to package and deploy custom models easily. The company operates on a usage-based pricing model, where clients pay only for the time their models are actively deployed, helping them manage costs effectively.