Senior Tech Lead Manager, Model Efficiency
Cohere- Full Time
- Senior (5 to 8 years)
Candidates must hold a Bachelor's, Master's, or Ph.D. in Computer Science, Engineering, or a related field. They should have over 5 years of professional experience in software engineering, with at least 2 years in a technical leadership role. Proven experience in managing and mentoring engineering teams is required, along with expertise in programming languages such as Python, C++, or Go. An in-depth understanding of ML model performance optimization using libraries like PyTorch, TensorRT, and CUDA is essential. Strong knowledge of containerization (Docker) and orchestration systems (Kubernetes) is also necessary, as well as experience with production-level AI/ML solutions for scaling and deploying large models. Candidates must demonstrate the ability to balance hands-on technical work with team leadership and project management.
The Engineering Manager will lead, mentor, and manage a team of engineers focused on developing and optimizing ML model inference and performance. They will oversee technical strategy and architecture decisions, driving improvements across the engineering organization. Collaboration with cross-functional teams will be essential to ensure seamless integration and scalability of ML models in production environments. The manager will dive into the codebase of frameworks like TensorRT, PyTorch, and CUDA to identify and resolve complex performance bottlenecks. They will drive the development and deployment of large-scale optimization techniques for various ML models, particularly large language models (LLMs). Additionally, they will own the full lifecycle of projects from inception through delivery, including planning, execution, and resource management, while fostering a collaborative and inclusive team environment that encourages continuous learning and growth.
Platform for deploying and managing ML models
Baseten provides a platform for deploying and managing machine learning (ML) models, aimed at simplifying the process for businesses. Users can select from a library of open-source foundation models and deploy them with just two clicks, making it easier to implement ML solutions. The platform features autoscaling, which adjusts resources based on demand, and comprehensive monitoring tools for tracking performance and troubleshooting. A key differentiator is Baseten's open-source model packaging framework, Truss, which allows users to package and deploy custom models easily. The company operates on a usage-based pricing model, where clients pay only for the time their models are actively deployed, helping them manage costs effectively.