Software Engineer - Model Performance
Baseten- Full Time
- Mid-level (3 to 4 years), Senior (5 to 8 years)
Candidates should possess 3+ years of experience managing engineering teams with a demonstrable impact on system performance metrics and team growth, along with extensive experience in transformer architecture optimizations. They should also have a strong understanding of ML accelerator architectures (GPUs, TPUs, custom ASICs), memory hierarchies, and hardware-aware optimization techniques including tensor core utilization and parallel computation patterns.
As a Senior Tech Lead Manager, Model Efficiency, you will architect comprehensive technical roadmaps with quantifiable performance metrics aligned with product requirements, identify and address technical competency gaps through strategic hiring, demonstrate expert-level understanding of ML accelerator architectures and inference frameworks, collaborate with MLOps and infrastructure teams, implement agile methodologies, provide technical mentorship, and integrate optimizations into production systems.
Provides NLP tools and LLMs via API
Cohere provides advanced Natural Language Processing (NLP) tools and Large Language Models (LLMs) through a user-friendly API. Their services cater to a wide range of clients, including businesses that want to improve their content generation, summarization, and search functions. Cohere's business model focuses on offering scalable and affordable generative AI tools, generating revenue by granting API access to pre-trained models that can handle tasks like text classification, sentiment analysis, and semantic search in multiple languages. The platform is customizable, enabling businesses to create smarter and faster solutions. With multilingual support, Cohere effectively addresses language barriers, making it suitable for international use.