HPC Support Engineer
LambdaFull Time
Senior (5 to 8 years)
Key technologies and capabilities for this role
Common questions about this position
The position is hybrid.
Yes, all infrastructure roles require participating in a 24x7 on-call rotation, and you are compensated for your on-call schedule.
Key skills include deploying and managing Kubernetes-based GPU/TPU superclusters, optimizing for AI/ML training with technologies like RDMA, NCCL, and high-speed interconnects, experience with JAX, PyTorch, distributed training, observability, automation, and infrastructure-as-code.
Cohere obsesses over what they build, works hard and moves fast to serve customers, values a team of the best in the world who are passionate about their craft, and believes diverse perspectives are required for great products.
Strong candidates have experience building and scaling ML-optimized HPC infrastructure, optimizing AI/ML training across clouds, troubleshooting complex issues, and driving innovation with AI researchers, suitable for staff engineer level with growth opportunities at multiple levels.
Provides NLP tools and LLMs via API
Cohere provides advanced Natural Language Processing (NLP) tools and Large Language Models (LLMs) through a user-friendly API. Their services cater to a wide range of clients, including businesses that want to improve their content generation, summarization, and search functions. Cohere's business model focuses on offering scalable and affordable generative AI tools, generating revenue by granting API access to pre-trained models that can handle tasks like text classification, sentiment analysis, and semantic search in multiple languages. The platform is customizable, enabling businesses to create smarter and faster solutions. With multilingual support, Cohere effectively addresses language barriers, making it suitable for international use.