Member of Technical Staff, Model Serving Infrastructure

San Francisco, California, United States

Apply with AI Apply

Not SpecifiedCompensation

Senior (5 to 8 years)Experience Level

Full TimeJob Type

UnknownVisa

AI & Machine Learning, Enterprise SoftwareIndustries

Position Overview

Location Type: Remote
Employment Type: Full-Time
Salary: Not provided

Cohere is dedicated to scaling intelligence to serve humanity by training and deploying frontier models for developers and enterprises. They foster a culture of hard work, rapid iteration, and customer focus. The company is seeking Members of Technical Staff to join the Model Serving team. This team is responsible for the development, deployment, and operation of the AI platform that delivers Cohere’s large language models via API endpoints. The role involves collaborating with various teams to deploy optimized NLP models into production, ensuring low latency, high throughput, and high availability. Additionally, the position includes direct customer interaction and the creation of customized deployments.

Requirements

Experience: Minimum of 5 years of engineering experience managing production infrastructure at a large scale.
System Design: Proven experience in designing large, highly available distributed systems, with a focus on Kubernetes and GPU workloads.
Cloud Platforms: Hands-on experience with Kubernetes development and production support across platforms such as GCP, Azure, AWS, OCI, multi-cloud environments, and on-prem/hybrid setups.
Linux Expertise: Demonstrated experience in designing, deploying, supporting, and troubleshooting complex Linux-based computing environments.
Resource Management: Experience in managing compute, storage, and network resources, including cost optimization.
Collaboration & Troubleshooting: Excellent skills in collaboration and troubleshooting, essential for building mission-critical systems and ensuring smooth operations.
Adaptability: Demonstrated grit and adaptability to effectively solve complex technical challenges.
Accelerator Knowledge: Familiarity with the computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators) and their impact on inference latency and throughput.
Distributed Systems Understanding: A strong understanding or practical experience with distributed systems.
Programming Languages: Proficiency in Golang, C++, or other languages suitable for high-performance, scalable server development.

Responsibilities

Develop, deploy, and maintain the AI platform that serves Cohere’s large language models.
Collaborate with cross-functional teams to deploy optimized NLP models into production environments.
Ensure the low latency, high throughput, and high availability of deployed AI models.
Engage directly with customers and develop customized deployment solutions to meet their specific requirements.
Manage and optimize compute, storage, and network resources and associated costs.
Troubleshoot and resolve complex technical issues to maintain system stability and performance.
Contribute to the enhancement of Cohere’s models and the value they deliver to customers.

Application Instructions

Not provided in the description.

Skills

Kubernetes

GPU workloads

GCP

Azure

AWS

OCI

Linux

Resource Management

Accelerator Knowledge

Distributed Systems

Cohere

Provides NLP tools and LLMs via API

About Cohere

Cohere provides advanced Natural Language Processing (NLP) tools and Large Language Models (LLMs) through a user-friendly API. Their services cater to a wide range of clients, including businesses that want to improve their content generation, summarization, and search functions. Cohere's business model focuses on offering scalable and affordable generative AI tools, generating revenue by granting API access to pre-trained models that can handle tasks like text classification, sentiment analysis, and semantic search in multiple languages. The platform is customizable, enabling businesses to create smarter and faster solutions. With multilingual support, Cohere effectively addresses language barriers, making it suitable for international use.

Toronto, CanadaHeadquarters

2019Year Founded

$914.4MTotal Funding

SERIES_DCompany Stage

AI & Machine LearningIndustries

501-1,000Employees

Risks

Competitors like Google and Microsoft may overshadow Cohere with seamless enterprise system integration.

Reliance on Nvidia chips poses risks if supply chain issues arise or strategic focus shifts.

High cost of AI data center could strain financial resources if government funding is delayed.

Differentiation

Cohere's North platform outperforms Microsoft Copilot and Google Vertex AI in enterprise functions.

Rerank 3.5 model processes queries in over 100 languages, enhancing multilingual search capabilities.

Command R7B model excels in RAG, math, and coding, outperforming competitors like Google's Gemma.

Upsides

Cohere's AI data center project positions it as a key player in Canadian AI.

North platform offers secure AI deployment for regulated industries, enhancing privacy-focused enterprise solutions.

Cohere's multilingual support breaks language barriers, expanding its global market reach.

Land your dream remote job 3x faster with AI

Try Jobo Free