Cohere

Member of Technical Staff, Model Serving

San Francisco, California, United States

Not SpecifiedCompensation
Junior (1 to 2 years)Experience Level
Full TimeJob Type
UnknownVisa
Artificial Intelligence, AI & Machine Learning, Software DevelopmentIndustries

Position Overview

  • Location Type: Remote
  • Employment Type: Full-Time
  • Salary: (Salary not provided in the description)

Cohere’s mission is to scale intelligence to serve humanity. They are training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. They believe their work is instrumental to the widespread adoption of AI. They obsess over what they build, with each person responsible for contributing to increasing the capabilities of their models and the value they drive for their customers. Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft.

Why this Role?

  • The Model Serving team is responsible for developing, deploying, and operating the AI platform delivering Cohere’s large language models through easy-to-use API endpoints.
  • This role involves working closely with many teams to serve optimized LLM models to production in low latency, high throughput, and high availability environments.
  • Opportunity to interface with customers and create customized deployments to meet their specific needs.

Requirements

  • Experience with serving ML models in production
  • Experience designing, implementing, and maintaining a production service at scale
  • Strong intuition for system behavior and resource estimation under different workloads
  • Familiarity with inference characteristics of deep learning models, specifically, Transformer based architectures
  • Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or Inferentia), especially how they improve inference latency and throughput
  • Strong understanding or working experience with distributed systems
  • Experience in performance benchmarking, profiling, and optimization
  • Experience with cloud infrastructure (e.g., AWS, GCP)
  • Experience in Golang (or, other languages designed for high-performance scalable servers)

Responsibilities

  • Develop and deploy machine learning serving infrastructure.
  • Optimize LLM models for performance and scalability.
  • Maintain and operate production AI platforms.
  • Collaborate with various teams to integrate LLM models into customer applications.
  • Contribute to the design and implementation of new features and improvements.

Company Information

  • Mission: To scale intelligence to serve humanity.
  • Focus: Training and deploying frontier models for developers and enterprises.
  • Values: Obsession over building, working hard, moving fast, and prioritizing customer value.
  • Diversity & Inclusion: Cohere values and celebrates diversity and strives to create an inclusive work environment for all. They welcome applicants from all backgrounds and are committed to providing equal opportunities.

Skills

ML model serving
production service design
system behavior analysis
resource estimation
Transformer architectures
accelerator hardware (GPUs, TPUs, Inferentia)
distributed systems
performance benchmarking
profiling
optimization
cloud infrastructure (AWS, GCP)
Golang

Cohere

Provides NLP tools and LLMs via API

About Cohere

Cohere provides advanced Natural Language Processing (NLP) tools and Large Language Models (LLMs) through a user-friendly API. Their services cater to a wide range of clients, including businesses that want to improve their content generation, summarization, and search functions. Cohere's business model focuses on offering scalable and affordable generative AI tools, generating revenue by granting API access to pre-trained models that can handle tasks like text classification, sentiment analysis, and semantic search in multiple languages. The platform is customizable, enabling businesses to create smarter and faster solutions. With multilingual support, Cohere effectively addresses language barriers, making it suitable for international use.

Toronto, CanadaHeadquarters
2019Year Founded
$914.4MTotal Funding
SERIES_DCompany Stage
AI & Machine LearningIndustries
501-1,000Employees

Risks

Competitors like Google and Microsoft may overshadow Cohere with seamless enterprise system integration.
Reliance on Nvidia chips poses risks if supply chain issues arise or strategic focus shifts.
High cost of AI data center could strain financial resources if government funding is delayed.

Differentiation

Cohere's North platform outperforms Microsoft Copilot and Google Vertex AI in enterprise functions.
Rerank 3.5 model processes queries in over 100 languages, enhancing multilingual search capabilities.
Command R7B model excels in RAG, math, and coding, outperforming competitors like Google's Gemma.

Upsides

Cohere's AI data center project positions it as a key player in Canadian AI.
North platform offers secure AI deployment for regulated industries, enhancing privacy-focused enterprise solutions.
Cohere's multilingual support breaks language barriers, expanding its global market reach.

Land your dream remote job 3x faster with AI