[Remote] Staff Research Engineer, Model Efficiency at Cohere

New York, New York, United States

Cohere Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
AI, Machine LearningIndustries

Requirements

  • PhD in Machine Learning or a related field
  • Understand LLM architecture, and how to optimize LLM inference given resource constraints
  • Significant experience with one or more techniques that enhance model efficiency
  • Strong software engineering skills
  • An appetite to work in a fast-paced high-ambiguity start-up environment
  • Publications at top-tier conferences and venues (ICLR, ACL, NeurIPS)
  • Passion to mentor others
  • Preferred location in EST or PST time zones

Responsibilities

  • Develop, prototype, and deploy techniques that materially improve how fast and efficiently models run in production
  • Explore and ship breakthroughs across the model execution stack, including model architecture and MoE routing optimization
  • Explore and ship breakthroughs across the model execution stack, including decoding and inference-time algorithm improvements
  • Explore and ship breakthroughs across the model execution stack, including software/hardware co-design for GPU acceleration
  • Explore and ship breakthroughs across the model execution stack, including performance optimization without compromising model quality

Skills

Key technologies and capabilities for this role

LLMModel EfficiencyInference OptimizationMoE RoutingGPU AccelerationPerformance OptimizationModel ArchitectureDecoding AlgorithmsSoftware Hardware Co-design

Questions & Answers

Common questions about this position

Is this position remote?

Yes, this is a remote position in a remote-friendly environment, with the Model Efficiency team concentrated in EST and PST time zones as preferred locations.

What qualifications are needed for this Staff Research Engineer role?

Candidates should have a PhD in Machine Learning or a related field, understand LLM architecture and optimization under resource constraints, significant experience with model efficiency techniques, strong software engineering skills, publications at top-tier conferences, and passion to mentor others.

What is the company culture like at Cohere?

Cohere has a fast-paced, high-ambiguity startup environment where the team obsesses over what they build, works hard and moves fast for customers, and values diverse perspectives from top experts in their fields.

What salary or compensation does this role offer?

This information is not specified in the job description.

What makes a strong candidate for this position?

A strong candidate has a PhD in ML, deep LLM optimization experience, top conference publications, strong engineering skills, and thrives in fast-paced environments; even if not a perfect match, applicants from diverse backgrounds are encouraged to apply.

Cohere

Provides NLP tools and LLMs via API

About Cohere

Cohere provides advanced Natural Language Processing (NLP) tools and Large Language Models (LLMs) through a user-friendly API. Their services cater to a wide range of clients, including businesses that want to improve their content generation, summarization, and search functions. Cohere's business model focuses on offering scalable and affordable generative AI tools, generating revenue by granting API access to pre-trained models that can handle tasks like text classification, sentiment analysis, and semantic search in multiple languages. The platform is customizable, enabling businesses to create smarter and faster solutions. With multilingual support, Cohere effectively addresses language barriers, making it suitable for international use.

Toronto, CanadaHeadquarters
2019Year Founded
$914.4MTotal Funding
SERIES_DCompany Stage
AI & Machine LearningIndustries
501-1,000Employees

Risks

Competitors like Google and Microsoft may overshadow Cohere with seamless enterprise system integration.
Reliance on Nvidia chips poses risks if supply chain issues arise or strategic focus shifts.
High cost of AI data center could strain financial resources if government funding is delayed.

Differentiation

Cohere's North platform outperforms Microsoft Copilot and Google Vertex AI in enterprise functions.
Rerank 3.5 model processes queries in over 100 languages, enhancing multilingual search capabilities.
Command R7B model excels in RAG, math, and coding, outperforming competitors like Google's Gemma.

Upsides

Cohere's AI data center project positions it as a key player in Canadian AI.
North platform offers secure AI deployment for regulated industries, enhancing privacy-focused enterprise solutions.
Cohere's multilingual support breaks language barriers, expanding its global market reach.

Land your dream remote job 3x faster with AI