[Remote] Member of Technical Staff, Data Engineering at Cohere

Toronto, Ontario, Canada

Cohere Logo
Not SpecifiedCompensation
Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
AI, Machine Learning, TechnologyIndustries

Requirements

  • Strong software engineering skills, with proficiency in Python and experience building data pipelines
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools
  • Experience working with large-scale web datasets like CommonCrawl
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training
  • Bonus: paper at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP)

Responsibilities

  • Design and build scalable data pipelines to ingest, parse, filter, and optimize diverse web datasets
  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance
  • Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency
  • Research and implement innovative data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models

Skills

Key technologies and capabilities for this role

Data EngineeringData PipelinesData IngestionData CleaningData FilteringData OptimizationData ModelingWeb DataCode DataMultilingual DataNLPAI ModelsPretraining Data

Questions & Answers

Common questions about this position

Is this role remote-friendly?

Yes, the role is remote with no restrictions on location between EST and EU time zones, though the company has offices in London, Paris, Toronto, San Francisco, and New York.

What is the salary for this position?

This information is not specified in the job description.

What skills are required for this Data Engineering role?

Strong software engineering skills are required, along with experience in designing scalable data pipelines, data modeling, data curation, and conducting data ablations for model performance.

What is the company culture like at Cohere?

Cohere obsesses over what they build, works hard and moves fast for customers, values top talent who are the best in their craft, and believes in diverse perspectives for great products.

What makes a strong candidate for this role?

A strong candidate has strong software engineering skills and experience with data pipelines, quality assessment, modeling, and curation for AI language models, plus the ability to collaborate cross-functionally.

Cohere

Provides NLP tools and LLMs via API

About Cohere

Cohere provides advanced Natural Language Processing (NLP) tools and Large Language Models (LLMs) through a user-friendly API. Their services cater to a wide range of clients, including businesses that want to improve their content generation, summarization, and search functions. Cohere's business model focuses on offering scalable and affordable generative AI tools, generating revenue by granting API access to pre-trained models that can handle tasks like text classification, sentiment analysis, and semantic search in multiple languages. The platform is customizable, enabling businesses to create smarter and faster solutions. With multilingual support, Cohere effectively addresses language barriers, making it suitable for international use.

Toronto, CanadaHeadquarters
2019Year Founded
$914.4MTotal Funding
SERIES_DCompany Stage
AI & Machine LearningIndustries
501-1,000Employees

Risks

Competitors like Google and Microsoft may overshadow Cohere with seamless enterprise system integration.
Reliance on Nvidia chips poses risks if supply chain issues arise or strategic focus shifts.
High cost of AI data center could strain financial resources if government funding is delayed.

Differentiation

Cohere's North platform outperforms Microsoft Copilot and Google Vertex AI in enterprise functions.
Rerank 3.5 model processes queries in over 100 languages, enhancing multilingual search capabilities.
Command R7B model excels in RAG, math, and coding, outperforming competitors like Google's Gemma.

Upsides

Cohere's AI data center project positions it as a key player in Canadian AI.
North platform offers secure AI deployment for regulated industries, enhancing privacy-focused enterprise solutions.
Cohere's multilingual support breaks language barriers, expanding its global market reach.

Land your dream remote job 3x faster with AI