LLM Data Researcher
Turing- Full Time
- Junior (1 to 2 years)
Candidates should possess extremely strong software engineering skills, strong expertise in designing and conducting data collection tasks, including working with human annotators, strong statistical skills and experience evaluating scientific experiments related to data collection and model performance, experience analysing datasets with respect to their quality, biases, and suitability for training ML models, hands-on experience training large language models (LLMs) on distributed training infrastructures, and familiarity with evaluating and improving the generalisability and robustness of ML systems. Proficiency in programming languages such as Python and ML frameworks (e.g., PyTorch, TensorFlow, JAX) is required, along with experience presenting findings through excellent communication skills and a background evidenced by one or more papers at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).
As a Member of Technical Staff, you will design and oversee data collection tasks, including supporting human annotators and ensuring data quality, develop and apply statistical methods to evaluate the quality and reliability of datasets, analyse and assess the generalisability and robustness of ML systems across diverse use cases, collaborate with teams to improve dataset quality and model performance, train and fine-tune large language models (LLMs) on distributed training infrastructures, and conduct experiments to evaluate model performance and identify areas for improvement.
Provides NLP tools and LLMs via API
Cohere provides advanced Natural Language Processing (NLP) tools and Large Language Models (LLMs) through a user-friendly API. Their services cater to a wide range of clients, including businesses that want to improve their content generation, summarization, and search functions. Cohere's business model focuses on offering scalable and affordable generative AI tools, generating revenue by granting API access to pre-trained models that can handle tasks like text classification, sentiment analysis, and semantic search in multiple languages. The platform is customizable, enabling businesses to create smarter and faster solutions. With multilingual support, Cohere effectively addresses language barriers, making it suitable for international use.