Senior Software Engineer, Vision Language Models
Motional- Full Time
- Junior (1 to 2 years)
Candidates must possess exceptional software engineering skills with a proven track record of building robust and scalable systems. A strong command of Python is required, along with familiarity with popular deep learning frameworks such as JAX, PyTorch, and TensorFlow, particularly their multimodal capabilities. Knowledge of distributed training strategies for large-scale multimodal models is essential, as well as familiarity with autoregressive models for tasks like image captioning and speech-to-text generation. Bonus points for publications in top-tier venues related to multimodal AI research and experience in writing efficient GPU kernels using CUDA.
As a Senior Member of Technical Staff focused on Multimodal AI, you will design and develop cutting-edge multimodal AI systems integrating text, speech, and vision. You will conduct research and experiments on advanced compute infrastructure, exploring novel ideas in multimodal representation learning and transfer learning. Collaborating closely with world-class teams, you will learn from and contribute to their expertise in the field.
Provides NLP tools and LLMs via API
Cohere provides advanced Natural Language Processing (NLP) tools and Large Language Models (LLMs) through a user-friendly API. Their services cater to a wide range of clients, including businesses that want to improve their content generation, summarization, and search functions. Cohere's business model focuses on offering scalable and affordable generative AI tools, generating revenue by granting API access to pre-trained models that can handle tasks like text classification, sentiment analysis, and semantic search in multiple languages. The platform is customizable, enabling businesses to create smarter and faster solutions. With multilingual support, Cohere effectively addresses language barriers, making it suitable for international use.