Senior Software Engineer-Distributed Inference
NVIDIA- Full Time
- Senior (5 to 8 years)
Candidates should have extremely strong software engineering skills and proficiency in Python along with related ML frameworks such as JAX, Pytorch, and XLA/MLIR. Experience writing kernels for GPUs using CUDA and Triton is required, as well as experience with large-scale distributed training strategies. Familiarity with autoregressive sequence models, such as Transformers, is also necessary.
As a Performance Engineer, you will design and write high-performant and scalable software for training. You will understand architectural modifications and their effects on training throughput and quality. The role involves writing low-level CUDA and Triton kernels to optimize performance, researching and implementing ideas on supercompute and data infrastructure, and learning from and collaborating with top researchers in the field.
Provides NLP tools and LLMs via API
Cohere provides advanced Natural Language Processing (NLP) tools and Large Language Models (LLMs) through a user-friendly API. Their services cater to a wide range of clients, including businesses that want to improve their content generation, summarization, and search functions. Cohere's business model focuses on offering scalable and affordable generative AI tools, generating revenue by granting API access to pre-trained models that can handle tasks like text classification, sentiment analysis, and semantic search in multiple languages. The platform is customizable, enabling businesses to create smarter and faster solutions. With multilingual support, Cohere effectively addresses language barriers, making it suitable for international use.