Member of Technical Staff, Agent Infrastructure Engineer
Cohere- Full Time
- Mid-level (3 to 4 years), Senior (5 to 8 years)
Candidates should possess 3+ years of highly relevant experience in infrastructure engineering with demonstrated expertise in large-scale distributed systems, strong knowledge of performance optimization techniques and system architectures for high-throughput ML workloads, experience with containerization technologies (Docker, Kubernetes) and orchestration at scale, and proven track record of building large-scale data pipelines and distributed storage systems. Familiarity with language model training, evaluation, and inference is highly encouraged, along with experience with GPU/TPU architectures and language model inference optimization.
As a Staff Infrastructure Engineer, you will design and implement large-scale infrastructure systems to support AI scientist training, evaluation, and deployment across distributed environments, identify and resolve infrastructure bottlenecks impeding progress toward scientific capabilities, develop robust and reliable evaluation frameworks for measuring progress towards scientific AGI, build scalable and performant VM/sandboxing/container architectures to safely execute long-horizon AI tasks and scientific workflows, collaborate to translate experimental requirements into production-ready infrastructure, develop large scale data pipelines to handle advanced language model training requirements, and optimize large scale training and inference pipelines for stable and efficient reinforcement learning.
Develops reliable and interpretable AI systems
Anthropic focuses on creating reliable and interpretable AI systems. Its main product, Claude, serves as an AI assistant that can manage tasks for clients across various industries. Claude utilizes advanced techniques in natural language processing, reinforcement learning, and code generation to perform its functions effectively. What sets Anthropic apart from its competitors is its emphasis on making AI systems that are not only powerful but also understandable and controllable by users. The company's goal is to enhance operational efficiency and improve decision-making for its clients through the deployment and licensing of its AI technologies.