Software Engineer, ML Infrastructure
Serve Robotics- Full Time
- Junior (1 to 2 years)
Candidates should have a strong technical background in data technologies, distributed systems, and reliable software development. Deep expertise in ML system optimization, distributed systems, or full-stack application development for internal tools is required. Experience collaborating with ML researchers in an applied setting is highly valued. Proficiency in debugging ML systems, experience with reinforcement learning and transformers, and familiarity with Python, Kubernetes, distributed infrastructure, GPUs, and large-scale data systems such as Beam or Spark are essential.
The Research Infrastructure Engineer will ensure that systems powering ChatGPT training and development run smoothly. They will dive into large ML codebases to understand and debug system issues, work with researchers to build tools for data management, model configuration, and evaluation, and create reusable Python libraries for ML projects. Additional responsibilities include profiling large model reinforcement learning training, identifying experiment failures in new research clusters, redesigning data pipelines for diverse multimodal data, and building front-end evaluation tooling for company-wide use.
Develops safe and beneficial AI technologies
OpenAI develops and deploys artificial intelligence technologies aimed at benefiting humanity. The company creates advanced AI models capable of performing various tasks, such as automating processes and enhancing creativity. OpenAI's products, like Sora, allow users to generate videos from text descriptions, showcasing the versatility of its AI applications. Unlike many competitors, OpenAI operates under a capped profit model, which limits the profits it can make and ensures that excess earnings are redistributed to maximize the social benefits of AI. This commitment to safety and ethical considerations is central to its mission of ensuring that artificial general intelligence (AGI) serves all of humanity.