Technical Program Manager, GPU Fleet Management at OpenAI

San Francisco, California, United States

OpenAI Logo
$230,000 – $285,000Compensation
Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
AI, Technology, Cloud ComputingIndustries

Requirements

  • Possess a degree in a hard science, or have a demonstrated track record of engineering expertise
  • Have 5+ years of experience in program management for major projects including capital projects or hyperscaler infrastructure deployment
  • Demonstrated ability to serve as the go-to person solely responsible for driving and delivering complex projects
  • Comfortable in managing cross-functional and cross-company teams; experience driving information and decision hygiene
  • Have an extensive track record of successfully delivering high-profile, technical projects against tight deadlines
  • Are technically adept and have effectively partnered with engineering or fundamental research teams of the highest caliber
  • Interfacing and leading external vendors including: engineering firms, equipment suppliers, and/or construction firms
  • Expertise in designing and implementing simple, scalable processes that solve complex problems
  • Experience managing complicated dependencies such as logistics and or supply chains
  • Are relentlessly resourceful and thrive in ambiguous, fast-paced environments
  • Are interested in and thoughtful about the impacts of AGI

Responsibilities

  • Guide the roadmap for automation for a fleet that can grow an order of magnitude in size or more
  • Ensure that the capacity demand signal is operationally materialized within Fleet tooling and scheduling systems
  • Build net-new systems that allow Fleet to better manage supply v. demand compute lifecycles across training and inference workstreams
  • Uplevel our autoscaling and global scheduling infrastructure by improving its reliability, ergonomics, and expanding capabilities
  • Work with Fleet Turnup engineers on executing on tight timelines and iteratively improving the process, tooling, and automation
  • Work with external partners to unlock bleeding edge compute and make it available as a turnkey resource for scheduling workloads
  • Collaborate closely with a broad set of stakeholders, including product engineering, inference, security, research and finance
  • Manage & coordinate the overall body of work across many parallel programs/projects, ensuring cohesive communication and consistent alignment across all teams in platform, to all cross functional teams, and up to leadership
  • Coordinate with engineers to manage the capacity footprint for OpenAI’s training and inference workloads while upleveling the demand-materialization lifecycle for all compute capacity

Skills

Program Management
GPU Fleet Management
Capacity Planning
Autoscaling
Scheduling Systems
Automation
Infrastructure Reliability
Cross-functional Coordination
Roadmap Planning
Tooling Development

OpenAI

Develops safe and beneficial AI technologies

About OpenAI

OpenAI develops and deploys artificial intelligence technologies aimed at benefiting humanity. The company creates advanced AI models capable of performing various tasks, such as automating processes and enhancing creativity. OpenAI's products, like Sora, allow users to generate videos from text descriptions, showcasing the versatility of its AI applications. Unlike many competitors, OpenAI operates under a capped profit model, which limits the profits it can make and ensures that excess earnings are redistributed to maximize the social benefits of AI. This commitment to safety and ethical considerations is central to its mission of ensuring that artificial general intelligence (AGI) serves all of humanity.

San Francisco, CaliforniaHeadquarters
2015Year Founded
$18,433.2MTotal Funding
LATE_VCCompany Stage
AI & Machine LearningIndustries
1,001-5,000Employees

Benefits

Health insurance
Dental and vision insurance
Flexible spending account for healthcare and dependent care
Mental healthcare service
Fertility treatment coverage
401(k) with generous matching
20-week paid parental leave
Life insurance (complimentary)
AD&D insurance (complimentary)
Short-term/long-term disability insurance (complimentary)
Optional buy-up life insurance
Flexible work hours and unlimited paid time off (we encourage 4+ weeks per year)
Annual learning & development stipend
Regular team happy hours and outings
Daily catered lunch and dinner
Travel to domestic conferences

Risks

Elon Musk's legal battle may pose financial and reputational challenges for OpenAI.
Customizable ChatGPT personas could lead to privacy and ethical concerns.
Competitors like Anthropic raising capital may intensify market competition.

Differentiation

OpenAI's capped profit model prioritizes ethical AI development over unlimited profit.
OpenAI's AI models, like Sora, offer unique video creation from text descriptions.
OpenAI's focus on AGI aims to create AI systems smarter than humans.

Upsides

OpenAI's $6.6 billion funding boosts its AI research and computational capacity.
Customizable ChatGPT personas enhance user engagement and satisfaction.
OpenAI's 'Operator' AI agent could revolutionize workforce automation by 2025.

Land your dream remote job 3x faster with AI