Operations Engineering Manager, Fleet Reliability at CoreWeave

Livingston, New Jersey, United States

Apply Now

Not SpecifiedCompensation

Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level

Full TimeJob Type

UnknownVisa

AI, Cloud Computing, Data CentersIndustries

Requirements

Seven or more years of experience in a software or infrastructure engineering industry, of which at least two years were in a leadership capacity

Responsibilities

Build and lead a 24/7 team of process-oriented, reliability and observability-focused engineers
Lead the socialization and documentation of clear and consistent processes for provisioning, validating, and troubleshooting nodes in our server fleet
Think critically about and advocate for process and automation improvements prioritizing event-driven automated remediation as the end goal
Provide a 24/7 engineering support function for high-criticality, time-sensitive node delivery and maintenance
Drive and improve our program of onboarding, documentation, enablement, and performance management to help your team members achieve new heights of personal growth and capability
Drive the culture and tone for how your team keeps score both in how they communicate with and support each other and how they enable the rest of CoreWeave

Skills

Key technologies and capabilities for this role

AutomationObservabilityServer ProvisioningHardware TriageFleet ManagementTeam LeadershipOnboardingTrainingProcess LeadershipReliability Engineering

Questions & Answers

Common questions about this position

What is the salary for this Operations Engineering Manager position?

This information is not specified in the job description.

Is this role remote or hybrid, and where is it located?

The position is hybrid.

What experience is required for this role?

Seven or more years of experience in a software or infrastructure role is required.

What is the company culture like at CoreWeave?

CoreWeave thrives in a dynamic environment where adaptability and resilience are key, offering career-defining opportunities for those who excel amid change and challenge.

What makes a strong candidate for this Operations Engineering Manager role?

Strong candidates thrive in dynamic environments, enjoy solving complex problems, have process and thought leadership experience, and can build and lead a 24/7 team focused on reliability and automation.

CoreWeave

Cloud service for GPU-accelerated workloads

About CoreWeave

CoreWeave provides cloud computing services that focus on GPU-accelerated workloads, which are essential for tasks requiring high computational power. Their services cater to industries such as artificial intelligence, machine learning, visual effects rendering, and data processing. Clients can access powerful computing resources on a pay-as-you-go basis, allowing them to avoid the costs of purchasing expensive hardware. CoreWeave's infrastructure utilizes a bare metal serverless Kubernetes platform, which enhances performance while minimizing operational complexity for users. This setup enables clients to optimize their computing needs with a variety of NVIDIA GPUs, ensuring they can balance performance and cost effectively. The company's goal is to offer flexible and scalable computing solutions that meet the demands of diverse clients, from tech companies to film studios.

New York City, New YorkHeadquarters

2017Year Founded

$1,625.4MTotal Funding

SECONDARYCompany Stage

Enterprise Software, AI & Machine LearningIndustries

501-1,000Employees

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Life Insurance

Disability Insurance

Health Savings Account/Flexible Spending Account

Tuition Reimbursement

Mental Health Support

Family Planning Benefits

Paid Parental Leave

Hybrid Work Options

401(k) Company Match

Unlimited Paid Time Off

Catered lunch each day in our office and data center locations

A casual work environment

Risks

Increased competition from Vultr, backed by AMD, may impact CoreWeave's market share.

The planned IPO in 2025 could expose CoreWeave to market volatility and scrutiny.

Reliance on NVIDIA GPUs poses risks if supply chain issues arise.

Differentiation

CoreWeave specializes in GPU-accelerated workloads, catering to AI and high-performance computing.

Their infrastructure uses a bare metal serverless Kubernetes platform for high performance.

CoreWeave offers a flexible pay-as-you-go pricing model, appealing to cost-conscious clients.

Upsides

CoreWeave's partnership with Dell enhances infrastructure, attracting clients seeking advanced AI solutions.

The $600M data center funding in Virginia supports CoreWeave's expansion in the AI cloud market.

Experienced leadership, like CIO Sandy Venugopal, positions CoreWeave for strategic growth.

Land your dream remote job 3x faster with AI

Try Jobo Free