Senior Site Reliability Engineer — GPU Infrastructure at Genmo

San Francisco, California, United States

Genmo Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Artificial Intelligence, Machine LearningIndustries

Requirements

  • BS/MS/PhD in CS, EE, or related field
  • 3+ yrs SRE/DevOps in production
  • 2+ yrs managing large Kubernetes fleets
  • Expert-level Kubernetes experience
  • Proficient in Python and Bash and IaC tools (Terraform, Helm, Ansible)
  • Track record of shipping and operating large-scale infrastructure with high reliability and clear communication

Responsibilities

  • Own the design and day-to-day operation of GPU clusters that train and serve frontier generative models
  • Lead production Kubernetes operations: GPU scheduling, cluster upgrades, multi-cluster federation
  • Define and implement Infrastructure-as-Code (Terraform, Helm, Ansible) and GitOps workflows with Argo CD or Flux
  • Build CI/CD pipelines, automated testing, and rollout strategies for infra changes
  • Develop an observability stack (Prometheus, Grafana, OpenTelemetry, eBPF) plus GPU telemetry with NVIDIA DCGM
  • Optimize high-performance networking (InfiniBand/RDMA) and debug perf bottlenecks
  • Run and continuously improve the 24×7 on-call rotation; lead post-incident reviews
  • Partner with researchers and engineers, communicate crisply, and ship with a high-ownership mindset

Skills

Kubernetes
Terraform
Helm
Ansible
Python
Bash
Prometheus
Grafana
OpenTelemetry
eBPF
NVIDIA DCGM
InfiniBand
RDMA
Argo CD
Flux
Slurm
Kueue
AWS
GCP
Azure

Genmo

AI tools for multimedia content creation

About Genmo

Genmo.ai specializes in providing AI tools for generating and editing multimedia content, including images, videos, and presentations. Users can upload images and animate specific parts, like transforming a static sky into a timelapse, or create entire movies by refining ideas, generating scenes, and selecting transitions. The platform caters to both individual content creators and businesses, operating on a subscription model with various service tiers. Genmo.ai differentiates itself by continuously enhancing its technology and focusing on user intent, ensuring that clients have powerful tools to realize their creative projects.

San Francisco, CaliforniaHeadquarters
N/AYear Founded
$29.2MTotal Funding
EARLY_VCCompany Stage
Consumer Software, AI & Machine LearningIndustries
1-10Employees

Risks

Server crashes during Mochi-1 launch could harm customer trust and satisfaction.
Open-source nature of Mochi-1 may lead to increased competition from developers.
Major tech players entering generative AI market could overshadow Genmo's offerings.

Differentiation

Genmo.ai offers unique AI tools for animating images and generating entire movies.
The platform supports both B2B and B2C models, catering to diverse client needs.
Genmo.ai's subscription model provides flexible access to advanced multimedia editing features.

Upsides

Launch of Mochi-1 model positions Genmo as a competitor to industry leaders.
Rising demand for AI-driven video editing boosts Genmo's market potential.
Subscription-based revenue model ensures steady income and opportunities for upselling.

Land your dream remote job 3x faster with AI