Software Engineer Lead - Cloud Engineering at Kumo

Mountain View, California, United States

Kumo Logo
Not SpecifiedCompensation
Expert & Leadership (9+ years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
AI, Big Data, Cloud ComputingIndustries

Requirements

  • 8-10+ years of experience managing large-scale Kubernetes clusters (EKS, GKE, AKS, or OpenSource) in production. Deep expertise in Kubernetes internals, including controllers, operators, scheduling, networking (CNI), and security policies
  • 8-10+ years of experience building cloud-native Kubernetes-based infrastructure across AWS, Azure, and GCP
  • 8-10+ years of experience building Kubernetes service meshes (Istio/Envoy, Traefik), networking policies (Calico/Tigera), and distributed ingress/egress control
  • Proven experience in optimizing, scaling, and maintaining Kubernetes clusters across multi-cloud environments, ensuring high availability and performance
  • 8-10+ years of experience writing production-grade controllers and operators in Python, Go, or Rust to extend Kubernetes functionality
  • Hands-on experience with Terraform, CloudFormation, Ansible, BASH and Make scripting to automate Kubernetes cluster provisioning and management
  • Expertise in building and operating large-scale distributed systems for cloud-native B2B SaaS applications running on Kubernetes
  • Deep expertise in building container orchestration, workload scheduling, and runtime optimizations

Responsibilities

  • Design, build, and scale Kubernetes-based infrastructure to support Kumo’s multi-cloud AI platform, ensuring high availability, resilience, and performance
  • Architect and optimize large-scale Kubernetes clusters, improving scheduling, networking (CNI), and workload orchestration for production environments
  • Develop and extend Kubernetes controllers and operators to automate cluster management, lifecycle operations, and scaling strategies
  • Enhance observability, diagnostics, and monitoring by building tools for real-time cluster health tracking, alerting, and performance tuning
  • Lead efforts to automate fleet management, optimizing node pools, autoscaling, and multi-cluster deployments across AWS, GCP, and Azure
  • Define and implement Kubernetes security policies, RBAC models, and best practices to ensure compliance and platform integrity
  • Collaborate with ML engineers and platform teams to optimize Kubernetes for machine learning workloads, ensuring seamless resource allocation for AI/ML models
  • Drive commit-to-production automation, cloud connectivity, and deployment orchestration, ensuring seamless application rollouts, zero-downtime upgrades, and global infrastructure reliability

Skills

Kubernetes
Multi-Cloud
CNI
Kubernetes Operators
Controllers
CI/CD
MLOps
Observability
Networking
Performance Tuning
Cluster Management
Workload Orchestration

Kumo

Generates and deploys predictive models

About Kumo

Kumo.ai specializes in creating and implementing accurate predictive models for organizations that need reliable forecasts for critical operations. Their platform uses Graph Neural Networks to analyze raw relational data, which removes the need for manual data preparation and enhances prediction accuracy and efficiency. Unlike many competitors, Kumo.ai's platform streamlines the entire Machine Learning lifecycle, from data preparation to model deployment, while also optimizing costs by eliminating unnecessary infrastructure. The company aims to provide a quick return on investment for its clients, which range from small businesses to large enterprises, by offering flexible deployment options through Software as a Service (SaaS) and Private Cloud models. Kumo.ai is built by experienced professionals from top tech companies and has already gained the trust of leading organizations globally.

Mountain View, CaliforniaHeadquarters
2021Year Founded
$35.5MTotal Funding
SERIES_BCompany Stage
Fintech, AI & Machine LearningIndustries
51-200Employees

Benefits

Stock Options
Medical Insurance
Dental Insurance

Risks

Increased competition from Databricks' Marketplace may divert potential customers.
The rise of multimodal AI could overshadow Kumo's current offerings.
Rapid AI advancements by tech giants may set new industry standards Kumo must meet.

Differentiation

Kumo.AI uses Graph Neural Networks for predictive modeling, eliminating manual feature engineering.
The platform offers a SQL-like Predictive Querying Language for rapid AI model creation.
Kumo.AI integrates with Snowflake's Native App Framework, enhancing model performance and scalability.

Upsides

Kumo's $18M Series B funding will expand its platform and market reach.
Integration with Snowpark Container Services enhances deep learning capabilities within Snowflake Data Cloud.
Kumo's platform supports both SaaS and Private Cloud models, offering client flexibility.

Land your dream remote job 3x faster with AI