Shakudo

Head of Site Reliability Engineering

Toronto, Ontario, Canada

Not SpecifiedCompensation
Senior (5 to 8 years), Mid-level (3 to 4 years)Experience Level
Full TimeJob Type
UnknownVisa
Cloud Computing, DevOps, Software EngineeringIndustries

Requirements

Candidates should possess 8+ years of experience in infrastructure, DevOps, or Site Reliability Engineering roles with increasing responsibility, along with proven experience scaling distributed systems in a high-availability, production environment. Expertise with Kubernetes, Terraform, containerization, and at least one major cloud provider (AWS preferred) is required, as well as strong knowledge of system design, networking, and reliability principles.

Responsibilities

The Head of Site Reliability Engineering will build and lead the SRE function at Shakudo, setting goals, technical direction, and driving team culture. They will own uptime, reliability, and incident response for the platform, architect scalable infrastructure using Kubernetes, cloud-native tooling, and automation frameworks, and create and enforce best practices for CI/CD, disaster recovery, and service-level objectives (SLOs). Furthermore, they will partner closely with engineering and product to ensure new features are reliable and production-ready, mentor engineers, and instill a culture of operational excellence.

Skills

Kubernetes
Terraform
Containerization
AWS
Prometheus
Grafana
Datadog
System Design
Networking
Reliability
Incident Response
CI/CD
Disaster Recovery
SLOs

Shakudo

End-to-end platform for AI projects

About Shakudo

Shakudo offers a platform designed to support organizations in developing and managing AI and data-intensive products. Their main product, the Hyperplane platform, facilitates the entire workflow of AI projects, from initial ideas to deployment. It automates the optimization of resources, helping teams select the best configurations without the need for complex setups. This makes it easier for data scientists and AI teams to focus on building and maintaining their models efficiently. Shakudo differentiates itself from competitors by providing a subscription-based service with tiered pricing, allowing organizations to choose the level of access that suits their needs. The goal of Shakudo is to simplify the AI development process, enabling organizations to implement their AI projects more quickly and reliably.

Key Metrics

Toronto, CanadaHeadquarters
2021Year Founded
$9.8MTotal Funding
SERIES_ACompany Stage
Enterprise Software, AI & Machine LearningIndustries
11-50Employees

Risks

Increased competition from established AI platform providers like DataRobot.
Potential customer resistance due to learning curve and integration challenges.
Rapid AI advancements may outpace Shakudo's platform updates, risking obsolescence.

Differentiation

Shakudo offers a unique compatibility across best-of-breed data tools.
Their Hyperplane platform automates resource optimization for AI projects.
Shakudo provides DevOps-friendly GraphQL APIs for seamless AI solution interaction.

Upsides

Increased demand for AI model interpretability tools enhances Shakudo's platform.
Rise of AI model marketplaces boosts Hyperplane's user engagement and revenue.
Growing trend of federated learning aligns with Shakudo's data stack compatibility.

Land your dream remote job 3x faster with AI