Head of Site Reliability Engineering at Shakudo

Toronto, Ontario, Canada

Apply Now

Not SpecifiedCompensation

Senior (5 to 8 years), Mid-level (3 to 4 years)Experience Level

Full TimeJob Type

UnknownVisa

Cloud Computing, DevOps, Software EngineeringIndustries

Requirements

Candidates should possess 8+ years of experience in infrastructure, DevOps, or Site Reliability Engineering roles with increasing responsibility, along with proven experience scaling distributed systems in a high-availability, production environment. Expertise with Kubernetes, Terraform, containerization, and at least one major cloud provider (AWS preferred) is required, as well as strong knowledge of system design, networking, and reliability principles.

Responsibilities

The Head of Site Reliability Engineering will build and lead the SRE function at Shakudo, setting goals, technical direction, and driving team culture. They will own uptime, reliability, and incident response for the platform, architect scalable infrastructure using Kubernetes, cloud-native tooling, and automation frameworks, and create and enforce best practices for CI/CD, disaster recovery, and service-level objectives (SLOs). Furthermore, they will partner closely with engineering and product to ensure new features are reliable and production-ready, mentor engineers, and instill a culture of operational excellence.

Skills

Kubernetes

Terraform

Containerization

AWS

Prometheus

Grafana

Datadog

System Design

Networking

Reliability

Incident Response

CI/CD

Disaster Recovery

SLOs

Shakudo

End-to-end platform for AI projects

About Shakudo

Shakudo offers a platform designed to support organizations in developing and managing AI and data-intensive products. Their main product, the Hyperplane platform, facilitates the entire workflow of AI projects, from initial ideas to deployment. It automates the optimization of resources, helping teams select the best configurations without the need for complex setups. This makes it easier for data scientists and AI teams to focus on building and maintaining their models efficiently. Shakudo differentiates itself from competitors by providing a subscription-based service with tiered pricing, allowing organizations to choose the level of access that suits their needs. The goal of Shakudo is to simplify the AI development process, enabling organizations to implement their AI projects more quickly and reliably.

Toronto, CanadaHeadquarters

2021Year Founded

$9.8MTotal Funding

SERIES_ACompany Stage

Enterprise Software, AI & Machine LearningIndustries

11-50Employees

Risks

Increased competition from established AI platform providers like DataRobot.

Potential customer resistance due to learning curve and integration challenges.

Rapid AI advancements may outpace Shakudo's platform updates, risking obsolescence.

Differentiation

Shakudo offers a unique compatibility across best-of-breed data tools.

Their Hyperplane platform automates resource optimization for AI projects.

Shakudo provides DevOps-friendly GraphQL APIs for seamless AI solution interaction.

Upsides

Increased demand for AI model interpretability tools enhances Shakudo's platform.

Rise of AI model marketplaces boosts Hyperplane's user engagement and revenue.

Growing trend of federated learning aligns with Shakudo's data stack compatibility.

Land your dream remote job 3x faster with AI

Try Jobo Free