Site Reliability Engineer
Stitch FixFull Time
Mid-level (3 to 4 years)
Candidates must possess at least 6 years of experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role involving software and infrastructure. Proficiency in Python or Bash is required, along with hands-on experience with Azure. Familiarity with CI/CD pipelines, infrastructure as code (IaC) tools like Terraform and Ansible, and a demonstrated ability to effectively triage and prioritize incidents are essential. A history of cross-functional team engagement during incidents and post-mortems, and a track record of proactively tailoring infrastructure to product needs are also required. Bonus qualifications include experience with application deployment, data pipelines, Snowflake, relational databases, and knowledge of AWS or GCP.
The Senior Site Reliability Engineer will collaborate with software engineers on iterative processes and task planning, taking ownership of service availability, scalability, and performance. They will proactively identify issues, implement automation to prevent recurrence, and participate in the on-call rotation to respond to incidents and restore service. Responsibilities include automating infrastructure provisioning, configuration, and management using IaC tools like Terragrunt and Ansible, and enhancing monitoring, logging, and alerting systems. The role also involves participating in blameless post-mortems, documenting issues, following up on action items, and fostering collaboration with other engineering teams to promote framework reuse. Staying current with industry trends and best practices in SRE, DevOps, and automation is also a key responsibility.
Cloud cost management and optimization platform
Vantage.sh is a platform designed to help businesses manage and optimize their cloud costs. It provides tools for creating detailed reports on cloud expenditure, allowing users to filter and group costs by various dimensions, set monthly budgets, and receive alerts when spending exceeds those budgets. The platform supports multiple cloud providers and offers in-depth resource-level analytics, enabling users to track costs across different subscriptions and projects. A standout feature is its Kubernetes cost optimization, which helps users allocate costs by service and identify areas for efficiency. Vantage.sh operates on a self-serve model, making it easy for businesses to start using the service and save on costs, charging a low fee of 5% of the savings generated. The goal of Vantage.sh is to provide businesses with a clear understanding of their cloud spending and help them find opportunities for cost reduction.