Site Reliability Engineer
Stitch FixFull Time
Mid-level (3 to 4 years)
Candidates must possess the ability to independently own projects from inception to delivery, including planning, requirement gathering, collaboration with Product, milestone planning, and production releases. Deep knowledge of building and operating highly available systems at scale is essential. Excellent skills in a system-level language such as Go, C++, Java, C#, or Rust are required, with fluency in one of these languages being more important than specific language knowledge. A strong interest in how things are built, a creative, pragmatic, and customer-centered mindset, and a flexible, collaborative working style are also necessary.
The Member of Technical Staff will architect, build, and operate scalable, reliable, and cost-efficient backend systems across the entire technology stack. Responsibilities include designing, implementing, and operating high-volume services in Go on top of Kubernetes/GCP/AWS, building operational tooling for written code, and collaborating with the cloud infrastructure team to manage code in production. The role also involves collaborating with stakeholders to design and deliver features meeting urgent customer needs while adhering to engineering best practices, and refining software development processes by contributing to build systems, test frameworks, deployment tools, and monitoring.
Cloud-native infrastructure and application monitoring
Chronosphere offers a platform that helps businesses monitor their cloud-native infrastructure and applications by quickly identifying and resolving issues before they affect customers. The platform filters out unnecessary data, allowing users to focus on critical information, which is especially useful for companies still using outdated monitoring tools. Unlike its competitors, Chronosphere emphasizes reducing observability costs, which can be a major expense for engineering teams, and helps decrease on-call alerts by up to 90%. The company's goal is to improve business efficiency and productivity while providing a strong return on investment.