Site Reliability Engineer
Stitch FixFull Time
Mid-level (3 to 4 years)
Candidates should possess 4-7 years of experience in large-scale production environments, with expertise in both on-premise and cloud deployments. Required technical skills include experience with artifact management services (Artifactory, Nexus, Quay.io), CI/CD tools (Github Actions, GitLab, Jenkins), IaC provisioning tools (Ansible, Chef, Puppet, Salt, Terraform), Kubernetes, Source Code Management (Bitbucket, Gitlab, Github), and observability stacks (Prometheus, Grafana, Thanos, ELK/EFK, Datadog). Familiarity with GitOps and Kubernetes orchestration tools like ArgoCD/FluxCD is necessary. The role also requires proven ability to work with remote teams, a track record of technical documentation, attention to detail, good decision-making skills, the ability to balance short-term and long-term goals, self-learning capabilities, and a security-minded approach.
The Engineer II - Site Reliability will build and maintain automation for platform infrastructure and applications, provide support for artifact management and CI/CD infrastructure, and administer and optimize source control. Responsibilities include assisting in developing cloud-agnostic solutions, implementing best practices, monitoring availability, assessing system health, supporting reliability improvements, and creating service documentation. The role also involves contributing to continuous improvement and developer experience initiatives, communicating with stakeholders, analyzing metrics for performance tuning, partnering with other engineers, and participating in on-call rotations and incident response.
Cloud-native endpoint security solutions provider
CrowdStrike specializes in cybersecurity, focusing on protecting businesses from cyber threats through cloud-native endpoint security solutions. Their main product, the Falcon platform, includes services like Falcon Pro, which replaces traditional antivirus with next-generation antivirus that integrates threat intelligence, Falcon Insight for endpoint detection and response, and Falcon Device Control to manage connected devices. Unlike many competitors, CrowdStrike's services are subscription-based, allowing clients to choose different levels of protection based on their needs. The company serves a diverse clientele, including many Fortune 100 companies, and is recognized as a leader in the cybersecurity field, known for its effectiveness in threat detection and response.