Site Reliability Engineer
Position Overview
Customer.io is seeking a Site Reliability Engineer (SRE) to join our team and help scale our infrastructure, reduce operational toil, and increase reliability as we grow. We are looking for individuals with experience in high-scale systems who are passionate about improving platforms for both developers and customers.
About Customer.io
Over 7,500 companies, from startups to global brands, utilize our platform to send billions of emails, push notifications, in-app messages, and SMS daily. Customer.io powers automated communication that users appreciate. We enable teams to send smarter, more relevant messages by leveraging real-time behavioral data. Our technology stack includes Go, React, Ember, and AI, allowing us to ship fast and scale with confidence.
What We Value
- Ownership: You take responsibility for problems end-to-end, act with an owner's mindset, and thrive in ambiguity. You have a track record of leading complex projects.
- Engineers with Product Taste: You approach problems from the user's perspective, considering performance, reliability, and the overall customer experience.
- Healthy Skepticism: You bring rigor and creativity, valuing best practices while prioritizing forward motion.
Responsibilities
- Build and scale infrastructure to support billions of messages per day and real-time events.
- Automate deployments, alerting, and incident response processes.
- Improve the on-call experience through clear alerts, robust documentation, and faster incident resolution.
- Tune MySQL and other datastore performance, and enhance reliability across distributed systems.
- Collaborate with cross-functional teams to debug, deploy, and support production systems.
- Share knowledge and elevate the team's capabilities through public progress updates (videos, writing) and mentorship.
- Leverage AI tools for prototyping, increasing efficiency, and improving decision-making.
What We're Looking For
- 7+ years of experience in SRE or infrastructure roles, with a focus on improving production systems at scale.
- Deep experience with MySQL, including schema design, performance tuning, and operational tooling.
- Fluency in cloud-native technologies (GCP experience is a plus) and Terraform.
- Proficiency in Go and Bash for scripting and systems programming.
- Skill in observability, incident response, and debugging distributed systems.
- A preference for action over perfection and pride in owning technical decisions.
Compensation & Benefits
- Salary: $140,000 - $180,000 USD (or local currency equivalent), depending on experience and market rate.
- Benefits:
- 100% coverage of medical, dental, vision, mental health, and supplemental insurance premiums for you and your family.
- 16 weeks paid parental leave.
- Unlimited Paid Time Off (PTO).
- Stipends for remote work and wellness.
- Professional development budget.
- See full benefits here →
Our Process
Our hiring process is designed to be clear, human, and informative for both candidates and the company.
- Application: Submit your application and explain your interest in the role.
- Recruiter Call (30 mins): A conversation to discuss your career goals and how we operate.
- Behavioral Interview (60 mins): Discuss topics like ownership, product thinking, and collaboration with a hiring manager.
- Take-Home Assignment: Complete a short, realistic task representative of the work done at Customer.io.
- Technical Interview + Assignment Review Call (90 mins): Review your take-home project, discuss your decisions, and tackle a system design problem focusing on scaling challenges and tradeoffs.
All final candidates will undergo a background check and employment verification.
Customer.io is committed to fostering an inclusive environment and recognizes the impact of systemic injustice.