Senior Chaos Engineer at Goodnotes

Hong Kong, Hong Kong

Goodnotes Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
TechnologyIndustries

Requirements

  • Proven experience with chaos engineering or fault injection, ideally in distributed, production-scale environments
  • Comfortable with iOS platforms, mobile networking, and understanding how client-side failures impact backend systems
  • Strong experience with Swift programming
  • Strong understanding of resilience patterns (e.g., circuit breakers, bulkheads, timeouts, retries) and system failure modes
  • Prior involvement in incident postmortems, war games, or reliability reviews
  • Comfortable building tools or scripts to automate chaos experiments and analyse system behavior under stress
  • Scientific mindset, love forming hypotheses, testing limits, and uncovering how systems really behave at the edge
  • Excited to build a program from scratch, not just join one

Responsibilities

  • Define the chaos engineering strategy at Goodnotes, including tools, safety practices, and long-term roadmap
  • Design and run fault injection experiments across mobile and backend systems, targeting failure points in user flows, APIs, and infrastructure components to surface hidden risks
  • Simulate real-world issues like latency spikes, dependency outages, cascading failures, and resource exhaustion
  • Build and scale tooling for automating experiments, tracking outcomes, and improving observability
  • Establish clear guardrails and blast radius controls to ensure experiments are safe, measured, and reversible
  • Collaborate across engineering teams to identify critical flows, formulate hypotheses, and stress-test assumptions
  • Facilitate resilience drills and chaos game days, driving cross-team engagement and response readiness
  • Document findings, communicate insights, translate chaos learnings into actionable improvements, and influence our engineering teams to enact recommended changes
  • Help shape the future of the chaos engineering function — including mentoring and hiring as the team grows

Skills

Chaos Engineering
Fault Injection
Gremlins
Zombie Services
Network Blackholes
Latency Spikes
Dependency Outages
Mobile Systems
Backend Systems
APIs
Infrastructure
Reliability Engineering
Experiment Design

Goodnotes

About Goodnotes

N/AHeadquarters
N/AYear Founded
N/ACompany Stage

Land your dream remote job 3x faster with AI