Strong experience with UNIX/Linux systems and using the CLI for troubleshooting
Good understanding of networking protocols and SIP
Strong hands-on experience with Kubernetes (k8s) and containerized environments
Proven track record of working in production environments, with a careful and methodical approach to changes (testing before deployment, rollback planning, risk mitigation)
Understanding of high-availability systems, fault tolerance, and performance optimization
Experience automating tasks with Python, Golang, or Shell scripts
Mindset of an SRE: you treat operations as an engineering discipline and continuously look for ways to make systems more reliable and efficient
Good command of English (B2 or higher) — ability to communicate effectively with distributed international teams (both written and spoken)
Responsibilities
Support and maintain Linux-based servers and telephony services in production
Investigate and resolve incidents in a high-load, distributed environment
Participate in on-call shifts and ensure the stability of systems under strict SLAs
Analyze service performance, reliability, and architecture bottlenecks; propose improvements
Work with development teams to safely deliver and validate changes before production deployment
Contribute ideas and help evolve team processes, automation, and monitoring practices