Site Reliability Engineer
CloseFull Time
Junior (1 to 2 years)
Candidates should have experience with Google Cloud and Infrastructure as Code tools like Terraform. A strong understanding of microservices, containers (Kubernetes, Docker), and networking is required, along with hands-on experience in PKI, service mesh, and Linux systems administration. A Site Reliability Engineering mindset focused on automation, scalability, and reliability is essential.
The Site Reliability Engineer will operate and optimize Kubernetes clusters, Istio service mesh, and Linux-based systems. They will automate workflows using Go, Python, and Shell scripting, and build monitoring and observability solutions with Prometheus, Grafana, and Loki. Responsibilities include troubleshooting complex system issues, partnering with AI/ML teams, and participating in on-call rotations and postmortem reviews.
Cloud migration and data management services
Pythian assists businesses in managing and optimizing their data and IT infrastructure through services like cloud migration, managed services, and advanced analytics. They help companies transfer their data to cloud platforms such as Google Cloud, AWS, and Microsoft Azure, while providing ongoing support for smooth operations. Pythian differentiates itself by offering specialized services in machine learning and data science, enabling businesses to turn their data into valuable insights. Their goal is to empower organizations to leverage cloud computing and advanced analytics to improve operations and drive growth.