Senior Site Reliability Engineer
GoFundMeFull Time
Senior (5 to 8 years)
Candidates must have experience with Google and AWS Clouds, including infrastructure as code deployment with Cloud Formation, Terraform, or Opsworks. Proficiency in scripting and automation using Python and Scala is essential, along with a solid understanding of microservices architecture and container technologies like Kubernetes and Docker. A clear understanding of software development lifecycles and best practices from an infrastructure perspective, comprehensive systems hardware and network troubleshooting experience, and common Linux distribution platform installation, configuration, performance tuning, and cloud migration are also required. Experience with TCP/IP networking, NIC bonding, network services configuration (DNS, NTP, DHCP, SMTP), virtual infrastructure administration with at least one hypervisor (VMware, Hyper-V, KVM), and administration of web servers and supporting technologies including network load balancers are necessary. Experience with the design, development, and deployment of Puppet, system and application error investigation, troubleshooting of access/availability issues including deep multi-system root cause analysis, and managing networking devices are also required. A solid understanding of DevOps tools, processes, and culture, the ability to pick up new technologies quickly, and the ability to provide accurate work scheduling and task estimations are needed.
The Site Reliability Consultant will operate, maintain, and administer solutions to improve customer infrastructure's operational efficiency, availability, and visibility. Responsibilities include planning maintenance activity, design documentation, and standard procedures. They will provide Root Cause Analysis reports for outages/incidents, observe and provide feedback on client infrastructure to identify improvement opportunities, and automate repetitive tasks. The consultant will contribute to and improve team documentation, gather and analyze information about client environments through audits to identify improvement opportunities, and work collaboratively with teammates for continuous improvement. Acting as a technology leader for clients, they will drive discussions on technology roadmaps and participate in an on-call rotation in an escalation capacity.
Cloud migration and data management services
Pythian assists businesses in managing and optimizing their data and IT infrastructure through services like cloud migration, managed services, and advanced analytics. They help companies transfer their data to cloud platforms such as Google Cloud, AWS, and Microsoft Azure, while providing ongoing support for smooth operations. Pythian differentiates itself by offering specialized services in machine learning and data science, enabling businesses to turn their data into valuable insights. Their goal is to empower organizations to leverage cloud computing and advanced analytics to improve operations and drive growth.