Job Description
Position Overview
This role is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of systems. The position creates a bridge between development and operations by applying a software engineering mindset to system administration topics. The role involves splitting time between operations/on-call duties and developing systems and software to increase site reliability and performance.
Employment Type
Full time
Company Information
Every day, Global Payments makes it possible for millions of people to move money between buyers and sellers using our payments solutions for credit, debit, prepaid and merchant services. Our worldwide team helps over 3 million companies, more than 1,300 financial institutions and over 600 million cardholders grow with confidence and achieve amazing results. We are driven by our passion for success and we are proud to deliver best-in-class payment technology and software solutions. Join our dynamic team and make your mark on the payments technology landscape of tomorrow.
Responsibilities
- Work with various teams including DevOps, Development, and Business partners to gather requirements for application changes needed in build-outs.
- Participate in architecture and R&D discussions for new technology or processes to increase the performance and reliability of our systems.
- Engage in chaos engineering: think laterally about potential system failures, design tests to demonstrate behavior, and formulate/implement remediation plans.
- Push systems to their limits and design solutions to achieve higher performance tiers.
- Utilize DevOps and GitOps practices to improve automation and processes for self-service.
- Safeguard reliability by ensuring services are highly available, resilient against disasters, self-monitoring, and self-healing.
- Run “game days” to test reliability assumptions and identify potential failure points before they impact customers.
- Review designs with an eye toward increasing the holistic stability of the platform and identifying potential risks.
- Build systems to proactively monitor the health, performance, and security of production and non-production virtualized infrastructure.
- Improve monitoring and alerting systems to ensure timely and relevant notifications for engineers.
- Troubleshoot systems and network issues alongside the Technical Operations Team.
- Mentor other engineers in reliability-related skills.
- Evolve the SDLC, practices, and tooling to incorporate Site Reliability considerations and best practices.
- Develop runbooks and improve documentation.
Minimum Qualifications
- BS in Computer Science, Information Technology, Business / Management Information Systems, or related field.
- Typically a minimum of 8+ years of professional experience in coding, designing, developing, and analyzing data.
- Broad and comprehensive advanced knowledge of multiple opposing front/back-end languages/technologies, including:
- Two or more modern programming languages used in the enterprise.
- Experience working with various APIs and external services.
- Experience with both relational and NoSQL databases.
- Experience in Public and Private Clouds, Jenkins, Terraform, Ansible, OpenShift, Kubernetes, or AWS EKS.
Preferred Qualifications
- BS in Computer Science, Information Technology, Business / Management Information Systems, or related field.
- 10+ years of professional experience in coding, designing, developing, and analyzing data.
- Experience with IBM Rational Tools.
Desired Skills and Capabilities
- Skills / Knowledge: Possess broad expertise or unique knowledge, using skills to contribute to the development of company objectives and principles, and to achieve goals in creative and effective ways.
- Barriers: (Information not provided)
Salary
(Information not provided)
Location Type
(Information not provided)