Global Payments

Principle Site Reliability Engineer

Dallas, Texas, United States

Not SpecifiedCompensation
Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Payments Technology, Financial ServicesIndustries

Requirements

Candidates should possess a BS in Computer Science, Information Technology, Business/Management Information Systems, or a related field. A minimum of 8+ years of professional experience in coding, designing, developing, and analyzing data is required, along with proficiency in at least two modern enterprise programming languages, experience with various APIs and external services, and familiarity with both relational and NoSQL databases. Experience with public and private clouds, Jenkins, Terraform, Ansible, OpenShift, Kubernetes, or AWS EKS is also necessary. Preferred qualifications include 10+ years of professional experience and experience with IBM Rational Tools.

Responsibilities

The Principle Site Reliability Engineer will be responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of systems. They will apply a software engineering mindset to system administration, splitting time between operations/on-call duties and developing systems to enhance site reliability and performance. This role involves collaborating with DevOps, Development, and Business partners to gather requirements, participating in architecture and R&D discussions for new technologies, and implementing chaos engineering practices to identify and remediate system failures. The engineer will push systems to their performance limits, design solutions for improvement, and utilize DevOps and GitOps practices for automation and self-service. Key duties include safeguarding reliability through high availability, disaster resilience, self-monitoring, and self-healing systems, running game days to test reliability assumptions, reviewing designs for platform stability, building systems for proactive monitoring, improving monitoring and alerting systems, troubleshooting systems and network issues, mentoring other engineers in reliability skills, evolving the SDLC and tooling for SRE best practices, and developing runbooks and documentation.

Skills

Site Reliability Engineering
Availability
Latency
Performance
Efficiency
Change Management
Monitoring
Emergency Response
Capacity Planning
Software Engineering
System Administration
DevOps
GitOps
Chaos Engineering
Automation
Self-service
High Availability
Resilience
Self-monitoring
Self-healing
Game Days

Global Payments

Payment technologies and software solutions

About Global Payments

N/AHeadquarters
N/AYear Founded
N/ACompany Stage

Land your dream remote job 3x faster with AI