Site Reliability Engineer at Global Payments

Lindon, Utah, United States

Not SpecifiedCompensation

Mid-level (3 to 4 years)Experience Level

Full TimeJob Type

UnknownVisa

FinTech, PaymentsIndustries

3+ years of experience in production support, site reliability engineering (SRE), or DevOps—preferably supporting Apigee APIs
Strong understanding of cloud infrastructure (AWS, GCP) and observability tools
Proficiency in Python or shell scripting for automation and troubleshooting
Proficiency in programming languages such as Python and JavaScript
Strong analytical, communication, and incident management skills
Excellent problem-solving and analytical skills
Excellent communication and collaboration skills
Ability to work proactively with a high level of initiative and accuracy
Ability to manage multiple assignments effectively and meet established deadlines
Strong interpersonal skills to interact professionally with staff and stakeholders
Excellent organizational skills and attention to detail
Critical thinking ability ranging from moderately to highly complex tasks
Flexibility in adapting to changing business needs and priorities
Ability to work creatively and independently with minimal supervision
Ability to utilize experience and judgment in accomplishing goals
Experience in navigating organizational structures and collaborating across teams

Serve as the first line of defense for production incidents, ensuring rapid triage, root cause analysis, and resolution
Monitor system health and performance of deployed APIs and integrating applications
Track and investigate issues related to latency, failures, or broken integrations, escalating to the API engineering group where appropriate
Collaborate with platform engineers to implement observability, logging, and alerting best practices for API services
Build diagnostic tools, runbooks, and automated workflows to improve incident response time and reduce manual intervention
Maintain knowledge bases and playbooks for repeatable troubleshooting and knowledge transfer
Partner with governance and compliance teams to ensure incidents are documented and remediated in line with internal policy
Contribute to retrospectives and continuous improvement efforts to harden production systems