Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, Media, TelecommunicationsIndustries
Requirements
In-depth experience, knowledge, and skills in own discipline (Site Reliability Engineering)
Ability to work with limited supervision and direction
Experience independently determining/developing approaches for non-routine solutions
Ability to determine own work priorities
Availability for on-call shifts, nights, weekends, and variable schedules
Regular, consistent, and punctual attendance
Responsibilities
Engineer technical solutions for infrastructure and application management, monitoring, and operations with standardization and automation focus
Collaborate with cross-functional teams to identify and address reliability and performance issues
Provide cybersecurity support such as vulnerability cleanup, secure server configuration, testing and validation, technical controls implementation, and incident remediation
Work closely with developers to ensure software releases are well-designed, planned, implemented, released, and monitored
Measure and improve reliability, quality, and efficiency of platforms
Work on-call shifts and support incident prevention, response, and retrospect
Perform complex analytical duties in the planning, deployment, testing, and evaluation of products
Contribute to the design and implementation of reliable and scalable infrastructure solutions with best practices, tool use, and quality assurance
Monitor system performance and implement improvements to optimize reliability, availability, production quality, operational efficiency, and engineering productivity
Develop and maintain tools for monitoring, deployment, and operations
Provide subject matter expertise, resolve complex break/fix scenarios, and engage broader teams as necessary
Partner with engineering, vendors, and client services to deliver successful technical solutions
Ensure availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning for platforms
Design, analyze, and troubleshoot large-scale distributed systems; debug and optimize code; automate routine tasks
Follow operational practices and exercise independent judgment and discretion
Understand Operating Principles and apply them to job performance
Own the customer experience by prioritizing customers and providing seamless digital options
Be enthusiastic learners, users, and advocates of technology, products, and services
Win as a team by collaborating and being open to new ideas
Participate actively in the Net Promoter System through huddles, call backs, and feedback elevation
Perform other duties and responsibilities as assigned
Skills
SRE
Distributed Systems
Monitoring
Automation
Linux
Python
AWS
Capacity Planning
Troubleshooting
Change Management
Infrastructure as Code
Performance Optimization
Comcast
Comcast Corporation is a global media and technology company.