6+ years of technical experience in software engineering, site reliability, or production operations
Proven track record of managing the full software development lifecycle (SDLC), from requirements gathering to production release
Hands-on understanding of full-stack components: Frontend/UI frameworks and client experience; APIs & service layers; Database layer (SQL/NoSQL, data modeling, performance tuning); Backend servers and distributed systems; Big data & ETL pipelines (batch and streaming)
Strong knowledge of incident management (PagerDuty, Jira, Datadog, Splunk, ServiceNow)
Confidence to dive deep with engineers while also translating technical details into clear business context for executives and clients
Experience operating in global, multi-time-zone environments with diverse customer and platform needs
Understand our Operating Principles; make them the guidelines for how you do your job
Own the customer experience - think and act in ways that prioritize customer needs
Responsibilities
Own the Escalations lifecycle within Engineering, from the beginning through resolution
Lead root cause analysis (RCA) sessions that dig deeper than symptoms and deliver long-lasting fixes
Facilitate retrospectives and follow-ups, turning lessons learned into clear improvement plans
Define and track metrics (incident frequency, resolution times, client impact), and make them visible through dashboards and reports
Partner with teams to strengthen systems through tooling, automation, and platform hardening
Keep a cross-platform perspective (TV, Data, Beeswax, Strata) to spot patterns and systemic issues
Lead QoS reviews and improvement sessions with leadership, highlighting what happened, why, and how we’ll prevent it next time
Support a culture of learning and transparency by running training, knowledge-sharing, and quality workshops
Act as the single voice for Engineering in incident management, making sure communication is consistent and clear at all levels
Collaborate with Engineering (Tier 2/3) to resolve incidents quickly and share learnings across teams
Partner with Operations (Tier 1) to fine-tune escalation paths and help reduce unnecessary hand-offs
Work closely with the COO team to analyze client impact and provide crisp, timely updates during incidents
Skills
QoS
Software Engineering
Root Cause Analysis
RCA
Retrospectives
Incident Management
Metrics
System Hardening
Escalations
Comcast
Comcast Corporation is a global media and technology company.