Senior Software Engineer - Observability
TetraScienceFull Time
Senior (5 to 8 years)
Candidates should have 3-5 years of experience in an IT or MSP environment with a focus on monitoring tools, a strong background in monitoring platforms like Logic Monitor or similar, and experience using Event Management, Event Correlation, and AIOps tools such as BigPanda. Proficiency with automation and scripting languages including Python, Ansible, PowerShell, and Go is required, along with knowledge of ITIL processes and integration with ITSM platforms like ServiceNow. Familiarity with customer reporting, SLA management, and service-level dashboards, strong problem-solving and troubleshooting abilities, excellent communication skills, and the ability to work independently are essential. Experience with Azure concepts, Log Analytics, Azure Monitor, Infrastructure as Code tools like Bicep, Terraform, and Ansible, and ticketing systems such as ServiceNow is necessary. A Bachelor's degree or higher in Information Technology is desired, and certifications like Azure Administrator Associate, AZ-104, ITIL v4 Foundation, or any monitoring tool certifications are highly desired.
The Senior Observability Engineer will design, administer, and support observability solutions providing visibility into customer environments across cloud, hybrid, and on-premises platforms. Responsibilities include standardizing and maintaining observability configurations like dashboards and alert thresholds, defining and supporting SLAs, SLOs, and KPIs in collaboration with various support teams, and defining alerting, dashboards, and monitoring baselines during customer onboarding. The role involves continuously improving noise reduction, event correlation, and escalation processes, participating in incident investigations using observability data for root cause analysis, and designing/configuring automated incident resolution activities. Ensuring observability solutions align with compliance and security requirements, configuring integration and optimization with ITSM platforms like ServiceNow, and assisting with client reporting are key duties. The engineer will also collaborate with senior engineers for issue escalation, develop and maintain technical knowledge base articles, and interact with clients, potentially working after hours or on weekends.
Global consulting firm for diverse sectors
RPS North America provides consulting and advisory services across various sectors, including property, energy, transport, defense, government services, and water resources. The company helps clients, which range from large corporations to non-profits, navigate complex challenges and achieve regulatory compliance through a wide array of services. These services include management consulting, environmental consulting, project management, and digital solutions, among others. RPS North America stands out from its competitors by leveraging its deep expertise and global presence to deliver tailored solutions that meet specific client needs. The company's goal is to drive sustainable growth and address critical issues such as infrastructure development and risk management, ensuring a consistent demand for its high-value services.