Senior Software Engineer - Observability
TetraScienceFull Time
Senior (5 to 8 years)
The Principal Observability Architect must have experience leading the strategic architecture, evolution, and operationalization of a modern, multi-tenant Observability Platform-as-a-Service (OPaaS) for hybrid on-prem and cloud-native SaaS products. They should be skilled in architecting cloud-agnostic, federated observability platforms supporting real-time monitoring, advanced telemetry pipelines, and AI-powered insights. Experience with Kubernetes, service meshes, serverless components, VictoriaMetrics, Prometheus, Grafana, Alertmanager, Cribl Stream and Edge, VictoriaLogs, OpenTelemetry, and AI/ML for observability is required. Familiarity with integrating alerting and incident response tools like PagerDuty, Slack, and ServiceNow, as well as CI/CD workflows (GitHub Actions, GitLab CI), is also necessary.
The Principal Observability Architect will lead the architecture and roadmap for a multi-region, multi-cloud, multi-tenant observability platform, designing near real-time telemetry ingestion pipelines with low-latency guarantees. They will define observability blueprints including telemetry SLAs, data contracts, tenant data isolation, and cost-aware retention strategies. Responsibilities include ensuring observability systems are cloud-native and container-aware, designing and implementing real-time metrics, logs, traces, and event pipelines, and embedding real-time anomaly detection and signal correlation with context-aware alerting. The role involves integrating with alerting and incident response tools, ensuring observability of synthetic probes, end-user transactions, and critical SLOs, standardizing OpenTelemetry instrumentation, architecting OpenTelemetry deployment patterns, and embedding observability validation gates into CI/CD workflows. Additionally, they will provide self-service tools and training to enable developer teams, leverage AI/ML for anomaly detection, predictive incident detection, and correlation, and build or integrate LLM-powered tools for natural language querying, AI-assisted debugging, and generative runbooks.
Cloud-based platform for digital workflows
ServiceNow offers a cloud-based platform that helps businesses automate and manage their operations, improving efficiency and enhancing customer and employee experiences. The Now Platform includes applications for IT operations, customer service, human resources, and security operations, all accessible over the internet. Targeting large enterprises across various industries, ServiceNow operates on a software-as-a-service (SaaS) model, generating revenue through subscription fees and professional services. The company's goal is to streamline business processes and drive digital transformation for its clients.