Staff Site Reliability Engineer at Altana AI

Brooklyn, New York, United States

Altana AI Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Supply Chain, AIIndustries

Requirements

  • Deep understanding of observability practices for comprehensive insights into system behavior
  • Strong focus on cloud-native environments (e.g., Kubernetes, microservices) and data pipelines
  • Application of Google-style SRE principles, including automation, proactive monitoring, and toil reduction
  • Hands-on experience influencing system design for operability and developing self-healing infrastructure
  • Ability to participate in on-call rotation for incident response

Responsibilities

  • Champion and implement SRE principles, including establishing and monitoring Service Level Objectives (SLOs) and error budgets for critical services; drive initiatives to improve system reliability, availability, performance, and efficiency
  • Design, implement, and maintain advanced monitoring, logging, and tracing solutions for cloud-native applications and infrastructure; develop dashboards, alerts, and runbooks for system health insights
  • Identify and automate repetitive operational tasks and manual processes; develop tools and scripts to enhance operations, deployment pipelines, and incident response
  • Participate in incident response lifecycle (detection, triage, mitigation, resolution); lead blameless postmortems to identify root causes and implement preventative measures
  • Collaborate with development teams to influence design of new services for operability, reliability, and cost-efficiency; identify and address performance bottlenecks and architectural weaknesses
  • Participate in periodic on-call rotation, responding to critical alerts for rapid incident resolution
  • Implement and maintain reliability and observability for critical data pipelines and data infrastructure

Skills

Key technologies and capabilities for this role

SRECloud NativeData PipelinesAutomationProactive MonitoringObservabilitySelf-Healing InfrastructureSystem Design

Questions & Answers

Common questions about this position

What are the main responsibilities of the Staff Site Reliability Engineer?

The role involves championing SRE principles like establishing SLOs and error budgets, designing observability solutions for cloud-native apps and Kubernetes, and automating repetitive operational tasks to reduce toil.

What skills and expertise are required for this SRE position?

Key skills include applying Google-style SRE principles, deep knowledge of observability practices, experience with cloud-native environments like Kubernetes and microservices, and proficiency in automation, monitoring, logging, and tracing.

Is this a remote position or does it require office work?

This information is not specified in the job description.

What is the compensation or salary for this role?

This information is not specified in the job description.

What is the company culture like at Altana AI?

Altana fosters a vibrant, collaborative team environment focused on solving complex problems with global impact, guided by values like value creation, diversity, embracing reality, getting things done, and amazing clients.

Altana AI

AI-powered platform for supply chain management

About Altana AI

Altana.ai offers a platform called Altana Atlas that provides an intelligent view of the global supply chain using artificial intelligence. This platform automates tasks, assists in searches, and enables real-time collaboration among stakeholders. It helps businesses like manufacturers and retailers improve their resilience, sustainability, and compliance while preventing disruptions and addressing forced labor issues. Altana.ai stands out by offering unique insights from over 120 million supply chain links, with a significant portion coming from proprietary data.

New York City, New YorkHeadquarters
2018Year Founded
$314.7MTotal Funding
SERIES_CCompany Stage
Data & Analytics, AI & Machine LearningIndustries
201-500Employees

Benefits

Comprehensive healthcare package
Paid parental leave of 3 months for the primary caregiver and 1 month for the secondary caregiver

Risks

Increased competition from AI-driven platforms may erode Altana's market share.
Rapid AI advancements require continuous R&D investment, straining financial resources.
Stringent data privacy regulations could limit access to proprietary data.

Differentiation

Altana AI offers a unique, intelligent map of the global supply chain.
The platform integrates AI for real-time collaboration and decision-making.
Altana's proprietary data enhances supply chain visibility and risk management.

Upsides

Altana raised $200M, reaching a $1B valuation, indicating strong investor confidence.
Partnerships with Maersk and UK Government expand Altana's market reach and credibility.
Growing demand for supply chain visibility boosts Altana's business potential.

Land your dream remote job 3x faster with AI