Site Reliability Engineering (SRE) Architect - CRL - Germany at Infosys

Munich, Bavaria, Germany

Apply Now

Not SpecifiedCompensation

Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level

Full TimeJob Type

UnknownVisa

TechnologyIndustries

Requirements

10+ years of experience in software engineering, DevOps, or systems engineering, with at least 5 years in a senior SRE or systems architecture role
Expert-level knowledge of at least one major cloud provider (AWS, GCP, or Azure), including core services like compute, storage, networking, and managed databases
Deep, hands-on experience designing and managing large-scale Kubernetes clusters and container-based microservices architectures
Proven expertise in architecting infrastructure with Terraform
Proficiency with configuration management tools like Ansible, Chef, or Puppet
Extensive experience designing and implementing monitoring and observability solutions using tools like Prometheus, Grafana, OpenTelemetry, Jaeger, and the ELK Stack (Elasticsearch, Logstash, Kibana) or similar commercial tools (e.g., Datadog, New Relic)
Strong proficiency in a high-level programming language such as Go or Python

Responsibilities

Architectural Design & Strategy: Design and architect robust, scalable, and fault-tolerant infrastructure and application services on public cloud platforms (AWS, GCP, Azure). Define the long-term vision for system reliability and performance
Reliability Frameworks: Establish and govern the standards for Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets across all engineering teams
Observability & Telemetry: Architect a comprehensive observability strategy. Design the systems for logging, metrics, tracing, and alerting to provide deep insights into system health and facilitate rapid incident response
Automation & Infrastructure as Code (IaC): Lead the strategy for automation and IaC. Design reusable patterns and frameworks using tools like Terraform and Ansible to ensure consistent, repeatable, and secure infrastructure provisioning
Resilience & Chaos Engineering: Proactively identify and mitigate reliability risks. Design and champion the implementation of resilience patterns, disaster recovery plans, and chaos engineering experiments to validate system robustness
Technical Leadership & Mentoring: Act as a thought leader and subject matter expert in reliability engineering. Mentor SREs and developers, evangelize best practices, and lead architectural review sessions to ensure reliability is a core component of every feature
Incident Management Evolution: Analyze major incidents to identify architectural weaknesses and drive the necessary design changes to prevent recurrence. Help evolve postmortem culture and incident response capabilities

Skills

Key technologies and capabilities for this role

AWSGCPAzureTerraformSLOsSLIsError BudgetsObservabilityLoggingMetricsTracingAlertingIaC

Questions & Answers

Common questions about this position

What is the work arrangement for this SRE Architect role?

The position is hybrid.

What experience level is required for this position?

Candidates need 10+ years of experience in software engineering, DevOps.

What key technical skills are needed for the SRE Architect role?

Proficiency in designing scalable infrastructure on public cloud platforms (AWS, GCP, Azure), Infrastructure as Code tools like Terraform and Ansible, and observability systems for logging, metrics, tracing, and alerting is required.

What is the company culture like at Infosys?

Infosys offers an entrepreneurial, high-growth environment with 300,000 employees, encouraging collaboration with expert colleagues, diverse thinking, and contributions across functional business pillars.

What makes a strong candidate for this SRE Architect position?

A visionary leader with 10+ years in software engineering or DevOps, deep expertise in cloud architecture, reliability frameworks like SLOs/SLIs, observability, IaC, chaos engineering, and strong technical leadership skills stands out.