[Remote] Senior Site Reliability Engineer at NVIDIA

Switzerland

NVIDIA Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, Artificial Intelligence, Cloud ComputingIndustries

Requirements

  • BS or MS in Computer Science or equivalent program from an accredited University/College
  • 8+ years of hands-on software engineering or equivalent experience
  • Demonstrate understanding of cloud design in the areas of virtualization and global infrastructure, distributed systems, and security
  • Expertise in Kubernetes (K8s) & KubeVirt and building RESTful web services
  • Understanding of building AI Agentic solutions preferably Nvidia open source AI solutions
  • Demonstrate working experiences in SRE principles like metrics emission for observability, monitoring, alerting using logs, traces and metrics
  • Hands on experience working with Docker, Containers and Infrastructure as Code like terraform deployment CI/CD
  • Exhibit knowledge in concepts of working with CSPs, for example: AWS (Fargate, EC2, IAM, ECR, EKS, Route53 etc...), Azure etc

Responsibilities

  • Design, build, and implement scalable cloud-based systems for PaaS/IaaS
  • Work closely with other teams on new products or features/improvements of existing products
  • Develop, maintain and improve cloud deployment of our software
  • Participate in the triage & resolution of complex infra-related issues
  • Collaborate with developers, QA and Product teams to establish, refine and streamline our software release process, software observability to ensure service operability, reliability, availability
  • Maintain services once live by measuring and monitoring availability, latency, and overall system health using metrics, logs, and traces
  • Develop, maintain and improve automation tools that can help improve efficiency of SRE operations
  • Practice balanced incident response and blameless postmortems
  • Be part of an on-call rotation to support production systems
  • Play a crucial role in ensuring the success of the Omniverse on DGX Cloud platform by helping to build deployment infrastructure processes, creating world-class SRE measurement and creating automation tools to improve efficiency of operations, and maintaining a high standard of perfection in service operability and reliability

Skills

Site Reliability Engineering
Cloud Infrastructure
PaaS
IaaS
Deployment Automation
Scalable Systems
Monitoring
Triage
GPU Cloud
Automation Tools

NVIDIA

Designs GPUs and AI computing solutions

About NVIDIA

NVIDIA designs and manufactures graphics processing units (GPUs) and system on a chip units (SoCs) for various markets, including gaming, professional visualization, data centers, and automotive. Their products include GPUs tailored for gaming and professional use, as well as platforms for artificial intelligence (AI) and high-performance computing (HPC) that cater to developers, data scientists, and IT administrators. NVIDIA generates revenue through the sale of hardware, software solutions, and cloud-based services, such as NVIDIA CloudXR and NGC, which enhance experiences in AI, machine learning, and computer vision. What sets NVIDIA apart from competitors is its strong focus on research and development, allowing it to maintain a leadership position in a competitive market. The company's goal is to drive innovation and provide advanced solutions that meet the needs of a diverse clientele, including gamers, researchers, and enterprises.

Santa Clara, CaliforniaHeadquarters
1993Year Founded
$19.5MTotal Funding
IPOCompany Stage
Automotive & Transportation, Enterprise Software, AI & Machine Learning, GamingIndustries
10,001+Employees

Benefits

Company Equity
401(k) Company Match

Risks

Increased competition from AI startups like xAI could challenge NVIDIA's market position.
Serve Robotics' expansion may divert resources from NVIDIA's core GPU and AI businesses.
Integration of VinBrain may pose challenges and distract from NVIDIA's primary operations.

Differentiation

NVIDIA leads in AI and HPC solutions with cutting-edge GPU technology.
The company excels in diverse markets, including gaming, data centers, and autonomous vehicles.
NVIDIA's cloud services, like CloudXR, offer scalable solutions for AI and machine learning.

Upsides

Acquisition of VinBrain enhances NVIDIA's AI capabilities in the healthcare sector.
Investment in Nebius Group boosts NVIDIA's AI infrastructure and cloud platform offerings.
Serve Robotics' expansion, backed by NVIDIA, highlights growth in autonomous delivery services.

Land your dream remote job 3x faster with AI