Senior Site Reliability Engineer - GPU Cloud at NVIDIA

Bengaluru, Karnataka, India

NVIDIA Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, Artificial Intelligence, High-Performance Computing, Cloud ComputingIndustries

Requirements

  • Minimum of 8 years of experience in automating and handling large-scale distributed system software deployments in on-prem/cloud environments
  • Proficiency in any language - Go/Python/Perl/C++/Java/C
  • Strong command on terraform, Kubernetes and cloud infra administration
  • Excellent debugging and troubleshooting skills
  • Ability to design simple and reliable systems that can work without much support
  • Outstanding teammate who can collaborate and influence in a multifaceted environment
  • Excellent interpersonal, and written communication skills
  • M.Sc or B.E in Computer Science or a related technical field involving coding (e.g., physics or mathematics)

Responsibilities

  • Providing scalable and robust service oriented infrastructure automation, monitoring and analytics solutions for NVIDIA's on-prem and cloud based GPU infrastructure
  • Owning the whole life cycle of new tools and services - from requirements gathering, to design documentation, validation and deployment
  • Provide customer support on a rotation basis

Skills

Key technologies and capabilities for this role

SREInfrastructure AutomationMonitoringAnalyticsGPU InfrastructureCloud ComputingOn-Prem InfrastructureHigh-Performance ComputingDistributed ComputingCustomer Support

Questions & Answers

Common questions about this position

What is the salary for this Senior Site Reliability Engineer position?

This information is not specified in the job description.

Is this Senior Site Reliability Engineer role remote or onsite?

This information is not specified in the job description.

What skills are required for the Senior Site Reliability Engineer role?

Candidates need a minimum of 8 years of experience in automating large-scale distributed systems, proficiency in languages like Go/Python/Perl/C++/Java/C, strong command of Terraform, Kubernetes, and cloud infrastructure administration, plus excellent debugging skills.

What is the company culture like for this SRE team at NVIDIA?

The team is fast-paced, dynamic, and dedicated, working closely with development teams on cloud and on-prem infrastructure for high-performance computing, with an emphasis on collaboration in a multifaceted environment.

What makes a candidate stand out for this Senior SRE position?

Stand out by demonstrating ability to decompose complex requirements into simple tasks, proven record of maintaining platform SLAs, integrating unit testing and benchmarking into code, and choosing optimal algorithms for scaling and availability.

NVIDIA

Designs GPUs and AI computing solutions

About NVIDIA

NVIDIA designs and manufactures graphics processing units (GPUs) and system on a chip units (SoCs) for various markets, including gaming, professional visualization, data centers, and automotive. Their products include GPUs tailored for gaming and professional use, as well as platforms for artificial intelligence (AI) and high-performance computing (HPC) that cater to developers, data scientists, and IT administrators. NVIDIA generates revenue through the sale of hardware, software solutions, and cloud-based services, such as NVIDIA CloudXR and NGC, which enhance experiences in AI, machine learning, and computer vision. What sets NVIDIA apart from competitors is its strong focus on research and development, allowing it to maintain a leadership position in a competitive market. The company's goal is to drive innovation and provide advanced solutions that meet the needs of a diverse clientele, including gamers, researchers, and enterprises.

Santa Clara, CaliforniaHeadquarters
1993Year Founded
$19.5MTotal Funding
IPOCompany Stage
Automotive & Transportation, Enterprise Software, AI & Machine Learning, GamingIndustries
10,001+Employees

Benefits

Company Equity
401(k) Company Match

Risks

Increased competition from AI startups like xAI could challenge NVIDIA's market position.
Serve Robotics' expansion may divert resources from NVIDIA's core GPU and AI businesses.
Integration of VinBrain may pose challenges and distract from NVIDIA's primary operations.

Differentiation

NVIDIA leads in AI and HPC solutions with cutting-edge GPU technology.
The company excels in diverse markets, including gaming, data centers, and autonomous vehicles.
NVIDIA's cloud services, like CloudXR, offer scalable solutions for AI and machine learning.

Upsides

Acquisition of VinBrain enhances NVIDIA's AI capabilities in the healthcare sector.
Investment in Nebius Group boosts NVIDIA's AI infrastructure and cloud platform offerings.
Serve Robotics' expansion, backed by NVIDIA, highlights growth in autonomous delivery services.

Land your dream remote job 3x faster with AI