Senior Site Reliability Engineer - GPU Cloud at NVIDIA

Bengaluru, Karnataka, India

NVIDIA Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, Artificial Intelligence, High-Performance Computing, Cloud ComputingIndustries

Requirements

  • Minimum of 8 years of experience in automating and handling large-scale distributed system software deployments in on-prem/cloud environments
  • Proficiency in any language - Go/Python/Perl/C++/Java/C
  • Strong command on terraform, Kubernetes and cloud infra administration
  • Excellent debugging and troubleshooting skills
  • Ability to design simple and reliable systems that can work without much support
  • Outstanding teammate who can collaborate and influence in a multifaceted environment
  • Excellent interpersonal, and written communication skills
  • M.Sc or B.E in Computer Science or a related technical field involving coding (e.g., physics or mathematics)

Responsibilities

  • Providing scalable and robust service oriented infrastructure automation, monitoring and analytics solutions for NVIDIA's on-prem and cloud based GPU infrastructure
  • Owning the whole life cycle of new tools and services - from requirements gathering, to design documentation, validation and deployment
  • Provide customer support on a rotation basis

Skills

SRE
Infrastructure Automation
Monitoring
Analytics
GPU Infrastructure
Cloud Computing
On-Prem Infrastructure
High-Performance Computing
Distributed Computing
Customer Support

NVIDIA

Designs GPUs and AI computing solutions

About NVIDIA

NVIDIA designs and manufactures graphics processing units (GPUs) and system on a chip units (SoCs) for various markets, including gaming, professional visualization, data centers, and automotive. Their products include GPUs tailored for gaming and professional use, as well as platforms for artificial intelligence (AI) and high-performance computing (HPC) that cater to developers, data scientists, and IT administrators. NVIDIA generates revenue through the sale of hardware, software solutions, and cloud-based services, such as NVIDIA CloudXR and NGC, which enhance experiences in AI, machine learning, and computer vision. What sets NVIDIA apart from competitors is its strong focus on research and development, allowing it to maintain a leadership position in a competitive market. The company's goal is to drive innovation and provide advanced solutions that meet the needs of a diverse clientele, including gamers, researchers, and enterprises.

Santa Clara, CaliforniaHeadquarters
1993Year Founded
$19.5MTotal Funding
IPOCompany Stage
Automotive & Transportation, Enterprise Software, AI & Machine Learning, GamingIndustries
10,001+Employees

Benefits

Company Equity
401(k) Company Match

Risks

Increased competition from AI startups like xAI could challenge NVIDIA's market position.
Serve Robotics' expansion may divert resources from NVIDIA's core GPU and AI businesses.
Integration of VinBrain may pose challenges and distract from NVIDIA's primary operations.

Differentiation

NVIDIA leads in AI and HPC solutions with cutting-edge GPU technology.
The company excels in diverse markets, including gaming, data centers, and autonomous vehicles.
NVIDIA's cloud services, like CloudXR, offer scalable solutions for AI and machine learning.

Upsides

Acquisition of VinBrain enhances NVIDIA's AI capabilities in the healthcare sector.
Investment in Nebius Group boosts NVIDIA's AI infrastructure and cloud platform offerings.
Serve Robotics' expansion, backed by NVIDIA, highlights growth in autonomous delivery services.

Land your dream remote job 3x faster with AI