Senior System Software Engineer, Enterprise MODS at NVIDIA

Santa Clara, California, United States

NVIDIA Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
AI, HPC, Cloud Computing, Data CenterIndustries

Requirements

  • Proven experience architecting diagnostics for complex server systems, especially at the SW/HW interface
  • Deep systems knowledge: x86/ARM architectures, Linux/Windows OS internals, firmware (UEFI/BIOS), BMC, and platform security
  • Ability to weigh tradeoffs in system development and drive the most optimum solutions with customers and multi-disciplinary teams
  • Expertise in programming languages like C, C++, and Python for tool development and automation
  • Familiarity with high-speed interconnects such as PCIe, Infiniband, NVLink, and Ethernet
  • Strong communication skills to engage with technical and executive teams
  • BS/MS or equivalent experience in Computer Science, Electrical Engineering, or related field
  • 12+ years of engineering experience in diagnostics, embedded systems, or cloud platforms

Responsibilities

  • Develop diagnostic systems for NVIDIA data center platforms, which involve hardware and software tools to develop the worst case stress workloads for CPUs, GPUs, memory, storage, and interconnects
  • Lead platform bring-up and integration, ensuring diagnostics are embedded early and effectively across the server lifecycle
  • Drive hardware validation strategy in collaboration with architecture and hardware teams, crafting robust validation plans for new server generations
  • Analyze root causes of complex failures, acting as a Level 2 engineering contact for critical issues and offering scalable solutions across the stack
  • Develop diagnostics software to ensure quality and performance at scale across ODM and partner production lines
  • Mentor and grow engineering teams, providing technical leadership and encouraging a culture of innovation and excellence
  • Influence the long-term strategy by developing diagnostic architecture and roadmaps for the upcoming products of NVIDIA and its partners

Skills

Key technologies and capabilities for this role

Diagnostic SystemsHardware ValidationPlatform Bring-upStress WorkloadsRoot Cause AnalysisGPU DiagnosticsCPU TestingMemory DiagnosticsStorage TestingInterconnect ValidationODM IntegrationServer Platforms

Questions & Answers

Common questions about this position

What programming languages are required for this role?

Expertise in C, C++, and Python is required for tool development and automation.

What experience level is needed for this position?

Candidates need 12+ years of engineering experience in diagnostics, embedded systems, or cloud platforms, along with a BS/MS or equivalent in Computer Science, Electrical Engineering, or related field.

What is the salary or compensation for this role?

This information is not specified in the job description.

Is this a remote position, or is there a location requirement?

This information is not specified in the job description.

What does NVIDIA's company culture look like for this role?

NVIDIA offers a diverse, supportive environment where everyone is inspired to do their best work, with emphasis on mentoring teams and encouraging a culture of innovation and excellence.

What skills or experience make a candidate stand out?

Experience driving diagnostics across rack-level or cluster-level deployments and background in cloud-scale infrastructure and partner ecosystems will help candidates stand out.

NVIDIA

Designs GPUs and AI computing solutions

About NVIDIA

NVIDIA designs and manufactures graphics processing units (GPUs) and system on a chip units (SoCs) for various markets, including gaming, professional visualization, data centers, and automotive. Their products include GPUs tailored for gaming and professional use, as well as platforms for artificial intelligence (AI) and high-performance computing (HPC) that cater to developers, data scientists, and IT administrators. NVIDIA generates revenue through the sale of hardware, software solutions, and cloud-based services, such as NVIDIA CloudXR and NGC, which enhance experiences in AI, machine learning, and computer vision. What sets NVIDIA apart from competitors is its strong focus on research and development, allowing it to maintain a leadership position in a competitive market. The company's goal is to drive innovation and provide advanced solutions that meet the needs of a diverse clientele, including gamers, researchers, and enterprises.

Santa Clara, CaliforniaHeadquarters
1993Year Founded
$19.5MTotal Funding
IPOCompany Stage
Automotive & Transportation, Enterprise Software, AI & Machine Learning, GamingIndustries
10,001+Employees

Benefits

Company Equity
401(k) Company Match

Risks

Increased competition from AI startups like xAI could challenge NVIDIA's market position.
Serve Robotics' expansion may divert resources from NVIDIA's core GPU and AI businesses.
Integration of VinBrain may pose challenges and distract from NVIDIA's primary operations.

Differentiation

NVIDIA leads in AI and HPC solutions with cutting-edge GPU technology.
The company excels in diverse markets, including gaming, data centers, and autonomous vehicles.
NVIDIA's cloud services, like CloudXR, offer scalable solutions for AI and machine learning.

Upsides

Acquisition of VinBrain enhances NVIDIA's AI capabilities in the healthcare sector.
Investment in Nebius Group boosts NVIDIA's AI infrastructure and cloud platform offerings.
Serve Robotics' expansion, backed by NVIDIA, highlights growth in autonomous delivery services.

Land your dream remote job 3x faster with AI