Senior Platform Telemetry Engineer at NVIDIA

Santa Clara, California, United States

NVIDIA Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Artificial Intelligence, High Performance Computing, SemiconductorsIndustries

Requirements

  • BS, MS, or PhD in EE/CS or related field (or equivalent experience)
  • 5+ years hands-on coding experience
  • Strong knowledge of time series databases like InfluxDB & Prometheus
  • Strong knowledge of building and consuming REST APIs (Redfish is a big plus)
  • Strong knowledge of telemetry visualization solutions like Grafana & Influx
  • Strong knowledge of firmware architecture, optimize firmware for low latency APIs
  • Strong knowledge of analyzing algorithms for time & space complexity and project system resource requirements
  • Proven record of solutions for scalability
  • Strong and demonstrable skill in C/C++ and Python
  • Experience programming and debugging skills for server platforms
  • Experience in SCM (e.g., Git, Perforce) and project management tools like Jira
  • Excellent written and oral communication skills, excellent work ethics, great sense of teamwork, love to produce quality work, and commitment to finish tasks every single day
  • Self-starter who loves to find creative solutions to complicated problems and hands-on with coding

Responsibilities

  • Drive next generation fleet management solutions for scaling AI infrastructure using GPUs and Grace solution from Nvidia
  • Work with customers, product management, and other architects to narrow down requirements for implementation to ensure speed of light product development
  • Bring up clarity on architecture for fleet health monitoring and fault-remediation solution at scale
  • Work with customers and other architects to understand requirements on health monitoring, making best use of available capabilities in-band as well as out of band
  • Detailed architecture, do POCs to validate architecture
  • Educate customers about product architecture and take feedback to make necessary changes
  • Write architecture specs, design documents, and own end-to-end delivery of product by working across teams
  • Do code review for the code produced because of architecture specs
  • Ensure product is properly tested by working with the development team to enhance unit testing and proper test plan in place
  • Drive product life cycles with QA teams to productize the code and be responsible as a product owner
  • Articulate requirements as part of Jira and bug management tools and work out an end-to-end execution plan in collaboration with other managers
  • Contribute to all phases of product development, from product definition, architecture, and design, through implementation, debugging, testing, and early customer support

Skills

GPU
NVIDIA Grace
HPC
Generative AI
Fleet Management
Telemetry
Health Monitoring
Fault Remediation
System Architecture
Proof of Concepts
Code Review
Unit Testing
QA

NVIDIA

Designs GPUs and AI computing solutions

About NVIDIA

NVIDIA designs and manufactures graphics processing units (GPUs) and system on a chip units (SoCs) for various markets, including gaming, professional visualization, data centers, and automotive. Their products include GPUs tailored for gaming and professional use, as well as platforms for artificial intelligence (AI) and high-performance computing (HPC) that cater to developers, data scientists, and IT administrators. NVIDIA generates revenue through the sale of hardware, software solutions, and cloud-based services, such as NVIDIA CloudXR and NGC, which enhance experiences in AI, machine learning, and computer vision. What sets NVIDIA apart from competitors is its strong focus on research and development, allowing it to maintain a leadership position in a competitive market. The company's goal is to drive innovation and provide advanced solutions that meet the needs of a diverse clientele, including gamers, researchers, and enterprises.

Santa Clara, CaliforniaHeadquarters
1993Year Founded
$19.5MTotal Funding
IPOCompany Stage
Automotive & Transportation, Enterprise Software, AI & Machine Learning, GamingIndustries
10,001+Employees

Benefits

Company Equity
401(k) Company Match

Risks

Increased competition from AI startups like xAI could challenge NVIDIA's market position.
Serve Robotics' expansion may divert resources from NVIDIA's core GPU and AI businesses.
Integration of VinBrain may pose challenges and distract from NVIDIA's primary operations.

Differentiation

NVIDIA leads in AI and HPC solutions with cutting-edge GPU technology.
The company excels in diverse markets, including gaming, data centers, and autonomous vehicles.
NVIDIA's cloud services, like CloudXR, offer scalable solutions for AI and machine learning.

Upsides

Acquisition of VinBrain enhances NVIDIA's AI capabilities in the healthcare sector.
Investment in Nebius Group boosts NVIDIA's AI infrastructure and cloud platform offerings.
Serve Robotics' expansion, backed by NVIDIA, highlights growth in autonomous delivery services.

Land your dream remote job 3x faster with AI