Senior Platform Telemetry Engineer at NVIDIA

Santa Clara, California, United States

NVIDIA Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
Artificial Intelligence, High Performance Computing, SemiconductorsIndustries

Requirements

  • BS, MS, or PhD in EE/CS or related field (or equivalent experience)
  • 5+ years hands-on coding experience
  • Strong knowledge of time series databases like InfluxDB & Prometheus
  • Strong knowledge of building and consuming REST APIs (Redfish is a big plus)
  • Strong knowledge of telemetry visualization solutions like Grafana & Influx
  • Strong knowledge of firmware architecture, optimize firmware for low latency APIs
  • Strong knowledge of analyzing algorithms for time & space complexity and project system resource requirements
  • Proven record of solutions for scalability
  • Strong and demonstrable skill in C/C++ and Python
  • Experience programming and debugging skills for server platforms
  • Experience in SCM (e.g., Git, Perforce) and project management tools like Jira
  • Excellent written and oral communication skills, excellent work ethics, great sense of teamwork, love to produce quality work, and commitment to finish tasks every single day
  • Self-starter who loves to find creative solutions to complicated problems and hands-on with coding

Responsibilities

  • Drive next generation fleet management solutions for scaling AI infrastructure using GPUs and Grace solution from Nvidia
  • Work with customers, product management, and other architects to narrow down requirements for implementation to ensure speed of light product development
  • Bring up clarity on architecture for fleet health monitoring and fault-remediation solution at scale
  • Work with customers and other architects to understand requirements on health monitoring, making best use of available capabilities in-band as well as out of band
  • Detailed architecture, do POCs to validate architecture
  • Educate customers about product architecture and take feedback to make necessary changes
  • Write architecture specs, design documents, and own end-to-end delivery of product by working across teams
  • Do code review for the code produced because of architecture specs
  • Ensure product is properly tested by working with the development team to enhance unit testing and proper test plan in place
  • Drive product life cycles with QA teams to productize the code and be responsible as a product owner
  • Articulate requirements as part of Jira and bug management tools and work out an end-to-end execution plan in collaboration with other managers
  • Contribute to all phases of product development, from product definition, architecture, and design, through implementation, debugging, testing, and early customer support

Skills

Key technologies and capabilities for this role

GPUNVIDIA GraceHPCGenerative AIFleet ManagementTelemetryHealth MonitoringFault RemediationSystem ArchitectureProof of ConceptsCode ReviewUnit TestingQA

Questions & Answers

Common questions about this position

What education and experience are required for this role?

A BS, MS, or PhD in EE/CS or related field (or equivalent experience) and 5+ years of hands-on coding experience are required.

What key technical skills are needed for the Senior Platform Telemetry Engineer position?

Strong knowledge of time series databases like InfluxDB & Prometheus, building and consuming REST APIs (Redfish a plus), telemetry visualization like Grafana & Influx, firmware architecture for low latency APIs, and analyzing algorithms for time & space complexity is required.

What is the salary or compensation for this position?

This information is not specified in the job description.

Is this a remote position or does it require working in an office?

This information is not specified in the job description.

What makes a strong candidate for this role at NVIDIA?

Candidates with a proven record of solutions for scalability, expertise in the listed technical areas, and experience driving end-to-end product development from architecture to delivery stand out.

NVIDIA

Designs GPUs and AI computing solutions

About NVIDIA

NVIDIA designs and manufactures graphics processing units (GPUs) and system on a chip units (SoCs) for various markets, including gaming, professional visualization, data centers, and automotive. Their products include GPUs tailored for gaming and professional use, as well as platforms for artificial intelligence (AI) and high-performance computing (HPC) that cater to developers, data scientists, and IT administrators. NVIDIA generates revenue through the sale of hardware, software solutions, and cloud-based services, such as NVIDIA CloudXR and NGC, which enhance experiences in AI, machine learning, and computer vision. What sets NVIDIA apart from competitors is its strong focus on research and development, allowing it to maintain a leadership position in a competitive market. The company's goal is to drive innovation and provide advanced solutions that meet the needs of a diverse clientele, including gamers, researchers, and enterprises.

Santa Clara, CaliforniaHeadquarters
1993Year Founded
$19.5MTotal Funding
IPOCompany Stage
Automotive & Transportation, Enterprise Software, AI & Machine Learning, GamingIndustries
10,001+Employees

Benefits

Company Equity
401(k) Company Match

Risks

Increased competition from AI startups like xAI could challenge NVIDIA's market position.
Serve Robotics' expansion may divert resources from NVIDIA's core GPU and AI businesses.
Integration of VinBrain may pose challenges and distract from NVIDIA's primary operations.

Differentiation

NVIDIA leads in AI and HPC solutions with cutting-edge GPU technology.
The company excels in diverse markets, including gaming, data centers, and autonomous vehicles.
NVIDIA's cloud services, like CloudXR, offer scalable solutions for AI and machine learning.

Upsides

Acquisition of VinBrain enhances NVIDIA's AI capabilities in the healthcare sector.
Investment in Nebius Group boosts NVIDIA's AI infrastructure and cloud platform offerings.
Serve Robotics' expansion, backed by NVIDIA, highlights growth in autonomous delivery services.

Land your dream remote job 3x faster with AI