DL System Software Engineer - AI Platform at NVIDIA

Toronto, Ontario, Canada

NVIDIA Logo
Not SpecifiedCompensation
Mid-level (3 to 4 years), Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, AIIndustries

Requirements

  • Bachelor's degree or equivalent experience in Computer Science, Computer Engineering, or relevant technical field
  • 5+ years of experience
  • Experience building large scale systems from scratch
  • Prior experience in container-based deployment systems like Kubernetes is beneficial
  • Strong coding skills in programming languages like Python, Go, Rust and/or C/C++
  • Solid foundation in computer science and computer engineering topics: algorithms and data structures, operating systems, computer architecture, etc
  • Strong understanding of AI and related technologies is a huge plus
  • Ability to quickly grasp new concepts and thrive in evolving situations
  • Ways to stand out
  • Graduate-level education or relevant practical background, particularly in research, is beneficial
  • Practical experience in building and optimizing AI applications is highly desired
  • Proficiency in container software such as containerd, CRI-O, Linux namespace, CRIU, and NVIDIA GPU technology such as CUDA graphs, Driver/runtime is greatly advantageous

Responsibilities

  • Taking part in the development of NVIDIA's AI platform for training, fine-tuning and serving latest AI models with the best performance and efficiency
  • Designing and building solutions for scheduling large scale AI training and inference workloads on GPU clusters over many cloud infrastructure
  • Exploring and finding solutions for open problems like industry-scale resource management, GPU scheduling, performance prediction, and live workload migration
  • Work with and contribute to adjacent teams like TensorRT/Dynamo inference engine, ML compiler, KAI/Grove scheduler, Lepton cloud etc

Skills

Key technologies and capabilities for this role

PythonGoRustC/C++KubernetesTensorRTML CompilerGPU ClustersCluster SchedulerOperating SystemsAlgorithmsData StructuresComputer Architecture

Questions & Answers

Common questions about this position

What is the base salary range for this position?

The base salary range is 116,250 CAD - 201,500 CAD, determined based on location, experience, and pay of employees in similar positions.

What benefits are offered with this role?

You will be eligible for equity and benefits.

Is this a remote position or does it require office work?

This information is not specified in the job description.

What skills and experience are required for this role?

Required qualifications include a Bachelor's degree in Computer Science or related field, 5+ years of experience building large scale systems, strong coding skills in Python, Go, Rust and/or C/C++, and solid foundation in algorithms, data structures, operating systems, and computer architecture. Experience with Kubernetes is beneficial, and strong understanding of AI technologies is a plus.

What makes a candidate stand out for this position?

Candidates stand out with graduate-level education or research background, practical experience building and optimizing AI applications, and proficiency in container software like containerd, CRI-O, Linux namespace, CRIU, and NVIDIA GPU technologies such as CUDA graphs and Driver/runtime.

NVIDIA

Designs GPUs and AI computing solutions

About NVIDIA

NVIDIA designs and manufactures graphics processing units (GPUs) and system on a chip units (SoCs) for various markets, including gaming, professional visualization, data centers, and automotive. Their products include GPUs tailored for gaming and professional use, as well as platforms for artificial intelligence (AI) and high-performance computing (HPC) that cater to developers, data scientists, and IT administrators. NVIDIA generates revenue through the sale of hardware, software solutions, and cloud-based services, such as NVIDIA CloudXR and NGC, which enhance experiences in AI, machine learning, and computer vision. What sets NVIDIA apart from competitors is its strong focus on research and development, allowing it to maintain a leadership position in a competitive market. The company's goal is to drive innovation and provide advanced solutions that meet the needs of a diverse clientele, including gamers, researchers, and enterprises.

Santa Clara, CaliforniaHeadquarters
1993Year Founded
$19.5MTotal Funding
IPOCompany Stage
Automotive & Transportation, Enterprise Software, AI & Machine Learning, GamingIndustries
10,001+Employees

Benefits

Company Equity
401(k) Company Match

Risks

Increased competition from AI startups like xAI could challenge NVIDIA's market position.
Serve Robotics' expansion may divert resources from NVIDIA's core GPU and AI businesses.
Integration of VinBrain may pose challenges and distract from NVIDIA's primary operations.

Differentiation

NVIDIA leads in AI and HPC solutions with cutting-edge GPU technology.
The company excels in diverse markets, including gaming, data centers, and autonomous vehicles.
NVIDIA's cloud services, like CloudXR, offer scalable solutions for AI and machine learning.

Upsides

Acquisition of VinBrain enhances NVIDIA's AI capabilities in the healthcare sector.
Investment in Nebius Group boosts NVIDIA's AI infrastructure and cloud platform offerings.
Serve Robotics' expansion, backed by NVIDIA, highlights growth in autonomous delivery services.

Land your dream remote job 3x faster with AI