Senior AI Software Engineer, LLM Inference Performance Analysis at NVIDIA

Santa Clara, California, United States

NVIDIA Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
Technology, AIIndustries

Requirements

  • Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience
  • 5+ years relevant experience
  • Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals
  • Skilled in innovative LLM architectures, covering inference optimization, profiling, and compiler-level performance tuning
  • Significant background in optimizing kernels through information retrieval techniques and generating code, including graph transformations, fusion, scheduling, and developing custom kernel generation frameworks like OpenAI Triton or other compiler-based code generation pipelines
  • Hands-on experience with deep learning frameworks like TensorRT-LLM, vLLM, SGLang, Jax/XLA, or related compiler/runtime environments
  • Proven ability to analyze and optimize LLM performance bottlenecks across model development, kernel execution, and runtime systems
  • Excellent communication and collaboration skills, with the ability to work independently and effectively across distributed teams in a fast-paced environment
  • Display a robust determination to continuously improve software and hardware performance by engaging in profiling, analysis, and optimization
  • Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques
  • Ways to stand out
  • Showcase innovative applications of agentic AI tools that enhance productivity and workflow automation
  • Proven background in LLVM, MLIR, and/or Clang compiler development
  • Active engagement with the open-source LLVM or MLIR community to ensure tighter integration and alignment with upstream efforts

Responsibilities

  • Analyze the performance of LLMs on NVIDIA GPUs by employing advanced profiling and projection tools
  • Find opportunities for performance improvements in the IR-based compiler middle end optimizer and/or in precompiled kernel optimizations driven by Graph IR transformations
  • Build and develop new compiler passes and optimization techniques to deliver outstanding, robust, and maintainable compiler infrastructure and tools
  • Collaborate closely with architecture teams to influence and co-design future hardware features that improve compiler and runtime efficiency
  • Work with geographically distributed teams across compiler, hardware, kernel, and framework domains to drive performance improvements and resolve complex issues
  • Contribute to a core team at the forefront of deep learning and LLM inference technology, spanning hardware architecture development, kernel optimization, and integration with higher-level deep learning frameworks

Skills

C++
Python
LLM Inference
Performance Profiling
GPU Optimization
Compiler IR
Kernel Optimization
Graph IR
Compiler Passes
Deep Learning

NVIDIA

Designs GPUs and AI computing solutions

About NVIDIA

NVIDIA designs and manufactures graphics processing units (GPUs) and system on a chip units (SoCs) for various markets, including gaming, professional visualization, data centers, and automotive. Their products include GPUs tailored for gaming and professional use, as well as platforms for artificial intelligence (AI) and high-performance computing (HPC) that cater to developers, data scientists, and IT administrators. NVIDIA generates revenue through the sale of hardware, software solutions, and cloud-based services, such as NVIDIA CloudXR and NGC, which enhance experiences in AI, machine learning, and computer vision. What sets NVIDIA apart from competitors is its strong focus on research and development, allowing it to maintain a leadership position in a competitive market. The company's goal is to drive innovation and provide advanced solutions that meet the needs of a diverse clientele, including gamers, researchers, and enterprises.

Santa Clara, CaliforniaHeadquarters
1993Year Founded
$19.5MTotal Funding
IPOCompany Stage
Automotive & Transportation, Enterprise Software, AI & Machine Learning, GamingIndustries
10,001+Employees

Benefits

Company Equity
401(k) Company Match

Risks

Increased competition from AI startups like xAI could challenge NVIDIA's market position.
Serve Robotics' expansion may divert resources from NVIDIA's core GPU and AI businesses.
Integration of VinBrain may pose challenges and distract from NVIDIA's primary operations.

Differentiation

NVIDIA leads in AI and HPC solutions with cutting-edge GPU technology.
The company excels in diverse markets, including gaming, data centers, and autonomous vehicles.
NVIDIA's cloud services, like CloudXR, offer scalable solutions for AI and machine learning.

Upsides

Acquisition of VinBrain enhances NVIDIA's AI capabilities in the healthcare sector.
Investment in Nebius Group boosts NVIDIA's AI infrastructure and cloud platform offerings.
Serve Robotics' expansion, backed by NVIDIA, highlights growth in autonomous delivery services.

Land your dream remote job 3x faster with AI