AI Inference Engineer at Quadric

Burlingame, California, United States

Quadric Logo
Not SpecifiedCompensation
Senior (5 to 8 years)Experience Level
Full TimeJob Type
UnknownVisa
AI, Semiconductors, Edge ComputingIndustries

Requirements

  • Bachelor’s or Master’s in Computer Science and/or Electrical Engineering
  • 5+ years of experience in AI/LLM model inference and deployment frameworks/tools
  • Experience with model quantization (PTQ, QAT) and tools
  • Experience with model accuracy measures
  • Experience with model inference performance profiling
  • Experience with at least one of the following frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, llamacpp
  • Proficiency in C/C++ and Python
  • Demonstrate good capability in problem solving, debug and communication

Responsibilities

  • Quantize, prune and convert models for deployment
  • Port models to Quadric platform using Quadric toolchain
  • Optimize inference deployment for latency, speed
  • Benchmark and profile model performance and accuracy
  • Develop tools to scale and speed up the deployment
  • Make improvements to SDK and runtime
  • Provide technical support and documents to customers and developer community

Skills

Key technologies and capabilities for this role

AI InferenceModel QuantizationPTQQATModel PruningONNXRuntimePyTorchvLLMHuggingFace TransformersNeural CompressorLlamaCPPC/C++PythonModel ProfilingBenchmarkingSDK Development

Questions & Answers

Common questions about this position

What benefits does Quadric offer?

Quadric provides a comprehensive benefits package including Health Care Plan (Medical, Dental & Vision), Retirement Plan (401k, IRA), Life Insurance, Paid Time Off, Family Leave, Short Term & Long Term Disability, Training & Development, Work From Home, Free Food & Snacks, and Stock Option Plan.

Is this position remote or hybrid?

The position is hybrid, requiring some office presence as remote is not fully supported.

What skills and experience are required for the AI Inference Engineer role?

Candidates need 5+ years in AI/LLM model inference and deployment, experience with model quantization (PTQ, QAT), accuracy measurement, performance profiling, frameworks like onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, or llamacpp, plus proficiency in C/C++ and Python.

What is the salary range for this position?

This information is not specified in the job description.

What makes a strong candidate for this AI Inference Engineer role?

A strong candidate holds a Bachelor’s or Master’s in Computer Science or Electrical Engineering, has 5+ years of AI/LLM inference experience, expertise in model optimization techniques like quantization, proficiency in C/C++ and Python, and strong problem-solving, debugging, and communication skills.

Quadric

Simplifies SoC design for machine learning

About Quadric

Quadric focuses on simplifying the design and programming of System on Chips (SoCs) specifically for machine learning applications. Their main product is the Chimera, a General-Purpose Neural Processing Unit (GPNPU) that combines matrix and vector operations with scalar control code in a single execution pipeline. This design allows developers to avoid splitting application code across different processors, making the development process more efficient. Quadric serves clients in the semiconductor industry, including SoC developers and manufacturers, who need to improve their machine learning capabilities. Unlike competitors, Quadric offers a comprehensive solution that includes both hardware and software tools, such as the Chimera LLVM C Compiler and the Chimera Instruction Set Simulator, enabling developers to design, simulate, and deploy their applications effectively. The goal of Quadric is to enhance the performance and ease of development for machine learning applications on SoCs.

Burlingame, CaliforniaHeadquarters
2017Year Founded
$42.1MTotal Funding
DEBTCompany Stage
Hardware, Enterprise Software, AI & Machine LearningIndustries
51-200Employees

Benefits

Health Insurance
Dental Insurance
Vision Insurance
401(k) Retirement Plan
Company Equity

Risks

Increased competition from established semiconductor companies threatens market share.
Rapid AI advancements may outpace Quadric's current product offerings.
Potential IP disputes could lead to costly legal battles.

Differentiation

Chimera GPNPU integrates matrix, vector, and scalar operations in one pipeline.
Quadric's GPNPU runs C code, enhancing versatility for neural network operations.
Quadric offers a unified architecture for ML inference and C++ processing.

Upsides

Quadric's Series B funding boosts expansion of engineering and commercial teams.
Partnership with Ams Osram enhances smart sensing solutions for edge applications.
Quadric Developer Studio simplifies AI and ML application development.

Land your dream remote job 3x faster with AI