Head of Engineering
LiveFlowFull Time
Expert & Leadership (9+ years)
Candidates must have a proven track record of shipping a production SDK and compiler stack for an ML accelerator, GPU, or NPU. A minimum of 5 years of experience managing teams that build compilers or ML systems, with over 10 years leading multi-disciplinary teams of approximately 30+ engineers, is required. Excellent communication and stakeholder management skills are essential, including comfort with customer escalations and field feedback loops. Demonstrated success in hiring, coaching, and performance management of managers and senior individual contributors, along with being a culture builder, is necessary. Familiarity with graph compilers and kernel development, including MLIR/TVM/StableHLO/HLO, LLVM, scheduling, and codegen, is expected. Knowledge of quantization techniques such as PTQ/QAT, per-tensor/per-channel schemes, symmetric/asymmetric, calibration datasets, and accuracy/performance trade-offs is required. Strong C++ and Python fundamentals, including performance profiling, vectorization, memory hierarchies, and concurrency, are essential. Experience integrating PyTorch backends (Dynamo/FX/Inductor/ONNX) and model export is considered a plus.
The Director of Software will own the end-to-end delivery of the software stack, including the SDK, graph compiler, kernel libraries, and developer tooling. They will set the technical direction and a multi-release roadmap for the compiler, kernels, and SDK, ensuring alignment with silicon and architecture. Responsibilities include owning the inference optimization stack (Quantization -> Compile -> Accelerated Performance) on the Chimera architecture for various model types. The role involves building and tracking execution plans, milestones, and KPIs, such as operator coverage, latency/throughput, accuracy deltas, and compile times. The Director will manage and mentor an engineering organization, grow the team, and develop leaders. They will also be hands-on for critical designs, reviews, and code, particularly around IR design, codegen, and kernels. Partnering with customers and field teams to unblock POCs, prioritize the roadmap, and drive production wins is a key responsibility. Establishing quality bars and release engineering processes, including CI/CD, testing, benchmarking, reproducible builds, documentation, and samples, is also expected.
Simplifies SoC design for machine learning
Quadric focuses on simplifying the design and programming of System on Chips (SoCs) specifically for machine learning applications. Their main product is the Chimera, a General-Purpose Neural Processing Unit (GPNPU) that combines matrix and vector operations with scalar control code in a single execution pipeline. This design allows developers to avoid splitting application code across different processors, making the development process more efficient. Quadric serves clients in the semiconductor industry, including SoC developers and manufacturers, who need to improve their machine learning capabilities. Unlike competitors, Quadric offers a comprehensive solution that includes both hardware and software tools, such as the Chimera LLVM C Compiler and the Chimera Instruction Set Simulator, enabling developers to design, simulate, and deploy their applications effectively. The goal of Quadric is to enhance the performance and ease of development for machine learning applications on SoCs.