Senior System Software Engineer - Dynamo and Triton Inference Server
NVIDIAFull Time
Senior (5 to 8 years)
Candidates must have a minimum of 7 years of work experience in MLOps or a similar role with a strong focus on high-performance machine learning systems. Proven experience with inference optimization techniques such as quantization, pruning, and model distillation is required, along with deep hands-on experience with hardware acceleration for machine learning, including familiarity with GPUs, TPUs, NPUs and related software ecosystems. Strong experience with AI compilers and runtime environments like TensorRT, OpenVINO, and TVM is necessary. Proven experience deploying and managing ML models on edge devices, strong experience in designing and building distributed systems, and proficiency with inter-process communication protocols like gRPC, message queuing systems like MQTT, and efficient data handling techniques such as buffering and callbacks are essential. Hands-on experience with popular ML frameworks such as PyTorch, TensorFlow, TFLite, and ONNX, proficiency in programming languages including Python and C++, and a solid understanding of machine learning concepts, the ML development lifecycle, and the challenges of deploying models at scale are required. Proficiency with containerization technologies (Docker, Kubernetes) and cloud platforms (AWS, Azure), and expertise in CI/CD principles and tools applied to machine learning workflows are also necessary. A Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field is required.
The Staff AI Engineer will own the full lifecycle of model inference and hardware acceleration, from initial optimization to large-scale deployment. They will design, build, and maintain robust pipelines and runtime environments for deploying and serving machine learning models at the Edge, ensuring high availability, low latency, and efficient resource utilization. Responsibilities include collaborating with researchers and hardware engineers to optimize models for performance, latency, and power consumption on specific hardware, using AI compilers and specialized software stacks to accelerate model execution, and designing, building, and maintaining MLOps pipelines for deploying models to various edge devices with a focus on performance and efficiency constraints. The role involves implementing and maintaining monitoring and alerting systems, working with cloud platforms and on-device environments to provision and manage infrastructure, and proactively identifying and resolving issues related to model performance, deployment failures, and data discrepancies. The engineer will work closely with Machine Learning Engineers, Software Engineers, and Product Managers to bring models from design to high-performance production systems.
Platform for software-defined vehicle development
Sonatus provides a platform for developing software-defined vehicles, focusing on a no-code solution that allows automotive companies to create flexible software architectures. This platform enables the collection and analysis of real-time diagnostic data from vehicles, which helps manufacturers like Hyundai Motor Group to continuously enhance vehicle quality and the ownership experience. Unlike competitors, Sonatus offers a comprehensive solution that spans the entire vehicle lifecycle, from design to after-sales services, allowing for faster innovation and cost reduction. The goal of Sonatus is to drive continuous improvement in automotive software, capitalizing on the increasing demand for software-defined vehicles globally.