Senior Software Engineer, NIM Production
NVIDIA- Full Time
- Senior (5 to 8 years)
Candidates should possess a Bachelor’s degree in Computer Science, Electrical Engineering, or a related field, or equivalent experience. They must have proven experience managing and supporting HPC infrastructure, particularly in a GPU-intensive environment. Strong familiarity with Linux operating systems, container technologies such as Singularity and Docker, and host management tools like Ansible is required. Experience with HPC job schedulers (Slurm, LSF) and monitoring tools (Prometheus, NVIDIA DCGM) is also necessary. Knowledge of configuring and optimizing RDMA networks and NVMe-backed storage solutions for high-performance computing is beneficial.
The HPC Cluster Engineer will provide ongoing, on-call technical support to resolve issues and ensure minimal downtime, including performing maintenance tasks such as draining impacted nodes and rebooting problematic nodes. They will monitor and report GPU node failures, responding swiftly to minimize impact, and manage user access to the cluster. The role involves updating system settings and software to maintain security and efficiency, proactively scheduling and managing node reboots to optimize performance and stability. They will also improve performance and stability of GPU container solutions, set up and manage monitoring solutions, and actively monitor GPU health. Furthermore, the engineer will be responsible for configuring and optimizing RDMA networks and NVMe-backed storage, and contributing to the overall stability and performance of the HPC cluster.
AI tools for multimedia content creation
Genmo.ai specializes in providing AI tools for generating and editing multimedia content, including images, videos, and presentations. Users can upload images and animate specific parts, like transforming a static sky into a timelapse, or create entire movies by refining ideas, generating scenes, and selecting transitions. The platform caters to both individual content creators and businesses, operating on a subscription model with various service tiers. Genmo.ai differentiates itself by continuously enhancing its technology and focusing on user intent, ensuring that clients have powerful tools to realize their creative projects.