Software Engineer - Model Performance
Baseten- Full Time
- Mid-level (3 to 4 years), Senior (5 to 8 years)
Candidates should have at least 5 years of experience writing high-quality, high-performance code. Experience with high-level ML frameworks and inference engines such as torch, vLLM, or TensorRT is required. Familiarity with Nvidia GPU architecture and CUDA is essential, along with experience in ML performance engineering, particularly in boosting GPU performance and debugging issues. Familiarity with low-level operating system foundations like the Linux kernel, file systems, and containers is a plus. Candidates must be able to work in-person at the NYC, San Francisco, or Stockholm office.
The Member of Technical Staff will focus on enhancing the performance of ML systems at scale. They will contribute to open-source projects and Modal’s container runtime, aiming to improve throughput and reduce latency for language and diffusion models.
Employee training and skill development platform
Modal Learning focuses on improving employee performance through skill development for businesses. Their main product, the Modal Mastery Platform, uses active learning techniques, including live cohort sessions, labs, and one-on-one coaching, to help employees engage with the material effectively. Unlike competitors, Modal Learning offers a subscription model that provides structured eight-week training programs, aligning skill development with organizational goals. The company's goal is to empower employees and help organizations retain talent by providing clear career development paths.