Senior Platform Engineer, Machine Learning
Fieldguide- Full Time
- Senior (5 to 8 years), Mid-level (3 to 4 years)
Candidates must have a Bachelor's degree or higher in Computer Science or a related field and at least 5 years of experience building production infrastructure systems. Expert-level proficiency in Go is required, with Python experience being a plus. Deep expertise with Kubernetes in production environments and extensive experience with major cloud providers such as AWS and GCP are essential. An advanced understanding of distributed systems concepts and performance tuning is necessary, along with proven experience designing observability systems and a track record of leading technical initiatives and mentoring engineers. Experience with ML/AI workloads and MLOps platforms is highly valued.
The Senior Infrastructure Software Engineer will design and architect scalable infrastructure systems for the ML inference platform and lead optimization of Kubernetes deployments for efficient model serving. They will drive enhancements to the inference orchestration layer for complex model deployments and define monitoring strategies for model performance, latency, and resource utilization. The engineer will develop advanced solutions for GPU capacity management and throughput optimization, establish infrastructure automation standards to streamline ML deployment workflows, and partner with other engineers to translate complex inference requirements into technical solutions. They will also make critical architectural decisions balancing performance with system reliability, lead technical discussions, mentor junior engineers on infrastructure best practices, and contribute to the long-term technical strategy and infrastructure roadmap.
Platform for deploying and managing ML models
Baseten provides a platform for deploying and managing machine learning (ML) models, aimed at simplifying the process for businesses. Users can select from a library of open-source foundation models and deploy them with just two clicks, making it easier to implement ML solutions. The platform features autoscaling, which adjusts resources based on demand, and comprehensive monitoring tools for tracking performance and troubleshooting. A key differentiator is Baseten's open-source model packaging framework, Truss, which allows users to package and deploy custom models easily. The company operates on a usage-based pricing model, where clients pay only for the time their models are actively deployed, helping them manage costs effectively.