AI Engineer & Researcher, Inference
SpeechifyFull Time
Junior (1 to 2 years)
Candidates should possess a Bachelor’s or Master’s degree in Computer Science or a related field, along with a minimum of four years of Machine Learning engineering experience, with at least two years specifically focused on model serving. Production experience with high-performance model serving frameworks such as vLLM, SGLang, or TensorRT-LLM is required, alongside strong Python proficiency and experience with PyTorch. Familiarity with model compilation and optimization techniques, including TensorRT, ONNX, and quantization, is also necessary. A proven track record of building inference systems at scale, handling 10,000+ queries per second (QPS), is expected. Knowledge of attention mechanisms and transformer architectures, as well as GPU architecture and memory management, is valued.
The ML Systems Engineer will design and implement high-performance model serving infrastructure to handle millions of requests daily, utilizing streaming, batching, and multi-modal inputs. They will develop automated model compilation and optimization pipelines, optimizing serving systems for throughput, latency, and GPU utilization. The role involves monitoring and observability of model-specific metrics, collaborating with researchers to transition models to production, and implementing A/B testing and canary deployments. Furthermore, the engineer will integrate the serving layer with platform infrastructure and contribute to advanced serving optimizations, including continuous batching and streaming generation patterns.
AI tools for multimedia content creation
Genmo.ai specializes in providing AI tools for generating and editing multimedia content, including images, videos, and presentations. Users can upload images and animate specific parts, like transforming a static sky into a timelapse, or create entire movies by refining ideas, generating scenes, and selecting transitions. The platform caters to both individual content creators and businesses, operating on a subscription model with various service tiers. Genmo.ai differentiates itself by continuously enhancing its technology and focusing on user intent, ensuring that clients have powerful tools to realize their creative projects.