[Remote] Machine Learning Engineer, Post Training at Groq

Toronto, Ontario, Canada

Apply Now

Not SpecifiedCompensation

Senior (5 to 8 years)Experience Level

Full TimeJob Type

UnknownVisa

Artificial Intelligence, AI & Machine Learning, Cloud ComputingIndustries

Requirements

Candidates should have 5+ years of experience in machine learning with a focus on model training. Proven experience with transformer-based architectures like LLaMA, Mistral, or Gemma is required, along with a deep understanding of speculative decoding and draft model usage. Hands-on experience with quantization-aware training (QAT) using frameworks like PyTorch QAT is necessary, as is familiarity with open-weight foundation models and continued/pre-training techniques. Proficiency in Python and ML frameworks such as PyTorch, JAX, or TensorFlow is essential. Preferred qualifications include experience optimizing models for fast inference and sampling in production, exposure to distributed training, low-level kernel optimizations, and inference-time system constraints, as well as publications or contributions to open-source ML projects.

Responsibilities

The Machine Learning Engineer will lead pre-training and post-training efforts for draft models tailored to speculative decoding architectures. They will conduct continued training and post-training of open-weight models for standard inference scenarios and implement/optimize quantization-aware training pipelines for low-precision inference. Collaboration with model architecture, inference, and systems teams to evaluate model readiness across training and deployment stages is expected. The role also involves developing tooling and evaluation metrics for training effectiveness, draft model fidelity, and speculative hit-rate optimization, and contributing to experimental designs for novel training regimes and speculative decoding strategies.

Skills

Machine Learning

Model Training

Speculative Decoding

Quantization-Aware Training (QAT)

Transformer Architectures

LLaMA

Mistral

Gemma

PyTorch

Performance Tuning

Inference Optimization

Tooling Development

Evaluation Metrics

Groq

AI inference technology for scalable solutions

About Groq

Groq specializes in AI inference technology, providing the Groq LPU™, which is known for its high compute speed, quality, and energy efficiency. The Groq LPU™ is designed to handle AI processing tasks quickly and effectively, making it suitable for both cloud and on-premises applications. Unlike many competitors, Groq's products are designed, fabricated, and assembled in North America, which helps maintain high standards of quality and performance. The company targets a variety of clients across different industries that require fast and efficient AI processing capabilities. Groq's goal is to deliver scalable AI inference solutions that meet the growing demands for rapid data processing in the AI and machine learning market.

Mountain View, CaliforniaHeadquarters

2016Year Founded

$1,266.5MTotal Funding

SERIES_DCompany Stage

AI & Machine LearningIndustries

201-500Employees

Benefits

Remote Work Options

Company Equity

Risks

Increased competition from SambaNova Systems and Gradio in high-speed AI inference.

Geopolitical risks in the MENA region may affect the Saudi Arabia data center project.

Rapid expansion could strain Groq's operational capabilities and supply chain.

Differentiation

Groq's LPU offers exceptional compute speed and energy efficiency for AI inference.

The company's products are designed and assembled in North America, ensuring high quality.

Groq emphasizes deterministic performance, providing predictable outcomes in AI computations.

Upsides

Groq secured $640M in Series D funding, boosting its expansion capabilities.

Partnership with Aramco Digital aims to build the world's largest inferencing data center.

Integration with Touchcast's Cognitive Caching enhances Groq's hardware for hyper-speed inference.

Land your dream remote job 3x faster with AI

Try Jobo Free