Truveta

Machine Learning Engineer - LLMs & Generative AI

Seattle, Washington, United States

Not SpecifiedCompensation
Entry Level & New GradExperience Level
Full TimeJob Type
UnknownVisa
HealthcareIndustries

Requirements

Candidates should possess at least 6 years of experience in software engineering or machine learning, with a preference for 3+ years holding a PhD. They must have experience designing and training large language models or large-scale generative models, demonstrating deep expertise in NLP, sequence modeling, and transformer architectures. Proficiency in Python and ML libraries like PyTorch or TensorFlow, coupled with strong engineering skills in building scalable ML pipelines, is required. Experience with RL-based fine-tuning and evaluating generative systems is also necessary, along with the ability to lead technical projects and collaborate effectively across teams. A Bachelor’s degree in Computer Science, Engineering, or a related field is a minimum qualification.

Responsibilities

The Machine Learning Engineer will lead the development, training, and deployment of large language and multimodal foundation models specifically tailored to clinical and biomedical domains. They will apply and refine state-of-the-art techniques such as supervised fine-tuning, reinforcement learning-based methods, parameter-efficient fine-tuning, prompt tuning, and retrieval-augmented generation. This role involves collaborating cross-functionally with researchers, clinicians, and engineers to design ML-driven solutions for healthcare, building scalable infrastructure for distributed training of large models, and designing and evaluating models for robustness, bias mitigation, factual consistency, and explainability within healthcare contexts. Furthermore, the engineer will stay current with the latest research in generative AI and contribute back to the community through publications and open-source initiatives.

Skills

Generative AI
Large Language Models (LLMs)
Reinforcement Learning
Supervised Fine-Tuning (SFT)
Reinforcement Learning with Human Feedback (RLHF)
Reinforcement Learning with Virtual Reality (RLVR)
Parameter-Efficient Fine-Tuning (PEFT)
Prompt Tuning
Retrieval-Augmented Generation (RAG)
Clinical Data
Multimodal Foundation Models
Healthcare Domain

Truveta

Healthcare data platform for research analytics

About Truveta

Truveta provides a platform that allows researchers to access and analyze patient data to enhance patient care and study the safety and effectiveness of treatments. The platform, known as Truveta Studio, offers immediate and compliant access to patient-level data, which is sourced from over 30 health systems and includes information from more than 100 million patients across the United States. This data is updated daily and comes from over 800 hospitals and 20,000 clinics. Truveta Studio is designed to simplify the data access process, making it cost-effective for researchers by charging them only for the data and analytics they use. Unlike many competitors, Truveta focuses on providing transparent pricing and efficient access to comprehensive healthcare data. The company's goal is to empower researchers in the healthcare and life sciences sectors to gain valuable insights that can lead to improved patient outcomes.

Seattle, WashingtonHeadquarters
2020Year Founded
$189.7MTotal Funding
LATE_VCCompany Stage
Data & Analytics, Biotechnology, HealthcareIndustries
201-500Employees

Benefits

Competitive Compensation
Comprehensive Benefits
401(k)
Professional Development
Work/Life Autonomy
Flexible Time Off
Generous Parental Leave
Team Activities

Risks

Data privacy concerns may arise from expanding datasets with sensitive information.
Rapid community expansion could strain resources and affect data quality.
Non-peer-reviewed studies may expose Truveta to criticism on scientific rigor.

Differentiation

Truveta offers the most comprehensive EHR data from over 100 million patients.
Truveta Studio provides cost-effective, compliant access to patient-level data and analytics.
Truveta's AI extracts complex clinical concepts from unstructured data, enhancing research capabilities.

Upsides

Truveta's partnership with Panalgo accelerates insights through integrated regulatory-grade EHR data.
The mother-child EHR dataset positions Truveta as a leader in maternal health research.
Truveta's real-world EHR data enables valuable drug comparisons ahead of clinical trials.

Land your dream remote job 3x faster with AI