Staff Machine Learning Research Engineer, Agent Post-training - Enterprise GenAI at Scale AI

New York, New York, United States

Scale AI Logo
Not SpecifiedCompensation
Senior (5 to 8 years), Expert & Leadership (9+ years)Experience Level
Full TimeJob Type
UnknownVisa
AI, Generative AI, Enterprise AIIndustries

Requirements

  • 5+ years of LLM training in a production environment
  • Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc
  • Publications in top conferences such as NEURIPS, ICLR, or ICML within the last two years
  • PhD or Masters in Computer Science or a related field

Responsibilities

  • Train state of the art models, developed both internally and from the community, to deploy to our enterprise customers
  • Research cutting edge algorithms to integrate directly into our training stack
  • Design solutions that enable complex multi-agent systems to directly learn from both process + outcome based rewards

Skills

Key technologies and capabilities for this role

LLM trainingRLHFRLVRPPOGRPOmulti-agent systemsreinforcement learningpost-training algorithms

Questions & Answers

Common questions about this position

What does the compensation package include?

Compensation packages include base salary, equity, and benefits such as comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, generous PTO, and possibly a commuter stipend.

What is the salary range for this position?

The specific salary range is determined by work location and additional factors, and your recruiter can share more details for your preferred location during the hiring process.

Is this role remote or does it require being in the office?

This information is not specified in the job description. Please reference the job posting's subtitle for where this position will be located.

What skills and experience are required for this role?

Ideal candidates have 5+ years of LLM training in a production environment, experience with post-training methods like RLHF/RLVR and algorithms like PPO/GRPO, publications in top conferences such as NEURIPS, ICLR, or ICML within the last two years, and a PhD or Masters in Computer Science or a related field.

What makes a strong candidate for this position?

Strong candidates will have extensive production experience in LLM training, expertise in advanced post-training algorithms like RLHF and PPO, recent top-tier publications, and an advanced degree in Computer Science, along with excitement for building next-gen agent training platforms.

Scale AI

AI platform for data and models

About Scale AI

Scale AI provides a platform that helps businesses develop AI applications by utilizing their enterprise data to customize generative models. The platform includes tools for collecting, curating, and annotating data, as well as features for evaluating and optimizing models. Scale works with a variety of clients, including major tech companies like Microsoft and Meta, government agencies such as the U.S. Army and Airforce, and startups like Brex and OpenSea. What sets Scale apart from its competitors is its comprehensive suite of tools and services that focus on safely unlocking the value of AI. The company's goal is to enhance the performance of advanced language models and generative models, making AI more accessible and effective for its clients.

San Francisco, CaliforniaHeadquarters
2016Year Founded
$1,558.9MTotal Funding
SERIES_FCompany Stage
Data & Analytics, AI & Machine LearningIndustries
1,001-5,000Employees

Benefits

Health, Dental & Vision Coverage - Our health plans give you the flexibility to select the right coverage for you and eligible family members through a variety of plan options.
Easy to use 401(K) - Plan and invest for the future with a 401(k) via Guideline. Scale’s 401(k) plan provides you an opportunity to defer compensation for your long-term savings.
Wellness Fund - We care about the physical, mental, and emotional wellbeing of all Scaliens. Our $100/month wellness stipend can be used for gym memberships, acupuncture, meditation apps, and so much more.
Virtual Social Activities - Being remote has not stopped us from hosting fun virtual events. From trivia night to candle making, we ensure employees are fostering connections & building strong relationships.
Learning & Development - We know how important career growth is for Scaliens, so we offer a $500/year L&D stipend to help support continued development throughout your journey.
Flexible hours allow you to work when you are most productive. You can work with your manager to best plan your daily work schedule.
Generous Paid Time Off - Enjoy time to travel or plan a staycation. We encourage employees to take time off to recharge and prevent burnout. We have a flexible PTO policy where each employee is afforded the flexibility to take planned time-off as needed.
Commuter Benefits - Set aside pre-tax dollars to use on qualified transportation expenses to help ease your commute.
Parental Leave - Balancing work and family is essential, and Scale understands the importance of having adequate leave policies in place to promote a healthy home and work life.

Risks

Legal issues over labor practices could harm Scale AI's reputation and financial standing.
Controversial stance on DEI may deter potential talent and partnerships, affecting growth.
Expansion in San Francisco could increase operational costs amid uncertain economic conditions.

Differentiation

Scale AI offers comprehensive data annotation for AI, unlike competitors focusing on single modalities.
The platform supports diverse clients, from tech giants to government agencies, enhancing its versatility.
Scale AI's commitment to AI safety initiatives sets it apart in responsible AI development.

Upsides

Growing demand for AI data annotation boosts Scale AI's market potential and client base.
Expansion into defense applications opens new revenue streams and strategic partnerships.
Partnerships with AI safety organizations enhance Scale AI's reputation as a responsible AI leader.

Land your dream remote job 3x faster with AI