Back

Research Scientist, RL Training

Worldwide Salaried Open

ABOUT THE ROLE We're looking for a Research Scientist to work on reinforcement learning for training and aligning large language models. This is a foundational research role focused on one of the most consequential open data problems in AI: how to generate the data, reward signals, and training procedures that steer LLM behavior in reliable and generalizable directions - and a core capability that directly differentiates Snorkel's data-as-a-service offering. You'll work closely with Snorkel's research, engineering, and delivery teams to advance our RL data capabilities - translating research ideas into the preference datasets, reward models, and RL-ready corpora we produce for frontier AI labs, and contributing to a research agenda that is central to Snorkel's long-term differentiation as a provider of bespoke training data. MAIN RESPONSIBILITIES

  • Research and implement reinforcement learning techniques - including GRPO, RLHF, RLAIF, DPO, and reward modeling - and translate them into data products (preference datasets, reward signals, verifiable rewards) that customers can use to train and fine-tune large language models.
  • Design and build data pipelines that generate high-quality training signal for RL workflows, including AI-assisted data annotation and curation data pipelines to improve model generalization to unseen benchmarks .
  • Prototype and iterate on end-to-end RL training recipes that inform what data Snorkel ships as part of its data-as-a-service deliveries.
  • Work closely with research scientists, ML engineers, and delivery teams to translate RL research into customer-ready data products.
  • Stay current with the latest developments in large-scale muli-node LLM training, alignment research, and scalable RL methods (on complex environments such as Terminal-Bench), bringing relevant advances into Snorkel's data-as-a-service approach.
  • Contribute to Snorkel's research publications and internal knowledge base in RL and model training.

PREFERRED QUALIFICATIONS

  • Deep expertise in reinforcement learning from human or AI feedback, reward modeling and credit attribution ideally with a clear perspective on what data makes these techniques work.
  • Experience training or fine-tuning 30B+ large language models at scale, including familiarity with distributed training infrastructure.
  • Strong proficiency in Python and ML frameworks, especially PyTorch and HuggingFace and hands-on experience with RL frameworks such as Verl and SkyRL.
  • Solid software engineering fundamentals - you can build research prototypes that others can run, extend, and integrate into data production workflows.
  • Familiarity with ML infrastructure and cloud platforms and tools (AWS, GCP, Kubernetes, Slurm, etc.); experience with large-scale RL training pipelines a strong plus.
  • Comfort operating in a high-iteration environment with open-ended research questions and shifting, customer-driven technical constraints.
  • Ph.D. in machine learning, reinforcement learning, or a related field strongly preferred; exceptional industry experience considered.

Salary Range $200,000-$275,000 USD Apply tot his job Apply To this Job

More jobs

Research Scientist, Generalist Embodied Agent Research - PhD New College Grad 2026

Worldwide Salaried

[Hiring] Principal Research Scientist, Database Systems @MongoDB

Worldwide Salaried

Applied Research Scientist [Machine Visibility]

Worldwide Salaried

Staff Machine Learning Research Scientist, LLM Evals

Worldwide Salaried

Associate Principal AI Research Scientist (Fundamental AI Research for Digital Biology)

Worldwide Salaried

Sr. ML Research Scientist

Worldwide Salaried

Principal Research Scientist (Database Systems)

Worldwide Salaried

Staff Research Scientist, Technical Lead (AdTech/Recommender Systems)

Worldwide Salaried

Remote Bioinformatics

Worldwide Salaried

Senior Research Scientist, Computational Biology

Worldwide Salaried

Telecom GIS & Data Analyst – Network Capacity Planning

Worldwide Salaried

Inside Sales Account Manager - Infusion - Remote (Mountain or Pacific Time Zone)

Worldwide Salaried

Experienced Data Entry Clerk/Data Entry Operator/Admin Assistant – USA ONLY BASED REMOTE JOB

Worldwide Salaried

Advisor, Personal

Worldwide Salaried

APTPUO-Hiver 2027-PT-HSS1500M

Worldwide Salaried

Senior GCP API Data Engineer

Worldwide Salaried

Remote Sales Representative -Entry Level Full Time & Part Time

Worldwide Salaried

Clinician Analyst (Atendimento Especializado - Farmacêutico/Enfermeiro)

Worldwide Salaried

Experienced Part-Time Evening Work From Home Data Entry Specialist – Flexible Schedule and Competitive Pay

Worldwide Salaried

Right Hand to CEO- Must Love Dogs (Maui or Remote Executive Assistant)

Worldwide Salaried