Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF

Researchers introduce Starling-7B, a large language model trained using Reinforcement Learning from AI Feedback (RLAIF). The model, which outperforms most existing models, was trained using a new dataset, Nectar, and a new reward training and policy tuning pipeline. The team has released the ranking dataset, the reward model, and the language model on HuggingFace, along with an online demo. The researchers are exploring various training methodologies and will continue to update their findings.

Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF

Related

GPT-5’s “Erdős Breakthrough” That Wasn’t

Unitree G1: A Humanoid Robot Rife with Security Flaws and Cyber Risks

Unlocking New Potential: Claude Skills Revolutionize AI Capabilities

Breaking AI’s Boring Mold: Stanford’s Verbalized Sampling Revolutionizes Alignment

NVIDIA DGX Spark Brings Petaflop AI Power to the Desktop

AI Becomes Infrastructure: The Year Machines Learned to Reason

Build Your Own ChatGPT for $100 with Karpathy’s Innovative Nanochat Kit

Tiny Recursive Model: How a 7M-Parameter Net Outsmarts Giants with Latent Scratchpads and Iterative Self-Critique

CodeMender: DeepMind’s AI Agent That Finds and Fixes Security Flaws Automatically