Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF


Researchers introduce Starling-7B, a large language model trained using Reinforcement Learning from AI Feedback (RLAIF). The model, which outperforms most existing models, was trained using a new dataset, Nectar, and a new reward training and policy tuning pipeline. The team has released the ranking dataset, the reward model, and the language model on HuggingFace, along with an online demo. The researchers are exploring various training methodologies and will continue to update their findings.

Read more…