A new technique called “instruction backtranslation” allows large language models (LLMs) to improve their ability to follow instructions without requiring additional human annotations or external models. Researchers from Meta AI published a paper detailing this self-alignment method.
The approach has an LLM generate “instructions” for unlabeled text from the web, and then select high-quality instruction-text pairs to fine-tune itself. After two iterations, their model called Humpback outperforms all other non-distilled LLM instruction followers on a benchmark leaderboard.
Instruction Backtranslation Method
The method described in the paper looks like this:
- Start with a base language model (like LLaMA), a small seed set of human-annotated instruction-output example pairs, and a large corpus of unlabeled text data (web pages).
- Finetune the base model on the seed data to create a “backward” model that can generate an instruction given an output text.
- Use the backward model to generate instruction prompts for each unlabeled web page, creating candidate augmented training data pairs.
- Use the base model to score each candidate pair for quality when prompted. Select only high scoring pairs.
- Finetune the base model on the augmented high-quality data plus the original seed data.
- Iterate – use the improved model to rescore candidates and select better data, further finetuning the model.
- Humpback achieves 83.7% win rate on the AlpacaEval leader-board, surpassing prior best results from models like Guanaco (71.8%) and LIMA (62.7%) that use more human-labeled data.
- With only 3,200 labeled examples, Humpback reaches performance near proprietary models like Claude. This demonstrates highly sample-efficient learning.
- Analysis shows the diversity of generated instructions complements human-written seed data. Iterative self-curation is critical for success.
This technique could greatly reduce the annotation cost for developing capable LLMs. As models become more proficient at instruction following, they can self-improve with less human involvement.
The results also suggest LLMs have some capacity for self-critique and quality control when properly prompted. This could open doors for models learning broader skills in a self-supervised fashion.
However, the authors point out potential issues around bias amplification, safety and capabilities with structured outputs. Rigorous testing is still needed to deploy such systems responsibly.
Overall, instruction backtranslation provides an exciting new paradigm for aligned LLMs. If methods like this can be scaled effectively, we may see models take on more autonomous self-improvement while ensuring human preferences are respected. The future possibilities for beneficial, widely capable AI assistants look increasingly within reach.