Orca 2 has splashed!

Microsoft researchers have developed a new technique called “Cautious Reasoning” that allows smaller AI models to match the reasoning capabilities of much larger models. The technique was implemented in a new model called Orca 2, built on top of the LLaMA architecture.

The key innovation is training the smaller models not just to mimic the outputs of larger models, but to learn which reasoning strategy is most effective for each type of task. Strategies include thinking step-by-step, providing explanations, or direct answer generation. During training, Orca 2 is shown the outputs of larger models like GPT-4 demonstrating these strategies. But the original complex prompts are erased, forcing Orca 2 to learn when to apply each technique.

This “Prompt Erasing” results in more flexible reasoning, allowing Orca 2 to choose the best approach rather than blindly imitating larger models. In tests across over 15 diverse benchmarks, Orca 2 significantly outperformed other models of similar size, despite having 5-10x fewer parameters.

Results comparing Orca 2 (7B & 13B) to LLaMA-2-Chat (13B & 70B) and WizardLM (13B & 70B) on variety of benchmarks (in 0-shot setting).

For example, on zero-shot reasoning tasks like AGIEval, BigBench, and GSM8K, the 13 billion parameter Orca 2 matched or exceeded the performance of 70 billion parameter models like LLaMA and WizardLM. It also exceeded these larger models on language understanding benchmarks like MMLU and ARC.

Macro-average Performance of different models on reasoning benchmarks.

Researchers believe teaching reasoning strategies, rather than just mimicking outputs, is key to unlocking the potential of smaller models. While not yet matching the very largest 175 billion parameter models like GPT-3.5, Orca 2 demonstrates that smaller models can reach impressive reasoning abilities given the right training approach.

The researchers now aim to continue improving reasoning across diverse tasks while working to align models for safety. They plan to open source Orca 2 to enable further research into optimizing and evaluating smaller but capable AI models. If techniques like Cautious Reasoning succeed, they could enable a new wave of specialized and efficient AI applications.

Related

Leave a ReplyCancel reply

The Energy Infrastructure Gap That Could Decide the AI Race

AI-Powered Security Checks: Filtering Bots Without Slowing Users

Inside the Underground World of LLM Jailbreaks

GPT-5 is Here, and It’s Not What You Expected

The AI Agent That Actually Knows How to Build ML Models

Qwen-Image: Finally, an AI That Can Actually Write

Perplexity’s Stealth Crawling Sparks Debate Over AI Web Ethics

Feeding Your Gut to Fight Fat: How Tryptophan Sparks Hormone Recovery

Putting Math Behind the Madness: A Theoretical Framework for LLM Hallucinations