Large Language Models Learn to Self-Compose Reasoning Structures

Researchers from Google DeepMind and University of Southern California have developed a new technique called SELF-DISCOVER that allows large language models (LLMs) like GPT-4 and PaLM to dynamically compose reasoning structures to solve complex problems.

The key innovation in SELF-DISCOVER is enabling LLMs to select relevant reasoning skills, adapt them to a specific task, and combine them into an executable structure – all without any training data or human involvement. For example, when faced with a mathematical word problem, the LLM may choose skills like “break into subproblems”, “propose and verify”, and “step-by-step reasoning”, then adapt them into a structured plan to decompose the problem, verify intermediate steps, and methodically reach the solution.

SELF-DISCOVER guides LLMs to self-discover and compose atomic reasoning modules into a reasoning structure to solve challenging tasks. Through testing on challenging reasoning benchmarks incuding Big Bench-Hard (BBH), agent reasoning (T4D), and MATH, we find that SELF-DISCOVER outperforms Direct Answering on 23/25 and CoT on 21/25 tasks in zero-shot setting using PaLM 2-L

Through meta-learning prompts, the researchers guide the LLM to go through these three steps of selecting, adapting and implementing reasoning skills on a given task. This allows the model to uncover the intrinsic reasoning structure needed to efficiently solve that task.

Experiments across challenging benchmarks like BigBench-Hard, agent reasoning tests, and mathematical word problems show SELF-DISCOVER substantially boosts reasoning capabilities of GPT-4 and PaLM 2-L. On 25 complex reasoning tasks, it improved accuracy by 11% over chain-of-thought prompting and up to 29% over direct answering using GPT-4. The discovered reasoning structures also transferred well from GPT-4 to other models like Llama2-70B, demonstrating universality.


Illustration of using SELF-DISCOVER for problem-solving. Given a generative LM, task, and seed reasoning module descriptions, we guide LMs to generate a reasoning structure in key-value format to solve the task. Finally, models can follow the self-discovered structures to solve the every instance from the task by filling in the values in JSON step-by-step.

Compared to inference-heavy methods like self-consistency, SELF-DISCOVER achieved superior accuracy with 10-40x fewer inference calls. It also outperformed prompt optimization techniques that require training data. This makes the approach highly efficient.

Self-Discover significantly improves LLM reasoning across a diverse set of 25 complex tasks: BBH, T4D and MATH. CoT: zero-shot Chain of Thought. PS: plan- and-solve prompting

The researchers suggest that by self-composing reasoning rather than relying on a fixed prompting style, LLMs can better adapt to diverse real-world problems. Just like programmers combine basic constructs, LLMs can learn to choose and integrate reasoning skills dynamically.

This work opens exciting avenues for structured reasoning with LLMs. The human-like reasoning composition in SELF-DISCOVER could enable collaborative problem-solving between humans and AI. With further research, it may be possible to build LLMs that learn richer reasoning strategies and unlock their full potential on complex cognitive tasks.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.