Large Language Models Learn to Self-Compose Reasoning Structures

Researchers from Google DeepMind and University of Southern California have developed a new technique called SELF-DISCOVER that allows large language models (LLMs) like GPT-4 and PaLM to dynamically compose reasoning structures to solve complex problems.

The key innovation in SELF-DISCOVER is enabling LLMs to select relevant reasoning skills, adapt them to a specific task, and combine them into an executable structure – all without any training data or human involvement. For example, when faced with a mathematical word problem, the LLM may choose skills like “break into subproblems”, “propose and verify”, and “step-by-step reasoning”, then adapt them into a structured plan to decompose the problem, verify intermediate steps, and methodically reach the solution.

SELF-DISCOVER guides LLMs to self-discover and compose atomic reasoning modules into a reasoning structure to solve challenging tasks. Through testing on challenging reasoning benchmarks incuding Big Bench-Hard (BBH), agent reasoning (T4D), and MATH, we find that SELF-DISCOVER outperforms Direct Answering on 23/25 and CoT on 21/25 tasks in zero-shot setting using PaLM 2-L

Through meta-learning prompts, the researchers guide the LLM to go through these three steps of selecting, adapting and implementing reasoning skills on a given task. This allows the model to uncover the intrinsic reasoning structure needed to efficiently solve that task.

Experiments across challenging benchmarks like BigBench-Hard, agent reasoning tests, and mathematical word problems show SELF-DISCOVER substantially boosts reasoning capabilities of GPT-4 and PaLM 2-L. On 25 complex reasoning tasks, it improved accuracy by 11% over chain-of-thought prompting and up to 29% over direct answering using GPT-4. The discovered reasoning structures also transferred well from GPT-4 to other models like Llama2-70B, demonstrating universality.

 

Illustration of using SELF-DISCOVER for problem-solving. Given a generative LM, task, and seed reasoning module descriptions, we guide LMs to generate a reasoning structure in key-value format to solve the task. Finally, models can follow the self-discovered structures to solve the every instance from the task by filling in the values in JSON step-by-step.

Compared to inference-heavy methods like self-consistency, SELF-DISCOVER achieved superior accuracy with 10-40x fewer inference calls. It also outperformed prompt optimization techniques that require training data. This makes the approach highly efficient.

Self-Discover significantly improves LLM reasoning across a diverse set of 25 complex tasks: BBH, T4D and MATH. CoT: zero-shot Chain of Thought. PS: plan- and-solve prompting

The researchers suggest that by self-composing reasoning rather than relying on a fixed prompting style, LLMs can better adapt to diverse real-world problems. Just like programmers combine basic constructs, LLMs can learn to choose and integrate reasoning skills dynamically.

This work opens exciting avenues for structured reasoning with LLMs. The human-like reasoning composition in SELF-DISCOVER could enable collaborative problem-solving between humans and AI. With further research, it may be possible to build LLMs that learn richer reasoning strategies and unlock their full potential on complex cognitive tasks.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Emsi's feed

Subscribe now to keep reading and get access to the full archive.

Continue reading