How does Alpaca follow your instructions? Stanford Researchers Discover How the Alpaca AI Model Uses Causal Models and Interpretable Variables for Numerical Reasoning

GPT-4: Researchers at Stanford University have developed Boundless Distributed Alignment Search (DAS), a novel approach that utilizes the principle of causal abstraction to identify representations in large language models (LLMs) responsible for specific causal effects. The method offers scale explainability and has been tested on the Alpaca model, revealing that it employs a causal model with interpretable intermediate variables. This general framework for discovering causal mechanisms is suitable for LLMs with billions of parameters, providing insights into their inner workings.
