GitHub – openai/transformer-debugger

OpenAI’s Superalignment team has developed the Transformer Debugger (TDB), a tool designed to delve into the specific behaviors of small language models. TDB leverages automated interpretability techniques and sparse autoencoders to facilitate the investigation of language model decisions, such as token output and attention head focus. The tool simplifies the process of exploring model behaviors without the immediate need for coding, by allowing interventions in the model’s forward pass and tracing the connections between model components.

The release includes a Neuron viewer React app, an Activation server for model inference, a simple inference library for GPT-2 models, and datasets of top-activating examples. To get started, users are guided to set up their environment with Python, pip, Node, and npm. The repository also provides instructions for running the TDB app, including setting up the activation server backend and neuron viewer frontend.

For those making changes to the TDB, the repository outlines steps to validate updates using pytest, mypy, and by confirming the functionality of the server and viewer. The tool and its components can be cited in research, with a provided citation format and BibTex entry for referencing.
Read more at GitHub…