Tiny Recursive Model: How a 7M-Parameter Net Outsmarts Giants with Latent Scratchpads and Iterative Self-Critique


Merit first: a 7-million-parameter model — Tiny Recursive Model (TRM) from Samsung — reportedly outperformed giants like DeepSeek-R1, Gemini 2.5 Pro, and o3-mini on hard reasoning benchmarks (ARG-AGI 1 and ARC-AGI 2). That’s huge: a model roughly 10,000x smaller getting better at reasoning shows scale isn’t the only path to competence.

Why it works (clean, practical ideas):

  • Draft-first, then think. TRM produces a quick complete draft answer (one-shot, not token-by-token). That draft is the starting hypothesis.
  • Latent scratchpad. Instead of writing thoughts as text, TRM keeps an internal latent “scratchpad” z made of several slots (code defaults to six). These latent vectors hold the model’s internal reasoning state.
  • Intense inner-loop critique. The model repeatedly refines the scratchpad — the code runs a small net across the scratchpad slots multiple times (six updates per cycle) to self-criticize and improve the logic. That’s where reasoning happens.
  • Revise the draft. After refining z, the model produces an improved draft from the updated latent state.
  • Repeat until confident. The draft→think→revise loop is repeated many times (reports say up to 16 cycles) to converge on a solid answer. A learned confidence head (q_hat) lets it stop early when ready.

Smart engineering choices amplify impact:
– Most inner thinking steps run under no_grad (and outputs are detached), so the model avoids storing huge backprop graphs — letting a tiny net perform many iterative steps without exploding memory/compute.
– Weight reuse via recurrence gives effective depth far beyond the parameter count.
– A binary quality head decides when to stop, making thinking adaptive.

Opinion: this is a brilliantly economical approach — trading raw parameter count for iterative, latent computation and smart training tricks. For reasoning-heavy tasks, doing more internal thought with a small model can beat brute-force scale. Expect more research applying latent scratchpads and iterative critique to get more mileage out of tiny models.
Read more…