Large language models can do jaw-dropping things. But nobody knows exactly why.

Harvard computer scientist Boaz Barak, currently with OpenAI, likens the current state of machine learning to early 20th-century physics, filled with unexpected experimental results. A key phenomenon in machine learning is generalization, where models learn from specific examples and apply that knowledge to new, unseen situations. This ability is particularly pronounced in large language models like OpenAI’s GPT-4 and Google DeepMind’s Gemini, which can, for instance, solve math problems in a language they weren’t explicitly trained in.

The field has progressed rapidly, largely through trial and error rather than a deep understanding of the underlying principles. Researchers have developed a vast array of techniques and “recipes” to improve model performance, but the process is still more art than science, akin to alchemy. One challenge is overfitting, where models perform well on training data but fail to generalize to new data. Despite this, the largest models with up to a trillion parameters continue to defy statistical expectations by improving with size, challenging conventional wisdom about model performance.
Read more at MIT Technology Review…

Large language models can do jaw-dropping things. But nobody knows exactly why.

Related

GPT-5’s “Erdős Breakthrough” That Wasn’t

Unitree G1: A Humanoid Robot Rife with Security Flaws and Cyber Risks

Unlocking New Potential: Claude Skills Revolutionize AI Capabilities

Breaking AI’s Boring Mold: Stanford’s Verbalized Sampling Revolutionizes Alignment

NVIDIA DGX Spark Brings Petaflop AI Power to the Desktop

AI Becomes Infrastructure: The Year Machines Learned to Reason

Build Your Own ChatGPT for $100 with Karpathy’s Innovative Nanochat Kit

Tiny Recursive Model: How a 7M-Parameter Net Outsmarts Giants with Latent Scratchpads and Iterative Self-Critique

CodeMender: DeepMind’s AI Agent That Finds and Fixes Security Flaws Automatically