Large language models can do jaw-dropping things. But nobody knows exactly why.

Harvard computer scientist Boaz Barak, currently with OpenAI, likens the current state of machine learning to early 20th-century physics, filled with unexpected experimental results. A key phenomenon in machine learning is generalization, where models learn from specific examples and apply that knowledge to new, unseen situations. This ability is particularly pronounced in large language models like OpenAI’s GPT-4 and Google DeepMind’s Gemini, which can, for instance, solve math problems in a language they weren’t explicitly trained in.

The field has progressed rapidly, largely through trial and error rather than a deep understanding of the underlying principles. Researchers have developed a vast array of techniques and “recipes” to improve model performance, but the process is still more art than science, akin to alchemy. One challenge is overfitting, where models perform well on training data but fail to generalize to new data. Despite this, the largest models with up to a trillion parameters continue to defy statistical expectations by improving with size, challenging conventional wisdom about model performance.
Read more at MIT Technology Review…

Large language models can do jaw-dropping things. But nobody knows exactly why.

Related

When Code Training Goes Wrong: The Surprising Case of Emergent AI Misalignment

The Energy Infrastructure Gap That Could Decide the AI Race

AI-Powered Security Checks: Filtering Bots Without Slowing Users

Inside the Underground World of LLM Jailbreaks

GPT-5 is Here, and It’s Not What You Expected

The AI Agent That Actually Knows How to Build ML Models

Qwen-Image: Finally, an AI That Can Actually Write

Perplexity’s Stealth Crawling Sparks Debate Over AI Web Ethics

Feeding Your Gut to Fight Fat: How Tryptophan Sparks Hormone Recovery