Large language models can do jaw-dropping things. But nobody knows exactly why.

Harvard computer scientist Boaz Barak, currently with OpenAI, likens the current state of machine learning to early 20th-century physics, filled with unexpected experimental results. A key phenomenon in machine learning is generalization, where models learn from specific examples and apply that knowledge to new, unseen situations. This ability is particularly pronounced in large language models like OpenAI’s GPT-4 and Google DeepMind’s Gemini, which can, for instance, solve math problems in a language they weren’t explicitly trained in.

The field has progressed rapidly, largely through trial and error rather than a deep understanding of the underlying principles. Researchers have developed a vast array of techniques and “recipes” to improve model performance, but the process is still more art than science, akin to alchemy. One challenge is overfitting, where models perform well on training data but fail to generalize to new data. Despite this, the largest models with up to a trillion parameters continue to defy statistical expectations by improving with size, challenging conventional wisdom about model performance.
Read more at MIT Technology Review…