Stanford Researchers Introduce Sophia: A Scalable Second-Order Optimizer For Language Model Pre-Training


GPT-4: Researchers have developed a novel optimizer called Sophia, which can train large language models (LLMs) twice as fast as the widely-used Adam optimizer. By using a lightweight estimate of the diagonal Hessian as a pre-conditioner, Sophia reduces the high up-front cost of training LLMs, potentially cutting budgets from $2M to $1M. The optimizer demonstrates consistent loss reduction across all parameter dimensions and is straightforward to implement with PyTorch. This development highlights the potential for academics to explore LLM pre-training and create effective algorithms with limited resources.
Read more at MarkTechPost…

Discover more from Emsi's feed

Subscribe now to keep reading and get access to the full archive.

Continue reading