Stanford Researchers Introduce Sophia: A Scalable Second-Order Optimizer For Language Model Pre-Training

GPT-4: Researchers have developed a novel optimizer called Sophia, which can train large language models (LLMs) twice as fast as the widely-used Adam optimizer. By using a lightweight estimate of the diagonal Hessian as a pre-conditioner, Sophia reduces the high up-front cost of training LLMs, potentially cutting budgets from $2M to $1M. The optimizer demonstrates consistent loss reduction across all parameter dimensions and is straightforward to implement with PyTorch. This development highlights the potential for academics to explore LLM pre-training and create effective algorithms with limited resources.
Read more at MarkTechPost…

Stanford Researchers Introduce Sophia: A Scalable Second-Order Optimizer For Language Model Pre-Training

Related

When the Vending Machine Went Sentient

Constant-Time Breakthrough Raises the Hash-Table Speed Limit

Star Wars Reimagined: China’s Laser Satellite Outpaces Starlink

Court Rules AI’s Use of Books as Fair Use but Slams Pirated Collection Storage

Introducing the OWASP AI Testing Guide: A New Standard for AI Security Testing

The Low-Background Steel Problem of AI

Chinese AI Firms Dodge US Chip Bans with Cross-Border Data Smuggling to Malaysia

OpenAI open-sources a demo of a UI testing agent

Financial Dynamics in Agentic AI: Cursor’s Rise Versus GitHub Copilot