Microsoft Unveils DeepSpeed-Chat to Democratize Training of Large Conversational AI Models

DeepSpeed-Chat is a new system introduced by Microsoft Researchers to make training large conversational AI models like ChatGPT fast, affordable and accessible. The system combines optimizations from DeepSpeed training and inference into a unified framework called Hybrid Engine to deliver unparalleled efficiency for RLHF training.

Key highlights:

Enables training 13B and 30B parameter conversational models for under $300 and $600 respectively on Azure cloud with end-to-end time of 9 hours and 18 hours. This is over 15x faster than existing systems.
Supports training models with hundreds of billions of parameters. A 175B model can be trained in under 1 day on a small cluster.
Makes RLHF accessible with ability to train 13B models on a single GPU.

The system provides an easy-to-use interface allowing users to train ChatGPT-like models from a single script. It replicates the full InstructGPT training pipeline with 3 key stages – supervised finetuning, reward model finetuning and RLHF.

DeepSpeed-Chat offers superior throughput, achieving over 10x higher training speed compared to existing PyTorch-based systems. The gains come from Hybrid Engine’s ability to seamlessly transition between optimized inference and training modes.

This development is a significant step towards democratizing access to large conversational AI models. Researchers and startups with limited resources can now train high-quality models without expensive infrastructure. The open-sourcing also enables broader innovation in this rapidly evolving field.

Possible implications:

Wider adoption of conversational AI across consumer apps, enterprise services etc. thanks to easy access to high-quality models
Fueling research into techniques like instruction tuning, prompt programming etc to improve robustness and capabilities of LLMs
Enabling small companies and startups to compete in conversational AI space against tech giants

In summary, DeepSpeed-Chat’s ability to train ChatGPT-scale models efficiently opens up exciting possibilities for both research and practical applications of conversational AI. The democratization of access can greatly accelerate progress in this transformative technology.

Microsoft Unveils DeepSpeed-Chat to Democratize Training of Large Conversational AI Models

Related

Leave a ReplyCancel reply

When Code Training Goes Wrong: The Surprising Case of Emergent AI Misalignment

The Energy Infrastructure Gap That Could Decide the AI Race

AI-Powered Security Checks: Filtering Bots Without Slowing Users

Inside the Underground World of LLM Jailbreaks

GPT-5 is Here, and It’s Not What You Expected

The AI Agent That Actually Knows How to Build ML Models

Qwen-Image: Finally, an AI That Can Actually Write

Perplexity’s Stealth Crawling Sparks Debate Over AI Web Ethics

Feeding Your Gut to Fight Fat: How Tryptophan Sparks Hormone Recovery