Microsoft Unveils DeepSpeed-Chat to Democratize Training of Large Conversational AI Models

DeepSpeed-Chat is a new system introduced by Microsoft Researchers to make training large conversational AI models like ChatGPT fast, affordable and accessible. The system combines optimizations from DeepSpeed training and inference into a unified framework called Hybrid Engine to deliver unparalleled efficiency for RLHF training.

Key highlights:

  • Enables training 13B and 30B parameter conversational models for under $300 and $600 respectively on Azure cloud with end-to-end time of 9 hours and 18 hours. This is over 15x faster than existing systems.
  • Supports training models with hundreds of billions of parameters. A 175B model can be trained in under 1 day on a small cluster.
  • Makes RLHF accessible with ability to train 13B models on a single GPU.

The system provides an easy-to-use interface allowing users to train ChatGPT-like models from a single script. It replicates the full InstructGPT training pipeline with 3 key stages – supervised finetuning, reward model finetuning and RLHF.

DeepSpeed-Chat offers superior throughput, achieving over 10x higher training speed compared to existing PyTorch-based systems. The gains come from Hybrid Engine’s ability to seamlessly transition between optimized inference and training modes.

This development is a significant step towards democratizing access to large conversational AI models. Researchers and startups with limited resources can now train high-quality models without expensive infrastructure. The open-sourcing also enables broader innovation in this rapidly evolving field.

Possible implications:

  • Wider adoption of conversational AI across consumer apps, enterprise services etc. thanks to easy access to high-quality models
  • Fueling research into techniques like instruction tuning, prompt programming etc to improve robustness and capabilities of LLMs
  • Enabling small companies and startups to compete in conversational AI space against tech giants

In summary, DeepSpeed-Chat’s ability to train ChatGPT-scale models efficiently opens up exciting possibilities for both research and practical applications of conversational AI. The democratization of access can greatly accelerate progress in this transformative technology.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Emsi's feed

Subscribe now to keep reading and get access to the full archive.

Continue reading