Microsoft Unveils DeepSpeed-Chat to Democratize Training of Large Conversational AI Models

DeepSpeed-Chat is a new system introduced by Microsoft Researchers to make training large conversational AI models like ChatGPT fast, affordable and accessible. The system combines optimizations from DeepSpeed training and inference into a unified framework called Hybrid Engine to deliver unparalleled efficiency for RLHF training.

Key highlights:

  • Enables training 13B and 30B parameter conversational models for under $300 and $600 respectively on Azure cloud with end-to-end time of 9 hours and 18 hours. This is over 15x faster than existing systems.
  • Supports training models with hundreds of billions of parameters. A 175B model can be trained in under 1 day on a small cluster.
  • Makes RLHF accessible with ability to train 13B models on a single GPU.

The system provides an easy-to-use interface allowing users to train ChatGPT-like models from a single script. It replicates the full InstructGPT training pipeline with 3 key stages – supervised finetuning, reward model finetuning and RLHF.

DeepSpeed-Chat offers superior throughput, achieving over 10x higher training speed compared to existing PyTorch-based systems. The gains come from Hybrid Engine’s ability to seamlessly transition between optimized inference and training modes.

This development is a significant step towards democratizing access to large conversational AI models. Researchers and startups with limited resources can now train high-quality models without expensive infrastructure. The open-sourcing also enables broader innovation in this rapidly evolving field.

Possible implications:

  • Wider adoption of conversational AI across consumer apps, enterprise services etc. thanks to easy access to high-quality models
  • Fueling research into techniques like instruction tuning, prompt programming etc to improve robustness and capabilities of LLMs
  • Enabling small companies and startups to compete in conversational AI space against tech giants

In summary, DeepSpeed-Chat’s ability to train ChatGPT-scale models efficiently opens up exciting possibilities for both research and practical applications of conversational AI. The democratization of access can greatly accelerate progress in this transformative technology.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.