Beyond ChatGPT: NExT-GPT is an OpenSource Model That Lets You Master AI With Audio, Video and Text

NExT-GPT, a multimodal AI large language model developed by the National University of Singapore and Tsinghua University, can process and generate combinations of text, images, audio, and video. This open-source model, pitched as an “any-to-any” system, allows for more natural interactions than text-only models. It uses a technique called “modality-switching instruction tuning” to improve cross-modal reasoning abilities and unique tokens to handle different inputs. NExT-GPT represents an open-source alternative to multimodal AI products from tech giants like Google and OpenAI.

Beyond ChatGPT: NExT-GPT is an OpenSource Model That Lets You Master AI With Audio, Video and Text

Related

When Code Training Goes Wrong: The Surprising Case of Emergent AI Misalignment

The Energy Infrastructure Gap That Could Decide the AI Race

AI-Powered Security Checks: Filtering Bots Without Slowing Users

Inside the Underground World of LLM Jailbreaks

GPT-5 is Here, and It’s Not What You Expected

The AI Agent That Actually Knows How to Build ML Models

Qwen-Image: Finally, an AI That Can Actually Write

Perplexity’s Stealth Crawling Sparks Debate Over AI Web Ethics

Feeding Your Gut to Fight Fat: How Tryptophan Sparks Hormone Recovery