Stable Diffusion 3.0 debuts new diffusion transformation architecture to reinvent text-to-image gen AI


Stability AI has unveiled an early preview of Stable Diffusion 3.0, a cutting-edge text-to-image generative AI model boasting enhanced image quality and performance. This new iteration is particularly adept at handling multi-subject prompts and displays significantly improved typography, addressing a previous weakness and aligning with advancements made by competitors like DALL-E 3 and Midjourney. Stable Diffusion 3.0, varying from 800M to 8B parameters, is built on a novel architecture known as a diffusion transformer, which is similar to OpenAI’s Sora model and marks a departure from the company’s previous models.

The diffusion transformer architecture replaces the U-Net backbone commonly used in diffusion models with a transformer that operates on latent image patches, leading to more efficient compute usage and superior performance. Additionally, the model incorporates flow matching, a new training method for Continuous Normalizing Flows that enhances training speed and model performance.

Beyond text-to-image capabilities, Stability AI is expanding Stable Diffusion 3.0’s utility to include 3D and video generation, aiming to create versatile open models adaptable to various applications. This development signifies a significant step forward in the evolution of generative AI models and their applications in visual media.
Read more at VentureBeat…

Discover more from Emsi's feed

Subscribe now to keep reading and get access to the full archive.

Continue reading