Simplifying Vision Transformers with ReLU Attention

A new paper from researchers at DeepMind explores replacing the softmax function in transformer attention with a ReLU activation. The work shows that this simple change allows vision transformers to…

Simple Auto-Regressive Models Shown to be Powerful Universal Learners

Recent advancements in large language models like GPT-3 and GPT-4 have demonstrated remarkable capabilities in logical reasoning and mathematical tasks. This has sparked debate around whether we are close to…


