Simplifying Vision Transformers with ReLU Attention
A new paper from researchers at DeepMind explores replacing the softmax function in transformer attention with a ReLU activation. The work shows that this simple change allows vision transformers to…