Alibaba Unveils Groundbreaking gte-v1.5 Series: A Leap in Text Embedding Technology

Alibaba Group’s Institute for Intelligent Computing has unveiled the gte-v1.5 series, a significant upgrade to their gte embeddings, now supporting context lengths up to 8192. These models leverage the transformer++ encoder backbone, integrating BERT, RoPE, and GLU technologies to enhance performance. The gte-v1.5 series sets a new benchmark in the MTEB benchmark for its model size category and shows competitive performance in the LoCo long-context retrieval tests. A standout model, the gte-Qwen1.5-7B-instruct, excels in multi-lingual embedding, securing top positions in both MTEB and C-MTEB competitions.

Developed with a focus on text embeddings, the series underwent a rigorous training regimen, including masked language modeling and weak-supervised contrastive pre-training, to support extended context lengths. The training strategy was multi-staged, gradually increasing context lengths to optimize the model’s performance across various benchmarks.

In summary, the gte-v1.5 series represents a significant step forward in the development of text embeddings, offering enhanced performance for processing long-context information and setting new standards in multi-lingual embedding capabilities.
Read more…

Alibaba Unveils Groundbreaking gte-v1.5 Series: A Leap in Text Embedding Technology

Related

The Day 7,000 Robot Vacuums Almost Became a Remote-Controlled Army

When Trust Is Breached: What PayPal’s Account Compromise Reveals About Financial Security

How to Erase an AI’s Conscience in 45 Minutes

Qwen3.5-397B-A17B: A Serious Look at Alibaba’s New Open-Weight Giant

gog: One Binary to Rule Your Google Workspace from the Terminal

PicoClaw: A Leaner AI Assistant That Actually Fits on Cheap Hardware

When AI Benchmarks Turn Into Memory Tests

Why Andromeda Is Racing Toward Us While the Rest of the Universe Pulls Away

When the World Becomes a Prompt: How Text in the Environment Can Hijack Embodied AI