Claude takes the top spot in AI chatbot ranking — finally knocking GPT-4 down to second place

Claude 3 Opus, developed by Anthropic, has surpassed OpenAI’s GPT-4 in the LMSYS Chatbot Arena, a leaderboard that ranks large language models based on blind human votes. This marks the first time GPT-4 has been dethroned since its launch. The Chatbot Arena, which began in May of the previous year, has collected over 400,000 votes, featuring models from Anthropic, OpenAI, Google, and newcomers like Mistral and Alibaba.

Claude 3 Opus’s victory is significant, but the margin is narrow, and with OpenAI’s GPT-5 on the horizon, Anthropic’s lead may be short-lived. The arena uses the Elo rating system, similar to chess, to rank the chatbots. Despite some limitations, such as missing models and occasional loading issues, the leaderboard is a competitive space for AI models.

Interestingly, even Anthropic’s smaller model, Claude 3 Haiku, has shown impressive performance, rivaling larger models in blind tests. The top ten includes all three Claude 3 variants, with Sonnet and Haiku also ranking high. The leaderboard is dominated by proprietary models, indicating a challenge for open-source AI to catch up. However, developments in open-source and decentralized AI, such as Meta’s upcoming Llama 3 and initiatives by StabilityAI, suggest a dynamic future for AI competition.
Read more at Tom’s Guide…

Claude takes the top spot in AI chatbot ranking — finally knocking GPT-4 down to second place

Related

When the Vending Machine Went Sentient

Constant-Time Breakthrough Raises the Hash-Table Speed Limit

Star Wars Reimagined: China’s Laser Satellite Outpaces Starlink

Court Rules AI’s Use of Books as Fair Use but Slams Pirated Collection Storage

Introducing the OWASP AI Testing Guide: A New Standard for AI Security Testing

The Low-Background Steel Problem of AI

Chinese AI Firms Dodge US Chip Bans with Cross-Border Data Smuggling to Malaysia

OpenAI open-sources a demo of a UI testing agent

Financial Dynamics in Agentic AI: Cursor’s Rise Versus GitHub Copilot