AI Model Leaderboards

Track and compare the performance of leading AI models based on community-driven evaluation.

216

Total Models

2,787,691

Total Votes

2025-03-16

Last Updated

Rank	Model	Organization	Score	Votes	License
1🥇	Grok-3-Preview-02-24	xAI	1406 ±7	9,109	Proprietary
1🥈	GPT-4.5-Preview	OpenAI	1400 ±6	8,596	Proprietary
3🥉	Gemini-2.0-Flash-Thinking-Exp-01-21	Google	1383 ±5	21,124	Proprietary
3	Gemini-2.0-Pro-Exp-02-05	Google	1380 ±4	19,038	Proprietary
3	ChatGPT-4o-latest (2025-01-29)	OpenAI	1375 ±5	20,936	Proprietary
6	DeepSeek-R1	DeepSeek	1360 ±6	11,507	MIT
6	Gemini-2.0-Flash-001	Google	1355 ±5	16,845	Proprietary
6	o1-2024-12-17	OpenAI	1352 ±5	23,441	Proprietary
8	Gemma-3-27B-it	Google	1340 ±8	5,028	Gemma
9	Qwen2.5-Max	Alibaba	1339 ±5	15,607	Proprietary

About the Leaderboard

This leaderboard is based on community-driven evaluation using the Bradley-Terry model. Models are ranked based on their performance in head-to-head comparisons.

Ranking Methodology

The ranking is determined by the model's Arena Score, which considers both raw performance and style control factors. Confidence intervals are calculated using bootstrapping.

Style Control Rank

This secondary ranking accounts for factors like response length and markdown usage to provide a more nuanced view of model performance.

Citation

Please cite the following paper if you find our leaderboard or dataset helpful.

@misc{chiang2024chatbot,
    title={Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference},
    author={Wei-Lin Chiang and Lianmin Zheng and Ying Sheng and Anastasios Nikolas Angelopoulos and Tianle Li and Dacheng Li and Hao Zhang and Banghua Zhu and Michael Jordan and Joseph E. Gonzalez and Ion Stoica},
    year={2024},
    eprint={2403.04132},
    archivePrefix={arXiv},
    primaryClass={cs.AI}
}