Chat · Vision · Overall Leaderboard

Ranking for Vision / Overall, based on public preference data.

Selection guide

Overall model ranking guide

Ranking for Vision / Overall, based on public preference data.

claude-opus-4-7claude-opus-4-7-thinkingclaude-opus-4-6-thinkingclaude-opus-4-6muse-spark
Current DirectoryChat · Vision · Overall
Models126
Published2026/05/18
Arena public preference evaluationOriginal leaderboard: Vision / OverallPublished: 2026/05/18Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
claude-opus-4-7
Anthropic
100.0
6.8K
1M
¥36 / ¥180Input/Output
2
claude-opus-4-7-thinking
Anthropic
99.2
6.5K
1M
¥36 / ¥180Input/Output
3
claude-opus-4-6-thinking
Anthropic
98.4
7.1K
1M
¥36 / ¥180Input/Output
4
claude-opus-4-6
Anthropic
97.6
8.5K
1M
¥36 / ¥180Input/Output
5
muse-spark
Meta
96.8
4.5K
-
-
6
gemini-3-pro
Google
96.0
13.2K
1.05M
¥14.4 / ¥86.4Input/Output
7
gemini-3.1-pro-preview
Google
95.2
16.8K
1.05M
¥14.4 / ¥86.4Input/Output
8
gpt-5.4-high
Openai
94.4
5.9K
1.05M
¥18 / ¥108Input/Output
9
gpt-5.5
Openai
93.6
4.7K
1.05M
¥36 / ¥216Input/Output
10
gpt-5.5-high
Openai
92.8
4.2K
1.05M
¥36 / ¥216Input/Output
11
gpt-5.4
Openai
92.0
5.6K
1.05M
¥18 / ¥108Input/Output
12
gemini-3-flash
Google
91.2
21.2K
1.05M
¥3.6 / ¥21.6Input/Output
13
claude-sonnet-4-6
Anthropic
90.4
8.9K
1M
¥21.6 / ¥108Input/Output
14
kimi-k2.6
Moonshot
89.6
5.6K
262K
¥6.84 / ¥28.8Input/Output
15
dola-seed-2.0-pro
Bytedance
88.8
8.9K
-
-
16
qwen3.7-plus-preview
Alibaba
88.0
3.6K
131K
¥3.6 / ¥21.6Input/Output
17
gpt-5.2-chat-latest-20260210
Openai
87.2
12.5K
400K
¥12.6 / ¥101Input/Output
18
kimi-k2.5-thinking
Moonshot
86.4
14.4K
262K
¥4.32 / ¥21.6Input/Output
19
gemini-3-flash (thinking-minimal)
Google
85.6
19.3K
1.05M
¥3.6 / ¥21.6Input/Output
20
glm-5v-turbo
Zai
84.8
7.5K
200K
¥0 / ¥0Input/Output
21
gemini-2.5-pro
Google
84.0
87.2K
1.05M
¥9 / ¥72Input/Output
22
gemma-4-31b
Google
83.2
17.3K
262K
¥3.24 / ¥7.2Input/Output
23
qwen3.5-397b-a17b
Alibaba
82.4
11.8K
262K
¥3.1 / ¥18.6Input/Output
24
gemma-4-26b-a4b
Google
81.6
10.7K
262K
¥0.94 / ¥2.88Input/Output
25
grok-4.20-beta-0309-reasoning
Xai
80.8
9.6K
2M
¥14.4 / ¥43.2Input/Output
26
kimi-k2.5-instant
Moonshot
80.0
3.9K
262K
¥4.32 / ¥21.6Input/Output
27
grok-4.20-multi-agent-beta-0309
Xai
79.2
8.9K
2M
¥14.4 / ¥43.2Input/Output
28
gemini-2.5-flash-preview-09-2025
Google
78.4
4.7K
1M
¥2.16 / ¥18Input/Output
29
gpt-5.1-high
Openai
77.6
9.2K
400K
¥9 / ¥72Input/Output
30
ernie-5.0-preview-1220
Baidu
76.8
3.2K
128K
¥7.92 / ¥14.4Input/Output
31
gpt-5.5-instant
Openai
76.0
3.6K
400K
¥9 / ¥72Input/Output
32
qwen3-vl-235b-a22b-instruct
Alibaba
75.2
12.2K
128K
¥2.16 / ¥8.64Input/Output
33
gpt-5.2-high
Openai
74.4
15.2K
400K
¥12.6 / ¥101Input/Output
34
gpt-5.4-mini-high
Openai
73.6
8.6K
400K
¥5.4 / ¥32.4Input/Output
35
chatgpt-4o-latest-20250326
Openai
72.8
23.4K
128K
¥18 / ¥72Input/Output
36
mimo-v2.5
Xiaomi
72.0
6.9K
1.05M
¥2.88 / ¥14.4Input/Output
37
gemini-3.1-flash-lite-preview
Google
71.2
14K
1.05M
¥1.8 / ¥10.8Input/Output
38
grok-4.3
Xai
70.4
3.9K
1M
¥9 / ¥18Input/Output
39
qwen3.5-122b-a10b
Alibaba
69.6
10.1K
262K
¥2.88 / ¥23Input/Output
40
qwen3.5-27b
Alibaba
68.8
9.4K
262K
¥2.16 / ¥17.3Input/Output
41
gemini-2.5-flash
Google
68.0
57.5K
1.05M
¥2.16 / ¥18Input/Output
42
gpt-5.1
Openai
67.2
10.1K
400K
¥9 / ¥72Input/Output
43
gpt-5-chat
Openai
66.4
40.8K
400K
¥9 / ¥72Input/Output
44
gpt-5.2
Openai
65.6
16.1K
400K
¥12.6 / ¥101Input/Output
45
mimo-v2-omni
Xiaomi
64.8
7.5K
262K
¥2.88 / ¥14.4Input/Output
46
o3-2025-04-16
Openai
64.0
46.7K
200K
¥14.4 / ¥57.6Input/Output
47
qwen-vl-max-2025-08-13
Alibaba
63.2
3.2K
131K
¥1.66 / ¥4.13Input/Output
48
gpt-5-high
Openai
62.4
35.1K
400K
¥9 / ¥72Input/Output
49
gpt-4.1-2025-04-14
Openai
61.6
42.1K
1.05M
¥14.4 / ¥57.6Input/Output
50
grok-4-0709
Xai
60.8
32.8K
256K
¥21.6 / ¥108Input/Output
51
qwen3-vl-235b-a22b-thinking
Alibaba
60.0
2.4K
131K
¥2.06 / ¥8.26Input/Output
52
gpt-5-mini-high
Openai
59.2
29.6K
400K
¥1.8 / ¥14.4Input/Output
53
gemini-2.5-flash-lite-preview-09-2025-no-thinking
Google
58.4
4.8K
1.05M
¥0.72 / ¥2.88Input/Output
54
grok-4-1-fast-reasoning
Xai
57.6
10.6K
2M
¥1.44 / ¥3.6Input/Output
55
gpt-5.4-nano-high
Openai
56.8
8.6K
400K
¥1.44 / ¥9Input/Output
56
gpt-4.5-preview-2025-02-27
Openai
56.0
2.9K
8.19K
¥216 / ¥432Input/Output
57
o4-mini-2025-04-16
Openai
55.2
42.4K
200K
¥7.92 / ¥31.7Input/Output
58
claude-sonnet-4-20250514-thinking-32k
Anthropic
54.4
1.2K
200K
¥21.6 / ¥108Input/Output
59
claude-opus-4-20250514-thinking-16k
Anthropic
53.6
1.3K
200K
¥108 / ¥540Input/Output
60
gemini-2.5-flash-lite-preview-06-17-thinking
Google
52.8
36.9K
65.5K
¥0.72 / ¥2.88Input/Output
61
step-1o-turbo-202506
Stepfun
52.0
1.9K
-
-
62
hunyuan-vision-1.5-thinking
Tencent
51.2
2.4K
-
-
63
gpt-4.1-mini-2025-04-14
Openai
50.4
41.4K
1.05M
¥2.88 / ¥11.5Input/Output
64
hunyuan-large-vision
Tencent
49.6
1.3K
-
-
65
step-3
Stepfun
48.8
3.3K
65.5K
¥1.8 / ¥4.68Input/Output
66
claude-opus-4-20250514
Anthropic
48.0
2.2K
200K
¥108 / ¥540Input/Output
67
claude-sonnet-4-20250514
Anthropic
47.2
1.9K
200K
¥21.6 / ¥108Input/Output
68
mistral-medium-2508
Mistral
46.4
42.2K
262K
¥2.88 / ¥14.4Input/Output
69
claude-3-7-sonnet-20250219-thinking-32k
Anthropic
45.6
1.5K
-
-
70
o1-2024-12-17
Openai
44.8
3.7K
128K
¥108 / ¥432Input/Output
71
glm-4.6v
Zai
44.0
2.3K
128K
¥2.16 / ¥6.48Input/Output
72
gemma-3-27b-it
Google
43.2
17.4K
128K
¥2.15 / ¥2.15Input/Output
73
gpt-5-nano-high
Openai
42.4
4K
400K
¥0.36 / ¥2.88Input/Output
74
gemini-1.5-pro-002
Google
41.6
8.9K
-
-
75
gemini-2.0-flash-001
Google
40.8
10.2K
1.05M
¥1.08 / ¥4.32Input/Output
76
mistral-medium-2505
Mistral
40.0
11K
262K
¥2.88 / ¥14.4Input/Output
77
glm-4.5v
Zai
39.2
3.3K
64K
¥4.32 / ¥13Input/Output
78
qwen2.5-vl-32b-instruct
Alibaba
38.4
1.5K
16.4K
¥0.39 / ¥1.57Input/Output
79
claude-3-7-sonnet-20250219
Anthropic
37.6
4.6K
200K
¥21.6 / ¥108Input/Output
80
llama-4-maverick-17b-128e-instruct
Meta
36.8
7.1K
1M
¥1.8 / ¥6.26Input/Output
81
mistral-small-2506
Mistral
36.0
11.2K
262K
¥2.88 / ¥14.4Input/Output
82
gemini-1.5-flash-002
Google
35.2
7.2K
2M
¥0.54 / ¥2.2Input/Output
83
mistral-small-3.1-24b-instruct-2503
Mistral
34.4
29.4K
262K
¥2.88 / ¥14.4Input/Output
84
gpt-4o-2024-05-13
Openai
33.6
23.3K
128K
¥36 / ¥108Input/Output
85
claude-3-5-sonnet-20241022
Anthropic
32.8
10.4K
200K
¥21.6 / ¥108Input/Output
86
step-1o-vision-32k-highres
Stepfun
32.0
2.8K
-
-
87
claude-3-5-sonnet-20240620
Anthropic
31.2
21.6K
200K
¥21.6 / ¥108Input/Output
88
llama-4-scout-17b-16e-instruct
Meta
30.4
6.6K
128K
¥1.44 / ¥5.62Input/Output
89
qwen2.5-vl-72b-instruct
Alibaba
29.6
3.8K
131K
¥16.5 / ¥49.5Input/Output
90
gemini-2.0-flash-lite-preview-02-05
Google
28.8
4K
1.05M
¥0.54 / ¥2.16Input/Output
91
claude-3-5-haiku-20241022
Anthropic
28.0
1.4K
200K
¥5.76 / ¥28.8Input/Output
92
gpt-4-turbo-2024-04-09
Openai
27.2
13.4K
128K
¥72 / ¥216Input/Output
93
pixtral-large-2411
Mistral
26.4
5.4K
128K
¥14.4 / ¥43.2Input/Output
94
gemini-1.5-pro-001
Google
25.6
16.7K
-
-
95
molmo-2-8b
Allenai
24.8
1.2K
-
-
96
gpt-4o-mini-2024-07-18
Openai
24.0
17.3K
128K
¥1.08 / ¥4.32Input/Output
97
gpt-4o-2024-08-06
Openai
23.2
3.4K
128K
¥18 / ¥72Input/Output
98
gpt-4.1-nano-2025-04-14
Openai
22.4
1.2K
1.05M
¥14.4 / ¥57.6Input/Output
99
qwen-vl-max-1119
Alibaba
21.6
1.4K
131K
¥1.66 / ¥4.13Input/Output
100
qwen2-vl-72b
Alibaba
20.8
5.9K
-
-
101
step-1v-32k
Stepfun
20.0
1.5K
32.8K
¥14.8 / ¥69Input/Output
102
gemini-1.5-flash-8b-001
Google
19.2
6.2K
2M
¥0.54 / ¥2.2Input/Output
103
claude-3-opus-20240229
Anthropic
18.4
15.6K
200K
¥108 / ¥540Input/Output
104
molmo-72b-0924
Allenai
17.6
3K
-
-
105
pixtral-12b-2409
Mistral
16.8
7.5K
128K
¥1.08 / ¥1.08Input/Output
106
gemini-1.5-flash-001
Google
16.0
13.3K
2M
¥0.54 / ¥2.2Input/Output
107
internvl2-26b
-
15.2
5.1K
-
-
108
llama-3.2-vision-90b-instruct
Meta
14.4
8.7K
131K
¥2.48 / ¥2.48Input/Output
109
hunyuan-standard-vision-2024-12-31
Tencent
13.6
809
-
-
110
c4ai-aya-vision-32b
Cohere
12.8
847
-
-
111
amazon-nova-lite-v1.0
Amazon
12.0
1.9K
300K
¥0.43 / ¥1.73Input/Output
112
qwen2-vl-7b-instruct
Alibaba
11.2
5.8K
131K
¥2.07 / ¥5.16Input/Output
113
claude-3-sonnet-20240229
Anthropic
10.4
12.3K
200K
¥21.6 / ¥108Input/Output
114
amazon-nova-pro-v1.0
Amazon
9.6
2.3K
300K
¥5.76 / ¥23Input/Output
115
yi-vision
-
8.8
1.2K
-
-
116
llama-3.2-vision-11b-instruct
Meta
8.0
4.8K
131K
¥2.48 / ¥2.48Input/Output
117
molmo-7b-d-0924
Allenai
7.2
2.8K
-
-
118
claude-3-haiku-20240307
Anthropic
6.4
13.4K
200K
¥1.8 / ¥9Input/Output
119
internvl2-4b
-
5.6
3.7K
-
-
120
nvila-internal-15b-v1
Nvidia
4.8
1.1K
-
-
121
llava-v1.6-34b
-
4.0
4.5K
-
-
122
cogvlm2-llama3-chat-19b
Zai
3.2
2K
-
-
123
llava-onevision-qwen2-72b-ov
-
2.4
1.3K
-
-
124
minicpm-v-2_6
-
1.6
2K
-
-
125
phi-3.5-vision-instruct
Microsoft
0.8
2.6K
128K
¥1.15 / ¥4.61Input/Output
126
phi-3-vision-128k-instruct
Microsoft
0.0
1.4K
128K
¥1.08 / ¥4.32Input/Output
Top model analysis

claude-opus-4-7 why it ranks first

claude-opus-4-7 ranks first with a percent score of 100.0 and 6.8K samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

总榜排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

总榜模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。