Chat · Text · Longer Query Leaderboard

Ranking for Text / Longer Query, based on public preference data.

Selection guide

Longer Query model ranking guide

Ranking for Text / Longer Query, based on public preference data.

claude-opus-4-6-thinkingclaude-opus-4-6claude-opus-4-7-thinkingclaude-opus-4-7gpt-5.5-high
Current DirectoryChat · Text · Longer Query
Models338
Published2026/05/27
Arena public preference evaluationOriginal leaderboard: Text / Longer QueryPublished: 2026/05/27Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
claude-opus-4-6-thinking
Anthropic
100.0
11.9K
1M
¥36 / ¥180Input/Output
2
claude-opus-4-6
Anthropic
99.7
13.1K
1M
¥36 / ¥180Input/Output
3
claude-opus-4-7-thinking
Anthropic
99.4
8.3K
1M
¥36 / ¥180Input/Output
4
claude-opus-4-7
Anthropic
99.1
8.7K
1M
¥36 / ¥180Input/Output
5
gpt-5.5-high
Openai
98.8
6.8K
1.05M
¥36 / ¥216Input/Output
6
qwen3.7-max-preview
Alibaba
98.5
1.6K
1M
¥18 / ¥54Input/Output
7
gemini-3.1-pro-preview
Google
98.2
15.9K
1.05M
¥14.4 / ¥86.4Input/Output
8
claude-sonnet-4-6
Anthropic
97.9
10K
1M
¥21.6 / ¥108Input/Output
9
mimo-v2.5-pro
Xiaomi
97.6
6.3K
1.05M
¥7.2 / ¥21.6Input/Output
10
gemini-3.5-flash
Google
97.3
3.7K
1.05M
¥10.8 / ¥64.8Input/Output
11
gpt-5.4-high
Openai
97.0
10.6K
1.05M
¥18 / ¥108Input/Output
12
claude-opus-4-5-20251101
Anthropic
96.7
20.1K
200K
¥36 / ¥180Input/Output
13
claude-opus-4-5-20251101-thinking-32k
Anthropic
96.4
9.3K
200K
¥108 / ¥540Input/Output
14
claude-sonnet-4-5-20250929
Anthropic
96.1
22.4K
200K
¥21.6 / ¥108Input/Output
15
glm-5.1
Zai
95.8
5.4K
200K
¥0 / ¥0Input/Output
16
qwen3.5-max-preview
Alibaba
95.5
7.4K
-
-
17
gpt-5.5
Openai
95.3
7K
1.05M
¥36 / ¥216Input/Output
18
claude-sonnet-4-5-20250929-thinking-32k
Anthropic
95.0
22.1K
200K
¥21.6 / ¥108Input/Output
19
gemini-3-pro
Google
94.7
10.5K
1.05M
¥14.4 / ¥86.4Input/Output
20
gpt-5.4
Openai
94.4
11.3K
1.05M
¥18 / ¥108Input/Output
21
ernie-5.1
Baidu
94.1
5.5K
119K
¥5.4 / ¥21.6Input/Output
22
kimi-k2.6
Moonshot
93.8
6.1K
262K
¥6.84 / ¥28.8Input/Output
23
claude-opus-4-1-20250805-thinking-16k
Anthropic
93.5
11.1K
200K
¥108 / ¥540Input/Output
24
mimo-v2-pro
Xiaomi
93.2
8.2K
1.05M
¥7.2 / ¥21.6Input/Output
25
qwen3.6-max-preview
Alibaba
92.9
1.8K
246K
¥9.5 / ¥56.9Input/Output
26
deepseek-v4-pro-thinking
Deepseek
92.6
6.4K
1M
¥3.13 / ¥6.26Input/Output
27
deepseek-v4-pro
Deepseek
92.3
6.8K
1M
¥3.13 / ¥6.26Input/Output
28
gemini-3-flash
Google
92.0
7.9K
1.05M
¥3.6 / ¥21.6Input/Output
29
gemini-2.5-pro
Google
91.7
31.2K
1.05M
¥9 / ¥72Input/Output
30
glm-5
Zai
91.4
7.2K
205K
¥7.2 / ¥23Input/Output
31
muse-spark
Meta
91.1
4.4K
-
-
32
gpt-5.1-high
Openai
90.8
10.3K
400K
¥9 / ¥72Input/Output
33
mimo-v2.5
Xiaomi
90.5
6.7K
1.05M
¥2.88 / ¥14.4Input/Output
34
claude-opus-4-1-20250805
Anthropic
90.2
17.7K
200K
¥108 / ¥540Input/Output
35
gemma-4-31b
Google
89.9
1.6K
262K
¥3.24 / ¥7.2Input/Output
36
qwen3.5-397b-a17b
Alibaba
89.6
11.6K
262K
¥3.1 / ¥18.6Input/Output
37
kimi-k2.5-thinking
Moonshot
89.3
12K
262K
¥4.32 / ¥21.6Input/Output
38
grok-3-preview-02-24
Xai
89.0
4.9K
1M
¥9 / ¥18Input/Output
39
qwen3-max-preview
Alibaba
88.7
6K
262K
¥6.2 / ¥24.8Input/Output
40
qwen3.6-plus
Alibaba
88.4
7.2K
1M
¥3.6 / ¥21.6Input/Output
41
kimi-k2.5-instant
Moonshot
88.1
2.2K
262K
¥4.32 / ¥21.6Input/Output
42
amazon-nova-experimental-chat-26-02-10
Amazon
87.8
947
-
-
43
glm-4.7
Zai
87.5
3.1K
205K
¥0 / ¥0Input/Output
44
grok-4.20-beta-0309-reasoning
Xai
87.2
10.9K
2M
¥14.4 / ¥43.2Input/Output
45
gemini-3-flash (thinking-minimal)
Google
86.9
17.4K
1.05M
¥3.6 / ¥21.6Input/Output
46
gpt-5.1
Openai
86.6
11.5K
400K
¥9 / ¥72Input/Output
47
grok-4.20-multi-agent-beta-0309
Xai
86.4
10.7K
2M
¥14.4 / ¥43.2Input/Output
48
deepseek-v4-flash
Deepseek
86.1
6.9K
1M
¥1.01 / ¥2.02Input/Output
49
gemma-4-26b-a4b
Google
85.8
1.5K
262K
¥0.94 / ¥2.88Input/Output
50
dola-seed-2.0-pro
Bytedance
85.5
13.1K
-
-
51
claude-haiku-4-5-20251001
Anthropic
85.2
22.9K
200K
¥7.2 / ¥36Input/Output
52
gpt-5.2-chat-latest-20260210
Openai
84.9
11.5K
400K
¥12.6 / ¥101Input/Output
53
deepseek-v3.2
Deepseek
84.6
12.2K
128K
¥2.09 / ¥3.1Input/Output
54
qwen3-235b-a22b-instruct-2507
Alibaba
84.3
24.6K
128K
¥2.09 / ¥8.23Input/Output
55
grok-4.20-beta1
Xai
84.0
8.2K
2M
¥14.4 / ¥43.2Input/Output
56
deepseek-v3.2-thinking
Deepseek
83.7
10.4K
128K
¥2.09 / ¥3.1Input/Output
57
glm-4.6
Zai
83.4
8.8K
205K
¥4.32 / ¥15.8Input/Output
58
deepseek-v4-flash-thinking
Deepseek
83.1
6.8K
1M
¥1.01 / ¥2.02Input/Output
59
ernie-5.0-0110
Baidu
82.8
10.3K
128K
¥7.92 / ¥14.4Input/Output
60
longcat-flash-chat-2602-exp
Meituan
82.5
8.5K
128K
¥1.08 / ¥10.8Input/Output
61
qwen3-vl-235b-a22b-instruct
Alibaba
82.2
2.2K
128K
¥2.16 / ¥8.64Input/Output
62
deepseek-v3.1-thinking
Deepseek
81.9
2.4K
128K
¥1.44 / ¥5.04Input/Output
63
gpt-5.5-instant
Openai
81.6
11.1K
400K
¥9 / ¥72Input/Output
64
claude-opus-4-20250514-thinking-16k
Anthropic
81.3
6.9K
200K
¥108 / ¥540Input/Output
65
deepseek-v3.1-terminus-thinking
Deepseek
81.0
680
128K
¥1.8 / ¥5.04Input/Output
66
mimo-v2-omni
Xiaomi
80.7
1.3K
262K
¥2.88 / ¥14.4Input/Output
67
gemini-2.5-flash
Google
80.4
30.5K
1.05M
¥2.16 / ¥18Input/Output
68
deepseek-v3.2-exp
Deepseek
80.1
2.9K
128K
¥0 / ¥0Input/Output
69
grok-4-fast-chat
Xai
79.8
1.5K
2M
¥1.44 / ¥3.6Input/Output
70
minimax-m2.1-preview
Minimax
79.5
4.3K
205K
¥0 / ¥0Input/Output
71
minimax-m2.7
Minimax
79.2
8.4K
205K
¥0 / ¥0Input/Output
72
grok-4.1
Xai
78.9
18.6K
200K
¥14.4 / ¥72Input/Output
73
kimi-k2-thinking-turbo
Moonshot
78.6
17.5K
262K
¥17.3 / ¥72Input/Output
74
chatgpt-4o-latest-20250326
Openai
78.3
17.3K
128K
¥18 / ¥72Input/Output
75
gpt-5.2
Openai
78.0
15.7K
400K
¥12.6 / ¥101Input/Output
76
qwen3-max-2025-09-23
Alibaba
77.7
2.1K
258K
¥6.19 / ¥24.7Input/Output
77
qwen3.5-27b
Alibaba
77.4
9.2K
262K
¥2.16 / ¥17.3Input/Output
78
mistral-large-3
Mistral
77.2
11.9K
262K
¥3.6 / ¥10.8Input/Output
79
glm-4.5
Zai
76.9
5K
131K
¥4.32 / ¥15.8Input/Output
80
hunyuan-hy3-preview
Tencent
76.6
2.3K
256K
¥0 / ¥0Input/Output
81
grok-4-fast-reasoning
Xai
76.3
4.5K
2M
¥1.44 / ¥3.6Input/Output
82
gpt-5.2-high
Openai
76.0
14.9K
400K
¥12.6 / ¥101Input/Output
83
grok-4-0709
Xai
75.7
8.8K
256K
¥21.6 / ¥108Input/Output
84
grok-4.1-thinking
Xai
75.4
18.6K
200K
¥14.4 / ¥72Input/Output
85
ernie-5.0-preview-1203
Baidu
75.1
2.5K
128K
¥7.92 / ¥14.4Input/Output
86
gpt-5.4-mini-high
Openai
74.8
10.2K
400K
¥5.4 / ¥32.4Input/Output
87
mimo-v2-flash (non-thinking)
Xiaomi
74.5
14K
262K
¥0.72 / ¥2.16Input/Output
88
mistral-medium-2508
Mistral
74.2
25.2K
262K
¥2.88 / ¥14.4Input/Output
89
deepseek-v3.2-exp-thinking
Deepseek
73.9
2.1K
128K
¥0 / ¥0Input/Output
90
ernie-5.0-preview-1022
Baidu
73.6
1.2K
128K
¥7.92 / ¥14.4Input/Output
91
qwen3.5-122b-a10b
Alibaba
73.3
9.6K
262K
¥2.88 / ¥23Input/Output
92
gemini-2.5-flash-preview-09-2025
Google
73.0
7.9K
1M
¥2.16 / ¥18Input/Output
93
gpt-4.5-preview-2025-02-27
Openai
72.7
1.8K
8.19K
¥216 / ¥432Input/Output
94
step-3.5-flash
Stepfun
72.4
11.3K
256K
¥0.69 / ¥2.07Input/Output
95
qwen3-next-80b-a3b-instruct
Alibaba
72.1
5.2K
131K
¥1.04 / ¥4.13Input/Output
96
grok-4.3
Xai
71.8
6.6K
1M
¥9 / ¥18Input/Output
97
claude-opus-4-20250514
Anthropic
71.5
8.4K
200K
¥108 / ¥540Input/Output
98
amazon-nova-experimental-chat-26-01-10
Amazon
71.2
886
-
-
99
gpt-5-chat
Openai
70.9
6.7K
400K
¥9 / ¥72Input/Output
100
deepseek-v3.1
Deepseek
70.6
3.2K
128K
¥1.44 / ¥5.04Input/Output
101
hunyuan-vision-1.5-thinking
Tencent
70.3
506
-
-
102
qwen3-235b-a22b-thinking-2507
Alibaba
70.0
1.6K
131K
¥2.07 / ¥8.26Input/Output
103
claude-sonnet-4-20250514-thinking-32k
Anthropic
69.7
6.7K
200K
¥21.6 / ¥108Input/Output
104
amazon-nova-experimental-chat-11-10
Amazon
69.4
6.1K
-
-
105
deepseek-v3.1-terminus
Deepseek
69.1
846
128K
¥1.8 / ¥5.04Input/Output
106
mimo-v2-flash (thinking)
Xiaomi
68.8
2.8K
262K
¥0.72 / ¥2.16Input/Output
107
gemini-3.1-flash-lite-preview
Google
68.5
12.7K
1.05M
¥1.8 / ¥10.8Input/Output
108
gpt-5.3-chat-latest
Openai
68.2
11K
128K
¥12.6 / ¥101Input/Output
109
qwen3-vl-235b-a22b-thinking
Alibaba
68.0
1.7K
131K
¥2.06 / ¥8.26Input/Output
110
qwen3.5-35b-a3b
Alibaba
67.7
9.9K
262K
¥1.8 / ¥14.4Input/Output
111
qwen3.5-flash
Alibaba
67.4
10.5K
1M
¥1.24 / ¥12.4Input/Output
112
deepseek-r1-0528
Deepseek
67.1
3.1K
164K
¥3.6 / ¥15.5Input/Output
113
grok-4-1-fast-reasoning
Xai
66.8
16K
2M
¥1.44 / ¥3.6Input/Output
114
longcat-flash-chat
Meituan
66.5
2.3K
128K
¥1.08 / ¥10.8Input/Output
115
qwen3-235b-a22b-no-thinking
Alibaba
66.2
7.1K
131K
¥2.07 / ¥8.26Input/Output
116
amazon-nova-experimental-chat-12-10
Amazon
65.9
760
-
-
117
gpt-5-high
Openai
65.6
6.8K
400K
¥9 / ¥72Input/Output
118
hunyuan-t1-20250711
Tencent
65.3
889
131K
¥0 / ¥0Input/Output
119
gpt-4.1-2025-04-14
Openai
65.0
9.8K
1.05M
¥14.4 / ¥57.6Input/Output
120
claude-sonnet-4-20250514
Anthropic
64.7
7.7K
200K
¥21.6 / ¥108Input/Output
121
qwen3-coder-480b-a35b-instruct
Alibaba
64.4
4.9K
262K
¥6.2 / ¥24.8Input/Output
122
qwen3-30b-a3b-instruct-2507
Alibaba
64.1
4.9K
262K
¥2.16 / ¥3.6Input/Output
123
o1-2024-12-17
Openai
63.8
3.6K
128K
¥108 / ¥432Input/Output
124
gemini-2.5-flash-lite-preview-06-17-thinking
Google
63.5
6.3K
65.5K
¥0.72 / ¥2.88Input/Output
125
claude-3-7-sonnet-20250219-thinking-32k
Anthropic
63.2
6.2K
-
-
126
grok-3-mini-high
Xai
62.9
3.3K
128K
¥0 / ¥0Input/Output
127
gemini-2.5-flash-lite-preview-09-2025-no-thinking
Google
62.6
11.5K
1.05M
¥0.72 / ¥2.88Input/Output
128
o3-2025-04-16
Openai
62.3
11.5K
200K
¥14.4 / ¥57.6Input/Output
129
amazon-nova-experimental-chat-10-20
Amazon
62.0
2.6K
-
-
130
nvidia-nemotron-3-super-120b-a12b
Nvidia
61.7
2K
262K
¥1.44 / ¥5.76Input/Output
131
gpt-5.4-nano-high
Openai
61.4
10K
400K
¥1.44 / ¥9Input/Output
132
glm-4.5-air
Zai
61.1
6.6K
131K
¥0 / ¥0Input/Output
133
minimax-m2.5
Minimax
60.8
12.6K
205K
¥0 / ¥0Input/Output
134
hunyuan-turbos-20250416
Tencent
60.5
1.5K
131K
¥0 / ¥0Input/Output
135
mistral-medium-2505
Mistral
60.2
5.9K
262K
¥2.88 / ¥14.4Input/Output
136
glm-4.6v
Zai
59.9
663
128K
¥2.16 / ¥6.48Input/Output
137
qwen2.5-max
Alibaba
59.6
4.5K
32K
¥11.5 / ¥46Input/Output
138
grok-3-mini-beta
Xai
59.3
4.1K
1M
¥9 / ¥18Input/Output
139
gpt-5-mini-high
Openai
59.1
5.8K
400K
¥1.8 / ¥14.4Input/Output
140
deepseek-r1
Deepseek
58.8
2.3K
164K
¥5.04 / ¥18Input/Output
141
claude-3-7-sonnet-20250219
Anthropic
58.5
6.7K
200K
¥21.6 / ¥108Input/Output
142
trinity-large-preview
-
58.2
9.7K
262K
¥1.8 / ¥6.48Input/Output
143
deepseek-v3-0324
Deepseek
57.9
8.5K
75K
¥1.44 / ¥5.76Input/Output
144
qwen3-next-80b-a3b-thinking
Alibaba
57.6
2.8K
131K
¥1.04 / ¥10.3Input/Output
145
qwen3-235b-a22b
Alibaba
57.3
4.6K
131K
¥2.07 / ¥8.26Input/Output
146
kimi-k2-0905-preview
Moonshot
57.0
2.5K
262K
¥4.32 / ¥18Input/Output
147
glm-4.7-flash
Zai
56.7
3.2K
200K
¥0 / ¥0Input/Output
148
step-1o-turbo-202506
Stepfun
56.4
1.3K
-
-
149
gemini-2.0-flash-001
Google
56.1
6.5K
1.05M
¥1.08 / ¥4.32Input/Output
150
o1-preview
Openai
55.8
4.6K
128K
¥108 / ¥432Input/Output
151
gpt-4.1-mini-2025-04-14
Openai
55.5
7K
1.05M
¥2.88 / ¥11.5Input/Output
152
deepseek-v3
Deepseek
55.2
3.1K
128K
¥0 / ¥0Input/Output
153
o3-mini-high
Openai
54.9
2.2K
200K
¥7.92 / ¥31.7Input/Output
154
hunyuan-large-2025-02-10
Tencent
54.6
371
-
-
155
nova-2-lite
Amazon
54.3
2.9K
128K
¥2.38 / ¥19.8Input/Output
156
minimax-m2
Minimax
54.0
1.6K
197K
¥0 / ¥0Input/Output
157
command-a-03-2025
Cohere
53.7
10.5K
256K
¥18 / ¥72Input/Output
158
gemma-3-27b-it
Google
53.4
6.6K
128K
¥2.15 / ¥2.15Input/Output
159
amazon-nova-experimental-chat-10-09
Amazon
53.1
575
-
-
160
mistral-small-2506
Mistral
52.8
3.4K
262K
¥2.88 / ¥14.4Input/Output
161
minimax-m1
Minimax
52.5
7K
1M
¥0.95 / ¥9.03Input/Output
162
intellect-3
-
52.2
1.3K
131K
¥1.44 / ¥7.92Input/Output
163
mercury-2
Inception Ai
51.9
844
128K
¥1.8 / ¥5.4Input/Output
164
qwen3-32b
Alibaba
51.6
504
131K
¥2.07 / ¥8.26Input/Output
165
o3-mini
Openai
51.3
9.2K
200K
¥7.92 / ¥31.7Input/Output
166
step-3
Stepfun
51.0
1.3K
65.5K
¥1.8 / ¥4.68Input/Output
167
kimi-k2-0711-preview
Moonshot
50.7
5.5K
131K
¥4.32 / ¥18Input/Output
168
qwen-plus-0125
Alibaba
50.4
690
1M
¥0.83 / ¥2.07Input/Output
169
ling-flash-2.0
Ant Group
50.1
1.3K
131K
¥1.01 / ¥4.1Input/Output
170
trinity-large-thinking
-
49.9
9.5K
262K
¥1.8 / ¥6.48Input/Output
171
gemini-2.0-flash-lite-preview-02-05
Google
49.6
2.9K
1.05M
¥0.54 / ¥2.16Input/Output
172
o1-mini
Openai
49.3
7.9K
128K
¥7.92 / ¥31.7Input/Output
173
hunyuan-turbos-20250226
Tencent
49.0
287
131K
¥0 / ¥0Input/Output
174
ring-flash-2.0
Ant Group
48.7
1.3K
131K
¥1.01 / ¥4.1Input/Output
175
gpt-oss-120b
Openai
48.4
6.5K
131K
¥1.08 / ¥4.32Input/Output
176
gemma-3-12b-it
Google
48.1
371
128K
¥1.96 / ¥1.96Input/Output
177
glm-4-plus-0111
Zai
47.8
765
128K
¥72 / ¥72Input/Output
178
nvidia-llama-3.3-nemotron-super-49b-v1.5
Nvidia
47.5
642
131K
¥2.88 / ¥2.88Input/Output
179
o4-mini-2025-04-16
Openai
47.2
8.5K
200K
¥7.92 / ¥31.7Input/Output
180
qwen3-30b-a3b
Alibaba
46.9
4.5K
128K
¥0.79 / ¥7.78Input/Output
181
olmo-3.1-32b-instruct
Allenai
46.6
2.9K
200K
¥14.4 / ¥57.6Input/Output
182
claude-3-5-sonnet-20241022
Anthropic
46.3
13.8K
200K
¥21.6 / ¥108Input/Output
183
gpt-5-nano-high
Openai
46.0
1.6K
400K
¥0.36 / ¥2.88Input/Output
184
qwq-32b
Alibaba
45.7
4K
131K
¥2.07 / ¥6.2Input/Output
185
hunyuan-turbo-0110
Tencent
45.4
313
-
-
186
gemini-1.5-pro-002
Google
45.1
8.4K
-
-
187
step-2-16k-exp-202412
Stepfun
44.8
693
16.4K
¥37.5 / ¥118Input/Output
188
glm-4.5v
Zai
44.5
1.1K
64K
¥4.32 / ¥13Input/Output
189
nvidia-nemotron-3-nano-30b-a3b-bf16
Nvidia
44.2
3.9K
131K
¥0 / ¥0Input/Output
190
deepseek-v2.5-1210
Deepseek
43.9
1K
1M
¥1.01 / ¥2.02Input/Output
191
hunyuan-standard-2025-02-10
Tencent
43.6
432
-
-
192
llama-3.3-nemotron-49b-super-v1
Nvidia
43.3
317
131K
¥0 / ¥0Input/Output
193
mistral-small-3.1-24b-instruct-2503
Mistral
43.0
6.3K
262K
¥2.88 / ¥14.4Input/Output
194
llama-3.1-nemotron-ultra-253b-v1
Nvidia
42.7
321
128K
¥4.32 / ¥13Input/Output
195
yi-lightning
-
42.4
3.6K
12K
¥1.44 / ¥1.44Input/Output
196
magistral-medium-2506
Mistral
42.1
2.4K
128K
¥14.4 / ¥36Input/Output
197
olmo-3-32b-think
Allenai
41.8
1.3K
128K
¥2.16 / ¥3.24Input/Output
198
qwen2.5-plus-1127
Alibaba
41.5
1.6K
-
-
199
gemini-1.5-pro-001
Google
41.2
9.8K
-
-
200
athene-v2-chat
-
40.9
3.7K
-
-
201
gpt-4o-mini-2024-07-18
Openai
40.7
9K
128K
¥1.08 / ¥4.32Input/Output
202
gpt-4o-2024-05-13
Openai
40.4
14.9K
128K
¥36 / ¥108Input/Output
203
qwen-max-0919
Alibaba
40.1
2.4K
131K
¥2.48 / ¥9.91Input/Output
204
granite-4.1-8b
Ibm
39.8
1.3K
131K
¥0.36 / ¥0.72Input/Output
205
glm-4-plus
Zai
39.5
4K
128K
¥54 / ¥54Input/Output
206
gemini-1.5-flash-002
Google
39.2
5.5K
2M
¥0.54 / ¥2.2Input/Output
207
gpt-4o-2024-08-06
Openai
38.9
6.2K
128K
¥18 / ¥72Input/Output
208
gpt-4.1-nano-2025-04-14
Openai
38.6
730
1.05M
¥14.4 / ¥57.6Input/Output
209
qwen2.5-72b-instruct
Alibaba
38.3
6K
131K
¥4.13 / ¥12.4Input/Output
210
llama-4-maverick-17b-128e-instruct
Meta
38.0
7.1K
1M
¥1.8 / ¥6.26Input/Output
211
hunyuan-large-vision
Tencent
37.7
877
-
-
212
gemma-3n-e4b-it
Google
37.4
2.9K
128K
¥0 / ¥0Input/Output
213
grok-2-2024-08-13
Xai
37.1
8.9K
1M
¥9 / ¥18Input/Output
214
claude-3-5-sonnet-20240620
Anthropic
36.8
10.6K
200K
¥21.6 / ¥108Input/Output
215
gemma-3-4b-it
Google
36.5
444
128K
¥1.44 / ¥1.44Input/Output
216
deepseek-v2.5
Deepseek
36.2
3.5K
1M
¥1.01 / ¥2.02Input/Output
217
llama-4-scout-17b-16e-instruct
Meta
35.9
5.8K
128K
¥1.44 / ¥5.62Input/Output
218
grok-2-mini-2024-08-13
Xai
35.6
7K
1M
¥9 / ¥18Input/Output
219
llama-3.1-405b-instruct-bf16
Meta
35.3
5.5K
128K
¥0 / ¥0Input/Output
220
olmo-3.1-32b-think
Allenai
35.0
2K
200K
¥14.4 / ¥57.6Input/Output
221
mercury
Inception Ai
34.7
493
128K
¥1.8 / ¥5.4Input/Output
222
mistral-large-2411
Mistral
34.4
3.6K
128K
¥14.4 / ¥43.2Input/Output
223
claude-3-5-haiku-20241022
Anthropic
34.1
11K
200K
¥5.76 / ¥28.8Input/Output
224
llama-3.1-405b-instruct-fp8
Meta
33.8
8K
128K
¥0 / ¥0Input/Output
225
claude-3-opus-20240229
Anthropic
33.5
23.4K
200K
¥108 / ¥540Input/Output
226
mistral-large-2407
Mistral
33.2
5.9K
131K
¥14.4 / ¥43.2Input/Output
227
llama-3.3-70b-instruct
Meta
32.9
8.3K
128K
¥0 / ¥0Input/Output
228
amazon-nova-pro-v1.0
Amazon
32.6
3.4K
300K
¥5.76 / ¥23Input/Output
229
gpt-4-turbo-2024-04-09
Openai
32.3
11.6K
128K
¥72 / ¥216Input/Output
230
gemini-advanced-0514
Google
32.0
5.5K
-
-
231
gemini-1.5-flash-001
Google
31.8
7.8K
2M
¥0.54 / ¥2.2Input/Output
232
qwen2.5-coder-32b-instruct
Alibaba
31.5
804
131K
¥2.07 / ¥6.2Input/Output
233
gpt-oss-20b
Openai
31.2
2.1K
131K
¥0.32 / ¥1.3Input/Output
234
mistral-small-24b-instruct-2501
Mistral
30.9
1.8K
262K
¥2.88 / ¥14.4Input/Output
235
gpt-4-0125-preview
Openai
30.6
9.7K
8.19K
¥216 / ¥432Input/Output
236
ibm-granite-h-small
Ibm
30.3
1.3K
-
-
237
llama-3.1-70b-instruct
Meta
30.0
7.6K
131K
¥2.88 / ¥2.88Input/Output
238
llama-3.1-nemotron-70b-instruct
Nvidia
29.7
1K
128K
¥0 / ¥0Input/Output
239
gpt-4-1106-preview
Openai
29.4
9.1K
8.19K
¥216 / ¥432Input/Output
240
amazon-nova-lite-v1.0
Amazon
29.1
2.8K
300K
¥0.43 / ¥1.73Input/Output
241
athene-70b-0725
-
28.8
2.1K
-
-
242
gemma-2-27b-it
Google
28.5
9.9K
8.19K
¥0.58 / ¥0.58Input/Output
243
command-r-plus-08-2024
Cohere
28.2
1.4K
128K
¥18 / ¥72Input/Output
244
c4ai-aya-expanse-32b
Cohere
27.9
4.2K
-
-
245
gemma-2-9b-it-simpo
-
27.6
841
8.19K
¥1.44 / ¥1.44Input/Output
246
nemotron-4-340b-instruct
Nvidia
27.3
2.1K
-
-
247
hunyuan-standard-256k
Tencent
27.0
414
-
-
248
llama-3.1-tulu-3-70b
Allenai
26.7
463
-
-
249
reka-core-20240904
-
26.4
951
-
-
250
deepseek-coder-v2
Deepseek
26.1
1.8K
1M
¥1.01 / ¥2.02Input/Output
251
gemini-1.5-flash-8b-001
Google
25.8
5.6K
2M
¥0.54 / ¥2.2Input/Output
252
phi-4
Microsoft
25.5
2.9K
128K
¥0.9 / ¥3.6Input/Output
253
ministral-8b-2410
Mistral
25.2
720
128K
¥0.72 / ¥0.72Input/Output
254
claude-3-sonnet-20240229
Anthropic
24.9
11.9K
200K
¥21.6 / ¥108Input/Output
255
glm-4-0520
Zai
24.6
1.2K
128K
¥108 / ¥108Input/Output
256
jamba-1.5-large
-
24.3
1K
256K
¥0 / ¥0Input/Output
257
amazon-nova-micro-v1.0
Amazon
24.0
2.6K
128K
¥0.25 / ¥1.01Input/Output
258
llama-3.1-nemotron-51b-instruct
Nvidia
23.7
587
128K
¥0 / ¥0Input/Output
259
command-r-plus
Cohere
23.4
9.1K
128K
¥18 / ¥72Input/Output
260
command-r-08-2024
Cohere
23.1
1.4K
128K
¥18 / ¥72Input/Output
261
gemma-2-9b-it
Google
22.8
6.9K
8.19K
¥1.44 / ¥1.44Input/Output
262
olmo-2-0325-32b-instruct
Allenai
22.6
261
-
-
263
qwen2-72b-instruct
Alibaba
22.3
4.6K
131K
¥4.13 / ¥12.4Input/Output
264
reka-flash-20240904
-
22.0
992
65.5K
¥0.72 / ¥1.44Input/Output
265
c4ai-aya-expanse-8b
Cohere
21.7
1.5K
-
-
266
claude-3-haiku-20240307
Anthropic
21.4
14K
200K
¥1.8 / ¥9Input/Output
267
gpt-4-0314
Openai
21.1
4.3K
8.19K
¥216 / ¥432Input/Output
268
gpt-4-0613
Openai
20.8
7.5K
8.19K
¥216 / ¥432Input/Output
269
llama-3.1-8b-instruct
Meta
20.5
6.7K
131K
¥0.79 / ¥0.79Input/Output
270
llama-3.1-tulu-3-8b
Allenai
20.2
454
-
-
271
llama-3-70b-instruct
Meta
19.9
17.5K
8.19K
¥3.67 / ¥5.33Input/Output
272
mistral-large-2402
Mistral
19.6
6.2K
262K
¥2.88 / ¥14.4Input/Output
273
qwq-32b-preview
Alibaba
19.3
475
131K
¥2.07 / ¥6.2Input/Output
274
granite-3.1-8b-instruct
Ibm
19.0
461
-
-
275
command-r
Cohere
18.7
6.2K
128K
¥18 / ¥72Input/Output
276
qwen1.5-110b-chat
Alibaba
18.4
3.3K
-
-
277
qwen1.5-72b-chat
Alibaba
18.1
3.5K
-
-
278
jamba-1.5-mini
-
17.8
1K
256K
¥0 / ¥0Input/Output
279
granite-3.1-2b-instruct
Ibm
17.5
489
-
-
280
mistral-medium
Mistral
17.2
2.7K
262K
¥2.88 / ¥14.4Input/Output
281
internlm2_5-20b-chat
-
16.9
1.3K
-
-
282
qwen1.5-32b-chat
Alibaba
16.6
2.6K
-
-
283
mixtral-8x22b-instruct-v0.1
Mistral
16.3
5.9K
64K
¥14.4 / ¥43.2Input/Output
284
yi-1.5-34b-chat
-
16.0
2.1K
-
-
285
reka-flash-21b-20240226
-
15.7
2.7K
-
-
286
reka-flash-21b-20240226-online
-
15.4
1.7K
-
-
287
gemini-pro-dev-api
Google
15.1
1.4K
1.05M
¥14.4 / ¥86.4Input/Output
288
gemma-2-2b-it
Google
14.8
5.9K
128K
¥0 / ¥0Input/Output
289
llama-3-8b-instruct
Meta
14.5
11.2K
8.19K
¥0.29 / ¥0.29Input/Output
290
granite-3.0-8b-instruct
Ibm
14.2
603
-
-
291
gpt-3.5-turbo-0125
Openai
13.9
6.7K
16.4K
¥3.6 / ¥10.8Input/Output
292
phi-3-medium-4k-instruct
Microsoft
13.6
2.2K
4.1K
¥1.22 / ¥4.9Input/Output
293
qwen1.5-14b-chat
Alibaba
13.4
2.1K
-
-
294
dbrx-instruct-preview
-
13.1
3.5K
-
-
295
tulu-2-dpo-70b
-
12.8
280
-
-
296
zephyr-orpo-141b-A35b-v0.1
-
12.5
448
200K
¥108 / ¥432Input/Output
297
starling-lm-7b-beta
-
12.2
1.7K
200K
¥5.4 / ¥18.7Input/Output
298
mixtral-8x7b-instruct-v0.1
Mistral
11.9
6.6K
32K
¥5.04 / ¥5.04Input/Output
299
llama-3.2-3b-instruct
Meta
11.6
1K
131K
¥0.22 / ¥0.35Input/Output
300
wizardlm-70b
Microsoft
11.3
355
-
-
301
yi-34b-chat
-
11.0
1.1K
-
-
302
openchat-3.5
-
10.7
414
-
-
303
deepseek-llm-67b-chat
Deepseek
10.4
243
1M
¥1.01 / ¥2.02Input/Output
304
qwen1.5-7b-chat
Alibaba
10.1
372
-
-
305
phi-3-small-8k-instruct
Microsoft
9.8
2.2K
8.19K
¥1.08 / ¥4.32Input/Output
306
openchat-3.5-0106
-
9.5
1.1K
-
-
307
starling-lm-7b-alpha
-
9.2
666
200K
¥5.4 / ¥18.7Input/Output
308
granite-3.0-2b-instruct
Ibm
8.9
647
-
-
309
gpt-3.5-turbo-1106
Openai
8.6
933
16.4K
¥7.2 / ¥14.4Input/Output
310
llama-2-13b-chat
Meta
8.3
1.2K
-
-
311
llama-2-70b-chat
Meta
8.0
2.8K
-
-
312
mistral-7b-instruct-v0.2
Mistral
7.7
1.6K
262K
¥2.88 / ¥14.4Input/Output
313
smollm2-1.7b-instruct
-
7.4
328
-
-
314
gemma-1.1-7b-it
Google
7.1
2.6K
-
-
315
openhermes-2.5-mistral-7b
-
6.8
270
1M
¥36 / ¥180Input/Output
316
wizardlm-13b
Microsoft
6.5
203
-
-
317
llama-3.2-1b-instruct
Meta
6.2
1.1K
16.4K
¥0.07 / ¥0.08Input/Output
318
vicuna-13b
-
5.9
756
-
-
319
nous-hermes-2-mixtral-8x7b-dpo
-
5.6
230
1M
¥36 / ¥180Input/Output
320
phi-3-mini-4k-instruct
Microsoft
5.3
1.7K
4.1K
¥0.94 / ¥3.74Input/Output
321
zephyr-7b-beta
-
5.0
461
-
-
322
phi-3-mini-4k-instruct-june-2024
Microsoft
4.7
1K
4.1K
¥0.94 / ¥3.74Input/Output
323
vicuna-7b
-
4.5
162
-
-
324
vicuna-33b
-
4.2
1K
-
-
325
qwen-14b-chat
Alibaba
3.9
214
32.8K
¥1.04 / ¥3.1Input/Output
326
gemma-7b-it
Google
3.6
676
-
-
327
codellama-34b-instruct
Meta
3.3
320
-
-
328
palm-2
Google
3.0
251
-
-
329
gemma-1.1-2b-it
Google
2.7
1.2K
-
-
330
mistral-7b-instruct
Mistral
2.4
448
262K
¥2.88 / ¥14.4Input/Output
331
snowflake-arctic-instruct
-
2.1
2.8K
-
-
332
llama-2-7b-chat
Meta
1.8
835
128K
¥4.03 / ¥48Input/Output
333
llama2-70b-steerlm-chat
Nvidia
1.5
208
-
-
334
phi-3-mini-128k-instruct
Microsoft
1.2
2.1K
128K
¥0.94 / ¥3.74Input/Output
335
qwen1.5-4b-chat
Alibaba
0.9
692
-
-
336
stripedhyena-nous-7b
-
0.6
334
-
-
337
gemma-2b-it
Google
0.3
377
-
-
338
chatglm3-6b
-
0.0
205
200K
¥5.4 / ¥18.7Input/Output
Top model analysis

claude-opus-4-6-thinking why it ranks first

claude-opus-4-6-thinking ranks first with a percent score of 100.0 and 11.9K samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

长程任务排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

长程任务模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。