Chat · Text · Hard Prompts Leaderboard

Ranking for Text / Hard Prompts, based on public preference data.

Selection guide

Hard Prompts model ranking guide

Ranking for Text / Hard Prompts, based on public preference data.

claude-opus-4-6-thinkingclaude-opus-4-6claude-opus-4-7-thinkingclaude-opus-4-7gpt-5.4-high
Current DirectoryChat · Text · Hard Prompts
Models360
Published2026/05/27
Arena public preference evaluationOriginal leaderboard: Text / Hard PromptsPublished: 2026/05/27Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
claude-opus-4-6-thinking
Anthropic
100.0
20.4K
1M
¥36 / ¥180Input/Output
2
claude-opus-4-6
Anthropic
99.7
22.3K
1M
¥36 / ¥180Input/Output
3
claude-opus-4-7-thinking
Anthropic
99.4
13K
1M
¥36 / ¥180Input/Output
4
claude-opus-4-7
Anthropic
99.2
13.6K
1M
¥36 / ¥180Input/Output
5
gpt-5.4-high
Openai
98.9
17.4K
1.05M
¥18 / ¥108Input/Output
6
gemini-3.1-pro-preview
Google
98.6
26.7K
1.05M
¥14.4 / ¥86.4Input/Output
7
gpt-5.5-high
Openai
98.3
10.5K
1.05M
¥36 / ¥216Input/Output
8
gemini-3.5-flash
Google
98.1
5.9K
1.05M
¥10.8 / ¥64.8Input/Output
9
mimo-v2.5-pro
Xiaomi
97.8
10.1K
1.05M
¥7.2 / ¥21.6Input/Output
10
qwen3.5-max-preview
Alibaba
97.5
12.6K
-
-
11
claude-sonnet-4-6
Anthropic
97.2
16.9K
1M
¥21.6 / ¥108Input/Output
12
gemini-3-pro
Google
96.9
22.5K
1.05M
¥14.4 / ¥86.4Input/Output
13
qwen3.7-max-preview
Alibaba
96.7
2.6K
1M
¥18 / ¥54Input/Output
14
glm-5.1
Zai
96.4
8.8K
200K
¥0 / ¥0Input/Output
15
ernie-5.1
Baidu
96.1
9.3K
119K
¥5.4 / ¥21.6Input/Output
16
gpt-5.5
Openai
95.8
10.9K
1.05M
¥36 / ¥216Input/Output
17
claude-opus-4-5-20251101
Anthropic
95.5
37.9K
200K
¥36 / ¥180Input/Output
18
muse-spark
Meta
95.3
7.6K
-
-
19
gpt-5.4
Openai
95.0
18.7K
1.05M
¥18 / ¥108Input/Output
20
kimi-k2.6
Moonshot
94.7
10K
262K
¥6.84 / ¥28.8Input/Output
21
claude-opus-4-5-20251101-thinking-32k
Anthropic
94.4
19.8K
200K
¥108 / ¥540Input/Output
22
gemini-3-flash
Google
94.2
16.6K
1.05M
¥3.6 / ¥21.6Input/Output
23
claude-sonnet-4-5-20250929
Anthropic
93.9
43.1K
200K
¥21.6 / ¥108Input/Output
24
claude-sonnet-4-5-20250929-thinking-32k
Anthropic
93.6
43.6K
200K
¥21.6 / ¥108Input/Output
25
amazon-nova-experimental-chat-26-02-10
Amazon
93.3
1.9K
-
-
26
mimo-v2-pro
Xiaomi
93.0
14.2K
1.05M
¥7.2 / ¥21.6Input/Output
27
dola-seed-2.0-pro
Bytedance
92.8
23K
-
-
28
deepseek-v4-pro-thinking
Deepseek
92.5
10.4K
1M
¥3.13 / ¥6.26Input/Output
29
gemini-2.5-pro
Google
92.2
63K
1.05M
¥9 / ¥72Input/Output
30
gpt-5.1-high
Openai
91.9
21.8K
400K
¥9 / ¥72Input/Output
31
deepseek-v4-pro
Deepseek
91.6
10.9K
1M
¥3.13 / ¥6.26Input/Output
32
kimi-k2.5-thinking
Moonshot
91.4
22K
262K
¥4.32 / ¥21.6Input/Output
33
grok-4.20-beta-0309-reasoning
Xai
91.1
18.3K
2M
¥14.4 / ¥43.2Input/Output
34
qwen3.6-max-preview
Alibaba
90.8
2.9K
246K
¥9.5 / ¥56.9Input/Output
35
glm-5
Zai
90.5
13K
205K
¥7.2 / ¥23Input/Output
36
qwen3.5-397b-a17b
Alibaba
90.3
19.7K
262K
¥3.1 / ¥18.6Input/Output
37
mimo-v2.5
Xiaomi
90.0
10.5K
1.05M
¥2.88 / ¥14.4Input/Output
38
qwen3-max-preview
Alibaba
89.7
13.6K
262K
¥6.2 / ¥24.8Input/Output
39
grok-4.20-multi-agent-beta-0309
Xai
89.4
17.8K
2M
¥14.4 / ¥43.2Input/Output
40
qwen3.6-plus
Alibaba
89.1
11.9K
1M
¥3.6 / ¥21.6Input/Output
41
ernie-5.0-0110
Baidu
88.9
19.9K
128K
¥7.92 / ¥14.4Input/Output
42
gemma-4-31b
Google
88.6
3.3K
262K
¥3.24 / ¥7.2Input/Output
43
gpt-5.2-chat-latest-20260210
Openai
88.3
19.9K
400K
¥12.6 / ¥101Input/Output
44
kimi-k2.5-instant
Moonshot
88.0
4.5K
262K
¥4.32 / ¥21.6Input/Output
45
glm-4.7
Zai
87.7
6.6K
205K
¥0 / ¥0Input/Output
46
claude-opus-4-1-20250805-thinking-16k
Anthropic
87.5
24.7K
200K
¥108 / ¥540Input/Output
47
grok-4.20-beta1
Xai
87.2
14.7K
2M
¥14.4 / ¥43.2Input/Output
48
claude-opus-4-1-20250805
Anthropic
86.9
38.9K
200K
¥108 / ¥540Input/Output
49
gemini-3-flash (thinking-minimal)
Google
86.6
31.4K
1.05M
¥3.6 / ¥21.6Input/Output
50
glm-4.6
Zai
86.4
19.1K
205K
¥4.32 / ¥15.8Input/Output
51
deepseek-v4-flash
Deepseek
86.1
10.9K
1M
¥1.01 / ¥2.02Input/Output
52
gemma-4-26b-a4b
Google
85.8
3.3K
262K
¥0.94 / ¥2.88Input/Output
53
longcat-flash-chat-2602-exp
Meituan
85.5
14.7K
128K
¥1.08 / ¥10.8Input/Output
54
ernie-5.0-preview-1203
Baidu
85.2
5.2K
128K
¥7.92 / ¥14.4Input/Output
55
grok-4.1
Xai
85.0
36.7K
200K
¥14.4 / ¥72Input/Output
56
grok-4.1-thinking
Xai
84.7
35.8K
200K
¥14.4 / ¥72Input/Output
57
gpt-5.1
Openai
84.4
23.6K
400K
¥9 / ¥72Input/Output
58
qwen3-235b-a22b-instruct-2507
Alibaba
84.1
50.8K
128K
¥2.09 / ¥8.23Input/Output
59
deepseek-v3.2
Deepseek
83.8
25.3K
128K
¥2.09 / ¥3.1Input/Output
60
grok-3-preview-02-24
Xai
83.6
10.7K
1M
¥9 / ¥18Input/Output
61
longcat-flash-chat
Meituan
83.3
5.6K
128K
¥1.08 / ¥10.8Input/Output
62
mistral-large-3
Mistral
83.0
23.6K
262K
¥3.6 / ¥10.8Input/Output
63
deepseek-v4-flash-thinking
Deepseek
82.7
10.7K
1M
¥1.01 / ¥2.02Input/Output
64
deepseek-v3.2-exp
Deepseek
82.5
6.5K
128K
¥0 / ¥0Input/Output
65
qwen3-vl-235b-a22b-instruct
Alibaba
82.2
5.7K
128K
¥2.16 / ¥8.64Input/Output
66
deepseek-v3.2-thinking
Deepseek
81.9
21.7K
128K
¥2.09 / ¥3.1Input/Output
67
glm-4.5
Zai
81.6
11.2K
131K
¥4.32 / ¥15.8Input/Output
68
gpt-5.4-mini-high
Openai
81.3
16.7K
400K
¥5.4 / ¥32.4Input/Output
69
gpt-5.2-high
Openai
81.1
27K
400K
¥12.6 / ¥101Input/Output
70
amazon-nova-experimental-chat-12-10
Amazon
80.8
1.9K
-
-
71
qwen3-next-80b-a3b-instruct
Alibaba
80.5
11.9K
131K
¥1.04 / ¥4.13Input/Output
72
mistral-medium-2508
Mistral
80.2
50K
262K
¥2.88 / ¥14.4Input/Output
73
deepseek-v3.1-terminus-thinking
Deepseek
79.9
1.7K
128K
¥1.8 / ¥5.04Input/Output
74
kimi-k2-thinking-turbo
Moonshot
79.7
34.3K
262K
¥17.3 / ¥72Input/Output
75
gpt-5.5-instant
Openai
79.4
16.7K
400K
¥9 / ¥72Input/Output
76
deepseek-v3.2-exp-thinking
Deepseek
79.1
4.7K
128K
¥0 / ¥0Input/Output
77
gpt-5.2
Openai
78.8
27.8K
400K
¥12.6 / ¥101Input/Output
78
qwen3-max-2025-09-23
Alibaba
78.6
4.9K
258K
¥6.19 / ¥24.7Input/Output
79
chatgpt-4o-latest-20250326
Openai
78.3
38.3K
128K
¥18 / ¥72Input/Output
80
gemini-2.5-flash
Google
78.0
62.1K
1.05M
¥2.16 / ¥18Input/Output
81
mimo-v2-omni
Xiaomi
77.7
1.9K
262K
¥2.88 / ¥14.4Input/Output
82
amazon-nova-experimental-chat-11-10
Amazon
77.4
13.6K
-
-
83
hunyuan-hy3-preview
Tencent
77.2
3.7K
256K
¥0 / ¥0Input/Output
84
ernie-5.0-preview-1022
Baidu
76.9
2.5K
128K
¥7.92 / ¥14.4Input/Output
85
mimo-v2-flash (non-thinking)
Xiaomi
76.6
26.3K
262K
¥0.72 / ¥2.16Input/Output
86
qwen3.5-122b-a10b
Alibaba
76.3
16.3K
262K
¥2.88 / ¥23Input/Output
87
amazon-nova-experimental-chat-26-01-10
Amazon
76.0
1.9K
-
-
88
hunyuan-vision-1.5-thinking
Tencent
75.8
1.1K
-
-
89
minimax-m2.7
Minimax
75.5
14.6K
205K
¥0 / ¥0Input/Output
90
deepseek-v3.1-thinking
Deepseek
75.2
5.2K
128K
¥1.44 / ¥5.04Input/Output
91
claude-haiku-4-5-20251001
Anthropic
74.9
44.4K
200K
¥7.2 / ¥36Input/Output
92
deepseek-r1-0528
Deepseek
74.7
7K
164K
¥3.6 / ¥15.5Input/Output
93
qwen3-235b-a22b-thinking-2507
Alibaba
74.4
3.8K
131K
¥2.07 / ¥8.26Input/Output
94
deepseek-v3.1
Deepseek
74.1
6.8K
128K
¥1.44 / ¥5.04Input/Output
95
gpt-5-high
Openai
73.8
15K
400K
¥9 / ¥72Input/Output
96
grok-4-fast-chat
Xai
73.5
3.2K
2M
¥1.44 / ¥3.6Input/Output
97
qwen3.5-27b
Alibaba
73.3
15.8K
262K
¥2.16 / ¥17.3Input/Output
98
minimax-m2.1-preview
Minimax
73.0
9.1K
205K
¥0 / ¥0Input/Output
99
step-3.5-flash
Stepfun
72.7
20.4K
256K
¥0.69 / ¥2.07Input/Output
100
gemini-2.5-flash-preview-09-2025
Google
72.4
17.5K
1M
¥2.16 / ¥18Input/Output
101
grok-4-0709
Xai
72.1
19.7K
256K
¥21.6 / ¥108Input/Output
102
qwen3-vl-235b-a22b-thinking
Alibaba
71.9
4K
131K
¥2.06 / ¥8.26Input/Output
103
grok-4-1-fast-reasoning
Xai
71.6
30.9K
2M
¥1.44 / ¥3.6Input/Output
104
grok-4.3
Xai
71.3
10.3K
1M
¥9 / ¥18Input/Output
105
gemini-3.1-flash-lite-preview
Google
71.0
21.7K
1.05M
¥1.8 / ¥10.8Input/Output
106
deepseek-v3.1-terminus
Deepseek
70.8
1.9K
128K
¥1.8 / ¥5.04Input/Output
107
gpt-5-chat
Openai
70.5
15.2K
400K
¥9 / ¥72Input/Output
108
mimo-v2-flash (thinking)
Xiaomi
70.2
6K
262K
¥0.72 / ¥2.16Input/Output
109
gpt-4.5-preview-2025-02-27
Openai
69.9
3.4K
8.19K
¥216 / ¥432Input/Output
110
grok-4-fast-reasoning
Xai
69.6
9.9K
2M
¥1.44 / ¥3.6Input/Output
111
qwen3.5-flash
Alibaba
69.4
18.3K
1M
¥1.24 / ¥12.4Input/Output
112
qwen3.5-35b-a3b
Alibaba
69.1
16.8K
262K
¥1.8 / ¥14.4Input/Output
113
o3-2025-04-16
Openai
68.8
25.8K
200K
¥14.4 / ¥57.6Input/Output
114
hunyuan-t1-20250711
Tencent
68.5
2K
131K
¥0 / ¥0Input/Output
115
claude-opus-4-20250514-thinking-16k
Anthropic
68.2
16.4K
200K
¥108 / ¥540Input/Output
116
qwen3-30b-a3b-instruct-2507
Alibaba
68.0
11.1K
262K
¥2.16 / ¥3.6Input/Output
117
gpt-5.3-chat-latest
Openai
67.7
18.8K
128K
¥12.6 / ¥101Input/Output
118
amazon-nova-experimental-chat-10-20
Amazon
67.4
6K
-
-
119
qwen3-235b-a22b-no-thinking
Alibaba
67.1
16.6K
131K
¥2.07 / ¥8.26Input/Output
120
nvidia-nemotron-3-super-120b-a12b
Nvidia
66.9
4.1K
262K
¥1.44 / ¥5.76Input/Output
121
kimi-k2-0905-preview
Moonshot
66.6
5.6K
262K
¥4.32 / ¥18Input/Output
122
gpt-4.1-2025-04-14
Openai
66.3
22.2K
1.05M
¥14.4 / ¥57.6Input/Output
123
gpt-5.4-nano-high
Openai
66.0
16.2K
400K
¥1.44 / ¥9Input/Output
124
gpt-5-mini-high
Openai
65.7
12.8K
400K
¥1.8 / ¥14.4Input/Output
125
glm-4.5-air
Zai
65.5
14.9K
131K
¥0 / ¥0Input/Output
126
gemini-2.5-flash-lite-preview-09-2025-no-thinking
Google
65.2
25.1K
1.05M
¥0.72 / ¥2.88Input/Output
127
claude-opus-4-20250514
Anthropic
64.9
19.2K
200K
¥108 / ¥540Input/Output
128
grok-3-mini-high
Xai
64.6
7.6K
128K
¥0 / ¥0Input/Output
129
gemini-2.5-flash-lite-preview-06-17-thinking
Google
64.3
14.6K
65.5K
¥0.72 / ¥2.88Input/Output
130
hunyuan-turbos-20250416
Tencent
64.1
3.9K
131K
¥0 / ¥0Input/Output
131
qwen3-coder-480b-a35b-instruct
Alibaba
63.8
11.6K
262K
¥6.2 / ¥24.8Input/Output
132
o1-2024-12-17
Openai
63.5
6.5K
128K
¥108 / ¥432Input/Output
133
minimax-m2.5
Minimax
63.2
22.1K
205K
¥0 / ¥0Input/Output
134
claude-sonnet-4-20250514-thinking-32k
Anthropic
63.0
15.6K
200K
¥21.6 / ¥108Input/Output
135
qwen3-next-80b-a3b-thinking
Alibaba
62.7
6.7K
131K
¥1.04 / ¥10.3Input/Output
136
glm-4.6v
Zai
62.4
1.5K
128K
¥2.16 / ¥6.48Input/Output
137
deepseek-v3-0324
Deepseek
62.1
18.7K
75K
¥1.44 / ¥5.76Input/Output
138
o3-mini-high
Openai
61.8
4.4K
200K
¥7.92 / ¥31.7Input/Output
139
nova-2-lite
Amazon
61.6
6.5K
128K
¥2.38 / ¥19.8Input/Output
140
kimi-k2-0711-preview
Moonshot
61.3
12.2K
131K
¥4.32 / ¥18Input/Output
141
mistral-medium-2505
Mistral
61.0
13.7K
262K
¥2.88 / ¥14.4Input/Output
142
gpt-oss-120b
Openai
60.7
14.8K
131K
¥1.08 / ¥4.32Input/Output
143
ling-flash-2.0
Ant Group
60.4
3.5K
131K
¥1.01 / ¥4.1Input/Output
144
grok-3-mini-beta
Xai
60.2
9.6K
1M
¥9 / ¥18Input/Output
145
qwen3-235b-a22b
Alibaba
59.9
10.5K
131K
¥2.07 / ¥8.26Input/Output
146
deepseek-r1
Deepseek
59.6
4.1K
164K
¥5.04 / ¥18Input/Output
147
mercury-2
Inception Ai
59.3
1.8K
128K
¥1.8 / ¥5.4Input/Output
148
qwen2.5-max
Alibaba
59.1
9.7K
32K
¥11.5 / ¥46Input/Output
149
minimax-m2
Minimax
58.8
3.7K
197K
¥0 / ¥0Input/Output
150
glm-4.7-flash
Zai
58.5
6.5K
200K
¥0 / ¥0Input/Output
151
o1-preview
Openai
58.2
8.5K
128K
¥108 / ¥432Input/Output
152
claude-sonnet-4-20250514
Anthropic
57.9
17.7K
200K
¥21.6 / ¥108Input/Output
153
step-3
Stepfun
57.7
3K
65.5K
¥1.8 / ¥4.68Input/Output
154
amazon-nova-experimental-chat-10-09
Amazon
57.4
1.4K
-
-
155
o4-mini-2025-04-16
Openai
57.1
19.5K
200K
¥7.92 / ¥31.7Input/Output
156
nvidia-nemotron-3-nano-30b-a3b-bf16
Nvidia
56.8
8.2K
131K
¥0 / ¥0Input/Output
157
intellect-3
-
56.5
2.7K
131K
¥1.44 / ¥7.92Input/Output
158
gpt-4.1-mini-2025-04-14
Openai
56.3
16.2K
1.05M
¥2.88 / ¥11.5Input/Output
159
trinity-large-preview
-
56.0
17.1K
262K
¥1.8 / ¥6.48Input/Output
160
gemini-2.0-flash-001
Google
55.7
14.1K
1.05M
¥1.08 / ¥4.32Input/Output
161
trinity-large-thinking
-
55.4
15.3K
262K
¥1.8 / ¥6.48Input/Output
162
ring-flash-2.0
Ant Group
55.2
3.6K
131K
¥1.01 / ¥4.1Input/Output
163
minimax-m1
Minimax
54.9
15.8K
1M
¥0.95 / ¥9.03Input/Output
164
gemma-3-27b-it
Google
54.6
17.9K
128K
¥2.15 / ¥2.15Input/Output
165
glm-4.5v
Zai
54.3
2.4K
64K
¥4.32 / ¥13Input/Output
166
mistral-small-2506
Mistral
54.0
7.8K
262K
¥2.88 / ¥14.4Input/Output
167
step-1o-turbo-202506
Stepfun
53.8
3.7K
-
-
168
qwen3-32b
Alibaba
53.5
1.2K
131K
¥2.07 / ¥8.26Input/Output
169
o1-mini
Openai
53.2
13.9K
128K
¥7.92 / ¥31.7Input/Output
170
nvidia-llama-3.3-nemotron-super-49b-v1.5
Nvidia
52.9
1.4K
131K
¥2.88 / ¥2.88Input/Output
171
claude-3-7-sonnet-20250219-thinking-32k
Anthropic
52.6
13.9K
-
-
172
o3-mini
Openai
52.4
20.1K
200K
¥7.92 / ¥31.7Input/Output
173
gpt-5-nano-high
Openai
52.1
3.8K
400K
¥0.36 / ¥2.88Input/Output
174
command-a-03-2025
Cohere
51.8
23.8K
256K
¥18 / ¥72Input/Output
175
qwq-32b
Alibaba
51.5
8.8K
131K
¥2.07 / ¥6.2Input/Output
176
hunyuan-turbos-20250226
Tencent
51.3
531
131K
¥0 / ¥0Input/Output
177
gemini-2.0-flash-lite-preview-02-05
Google
51.0
6.2K
1.05M
¥0.54 / ¥2.16Input/Output
178
olmo-3.1-32b-instruct
Allenai
50.7
6.4K
200K
¥14.4 / ¥57.6Input/Output
179
qwen-plus-0125
Alibaba
50.4
1.5K
1M
¥0.83 / ¥2.07Input/Output
180
llama-3.1-nemotron-ultra-253b-v1
Nvidia
50.1
713
128K
¥4.32 / ¥13Input/Output
181
qwen3-30b-a3b
Alibaba
49.9
10.7K
128K
¥0.79 / ¥7.78Input/Output
182
claude-3-7-sonnet-20250219
Anthropic
49.6
15.3K
200K
¥21.6 / ¥108Input/Output
183
deepseek-v3
Deepseek
49.3
5.4K
128K
¥0 / ¥0Input/Output
184
llama-3.3-nemotron-49b-super-v1
Nvidia
49.0
510
131K
¥0 / ¥0Input/Output
185
gemma-3-12b-it
Google
48.7
977
128K
¥1.96 / ¥1.96Input/Output
186
hunyuan-turbo-0110
Tencent
48.5
496
-
-
187
claude-3-5-sonnet-20241022
Anthropic
48.2
27.3K
200K
¥21.6 / ¥108Input/Output
188
yi-lightning
-
47.9
7K
12K
¥1.44 / ¥1.44Input/Output
189
olmo-3-32b-think
Allenai
47.6
3K
128K
¥2.16 / ¥3.24Input/Output
190
granite-4.1-8b
Ibm
47.4
2.3K
131K
¥0.36 / ¥0.72Input/Output
191
qwen2.5-plus-1127
Alibaba
47.1
2.7K
-
-
192
step-2-16k-exp-202412
Stepfun
46.8
1.2K
16.4K
¥37.5 / ¥118Input/Output
193
gemini-1.5-pro-002
Google
46.5
14.9K
-
-
194
molmo-2-8b
Allenai
46.2
438
-
-
195
glm-4-plus-0111
Zai
46.0
1.5K
128K
¥72 / ¥72Input/Output
196
athene-v2-chat
-
45.7
6.5K
-
-
197
deepseek-v2.5-1210
Deepseek
45.4
1.8K
1M
¥1.01 / ¥2.02Input/Output
198
hunyuan-large-2025-02-10
Tencent
45.1
889
-
-
199
mercury
Inception Ai
44.8
1K
128K
¥1.8 / ¥5.4Input/Output
200
gpt-4.1-nano-2025-04-14
Openai
44.6
1.7K
1.05M
¥14.4 / ¥57.6Input/Output
201
gemma-3n-e4b-it
Google
44.3
8.6K
128K
¥0 / ¥0Input/Output
202
llama-4-maverick-17b-128e-instruct
Meta
44.0
16K
1M
¥1.8 / ¥6.26Input/Output
203
gpt-4o-2024-05-13
Openai
43.7
32.4K
128K
¥36 / ¥108Input/Output
204
mistral-small-3.1-24b-instruct-2503
Mistral
43.5
14.7K
262K
¥2.88 / ¥14.4Input/Output
205
claude-3-5-sonnet-20240620
Anthropic
43.2
23.2K
200K
¥21.6 / ¥108Input/Output
206
gpt-oss-20b
Openai
42.9
4.8K
131K
¥0.32 / ¥1.3Input/Output
207
glm-4-plus
Zai
42.6
7.1K
128K
¥54 / ¥54Input/Output
208
olmo-3.1-32b-think
Allenai
42.3
4.4K
200K
¥14.4 / ¥57.6Input/Output
209
grok-2-2024-08-13
Xai
42.1
17.3K
1M
¥9 / ¥18Input/Output
210
qwen2.5-72b-instruct
Alibaba
41.8
10.5K
131K
¥4.13 / ¥12.4Input/Output
211
deepseek-v2.5
Deepseek
41.5
6.9K
1M
¥1.01 / ¥2.02Input/Output
212
qwen-max-0919
Alibaba
41.2
4.4K
131K
¥2.48 / ¥9.91Input/Output
213
llama-3.1-405b-instruct-bf16
Meta
40.9
10.7K
128K
¥0 / ¥0Input/Output
214
magistral-medium-2506
Mistral
40.7
5.7K
128K
¥14.4 / ¥36Input/Output
215
llama-4-scout-17b-16e-instruct
Meta
40.4
13K
128K
¥1.44 / ¥5.62Input/Output
216
gpt-4o-mini-2024-07-18
Openai
40.1
18.3K
128K
¥1.08 / ¥4.32Input/Output
217
llama-3.1-nemotron-70b-instruct
Nvidia
39.8
2K
128K
¥0 / ¥0Input/Output
218
hunyuan-standard-2025-02-10
Tencent
39.6
873
-
-
219
gpt-4o-2024-08-06
Openai
39.3
12.7K
128K
¥18 / ¥72Input/Output
220
llama-3.1-405b-instruct-fp8
Meta
39.0
16.2K
128K
¥0 / ¥0Input/Output
221
llama-3.3-70b-instruct
Meta
38.7
17K
128K
¥0 / ¥0Input/Output
222
gemini-1.5-flash-002
Google
38.4
9.3K
2M
¥0.54 / ¥2.2Input/Output
223
hunyuan-large-vision
Tencent
38.2
2.2K
-
-
224
mistral-large-2411
Mistral
37.9
7K
128K
¥14.4 / ¥43.2Input/Output
225
mistral-large-2407
Mistral
37.6
12.7K
131K
¥14.4 / ¥43.2Input/Output
226
grok-2-mini-2024-08-13
Xai
37.3
14.2K
1M
¥9 / ¥18Input/Output
227
gemini-1.5-pro-001
Google
37.0
22.5K
-
-
228
gemma-3-4b-it
Google
36.8
1.1K
128K
¥1.44 / ¥1.44Input/Output
229
qwen2.5-coder-32b-instruct
Alibaba
36.5
1.4K
131K
¥2.07 / ¥6.2Input/Output
230
gpt-4-turbo-2024-04-09
Openai
36.2
28.3K
128K
¥72 / ¥216Input/Output
231
claude-3-5-haiku-20241022
Anthropic
35.9
23.8K
200K
¥5.76 / ¥28.8Input/Output
232
gemini-advanced-0514
Google
35.7
14.1K
-
-
233
claude-3-opus-20240229
Anthropic
35.4
55.1K
200K
¥108 / ¥540Input/Output
234
amazon-nova-pro-v1.0
Amazon
35.1
6.3K
300K
¥5.76 / ¥23Input/Output
235
athene-70b-0725
-
34.8
5.4K
-
-
236
llama-3.1-70b-instruct
Meta
34.5
15K
131K
¥2.88 / ¥2.88Input/Output
237
gpt-4-1106-preview
Openai
34.3
26.2K
8.19K
¥216 / ¥432Input/Output
238
ibm-granite-h-small
Ibm
34.0
3K
-
-
239
gpt-4-0125-preview
Openai
33.7
25.6K
8.19K
¥216 / ¥432Input/Output
240
mistral-small-24b-instruct-2501
Mistral
33.4
3.6K
262K
¥2.88 / ¥14.4Input/Output
241
llama-3.1-tulu-3-70b
Allenai
33.1
779
-
-
242
amazon-nova-lite-v1.0
Amazon
32.9
4.9K
300K
¥0.43 / ¥1.73Input/Output
243
gemini-1.5-flash-001
Google
32.6
18.2K
2M
¥0.54 / ¥2.2Input/Output
244
phi-4
Microsoft
32.3
5.7K
128K
¥0.9 / ¥3.6Input/Output
245
hunyuan-standard-256k
Tencent
32.0
700
-
-
246
jamba-1.5-large
-
31.8
2.4K
256K
¥0 / ¥0Input/Output
247
glm-4-0520
Zai
31.5
3K
128K
¥108 / ¥108Input/Output
248
reka-core-20240904
-
31.2
2.1K
-
-
249
gemini-1.5-flash-8b-001
Google
30.9
9.7K
2M
¥0.54 / ¥2.2Input/Output
250
olmo-2-0325-32b-instruct
Allenai
30.6
789
-
-
251
deepseek-coder-v2
Deepseek
30.4
4.4K
1M
¥1.01 / ¥2.02Input/Output
252
llama-3.1-nemotron-51b-instruct
Nvidia
30.1
1K
128K
¥0 / ¥0Input/Output
253
nemotron-4-340b-instruct
Nvidia
29.8
5.6K
-
-
254
gpt-4-0314
Openai
29.5
13.9K
8.19K
¥216 / ¥432Input/Output
255
gemma-2-27b-it
Google
29.2
20.5K
8.19K
¥0.58 / ¥0.58Input/Output
256
claude-3-sonnet-20240229
Anthropic
29.0
30.7K
200K
¥21.6 / ¥108Input/Output
257
gemma-2-9b-it-simpo
-
28.7
2.6K
8.19K
¥1.44 / ¥1.44Input/Output
258
llama-3-70b-instruct
Meta
28.4
46K
8.19K
¥3.67 / ¥5.33Input/Output
259
c4ai-aya-expanse-32b
Cohere
28.1
7.2K
-
-
260
ministral-8b-2410
Mistral
27.9
1.3K
128K
¥0.72 / ¥0.72Input/Output
261
amazon-nova-micro-v1.0
Amazon
27.6
4.9K
128K
¥0.25 / ¥1.01Input/Output
262
qwen2-72b-instruct
Alibaba
27.3
10.8K
131K
¥4.13 / ¥12.4Input/Output
263
command-r-plus-08-2024
Cohere
27.0
2.8K
128K
¥18 / ¥72Input/Output
264
reka-flash-20240904
-
26.7
2.3K
65.5K
¥0.72 / ¥1.44Input/Output
265
llama-3.1-8b-instruct
Meta
26.5
13.5K
131K
¥0.79 / ¥0.79Input/Output
266
gpt-4-0613
Openai
26.2
23K
8.19K
¥216 / ¥432Input/Output
267
llama-3.1-tulu-3-8b
Allenai
25.9
728
-
-
268
claude-3-haiku-20240307
Anthropic
25.6
33.7K
200K
¥1.8 / ¥9Input/Output
269
gemma-2-9b-it
Google
25.3
14.8K
8.19K
¥1.44 / ¥1.44Input/Output
270
qwen1.5-110b-chat
Alibaba
25.1
7.7K
-
-
271
mistral-large-2402
Mistral
24.8
17.2K
262K
¥2.88 / ¥14.4Input/Output
272
jamba-1.5-mini
-
24.5
2.4K
256K
¥0 / ¥0Input/Output
273
command-r-plus
Cohere
24.2
22.4K
128K
¥18 / ¥72Input/Output
274
command-r-08-2024
Cohere
24.0
3K
128K
¥18 / ¥72Input/Output
275
qwq-32b-preview
Alibaba
23.7
852
131K
¥2.07 / ¥6.2Input/Output
276
yi-1.5-34b-chat
-
23.4
6.7K
-
-
277
internlm2_5-20b-chat
-
23.1
2.6K
-
-
278
c4ai-aya-expanse-8b
Cohere
22.8
2.6K
-
-
279
mixtral-8x22b-instruct-v0.1
Mistral
22.6
14.7K
64K
¥14.4 / ¥43.2Input/Output
280
mistral-medium
Mistral
22.3
8.7K
262K
¥2.88 / ¥14.4Input/Output
281
qwen1.5-72b-chat
Alibaba
22.0
10.9K
-
-
282
granite-3.1-8b-instruct
Ibm
21.7
798
-
-
283
granite-3.1-2b-instruct
Ibm
21.4
832
-
-
284
reka-flash-21b-20240226-online
-
21.2
4.5K
-
-
285
reka-flash-21b-20240226
-
20.9
7.4K
-
-
286
llama-3-8b-instruct
Meta
20.6
30.2K
8.19K
¥0.29 / ¥0.29Input/Output
287
qwen1.5-32b-chat
Alibaba
20.3
6.1K
-
-
288
phi-3-medium-4k-instruct
Microsoft
20.1
6.8K
4.1K
¥1.22 / ¥4.9Input/Output
289
starling-lm-7b-beta
-
19.8
4.5K
200K
¥5.4 / ¥18.7Input/Output
290
zephyr-orpo-141b-A35b-v0.1
-
19.5
1.2K
200K
¥108 / ¥432Input/Output
291
command-r
Cohere
19.2
15.1K
128K
¥18 / ¥72Input/Output
292
mixtral-8x7b-instruct-v0.1
Mistral
18.9
19.5K
32K
¥5.04 / ¥5.04Input/Output
293
qwen1.5-14b-chat
Alibaba
18.7
4.9K
-
-
294
dbrx-instruct-preview
-
18.4
8.9K
-
-
295
gemma-2-2b-it
Google
18.1
12.3K
128K
¥0 / ¥0Input/Output
296
gemini-pro-dev-api
Google
17.8
4.5K
1.05M
¥14.4 / ¥86.4Input/Output
297
gpt-3.5-turbo-0125
Openai
17.5
18.5K
16.4K
¥3.6 / ¥10.8Input/Output
298
tulu-2-dpo-70b
-
17.3
1.4K
-
-
299
yi-34b-chat
-
17.0
3.8K
-
-
300
phi-3-small-8k-instruct
Microsoft
16.7
5.2K
8.19K
¥1.08 / ¥4.32Input/Output
301
gpt-3.5-turbo-1106
Openai
16.4
3.8K
16.4K
¥7.2 / ¥14.4Input/Output
302
llama-3.2-3b-instruct
Meta
16.2
2.2K
131K
¥0.22 / ¥0.35Input/Output
303
gemini-pro
Google
15.9
1.4K
1.05M
¥14.4 / ¥86.4Input/Output
304
granite-3.0-8b-instruct
Ibm
15.6
1.7K
-
-
305
phi-3-mini-4k-instruct-june-2024
Microsoft
15.3
3.2K
4.1K
¥0.94 / ¥3.74Input/Output
306
openchat-3.5-0106
-
15.0
3.4K
-
-
307
wizardlm-70b
Microsoft
14.8
1.8K
-
-
308
starling-lm-7b-alpha
-
14.5
2.5K
200K
¥5.4 / ¥18.7Input/Output
309
granite-3.0-2b-instruct
Ibm
14.2
1.7K
-
-
310
llama-2-70b-chat
Meta
13.9
9.7K
-
-
311
phi-3-mini-4k-instruct
Microsoft
13.6
5.9K
4.1K
¥0.94 / ¥3.74Input/Output
312
openhermes-2.5-mistral-7b
-
13.4
1.1K
1M
¥36 / ¥180Input/Output
313
snowflake-arctic-instruct
-
13.1
9.6K
-
-
314
gemma-1.1-7b-it
Google
12.8
7K
-
-
315
deepseek-llm-67b-chat
Deepseek
12.5
1.1K
1M
¥1.01 / ¥2.02Input/Output
316
mistral-7b-instruct-v0.2
Mistral
12.3
5.2K
262K
¥2.88 / ¥14.4Input/Output
317
vicuna-33b
-
12.0
5.1K
-
-
318
openchat-3.5
-
11.7
1.7K
-
-
319
qwen1.5-7b-chat
Alibaba
11.4
1.3K
-
-
320
solar-10.7b-instruct-v1.0
-
11.1
932
128K
¥0 / ¥0Input/Output
321
dolphin-2.2.1-mistral-7b
-
10.9
359
262K
¥2.88 / ¥14.4Input/Output
322
nous-hermes-2-mixtral-8x7b-dpo
-
10.6
1.1K
1M
¥36 / ¥180Input/Output
323
smollm2-1.7b-instruct
-
10.3
572
-
-
324
codellama-70b-instruct
Meta
10.0
306
-
-
325
mpt-30b-chat
-
9.7
434
-
-
326
llama-2-13b-chat
Meta
9.5
4.5K
-
-
327
llama2-70b-steerlm-chat
Nvidia
9.2
787
-
-
328
llama-3.2-1b-instruct
Meta
8.9
2.2K
16.4K
¥0.07 / ¥0.08Input/Output
329
gemma-7b-it
Google
8.6
2.3K
-
-
330
codellama-34b-instruct
Meta
8.4
1.5K
-
-
331
phi-3-mini-128k-instruct
Microsoft
8.1
6.2K
128K
¥0.94 / ¥3.74Input/Output
332
qwen-14b-chat
Alibaba
7.8
1K
32.8K
¥1.04 / ¥3.1Input/Output
333
zephyr-7b-beta
-
7.5
2.2K
-
-
334
zephyr-7b-alpha
-
7.2
333
-
-
335
vicuna-13b
-
7.0
4K
-
-
336
wizardlm-13b
Microsoft
6.7
1.3K
-
-
337
llama-2-7b-chat
Meta
6.4
3.3K
128K
¥4.03 / ¥48Input/Output
338
falcon-180b-chat
-
6.1
231
-
-
339
guanaco-33b
-
5.8
517
200K
¥14.4 / ¥57.6Input/Output
340
gemma-1.1-2b-it
Google
5.6
3.2K
-
-
341
palm-2
Google
5.3
1.6K
-
-
342
mistral-7b-instruct
Mistral
5.0
1.9K
262K
¥2.88 / ¥14.4Input/Output
343
stripedhyena-nous-7b
-
4.7
1.2K
-
-
344
vicuna-7b
-
4.5
1.3K
-
-
345
olmo-7b-instruct
Allenai
4.2
1.5K
-
-
346
gemma-2b-it
Google
3.9
1.2K
-
-
347
qwen1.5-4b-chat
Alibaba
3.6
2.1K
-
-
348
chatglm3-6b
-
3.3
999
200K
¥5.4 / ¥18.7Input/Output
349
gpt4all-13b-snoozy
-
3.1
347
1M
¥36 / ¥216Input/Output
350
koala-13b
-
2.8
1.3K
-
-
351
chatglm2-6b
-
2.5
490
200K
¥5.4 / ¥18.7Input/Output
352
chatglm-6b
-
2.2
848
200K
¥5.4 / ¥18.7Input/Output
353
RWKV-4-Raven-14B
-
1.9
876
-
-
354
mpt-7b-chat
-
1.7
714
-
-
355
oasst-pythia-12b
-
1.4
1.2K
-
-
356
stablelm-tuned-alpha-7b
-
1.1
575
-
-
357
alpaca-13b
-
0.8
1.1K
-
-
358
fastchat-t5-3b
-
0.6
797
-
-
359
dolly-v2-12b
-
0.3
615
-
-
360
llama-13b
Meta
0.0
430
-
-
Top model analysis

claude-opus-4-6-thinking why it ranks first

claude-opus-4-6-thinking ranks first with a percent score of 100.0 and 20.4K samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

复杂提示词排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

复杂提示词模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。