Chat · Text · Instruction Following Leaderboard

Ranking for Text / Instruction Following, based on public preference data.

Selection guide

Instruction Following model ranking guide

Ranking for Text / Instruction Following, based on public preference data.

claude-opus-4-6-thinkingclaude-opus-4-6claude-opus-4-7-thinkingclaude-opus-4-7mimo-v2.5-pro
Current DirectoryChat · Text · Instruction Following
Models360
Published2026/05/27
Arena public preference evaluationOriginal leaderboard: Text / Instruction FollowingPublished: 2026/05/27Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
claude-opus-4-6-thinking
Anthropic
100.0
10K
1M
¥36 / ¥180Input/Output
2
claude-opus-4-6
Anthropic
99.7
11.2K
1M
¥36 / ¥180Input/Output
3
claude-opus-4-7-thinking
Anthropic
99.4
6.6K
1M
¥36 / ¥180Input/Output
4
claude-opus-4-7
Anthropic
99.2
7K
1M
¥36 / ¥180Input/Output
5
mimo-v2.5-pro
Xiaomi
98.9
5.1K
1.05M
¥7.2 / ¥21.6Input/Output
6
gpt-5.5-high
Openai
98.6
5.4K
1.05M
¥36 / ¥216Input/Output
7
claude-sonnet-4-6
Anthropic
98.3
8.5K
1M
¥21.6 / ¥108Input/Output
8
claude-opus-4-5-20251101-thinking-32k
Anthropic
98.1
9.5K
200K
¥108 / ¥540Input/Output
9
gpt-5.4-high
Openai
97.8
8.9K
1.05M
¥18 / ¥108Input/Output
10
claude-opus-4-5-20251101
Anthropic
97.5
18.9K
200K
¥36 / ¥180Input/Output
11
gpt-5.5
Openai
97.2
5.6K
1.05M
¥36 / ¥216Input/Output
12
gemini-3.1-pro-preview
Google
96.9
13.5K
1.05M
¥14.4 / ¥86.4Input/Output
13
qwen3.5-max-preview
Alibaba
96.7
6.3K
-
-
14
ernie-5.1
Baidu
96.4
4.4K
119K
¥5.4 / ¥21.6Input/Output
15
gemini-3.5-flash
Google
96.1
3K
1.05M
¥10.8 / ¥64.8Input/Output
16
qwen3.7-max-preview
Alibaba
95.8
1.3K
1M
¥18 / ¥54Input/Output
17
gemini-3-pro
Google
95.5
11.2K
1.05M
¥14.4 / ¥86.4Input/Output
18
claude-sonnet-4-5-20250929
Anthropic
95.3
21.5K
200K
¥21.6 / ¥108Input/Output
19
gpt-5.4
Openai
95.0
9.6K
1.05M
¥18 / ¥108Input/Output
20
claude-sonnet-4-5-20250929-thinking-32k
Anthropic
94.7
21.9K
200K
¥21.6 / ¥108Input/Output
21
glm-5.1
Zai
94.4
4.5K
200K
¥0 / ¥0Input/Output
22
kimi-k2.6
Moonshot
94.2
5K
262K
¥6.84 / ¥28.8Input/Output
23
mimo-v2-pro
Xiaomi
93.9
7.2K
1.05M
¥7.2 / ¥21.6Input/Output
24
gpt-5.1-high
Openai
93.6
10.8K
400K
¥9 / ¥72Input/Output
25
muse-spark
Meta
93.3
3.9K
-
-
26
gemini-3-flash
Google
93.0
8.1K
1.05M
¥3.6 / ¥21.6Input/Output
27
claude-opus-4-1-20250805-thinking-16k
Anthropic
92.8
13K
200K
¥108 / ¥540Input/Output
28
gemini-2.5-pro
Google
92.5
33.4K
1.05M
¥9 / ¥72Input/Output
29
deepseek-v4-pro
Deepseek
92.2
5.5K
1M
¥3.13 / ¥6.26Input/Output
30
deepseek-v4-pro-thinking
Deepseek
91.9
5.3K
1M
¥3.13 / ¥6.26Input/Output
31
claude-opus-4-1-20250805
Anthropic
91.6
20.3K
200K
¥108 / ¥540Input/Output
32
qwen3.6-max-preview
Alibaba
91.4
1.5K
246K
¥9.5 / ¥56.9Input/Output
33
mimo-v2.5
Xiaomi
91.1
5.3K
1.05M
¥2.88 / ¥14.4Input/Output
34
gemma-4-31b
Google
90.8
1.6K
262K
¥3.24 / ¥7.2Input/Output
35
kimi-k2.5-instant
Moonshot
90.5
2.2K
262K
¥4.32 / ¥21.6Input/Output
36
kimi-k2.5-thinking
Moonshot
90.3
10.8K
262K
¥4.32 / ¥21.6Input/Output
37
glm-5
Zai
90.0
6.8K
205K
¥7.2 / ¥23Input/Output
38
amazon-nova-experimental-chat-26-02-10
Amazon
89.7
935
-
-
39
qwen3.6-plus
Alibaba
89.4
6K
1M
¥3.6 / ¥21.6Input/Output
40
qwen3.5-397b-a17b
Alibaba
89.1
9.9K
262K
¥3.1 / ¥18.6Input/Output
41
grok-4.20-beta-0309-reasoning
Xai
88.9
9.4K
2M
¥14.4 / ¥43.2Input/Output
42
deepseek-v4-flash
Deepseek
88.6
5.6K
1M
¥1.01 / ¥2.02Input/Output
43
gemma-4-26b-a4b
Google
88.3
1.6K
262K
¥0.94 / ¥2.88Input/Output
44
qwen3-max-preview
Alibaba
88.0
7.3K
262K
¥6.2 / ¥24.8Input/Output
45
grok-4.20-multi-agent-beta-0309
Xai
87.7
9.1K
2M
¥14.4 / ¥43.2Input/Output
46
gpt-5.1
Openai
87.5
11.9K
400K
¥9 / ¥72Input/Output
47
gemini-3-flash (thinking-minimal)
Google
87.2
15.8K
1.05M
¥3.6 / ¥21.6Input/Output
48
gpt-5.2-chat-latest-20260210
Openai
86.9
10K
400K
¥12.6 / ¥101Input/Output
49
deepseek-v4-flash-thinking
Deepseek
86.6
5.6K
1M
¥1.01 / ¥2.02Input/Output
50
grok-4.20-beta1
Xai
86.4
7.4K
2M
¥14.4 / ¥43.2Input/Output
51
ernie-5.0-0110
Baidu
86.1
10K
128K
¥7.92 / ¥14.4Input/Output
52
glm-4.7
Zai
85.8
3.2K
205K
¥0 / ¥0Input/Output
53
dola-seed-2.0-pro
Bytedance
85.5
11.6K
-
-
54
glm-4.6
Zai
85.2
10K
205K
¥4.32 / ¥15.8Input/Output
55
deepseek-v3.2
Deepseek
85.0
12.5K
128K
¥2.09 / ¥3.1Input/Output
56
claude-haiku-4-5-20251001
Anthropic
84.7
22.4K
200K
¥7.2 / ¥36Input/Output
57
longcat-flash-chat-2602-exp
Meituan
84.4
7.5K
128K
¥1.08 / ¥10.8Input/Output
58
grok-3-preview-02-24
Xai
84.1
9.7K
1M
¥9 / ¥18Input/Output
59
qwen3-vl-235b-a22b-instruct
Alibaba
83.8
2.9K
128K
¥2.16 / ¥8.64Input/Output
60
gpt-5.2-high
Openai
83.6
13.7K
400K
¥12.6 / ¥101Input/Output
61
qwen3-235b-a22b-instruct-2507
Alibaba
83.3
26.2K
128K
¥2.09 / ¥8.23Input/Output
62
gpt-5.5-instant
Openai
83.0
8.7K
400K
¥9 / ¥72Input/Output
63
deepseek-v3.2-thinking
Deepseek
82.7
10.7K
128K
¥2.09 / ¥3.1Input/Output
64
claude-opus-4-20250514-thinking-16k
Anthropic
82.5
9.1K
200K
¥108 / ¥540Input/Output
65
mistral-large-3
Mistral
82.2
11.8K
262K
¥3.6 / ¥10.8Input/Output
66
gemini-2.5-flash
Google
81.9
33K
1.05M
¥2.16 / ¥18Input/Output
67
deepseek-v3.1-terminus-thinking
Deepseek
81.6
864
128K
¥1.8 / ¥5.04Input/Output
68
gpt-5.4-mini-high
Openai
81.3
8.6K
400K
¥5.4 / ¥32.4Input/Output
69
gpt-4.5-preview-2025-02-27
Openai
81.1
5.5K
8.19K
¥216 / ¥432Input/Output
70
chatgpt-4o-latest-20250326
Openai
80.8
22.8K
128K
¥18 / ¥72Input/Output
71
glm-4.5
Zai
80.5
6.2K
131K
¥4.32 / ¥15.8Input/Output
72
deepseek-v3.2-exp
Deepseek
80.2
3.3K
128K
¥0 / ¥0Input/Output
73
deepseek-v3.1-thinking
Deepseek
79.9
2.9K
128K
¥1.44 / ¥5.04Input/Output
74
gpt-5.2
Openai
79.7
14.2K
400K
¥12.6 / ¥101Input/Output
75
kimi-k2-thinking-turbo
Moonshot
79.4
17.1K
262K
¥17.3 / ¥72Input/Output
76
ernie-5.0-preview-1203
Baidu
79.1
2.6K
128K
¥7.92 / ¥14.4Input/Output
77
minimax-m2.7
Minimax
78.8
7.1K
205K
¥0 / ¥0Input/Output
78
deepseek-v3.2-exp-thinking
Deepseek
78.6
2.5K
128K
¥0 / ¥0Input/Output
79
grok-4.1
Xai
78.3
17.8K
200K
¥14.4 / ¥72Input/Output
80
mistral-medium-2508
Mistral
78.0
25.9K
262K
¥2.88 / ¥14.4Input/Output
81
qwen3-max-2025-09-23
Alibaba
77.7
2.6K
258K
¥6.19 / ¥24.7Input/Output
82
minimax-m2.1-preview
Minimax
77.4
4.5K
205K
¥0 / ¥0Input/Output
83
longcat-flash-chat
Meituan
77.2
3K
128K
¥1.08 / ¥10.8Input/Output
84
grok-4.1-thinking
Xai
76.9
17.6K
200K
¥14.4 / ¥72Input/Output
85
qwen3.5-122b-a10b
Alibaba
76.6
8.3K
262K
¥2.88 / ¥23Input/Output
86
mimo-v2-omni
Xiaomi
76.3
958
262K
¥2.88 / ¥14.4Input/Output
87
hunyuan-vision-1.5-thinking
Tencent
76.0
626
-
-
88
ernie-5.0-preview-1022
Baidu
75.8
1.3K
128K
¥7.92 / ¥14.4Input/Output
89
gemini-2.5-flash-preview-09-2025
Google
75.5
9.2K
1M
¥2.16 / ¥18Input/Output
90
hunyuan-hy3-preview
Tencent
75.2
1.9K
256K
¥0 / ¥0Input/Output
91
amazon-nova-experimental-chat-12-10
Amazon
74.9
921
-
-
92
grok-4-fast-chat
Xai
74.7
1.7K
2M
¥1.44 / ¥3.6Input/Output
93
mimo-v2-flash (non-thinking)
Xiaomi
74.4
13.1K
262K
¥0.72 / ¥2.16Input/Output
94
qwen3.5-27b
Alibaba
74.1
7.9K
262K
¥2.16 / ¥17.3Input/Output
95
qwen3-next-80b-a3b-instruct
Alibaba
73.8
6.3K
131K
¥1.04 / ¥4.13Input/Output
96
deepseek-v3.1
Deepseek
73.5
3.7K
128K
¥1.44 / ¥5.04Input/Output
97
amazon-nova-experimental-chat-11-10
Amazon
73.3
6.7K
-
-
98
gpt-5-high
Openai
73.0
8.3K
400K
¥9 / ¥72Input/Output
99
grok-4-0709
Xai
72.7
10.8K
256K
¥21.6 / ¥108Input/Output
100
step-3.5-flash
Stepfun
72.4
10.2K
256K
¥0.69 / ¥2.07Input/Output
101
qwen3-235b-a22b-thinking-2507
Alibaba
72.1
2.1K
131K
¥2.07 / ¥8.26Input/Output
102
amazon-nova-experimental-chat-26-01-10
Amazon
71.9
926
-
-
103
gpt-5-chat
Openai
71.6
8.1K
400K
¥9 / ¥72Input/Output
104
qwen3.5-35b-a3b
Alibaba
71.3
8.5K
262K
¥1.8 / ¥14.4Input/Output
105
deepseek-r1-0528
Deepseek
71.0
4K
164K
¥3.6 / ¥15.5Input/Output
106
grok-4-fast-reasoning
Xai
70.8
5.3K
2M
¥1.44 / ¥3.6Input/Output
107
deepseek-v3.1-terminus
Deepseek
70.5
1K
128K
¥1.8 / ¥5.04Input/Output
108
gemini-3.1-flash-lite-preview
Google
70.2
11K
1.05M
¥1.8 / ¥10.8Input/Output
109
grok-4.3
Xai
69.9
5.3K
1M
¥9 / ¥18Input/Output
110
mimo-v2-flash (thinking)
Xiaomi
69.6
3K
262K
¥0.72 / ¥2.16Input/Output
111
hunyuan-t1-20250711
Tencent
69.4
1.1K
131K
¥0 / ¥0Input/Output
112
grok-4-1-fast-reasoning
Xai
69.1
15.7K
2M
¥1.44 / ¥3.6Input/Output
113
claude-sonnet-4-20250514-thinking-32k
Anthropic
68.8
8.6K
200K
¥21.6 / ¥108Input/Output
114
gpt-5.3-chat-latest
Openai
68.5
9.6K
128K
¥12.6 / ¥101Input/Output
115
qwen3-vl-235b-a22b-thinking
Alibaba
68.2
2.1K
131K
¥2.06 / ¥8.26Input/Output
116
claude-opus-4-20250514
Anthropic
68.0
10.7K
200K
¥108 / ¥540Input/Output
117
qwen3.5-flash
Alibaba
67.7
9.1K
1M
¥1.24 / ¥12.4Input/Output
118
o3-2025-04-16
Openai
67.4
15.5K
200K
¥14.4 / ¥57.6Input/Output
119
o1-2024-12-17
Openai
67.1
10.2K
128K
¥108 / ¥432Input/Output
120
gemini-2.5-flash-lite-preview-06-17-thinking
Google
66.9
8.1K
65.5K
¥0.72 / ¥2.88Input/Output
121
gpt-4.1-2025-04-14
Openai
66.6
13.3K
1.05M
¥14.4 / ¥57.6Input/Output
122
amazon-nova-experimental-chat-10-20
Amazon
66.3
3K
-
-
123
qwen3-30b-a3b-instruct-2507
Alibaba
66.0
6K
262K
¥2.16 / ¥3.6Input/Output
124
qwen3-235b-a22b-no-thinking
Alibaba
65.7
9.3K
131K
¥2.07 / ¥8.26Input/Output
125
gpt-5.4-nano-high
Openai
65.5
8.3K
400K
¥1.44 / ¥9Input/Output
126
gpt-5-mini-high
Openai
65.2
6.9K
400K
¥1.8 / ¥14.4Input/Output
127
deepseek-r1
Deepseek
64.9
6.4K
164K
¥5.04 / ¥18Input/Output
128
grok-3-mini-high
Xai
64.6
4.2K
128K
¥0 / ¥0Input/Output
129
qwen3-coder-480b-a35b-instruct
Alibaba
64.3
6.2K
262K
¥6.2 / ¥24.8Input/Output
130
gemini-2.5-flash-lite-preview-09-2025-no-thinking
Google
64.1
12.9K
1.05M
¥0.72 / ¥2.88Input/Output
131
minimax-m2.5
Minimax
63.8
11.2K
205K
¥0 / ¥0Input/Output
132
glm-4.5-air
Zai
63.5
8.1K
131K
¥0 / ¥0Input/Output
133
glm-4.6v
Zai
63.2
733
128K
¥2.16 / ¥6.48Input/Output
134
claude-3-7-sonnet-20250219-thinking-32k
Anthropic
63.0
10.1K
-
-
135
claude-sonnet-4-20250514
Anthropic
62.7
10K
200K
¥21.6 / ¥108Input/Output
136
nvidia-nemotron-3-super-120b-a12b
Nvidia
62.4
2K
262K
¥1.44 / ¥5.76Input/Output
137
kimi-k2-0905-preview
Moonshot
62.1
2.9K
262K
¥4.32 / ¥18Input/Output
138
deepseek-v3-0324
Deepseek
61.8
12.4K
75K
¥1.44 / ¥5.76Input/Output
139
grok-3-mini-beta
Xai
61.6
5.5K
1M
¥9 / ¥18Input/Output
140
qwen3-next-80b-a3b-thinking
Alibaba
61.3
3.5K
131K
¥1.04 / ¥10.3Input/Output
141
o1-preview
Openai
61.0
12.8K
128K
¥108 / ¥432Input/Output
142
hunyuan-turbos-20250416
Tencent
60.7
2.4K
131K
¥0 / ¥0Input/Output
143
mistral-medium-2505
Mistral
60.4
7.9K
262K
¥2.88 / ¥14.4Input/Output
144
o3-mini-high
Openai
60.2
6.7K
200K
¥7.92 / ¥31.7Input/Output
145
gemini-2.0-flash-001
Google
59.9
13.6K
1.05M
¥1.08 / ¥4.32Input/Output
146
nova-2-lite
Amazon
59.6
3.3K
128K
¥2.38 / ¥19.8Input/Output
147
qwen2.5-max
Alibaba
59.3
11K
32K
¥11.5 / ¥46Input/Output
148
trinity-large-preview
-
59.1
8.5K
262K
¥1.8 / ¥6.48Input/Output
149
minimax-m2
Minimax
58.8
2K
197K
¥0 / ¥0Input/Output
150
qwen3-235b-a22b
Alibaba
58.5
6.2K
131K
¥2.07 / ¥8.26Input/Output
151
gpt-4.1-mini-2025-04-14
Openai
58.2
10.1K
1.05M
¥2.88 / ¥11.5Input/Output
152
step-3
Stepfun
57.9
1.6K
65.5K
¥1.8 / ¥4.68Input/Output
153
glm-4.7-flash
Zai
57.7
3.2K
200K
¥0 / ¥0Input/Output
154
trinity-large-thinking
-
57.4
7.9K
262K
¥1.8 / ¥6.48Input/Output
155
claude-3-7-sonnet-20250219
Anthropic
57.1
12.3K
200K
¥21.6 / ¥108Input/Output
156
kimi-k2-0711-preview
Moonshot
56.8
6.8K
131K
¥4.32 / ¥18Input/Output
157
gemma-3-27b-it
Google
56.5
12.5K
128K
¥2.15 / ¥2.15Input/Output
158
mercury-2
Inception Ai
56.3
836
128K
¥1.8 / ¥5.4Input/Output
159
o4-mini-2025-04-16
Openai
56.0
11.9K
200K
¥7.92 / ¥31.7Input/Output
160
gpt-oss-120b
Openai
55.7
7.8K
131K
¥1.08 / ¥4.32Input/Output
161
ling-flash-2.0
Ant Group
55.4
1.8K
131K
¥1.01 / ¥4.1Input/Output
162
hunyuan-turbos-20250226
Tencent
55.2
886
131K
¥0 / ¥0Input/Output
163
amazon-nova-experimental-chat-10-09
Amazon
54.9
720
-
-
164
deepseek-v3
Deepseek
54.6
8.6K
128K
¥0 / ¥0Input/Output
165
o3-mini
Openai
54.3
17K
200K
¥7.92 / ¥31.7Input/Output
166
glm-4.5v
Zai
54.0
1.3K
64K
¥4.32 / ¥13Input/Output
167
ring-flash-2.0
Ant Group
53.8
1.9K
131K
¥1.01 / ¥4.1Input/Output
168
intellect-3
-
53.5
1.4K
131K
¥1.44 / ¥7.92Input/Output
169
minimax-m1
Minimax
53.2
8.7K
1M
¥0.95 / ¥9.03Input/Output
170
mistral-small-2506
Mistral
52.9
4.4K
262K
¥2.88 / ¥14.4Input/Output
171
step-1o-turbo-202506
Stepfun
52.6
2.1K
-
-
172
command-a-03-2025
Cohere
52.4
15.5K
256K
¥18 / ¥72Input/Output
173
llama-3.1-nemotron-ultra-253b-v1
Nvidia
52.1
660
128K
¥4.32 / ¥13Input/Output
174
gpt-5-nano-high
Openai
51.8
2K
400K
¥0.36 / ¥2.88Input/Output
175
qwen3-32b
Alibaba
51.5
858
131K
¥2.07 / ¥8.26Input/Output
176
gemini-2.0-flash-lite-preview-02-05
Google
51.3
9.2K
1.05M
¥0.54 / ¥2.16Input/Output
177
o1-mini
Openai
51.0
21.5K
128K
¥7.92 / ¥31.7Input/Output
178
nvidia-nemotron-3-nano-30b-a3b-bf16
Nvidia
50.7
4.2K
131K
¥0 / ¥0Input/Output
179
qwen-plus-0125
Alibaba
50.4
2.2K
1M
¥0.83 / ¥2.07Input/Output
180
gemma-3-12b-it
Google
50.1
1.1K
128K
¥1.96 / ¥1.96Input/Output
181
olmo-3.1-32b-instruct
Allenai
49.9
3.2K
200K
¥14.4 / ¥57.6Input/Output
182
nvidia-llama-3.3-nemotron-super-49b-v1.5
Nvidia
49.6
807
131K
¥2.88 / ¥2.88Input/Output
183
qwq-32b
Alibaba
49.3
7.2K
131K
¥2.07 / ¥6.2Input/Output
184
claude-3-5-sonnet-20241022
Anthropic
49.0
31.3K
200K
¥21.6 / ¥108Input/Output
185
gemini-1.5-pro-002
Google
48.7
22.8K
-
-
186
llama-3.3-nemotron-49b-super-v1
Nvidia
48.5
803
131K
¥0 / ¥0Input/Output
187
glm-4-plus-0111
Zai
48.2
2.2K
128K
¥72 / ¥72Input/Output
188
step-2-16k-exp-202412
Stepfun
47.9
2K
16.4K
¥37.5 / ¥118Input/Output
189
qwen3-30b-a3b
Alibaba
47.6
6.1K
128K
¥0.79 / ¥7.78Input/Output
190
deepseek-v2.5-1210
Deepseek
47.4
3K
1M
¥1.01 / ¥2.02Input/Output
191
hunyuan-turbo-0110
Tencent
47.1
842
-
-
192
yi-lightning
-
46.8
10.9K
12K
¥1.44 / ¥1.44Input/Output
193
gpt-4o-2024-05-13
Openai
46.5
43.8K
128K
¥36 / ¥108Input/Output
194
hunyuan-large-2025-02-10
Tencent
46.2
1.3K
-
-
195
granite-4.1-8b
Ibm
46.0
1.2K
131K
¥0.36 / ¥0.72Input/Output
196
molmo-2-8b
Allenai
45.7
217
-
-
197
qwen2.5-plus-1127
Alibaba
45.4
4.2K
-
-
198
olmo-3-32b-think
Allenai
45.1
1.5K
128K
¥2.16 / ¥3.24Input/Output
199
athene-v2-chat
-
44.8
10.2K
-
-
200
grok-2-2024-08-13
Xai
44.6
25.7K
1M
¥9 / ¥18Input/Output
201
claude-3-5-sonnet-20240620
Anthropic
44.3
32.1K
200K
¥21.6 / ¥108Input/Output
202
llama-4-maverick-17b-128e-instruct
Meta
44.0
10.5K
1M
¥1.8 / ¥6.26Input/Output
203
gpt-4o-2024-08-06
Openai
43.7
18.3K
128K
¥18 / ¥72Input/Output
204
gpt-4.1-nano-2025-04-14
Openai
43.5
2K
1.05M
¥14.4 / ¥57.6Input/Output
205
mistral-small-3.1-24b-instruct-2503
Mistral
43.2
8K
262K
¥2.88 / ¥14.4Input/Output
206
glm-4-plus
Zai
42.9
10.7K
128K
¥54 / ¥54Input/Output
207
qwen-max-0919
Alibaba
42.6
6.9K
131K
¥2.48 / ¥9.91Input/Output
208
llama-3.1-405b-instruct-bf16
Meta
42.3
16.2K
128K
¥0 / ¥0Input/Output
209
llama-3.1-405b-instruct-fp8
Meta
42.1
23.6K
128K
¥0 / ¥0Input/Output
210
gpt-4o-mini-2024-07-18
Openai
41.8
26.7K
128K
¥1.08 / ¥4.32Input/Output
211
gemini-1.5-flash-002
Google
41.5
14.6K
2M
¥0.54 / ¥2.2Input/Output
212
magistral-medium-2506
Mistral
41.2
3.1K
128K
¥14.4 / ¥36Input/Output
213
gemma-3n-e4b-it
Google
40.9
5K
128K
¥0 / ¥0Input/Output
214
qwen2.5-72b-instruct
Alibaba
40.7
16.4K
131K
¥4.13 / ¥12.4Input/Output
215
gemini-1.5-pro-001
Google
40.4
29.8K
-
-
216
gemini-advanced-0514
Google
40.1
18.5K
-
-
217
llama-3.1-nemotron-70b-instruct
Nvidia
39.8
3K
128K
¥0 / ¥0Input/Output
218
llama-4-scout-17b-16e-instruct
Meta
39.6
7.5K
128K
¥1.44 / ¥5.62Input/Output
219
gpt-4-turbo-2024-04-09
Openai
39.3
36.3K
128K
¥72 / ¥216Input/Output
220
hunyuan-large-vision
Tencent
39.0
1.2K
-
-
221
deepseek-v2.5
Deepseek
38.7
10.2K
1M
¥1.01 / ¥2.02Input/Output
222
mistral-large-2411
Mistral
38.4
11K
128K
¥14.4 / ¥43.2Input/Output
223
claude-3-opus-20240229
Anthropic
38.2
72K
200K
¥108 / ¥540Input/Output
224
mistral-large-2407
Mistral
37.9
18.3K
131K
¥14.4 / ¥43.2Input/Output
225
hunyuan-standard-2025-02-10
Tencent
37.6
1.3K
-
-
226
grok-2-mini-2024-08-13
Xai
37.3
21.1K
1M
¥9 / ¥18Input/Output
227
olmo-3.1-32b-think
Allenai
37.0
2.2K
200K
¥14.4 / ¥57.6Input/Output
228
llama-3.3-70b-instruct
Meta
36.8
18.8K
128K
¥0 / ¥0Input/Output
229
gpt-4-1106-preview
Openai
36.5
34.4K
8.19K
¥216 / ¥432Input/Output
230
claude-3-5-haiku-20241022
Anthropic
36.2
22K
200K
¥5.76 / ¥28.8Input/Output
231
gemma-3-4b-it
Google
35.9
1.2K
128K
¥1.44 / ¥1.44Input/Output
232
gpt-oss-20b
Openai
35.7
2.6K
131K
¥0.32 / ¥1.3Input/Output
233
mercury
Inception Ai
35.4
569
128K
¥1.8 / ¥5.4Input/Output
234
amazon-nova-pro-v1.0
Amazon
35.1
9.5K
300K
¥5.76 / ¥23Input/Output
235
gpt-4-0125-preview
Openai
34.8
33.3K
8.19K
¥216 / ¥432Input/Output
236
llama-3.1-tulu-3-70b
Allenai
34.5
1.2K
-
-
237
llama-3.1-70b-instruct
Meta
34.3
21.9K
131K
¥2.88 / ¥2.88Input/Output
238
athene-70b-0725
-
34.0
7.5K
-
-
239
qwen2.5-coder-32b-instruct
Alibaba
33.7
2.2K
131K
¥2.07 / ¥6.2Input/Output
240
ibm-granite-h-small
Ibm
33.4
1.6K
-
-
241
mistral-small-24b-instruct-2501
Mistral
33.1
5.5K
262K
¥2.88 / ¥14.4Input/Output
242
jamba-1.5-large
-
32.9
3.3K
256K
¥0 / ¥0Input/Output
243
gemini-1.5-flash-001
Google
32.6
23.7K
2M
¥0.54 / ¥2.2Input/Output
244
reka-core-20240904
-
32.3
3.1K
-
-
245
gemma-2-27b-it
Google
32.0
29.5K
8.19K
¥0.58 / ¥0.58Input/Output
246
amazon-nova-lite-v1.0
Amazon
31.8
7.8K
300K
¥0.43 / ¥1.73Input/Output
247
nemotron-4-340b-instruct
Nvidia
31.5
7.4K
-
-
248
glm-4-0520
Zai
31.2
3.8K
128K
¥108 / ¥108Input/Output
249
hunyuan-standard-256k
Tencent
30.9
1.1K
-
-
250
phi-4
Microsoft
30.6
9.2K
128K
¥0.9 / ¥3.6Input/Output
251
gpt-4-0314
Openai
30.4
18.1K
8.19K
¥216 / ¥432Input/Output
252
llama-3.1-nemotron-51b-instruct
Nvidia
30.1
1.5K
128K
¥0 / ¥0Input/Output
253
claude-3-sonnet-20240229
Anthropic
29.8
38.8K
200K
¥21.6 / ¥108Input/Output
254
gemini-1.5-flash-8b-001
Google
29.5
14.9K
2M
¥0.54 / ¥2.2Input/Output
255
command-r-plus-08-2024
Cohere
29.2
4K
128K
¥18 / ¥72Input/Output
256
c4ai-aya-expanse-32b
Cohere
29.0
11.3K
-
-
257
llama-3-70b-instruct
Meta
28.7
56.6K
8.19K
¥3.67 / ¥5.33Input/Output
258
gemma-2-9b-it-simpo
-
28.4
3.7K
8.19K
¥1.44 / ¥1.44Input/Output
259
gpt-4-0613
Openai
28.1
29.7K
8.19K
¥216 / ¥432Input/Output
260
olmo-2-0325-32b-instruct
Allenai
27.9
1.1K
-
-
261
reka-flash-20240904
-
27.6
3.2K
65.5K
¥0.72 / ¥1.44Input/Output
262
qwen2-72b-instruct
Alibaba
27.3
14.2K
131K
¥4.13 / ¥12.4Input/Output
263
deepseek-coder-v2
Deepseek
27.0
5.6K
1M
¥1.01 / ¥2.02Input/Output
264
command-r-plus
Cohere
26.7
28.1K
128K
¥18 / ¥72Input/Output
265
gemma-2-9b-it
Google
26.5
21.4K
8.19K
¥1.44 / ¥1.44Input/Output
266
llama-3.1-tulu-3-8b
Allenai
26.2
1.2K
-
-
267
amazon-nova-micro-v1.0
Amazon
25.9
7.7K
128K
¥0.25 / ¥1.01Input/Output
268
claude-3-haiku-20240307
Anthropic
25.6
43K
200K
¥1.8 / ¥9Input/Output
269
mistral-large-2402
Mistral
25.3
21.5K
262K
¥2.88 / ¥14.4Input/Output
270
command-r-08-2024
Cohere
25.1
4.2K
128K
¥18 / ¥72Input/Output
271
ministral-8b-2410
Mistral
24.8
1.9K
128K
¥0.72 / ¥0.72Input/Output
272
llama-3.1-8b-instruct
Meta
24.5
19.8K
131K
¥0.79 / ¥0.79Input/Output
273
qwen1.5-110b-chat
Alibaba
24.2
9.5K
-
-
274
qwq-32b-preview
Alibaba
24.0
1.3K
131K
¥2.07 / ¥6.2Input/Output
275
c4ai-aya-expanse-8b
Cohere
23.7
4K
-
-
276
mistral-medium
Mistral
23.4
11.5K
262K
¥2.88 / ¥14.4Input/Output
277
jamba-1.5-mini
-
23.1
3.3K
256K
¥0 / ¥0Input/Output
278
mixtral-8x22b-instruct-v0.1
Mistral
22.8
18.5K
64K
¥14.4 / ¥43.2Input/Output
279
qwen1.5-72b-chat
Alibaba
22.6
13.8K
-
-
280
yi-1.5-34b-chat
-
22.3
9K
-
-
281
internlm2_5-20b-chat
-
22.0
4.1K
-
-
282
granite-3.1-8b-instruct
Ibm
21.7
1.3K
-
-
283
reka-flash-21b-20240226-online
-
21.4
5.6K
-
-
284
llama-3-8b-instruct
Meta
21.2
37.7K
8.19K
¥0.29 / ¥0.29Input/Output
285
command-r
Cohere
20.9
19.1K
128K
¥18 / ¥72Input/Output
286
reka-flash-21b-20240226
-
20.6
9K
-
-
287
gpt-3.5-turbo-0125
Openai
20.3
23.5K
16.4K
¥3.6 / ¥10.8Input/Output
288
zephyr-orpo-141b-A35b-v0.1
-
20.1
1.6K
200K
¥108 / ¥432Input/Output
289
gemma-2-2b-it
Google
19.8
18.2K
128K
¥0 / ¥0Input/Output
290
granite-3.1-2b-instruct
Ibm
19.5
1.3K
-
-
291
qwen1.5-32b-chat
Alibaba
19.2
7.7K
-
-
292
gemini-pro-dev-api
Google
18.9
5.9K
1.05M
¥14.4 / ¥86.4Input/Output
293
phi-3-medium-4k-instruct
Microsoft
18.7
9.4K
4.1K
¥1.22 / ¥4.9Input/Output
294
dbrx-instruct-preview
-
18.4
11.3K
-
-
295
tulu-2-dpo-70b
-
18.1
2K
-
-
296
gemini-pro
Google
17.8
1.9K
1.05M
¥14.4 / ¥86.4Input/Output
297
mixtral-8x7b-instruct-v0.1
Mistral
17.5
25K
32K
¥5.04 / ¥5.04Input/Output
298
qwen1.5-14b-chat
Alibaba
17.3
6.2K
-
-
299
starling-lm-7b-beta
-
17.0
5.8K
200K
¥5.4 / ¥18.7Input/Output
300
wizardlm-70b
Microsoft
16.7
2.7K
-
-
301
gpt-3.5-turbo-1106
Openai
16.4
5.2K
16.4K
¥7.2 / ¥14.4Input/Output
302
yi-34b-chat
-
16.2
5.1K
-
-
303
llama-3.2-3b-instruct
Meta
15.9
3.2K
131K
¥0.22 / ¥0.35Input/Output
304
granite-3.0-8b-instruct
Ibm
15.6
2.6K
-
-
305
phi-3-small-8k-instruct
Microsoft
15.3
6.6K
8.19K
¥1.08 / ¥4.32Input/Output
306
deepseek-llm-67b-chat
Deepseek
15.0
1.5K
1M
¥1.01 / ¥2.02Input/Output
307
openchat-3.5-0106
-
14.8
4.4K
-
-
308
llama-2-70b-chat
Meta
14.5
12.6K
-
-
309
openhermes-2.5-mistral-7b
-
14.2
1.6K
1M
¥36 / ¥180Input/Output
310
openchat-3.5
-
13.9
2.4K
-
-
311
starling-lm-7b-alpha
-
13.6
3.3K
200K
¥5.4 / ¥18.7Input/Output
312
snowflake-arctic-instruct
-
13.4
11.7K
-
-
313
vicuna-33b
-
13.1
7K
-
-
314
llama2-70b-steerlm-chat
Nvidia
12.8
1.1K
-
-
315
mistral-7b-instruct-v0.2
Mistral
12.5
6.7K
262K
¥2.88 / ¥14.4Input/Output
316
qwen1.5-7b-chat
Alibaba
12.3
1.7K
-
-
317
phi-3-mini-4k-instruct-june-2024
Microsoft
12.0
4.4K
4.1K
¥0.94 / ¥3.74Input/Output
318
gemma-1.1-7b-it
Google
11.7
8.9K
-
-
319
nous-hermes-2-mixtral-8x7b-dpo
-
11.4
1.4K
1M
¥36 / ¥180Input/Output
320
granite-3.0-2b-instruct
Ibm
11.1
2.7K
-
-
321
mpt-30b-chat
-
10.9
718
-
-
322
dolphin-2.2.1-mistral-7b
-
10.6
497
262K
¥2.88 / ¥14.4Input/Output
323
phi-3-mini-4k-instruct
Microsoft
10.3
7.6K
4.1K
¥0.94 / ¥3.74Input/Output
324
wizardlm-13b
Microsoft
10.0
2K
-
-
325
falcon-180b-chat
-
9.7
389
-
-
326
solar-10.7b-instruct-v1.0
-
9.5
1.2K
128K
¥0 / ¥0Input/Output
327
llama-2-13b-chat
Meta
9.2
6.1K
-
-
328
vicuna-13b
-
8.9
5.7K
-
-
329
zephyr-7b-alpha
-
8.6
534
-
-
330
zephyr-7b-beta
-
8.4
3.1K
-
-
331
qwen-14b-chat
Alibaba
8.1
1.5K
32.8K
¥1.04 / ¥3.1Input/Output
332
llama-3.2-1b-instruct
Meta
7.8
3.2K
16.4K
¥0.07 / ¥0.08Input/Output
333
smollm2-1.7b-instruct
-
7.5
859
-
-
334
codellama-34b-instruct
Meta
7.2
2.3K
-
-
335
codellama-70b-instruct
Meta
7.0
358
-
-
336
phi-3-mini-128k-instruct
Microsoft
6.7
7.4K
128K
¥0.94 / ¥3.74Input/Output
337
gemma-7b-it
Google
6.4
2.8K
-
-
338
stripedhyena-nous-7b
-
6.1
1.7K
-
-
339
palm-2
Google
5.8
2.5K
-
-
340
llama-2-7b-chat
Meta
5.6
4.5K
128K
¥4.03 / ¥48Input/Output
341
mistral-7b-instruct
Mistral
5.3
2.8K
262K
¥2.88 / ¥14.4Input/Output
342
vicuna-7b
-
5.0
2K
-
-
343
gemma-1.1-2b-it
Google
4.7
3.9K
-
-
344
guanaco-33b
-
4.5
777
200K
¥14.4 / ¥57.6Input/Output
345
qwen1.5-4b-chat
Alibaba
4.2
2.6K
-
-
346
olmo-7b-instruct
Allenai
3.9
1.9K
-
-
347
gemma-2b-it
Google
3.6
1.5K
-
-
348
chatglm3-6b
-
3.3
1.3K
200K
¥5.4 / ¥18.7Input/Output
349
gpt4all-13b-snoozy
-
3.1
483
1M
¥36 / ¥216Input/Output
350
koala-13b
-
2.8
1.9K
-
-
351
mpt-7b-chat
-
2.5
1.1K
-
-
352
chatglm2-6b
-
2.2
762
200K
¥5.4 / ¥18.7Input/Output
353
chatglm-6b
-
1.9
1.3K
200K
¥5.4 / ¥18.7Input/Output
354
alpaca-13b
-
1.7
1.5K
-
-
355
oasst-pythia-12b
-
1.4
1.7K
-
-
356
RWKV-4-Raven-14B
-
1.1
1.4K
-
-
357
fastchat-t5-3b
-
0.8
1.1K
-
-
358
stablelm-tuned-alpha-7b
-
0.6
814
-
-
359
dolly-v2-12b
-
0.3
899
-
-
360
llama-13b
Meta
0.0
584
-
-
Top model analysis

claude-opus-4-6-thinking why it ranks first

claude-opus-4-6-thinking ranks first with a percent score of 100.0 and 10K samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

指令遵循排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

指令遵循模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。