Chat · Text · Mathematical Occupations Leaderboard

Ranking for Text / Mathematical Occupations, based on public preference data.

Selection guide

Mathematical Occupations model ranking guide

Ranking for Text / Mathematical Occupations, based on public preference data.

claude-opus-4-6claude-opus-4-6-thinkinggemini-3.5-flashmimo-v2.5-progpt-5.4-high
Current DirectoryChat · Text · Mathematical Occupations
Models342
Published2026/05/27
Arena public preference evaluationOriginal leaderboard: Text / Industry MathematicalPublished: 2026/05/27Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
claude-opus-4-6
Anthropic
100.0
1.9K
1M
¥36 / ¥180Input/Output
2
claude-opus-4-6-thinking
Anthropic
99.7
1.7K
1M
¥36 / ¥180Input/Output
3
gemini-3.5-flash
Google
99.4
561
1.05M
¥10.8 / ¥64.8Input/Output
4
mimo-v2.5-pro
Xiaomi
99.1
839
1.05M
¥7.2 / ¥21.6Input/Output
5
gpt-5.4-high
Openai
98.8
1.6K
1.05M
¥18 / ¥108Input/Output
6
claude-opus-4-7-thinking
Anthropic
98.5
1.1K
1M
¥36 / ¥180Input/Output
7
claude-opus-4-7
Anthropic
98.2
1.2K
1M
¥36 / ¥180Input/Output
8
gpt-5.5
Openai
97.9
982
1.05M
¥36 / ¥216Input/Output
9
qwen3.5-max-preview
Alibaba
97.7
1.1K
-
-
10
gpt-5.5-high
Openai
97.4
987
1.05M
¥36 / ¥216Input/Output
11
ernie-5.1
Baidu
97.1
759
119K
¥5.4 / ¥21.6Input/Output
12
qwen3.6-max-preview
Alibaba
96.8
308
246K
¥9.5 / ¥56.9Input/Output
13
kimi-k2.6
Moonshot
96.5
838
262K
¥6.84 / ¥28.8Input/Output
14
gemini-3.1-pro-preview
Google
96.2
2.3K
1.05M
¥14.4 / ¥86.4Input/Output
15
claude-sonnet-4-6
Anthropic
95.9
1.5K
1M
¥21.6 / ¥108Input/Output
16
gemini-3-pro
Google
95.6
1.9K
1.05M
¥14.4 / ¥86.4Input/Output
17
kimi-k2.5-thinking
Moonshot
95.3
1.8K
262K
¥4.32 / ¥21.6Input/Output
18
glm-5.1
Zai
95.0
802
200K
¥0 / ¥0Input/Output
19
qwen3.7-max-preview
Alibaba
94.7
227
1M
¥18 / ¥54Input/Output
20
claude-opus-4-5-20251101
Anthropic
94.4
3.1K
200K
¥36 / ¥180Input/Output
21
gemma-4-26b-a4b
Google
94.1
296
262K
¥0.94 / ¥2.88Input/Output
22
gemini-3-flash
Google
93.8
1.4K
1.05M
¥3.6 / ¥21.6Input/Output
23
mimo-v2-pro
Xiaomi
93.5
1.3K
1.05M
¥7.2 / ¥21.6Input/Output
24
gemma-4-31b
Google
93.3
322
262K
¥3.24 / ¥7.2Input/Output
25
claude-sonnet-4-5-20250929-thinking-32k
Anthropic
93.0
3.6K
200K
¥21.6 / ¥108Input/Output
26
qwen3.5-397b-a17b
Alibaba
92.7
1.7K
262K
¥3.1 / ¥18.6Input/Output
27
gpt-5.4
Openai
92.4
1.6K
1.05M
¥18 / ¥108Input/Output
28
deepseek-v4-pro-thinking
Deepseek
92.1
836
1M
¥3.13 / ¥6.26Input/Output
29
muse-spark
Meta
91.8
717
-
-
30
claude-opus-4-5-20251101-thinking-32k
Anthropic
91.5
1.6K
200K
¥108 / ¥540Input/Output
31
grok-4.20-beta-0309-reasoning
Xai
91.2
1.6K
2M
¥14.4 / ¥43.2Input/Output
32
deepseek-v4-pro
Deepseek
90.9
973
1M
¥3.13 / ¥6.26Input/Output
33
glm-5
Zai
90.6
1.2K
205K
¥7.2 / ¥23Input/Output
34
mimo-v2.5
Xiaomi
90.3
914
1.05M
¥2.88 / ¥14.4Input/Output
35
gemini-2.5-pro
Google
90.0
6.5K
1.05M
¥9 / ¥72Input/Output
36
qwen3.6-plus
Alibaba
89.7
1K
1M
¥3.6 / ¥21.6Input/Output
37
grok-4.20-multi-agent-beta-0309
Xai
89.4
1.6K
2M
¥14.4 / ¥43.2Input/Output
38
glm-4.6
Zai
89.1
1.7K
205K
¥4.32 / ¥15.8Input/Output
39
qwen3-max-preview
Alibaba
88.9
1.3K
262K
¥6.2 / ¥24.8Input/Output
40
gpt-5.1-high
Openai
88.6
1.7K
400K
¥9 / ¥72Input/Output
41
claude-sonnet-4-5-20250929
Anthropic
88.3
3.7K
200K
¥21.6 / ¥108Input/Output
42
deepseek-v3.2
Deepseek
88.0
2.2K
128K
¥2.09 / ¥3.1Input/Output
43
deepseek-v4-flash
Deepseek
87.7
957
1M
¥1.01 / ¥2.02Input/Output
44
gpt-5.2-high
Openai
87.4
2.3K
400K
¥12.6 / ¥101Input/Output
45
hunyuan-hy3-preview
Tencent
87.1
360
256K
¥0 / ¥0Input/Output
46
longcat-flash-chat
Meituan
86.8
545
128K
¥1.08 / ¥10.8Input/Output
47
kimi-k2.5-instant
Moonshot
86.5
333
262K
¥4.32 / ¥21.6Input/Output
48
kimi-k2-thinking-turbo
Moonshot
86.2
2.9K
262K
¥17.3 / ¥72Input/Output
49
qwen3-235b-a22b-instruct-2507
Alibaba
85.9
4.8K
128K
¥2.09 / ¥8.23Input/Output
50
qwen3.5-27b
Alibaba
85.6
1.4K
262K
¥2.16 / ¥17.3Input/Output
51
gemini-3-flash (thinking-minimal)
Google
85.3
2.6K
1.05M
¥3.6 / ¥21.6Input/Output
52
qwen3-vl-235b-a22b-thinking
Alibaba
85.0
379
131K
¥2.06 / ¥8.26Input/Output
53
qwen3-235b-a22b-thinking-2507
Alibaba
84.8
492
131K
¥2.07 / ¥8.26Input/Output
54
gpt-5.4-mini-high
Openai
84.5
1.4K
400K
¥5.4 / ¥32.4Input/Output
55
deepseek-v3.1-thinking
Deepseek
84.2
565
128K
¥1.44 / ¥5.04Input/Output
56
longcat-flash-chat-2602-exp
Meituan
83.9
1.3K
128K
¥1.08 / ¥10.8Input/Output
57
grok-4.20-beta1
Xai
83.6
1.3K
2M
¥14.4 / ¥43.2Input/Output
58
qwen3-next-80b-a3b-instruct
Alibaba
83.3
1.1K
131K
¥1.04 / ¥4.13Input/Output
59
deepseek-v3.2-exp-thinking
Deepseek
83.0
393
128K
¥0 / ¥0Input/Output
60
grok-4-fast-chat
Xai
82.7
320
2M
¥1.44 / ¥3.6Input/Output
61
deepseek-v3.2-thinking
Deepseek
82.4
1.7K
128K
¥2.09 / ¥3.1Input/Output
62
gpt-5.2-chat-latest-20260210
Openai
82.1
1.7K
400K
¥12.6 / ¥101Input/Output
63
dola-seed-2.0-pro
Bytedance
81.8
2K
-
-
64
minimax-m2.7
Minimax
81.5
1.2K
205K
¥0 / ¥0Input/Output
65
claude-opus-4-1-20250805-thinking-16k
Anthropic
81.2
2.4K
200K
¥108 / ¥540Input/Output
66
qwen3.5-122b-a10b
Alibaba
80.9
1.4K
262K
¥2.88 / ¥23Input/Output
67
deepseek-v4-flash-thinking
Deepseek
80.6
886
1M
¥1.01 / ¥2.02Input/Output
68
ernie-5.0-0110
Baidu
80.4
1.6K
128K
¥7.92 / ¥14.4Input/Output
69
ernie-5.0-preview-1022
Baidu
80.1
267
128K
¥7.92 / ¥14.4Input/Output
70
amazon-nova-experimental-chat-11-10
Amazon
79.8
1.1K
-
-
71
glm-4.5
Zai
79.5
1.3K
131K
¥4.32 / ¥15.8Input/Output
72
claude-opus-4-1-20250805
Anthropic
79.2
3.8K
200K
¥108 / ¥540Input/Output
73
grok-4-0709
Xai
78.9
2.2K
256K
¥21.6 / ¥108Input/Output
74
mistral-large-3
Mistral
78.6
1.9K
262K
¥3.6 / ¥10.8Input/Output
75
gpt-5.4-nano-high
Openai
78.3
1.4K
400K
¥1.44 / ¥9Input/Output
76
o3-2025-04-16
Openai
78.0
3.5K
200K
¥14.4 / ¥57.6Input/Output
77
gemini-2.5-flash
Google
77.7
6.7K
1.05M
¥2.16 / ¥18Input/Output
78
gemini-2.5-flash-preview-09-2025
Google
77.4
1.6K
1M
¥2.16 / ¥18Input/Output
79
gpt-5.1
Openai
77.1
2K
400K
¥9 / ¥72Input/Output
80
qwen3-vl-235b-a22b-instruct
Alibaba
76.8
591
128K
¥2.16 / ¥8.64Input/Output
81
minimax-m2.1-preview
Minimax
76.5
652
205K
¥0 / ¥0Input/Output
82
gemini-3.1-flash-lite-preview
Google
76.2
1.9K
1.05M
¥1.8 / ¥10.8Input/Output
83
qwen3-32b
Alibaba
76.0
282
131K
¥2.07 / ¥8.26Input/Output
84
claude-haiku-4-5-20251001
Anthropic
75.7
3.8K
200K
¥7.2 / ¥36Input/Output
85
deepseek-r1-0528
Deepseek
75.4
949
164K
¥3.6 / ¥15.5Input/Output
86
grok-4.3
Xai
75.1
813
1M
¥9 / ¥18Input/Output
87
gpt-5.2
Openai
74.8
2.3K
400K
¥12.6 / ¥101Input/Output
88
gpt-5.5-instant
Openai
74.5
1.5K
400K
¥9 / ¥72Input/Output
89
step-3.5-flash
Stepfun
74.2
1.7K
256K
¥0.69 / ¥2.07Input/Output
90
mistral-medium-2508
Mistral
73.9
4.7K
262K
¥2.88 / ¥14.4Input/Output
91
deepseek-v3.1
Deepseek
73.6
839
128K
¥1.44 / ¥5.04Input/Output
92
mimo-v2-flash (non-thinking)
Xiaomi
73.3
2.1K
262K
¥0.72 / ¥2.16Input/Output
93
deepseek-v3.2-exp
Deepseek
73.0
634
128K
¥0 / ¥0Input/Output
94
amazon-nova-experimental-chat-10-20
Amazon
72.7
594
-
-
95
grok-4.1-thinking
Xai
72.4
2.9K
200K
¥14.4 / ¥72Input/Output
96
qwen3-max-2025-09-23
Alibaba
72.1
478
258K
¥6.19 / ¥24.7Input/Output
97
qwen3.5-flash
Alibaba
71.8
1.5K
1M
¥1.24 / ¥12.4Input/Output
98
glm-4.7
Zai
71.6
541
205K
¥0 / ¥0Input/Output
99
gpt-4.5-preview-2025-02-27
Openai
71.3
1.3K
8.19K
¥216 / ¥432Input/Output
100
qwen3-30b-a3b-instruct-2507
Alibaba
71.0
1.2K
262K
¥2.16 / ¥3.6Input/Output
101
ernie-5.0-preview-1203
Baidu
70.7
477
128K
¥7.92 / ¥14.4Input/Output
102
grok-4.1
Xai
70.4
3.2K
200K
¥14.4 / ¥72Input/Output
103
qwen3.5-35b-a3b
Alibaba
70.1
1.4K
262K
¥1.8 / ¥14.4Input/Output
104
chatgpt-4o-latest-20250326
Openai
69.8
4.8K
128K
¥18 / ¥72Input/Output
105
grok-3-preview-02-24
Xai
69.5
2.4K
1M
¥9 / ¥18Input/Output
106
gpt-5-high
Openai
69.2
1.7K
400K
¥9 / ¥72Input/Output
107
grok-4-1-fast-reasoning
Xai
68.9
2.6K
2M
¥1.44 / ¥3.6Input/Output
108
qwen3-235b-a22b-no-thinking
Alibaba
68.6
2.3K
131K
¥2.07 / ¥8.26Input/Output
109
nvidia-nemotron-3-super-120b-a12b
Nvidia
68.3
414
262K
¥1.44 / ¥5.76Input/Output
110
nvidia-llama-3.3-nemotron-super-49b-v1.5
Nvidia
68.0
183
131K
¥2.88 / ¥2.88Input/Output
111
grok-4-fast-reasoning
Xai
67.7
941
2M
¥1.44 / ¥3.6Input/Output
112
o3-mini-high
Openai
67.4
1.7K
200K
¥7.92 / ¥31.7Input/Output
113
kimi-k2-0905-preview
Moonshot
67.2
641
262K
¥4.32 / ¥18Input/Output
114
gpt-5-chat
Openai
66.9
1.6K
400K
¥9 / ¥72Input/Output
115
qwen3-235b-a22b
Alibaba
66.6
1.5K
131K
¥2.07 / ¥8.26Input/Output
116
glm-4.5-air
Zai
66.3
1.5K
131K
¥0 / ¥0Input/Output
117
claude-opus-4-20250514-thinking-16k
Anthropic
66.0
2K
200K
¥108 / ¥540Input/Output
118
o1-2024-12-17
Openai
65.7
2.6K
128K
¥108 / ¥432Input/Output
119
mimo-v2-flash (thinking)
Xiaomi
65.4
426
262K
¥0.72 / ¥2.16Input/Output
120
deepseek-r1
Deepseek
65.1
1.5K
164K
¥5.04 / ¥18Input/Output
121
hunyuan-t1-20250711
Tencent
64.8
225
131K
¥0 / ¥0Input/Output
122
qwen3-next-80b-a3b-thinking
Alibaba
64.5
671
131K
¥1.04 / ¥10.3Input/Output
123
gpt-oss-120b
Openai
64.2
1.6K
131K
¥1.08 / ¥4.32Input/Output
124
gpt-5.3-chat-latest
Openai
63.9
1.7K
128K
¥12.6 / ¥101Input/Output
125
gpt-5-mini-high
Openai
63.6
1.3K
400K
¥1.8 / ¥14.4Input/Output
126
minimax-m2.5
Minimax
63.3
1.9K
205K
¥0 / ¥0Input/Output
127
o1-preview
Openai
63.0
3.9K
128K
¥108 / ¥432Input/Output
128
claude-sonnet-4-20250514-thinking-32k
Anthropic
62.8
1.9K
200K
¥21.6 / ¥108Input/Output
129
o3-mini
Openai
62.5
4.4K
200K
¥7.92 / ¥31.7Input/Output
130
intellect-3
-
62.2
233
131K
¥1.44 / ¥7.92Input/Output
131
nvidia-nemotron-3-nano-30b-a3b-bf16
Nvidia
61.9
661
131K
¥0 / ¥0Input/Output
132
claude-opus-4-20250514
Anthropic
61.6
2.6K
200K
¥108 / ¥540Input/Output
133
o4-mini-2025-04-16
Openai
61.3
2.8K
200K
¥7.92 / ¥31.7Input/Output
134
glm-4.7-flash
Zai
61.0
481
200K
¥0 / ¥0Input/Output
135
step-3
Stepfun
60.7
306
65.5K
¥1.8 / ¥4.68Input/Output
136
gemini-2.5-flash-lite-preview-09-2025-no-thinking
Google
60.4
2.1K
1.05M
¥0.72 / ¥2.88Input/Output
137
ling-flash-2.0
Ant Group
60.1
371
131K
¥1.01 / ¥4.1Input/Output
138
nova-2-lite
Amazon
59.8
583
128K
¥2.38 / ¥19.8Input/Output
139
minimax-m2
Minimax
59.5
285
197K
¥0 / ¥0Input/Output
140
qwen2.5-max
Alibaba
59.2
2.9K
32K
¥11.5 / ¥46Input/Output
141
gemini-2.5-flash-lite-preview-06-17-thinking
Google
58.9
2K
65.5K
¥0.72 / ¥2.88Input/Output
142
o1-mini
Openai
58.7
6.3K
128K
¥7.92 / ¥31.7Input/Output
143
qwen3-coder-480b-a35b-instruct
Alibaba
58.4
1.5K
262K
¥6.2 / ¥24.8Input/Output
144
grok-3-mini-high
Xai
58.1
987
128K
¥0 / ¥0Input/Output
145
deepseek-v3-0324
Deepseek
57.8
2.9K
75K
¥1.44 / ¥5.76Input/Output
146
minimax-m1
Minimax
57.5
1.8K
1M
¥0.95 / ¥9.03Input/Output
147
qwen3-30b-a3b
Alibaba
57.2
1.6K
128K
¥0.79 / ¥7.78Input/Output
148
trinity-large-thinking
-
56.9
1.3K
262K
¥1.8 / ¥6.48Input/Output
149
hunyuan-turbos-20250416
Tencent
56.6
795
131K
¥0 / ¥0Input/Output
150
ring-flash-2.0
Ant Group
56.3
360
131K
¥1.01 / ¥4.1Input/Output
151
gpt-4.1-2025-04-14
Openai
56.0
3.1K
1.05M
¥14.4 / ¥57.6Input/Output
152
grok-3-mini-beta
Xai
55.7
1.5K
1M
¥9 / ¥18Input/Output
153
kimi-k2-0711-preview
Moonshot
55.4
1.6K
131K
¥4.32 / ¥18Input/Output
154
gemini-2.0-flash-001
Google
55.1
3.8K
1.05M
¥1.08 / ¥4.32Input/Output
155
claude-sonnet-4-20250514
Anthropic
54.8
2.3K
200K
¥21.6 / ¥108Input/Output
156
qwq-32b
Alibaba
54.5
1.6K
131K
¥2.07 / ¥6.2Input/Output
157
mistral-small-2506
Mistral
54.3
993
262K
¥2.88 / ¥14.4Input/Output
158
mistral-medium-2505
Mistral
54.0
2.2K
262K
¥2.88 / ¥14.4Input/Output
159
trinity-large-preview
-
53.7
1.5K
262K
¥1.8 / ¥6.48Input/Output
160
olmo-3.1-32b-instruct
Allenai
53.4
428
200K
¥14.4 / ¥57.6Input/Output
161
gpt-4.1-mini-2025-04-14
Openai
53.1
2.5K
1.05M
¥2.88 / ¥11.5Input/Output
162
llama-3.3-nemotron-49b-super-v1
Nvidia
52.8
207
131K
¥0 / ¥0Input/Output
163
step-1o-turbo-202506
Stepfun
52.5
569
-
-
164
claude-3-7-sonnet-20250219-thinking-32k
Anthropic
52.2
2.6K
-
-
165
gemma-3-12b-it
Google
51.9
333
128K
¥1.96 / ¥1.96Input/Output
166
granite-4.1-8b
Ibm
51.6
187
131K
¥0.36 / ¥0.72Input/Output
167
qwen-plus-0125
Alibaba
51.3
627
1M
¥0.83 / ¥2.07Input/Output
168
olmo-3-32b-think
Allenai
51.0
202
128K
¥2.16 / ¥3.24Input/Output
169
glm-4.5v
Zai
50.7
234
64K
¥4.32 / ¥13Input/Output
170
gemma-3-27b-it
Google
50.4
3.2K
128K
¥2.15 / ¥2.15Input/Output
171
gpt-5-nano-high
Openai
50.1
439
400K
¥0.36 / ¥2.88Input/Output
172
gemini-1.5-pro-002
Google
49.9
6.5K
-
-
173
gemini-2.0-flash-lite-preview-02-05
Google
49.6
2.4K
1.05M
¥0.54 / ¥2.16Input/Output
174
claude-3-7-sonnet-20250219
Anthropic
49.3
3.2K
200K
¥21.6 / ¥108Input/Output
175
deepseek-v3
Deepseek
49.0
2.3K
128K
¥0 / ¥0Input/Output
176
command-a-03-2025
Cohere
48.7
3.6K
256K
¥18 / ¥72Input/Output
177
step-2-16k-exp-202412
Stepfun
48.4
543
16.4K
¥37.5 / ¥118Input/Output
178
claude-3-5-sonnet-20241022
Anthropic
48.1
8.6K
200K
¥21.6 / ¥108Input/Output
179
yi-lightning
-
47.8
3.2K
12K
¥1.44 / ¥1.44Input/Output
180
athene-v2-chat
-
47.5
2.9K
-
-
181
gpt-oss-20b
Openai
47.2
583
131K
¥0.32 / ¥1.3Input/Output
182
qwen2.5-plus-1127
Alibaba
46.9
1.3K
-
-
183
hunyuan-large-2025-02-10
Tencent
46.6
431
-
-
184
olmo-3.1-32b-think
Allenai
46.3
340
200K
¥14.4 / ¥57.6Input/Output
185
hunyuan-turbo-0110
Tencent
46.0
216
-
-
186
hunyuan-large-vision
Tencent
45.7
364
-
-
187
llama-4-maverick-17b-128e-instruct
Meta
45.5
2.6K
1M
¥1.8 / ¥6.26Input/Output
188
claude-3-5-sonnet-20240620
Anthropic
45.2
10K
200K
¥21.6 / ¥108Input/Output
189
glm-4-plus-0111
Zai
44.9
638
128K
¥72 / ¥72Input/Output
190
qwen2.5-72b-instruct
Alibaba
44.6
4.6K
131K
¥4.13 / ¥12.4Input/Output
191
hunyuan-turbos-20250226
Tencent
44.3
224
131K
¥0 / ¥0Input/Output
192
hunyuan-standard-2025-02-10
Tencent
44.0
441
-
-
193
llama-3.1-nemotron-70b-instruct
Nvidia
43.7
874
128K
¥0 / ¥0Input/Output
194
llama-4-scout-17b-16e-instruct
Meta
43.4
1.8K
128K
¥1.44 / ¥5.62Input/Output
195
gpt-4o-2024-05-13
Openai
43.1
13.3K
128K
¥36 / ¥108Input/Output
196
grok-2-2024-08-13
Xai
42.8
7.6K
1M
¥9 / ¥18Input/Output
197
gpt-4o-2024-08-06
Openai
42.5
5.8K
128K
¥18 / ¥72Input/Output
198
glm-4-plus
Zai
42.2
3K
128K
¥54 / ¥54Input/Output
199
qwen-max-0919
Alibaba
41.9
1.8K
131K
¥2.48 / ¥9.91Input/Output
200
gemini-1.5-flash-002
Google
41.6
4.1K
2M
¥0.54 / ¥2.2Input/Output
201
deepseek-v2.5-1210
Deepseek
41.3
893
1M
¥1.01 / ¥2.02Input/Output
202
gemini-1.5-pro-001
Google
41.1
9.3K
-
-
203
llama-3.1-405b-instruct-fp8
Meta
40.8
7.2K
128K
¥0 / ¥0Input/Output
204
gemini-advanced-0514
Google
40.5
5.8K
-
-
205
llama-3.1-405b-instruct-bf16
Meta
40.2
4.4K
128K
¥0 / ¥0Input/Output
206
ibm-granite-h-small
Ibm
39.9
279
-
-
207
gpt-4.1-nano-2025-04-14
Openai
39.6
534
1.05M
¥14.4 / ¥57.6Input/Output
208
deepseek-v2.5
Deepseek
39.3
3K
1M
¥1.01 / ¥2.02Input/Output
209
gemma-3n-e4b-it
Google
39.0
1.4K
128K
¥0 / ¥0Input/Output
210
grok-2-mini-2024-08-13
Xai
38.7
6.3K
1M
¥9 / ¥18Input/Output
211
mistral-small-3.1-24b-instruct-2503
Mistral
38.4
1.9K
262K
¥2.88 / ¥14.4Input/Output
212
gpt-4o-mini-2024-07-18
Openai
38.1
8K
128K
¥1.08 / ¥4.32Input/Output
213
gpt-4-1106-preview
Openai
37.8
11.5K
8.19K
¥216 / ¥432Input/Output
214
gpt-4-turbo-2024-04-09
Openai
37.5
11.8K
128K
¥72 / ¥216Input/Output
215
llama-3.3-70b-instruct
Meta
37.2
5.2K
128K
¥0 / ¥0Input/Output
216
mistral-large-2407
Mistral
37.0
5.7K
131K
¥14.4 / ¥43.2Input/Output
217
claude-3-opus-20240229
Anthropic
36.7
23K
200K
¥108 / ¥540Input/Output
218
amazon-nova-pro-v1.0
Amazon
36.4
2.7K
300K
¥5.76 / ¥23Input/Output
219
gpt-4-0125-preview
Openai
36.1
10.8K
8.19K
¥216 / ¥432Input/Output
220
mistral-large-2411
Mistral
35.8
3K
128K
¥14.4 / ¥43.2Input/Output
221
gemma-3-4b-it
Google
35.5
365
128K
¥1.44 / ¥1.44Input/Output
222
magistral-medium-2506
Mistral
35.2
582
128K
¥14.4 / ¥36Input/Output
223
phi-4
Microsoft
34.9
2.4K
128K
¥0.9 / ¥3.6Input/Output
224
llama-3.1-70b-instruct
Meta
34.6
6.5K
131K
¥2.88 / ¥2.88Input/Output
225
claude-3-5-haiku-20241022
Anthropic
34.3
5.8K
200K
¥5.76 / ¥28.8Input/Output
226
llama-3.1-tulu-3-70b
Allenai
34.0
349
-
-
227
hunyuan-standard-256k
Tencent
33.7
292
-
-
228
qwen2.5-coder-32b-instruct
Alibaba
33.4
619
131K
¥2.07 / ¥6.2Input/Output
229
mistral-small-24b-instruct-2501
Mistral
33.1
1.4K
262K
¥2.88 / ¥14.4Input/Output
230
reka-core-20240904
-
32.8
1K
-
-
231
gemini-1.5-flash-001
Google
32.6
7.3K
2M
¥0.54 / ¥2.2Input/Output
232
amazon-nova-lite-v1.0
Amazon
32.3
2.2K
300K
¥0.43 / ¥1.73Input/Output
233
athene-70b-0725
-
32.0
2.6K
-
-
234
deepseek-coder-v2
Deepseek
31.7
1.7K
1M
¥1.01 / ¥2.02Input/Output
235
qwen2-72b-instruct
Alibaba
31.4
4.2K
131K
¥4.13 / ¥12.4Input/Output
236
glm-4-0520
Zai
31.1
1.1K
128K
¥108 / ¥108Input/Output
237
llama-3.1-nemotron-51b-instruct
Nvidia
30.8
422
128K
¥0 / ¥0Input/Output
238
gpt-4-0314
Openai
30.5
6.1K
8.19K
¥216 / ¥432Input/Output
239
jamba-1.5-large
-
30.2
962
256K
¥0 / ¥0Input/Output
240
gemini-1.5-flash-8b-001
Google
29.9
4.2K
2M
¥0.54 / ¥2.2Input/Output
241
qwq-32b-preview
Alibaba
29.6
422
131K
¥2.07 / ¥6.2Input/Output
242
gemma-2-27b-it
Google
29.3
8.9K
8.19K
¥0.58 / ¥0.58Input/Output
243
nemotron-4-340b-instruct
Nvidia
29.0
2.1K
-
-
244
llama-3-70b-instruct
Meta
28.7
18.7K
8.19K
¥3.67 / ¥5.33Input/Output
245
claude-3-sonnet-20240229
Anthropic
28.4
12.4K
200K
¥21.6 / ¥108Input/Output
246
amazon-nova-micro-v1.0
Amazon
28.2
2K
128K
¥0.25 / ¥1.01Input/Output
247
c4ai-aya-expanse-32b
Cohere
27.9
3.2K
-
-
248
reka-flash-20240904
-
27.6
1.1K
65.5K
¥0.72 / ¥1.44Input/Output
249
gpt-4-0613
Openai
27.3
9.9K
8.19K
¥216 / ¥432Input/Output
250
command-r-plus-08-2024
Cohere
27.0
1.2K
128K
¥18 / ¥72Input/Output
251
gemma-2-9b-it-simpo
-
26.7
1.1K
8.19K
¥1.44 / ¥1.44Input/Output
252
olmo-2-0325-32b-instruct
Allenai
26.4
336
-
-
253
gemma-2-9b-it
Google
26.1
6.3K
8.19K
¥1.44 / ¥1.44Input/Output
254
llama-3.1-tulu-3-8b
Allenai
25.8
304
-
-
255
qwen1.5-110b-chat
Alibaba
25.5
2.9K
-
-
256
mistral-large-2402
Mistral
25.2
7K
262K
¥2.88 / ¥14.4Input/Output
257
claude-3-haiku-20240307
Anthropic
24.9
13.5K
200K
¥1.8 / ¥9Input/Output
258
granite-3.1-2b-instruct
Ibm
24.6
341
-
-
259
yi-1.5-34b-chat
-
24.3
2.6K
-
-
260
ministral-8b-2410
Mistral
24.0
543
128K
¥0.72 / ¥0.72Input/Output
261
internlm2_5-20b-chat
-
23.8
1.1K
-
-
262
granite-3.1-8b-instruct
Ibm
23.5
320
-
-
263
llama-3.1-8b-instruct
Meta
23.2
6K
131K
¥0.79 / ¥0.79Input/Output
264
mixtral-8x22b-instruct-v0.1
Mistral
22.9
6K
64K
¥14.4 / ¥43.2Input/Output
265
mistral-medium
Mistral
22.6
3.8K
262K
¥2.88 / ¥14.4Input/Output
266
c4ai-aya-expanse-8b
Cohere
22.3
1.1K
-
-
267
phi-3-medium-4k-instruct
Microsoft
22.0
2.7K
4.1K
¥1.22 / ¥4.9Input/Output
268
qwen1.5-72b-chat
Alibaba
21.7
4.5K
-
-
269
reka-flash-21b-20240226-online
-
21.4
1.9K
-
-
270
command-r-08-2024
Cohere
21.1
1.4K
128K
¥18 / ¥72Input/Output
271
command-r-plus
Cohere
20.8
8.8K
128K
¥18 / ¥72Input/Output
272
qwen1.5-32b-chat
Alibaba
20.5
2.4K
-
-
273
jamba-1.5-mini
-
20.2
932
256K
¥0 / ¥0Input/Output
274
reka-flash-21b-20240226
-
19.9
3K
-
-
275
llama-3-8b-instruct
Meta
19.6
12.8K
8.19K
¥0.29 / ¥0.29Input/Output
276
phi-3-small-8k-instruct
Microsoft
19.4
1.8K
8.19K
¥1.08 / ¥4.32Input/Output
277
granite-3.0-8b-instruct
Ibm
19.1
675
-
-
278
phi-3-mini-4k-instruct-june-2024
Microsoft
18.8
1.4K
4.1K
¥0.94 / ¥3.74Input/Output
279
gemma-2-2b-it
Google
18.5
5.6K
128K
¥0 / ¥0Input/Output
280
zephyr-orpo-141b-A35b-v0.1
-
18.2
526
200K
¥108 / ¥432Input/Output
281
mixtral-8x7b-instruct-v0.1
Mistral
17.9
8.5K
32K
¥5.04 / ¥5.04Input/Output
282
gemini-pro
Google
17.6
844
1.05M
¥14.4 / ¥86.4Input/Output
283
qwen1.5-14b-chat
Alibaba
17.3
2K
-
-
284
dbrx-instruct-preview
-
17.0
3.6K
-
-
285
granite-3.0-2b-instruct
Ibm
16.7
751
-
-
286
starling-lm-7b-beta
-
16.4
1.8K
200K
¥5.4 / ¥18.7Input/Output
287
llama-3.2-3b-instruct
Meta
16.1
980
131K
¥0.22 / ¥0.35Input/Output
288
gpt-3.5-turbo-1106
Openai
15.8
1.7K
16.4K
¥7.2 / ¥14.4Input/Output
289
gpt-3.5-turbo-0125
Openai
15.5
7.5K
16.4K
¥3.6 / ¥10.8Input/Output
290
command-r
Cohere
15.2
5.9K
128K
¥18 / ¥72Input/Output
291
gemini-pro-dev-api
Google
15.0
2K
1.05M
¥14.4 / ¥86.4Input/Output
292
smollm2-1.7b-instruct
-
14.7
240
-
-
293
yi-34b-chat
-
14.4
1.7K
-
-
294
phi-3-mini-4k-instruct
Microsoft
14.1
2.2K
4.1K
¥0.94 / ¥3.74Input/Output
295
qwen1.5-7b-chat
Alibaba
13.8
578
-
-
296
wizardlm-70b
Microsoft
13.5
764
-
-
297
tulu-2-dpo-70b
-
13.2
751
-
-
298
llama-3.2-1b-instruct
Meta
12.9
976
16.4K
¥0.07 / ¥0.08Input/Output
299
gemma-1.1-7b-it
Google
12.6
2.7K
-
-
300
snowflake-arctic-instruct
-
12.3
4.2K
-
-
301
openhermes-2.5-mistral-7b
-
12.0
568
1M
¥36 / ¥180Input/Output
302
openchat-3.5-0106
-
11.7
1.5K
-
-
303
deepseek-llm-67b-chat
Deepseek
11.4
484
1M
¥1.01 / ¥2.02Input/Output
304
llama-2-70b-chat
Meta
11.1
4.1K
-
-
305
mistral-7b-instruct-v0.2
Mistral
10.9
2.2K
262K
¥2.88 / ¥14.4Input/Output
306
phi-3-mini-128k-instruct
Microsoft
10.6
2.6K
128K
¥0.94 / ¥3.74Input/Output
307
starling-lm-7b-alpha
-
10.3
1.1K
200K
¥5.4 / ¥18.7Input/Output
308
vicuna-33b
-
10.0
2.4K
-
-
309
qwen-14b-chat
Alibaba
9.7
467
32.8K
¥1.04 / ¥3.1Input/Output
310
openchat-3.5
-
9.4
826
-
-
311
llama-2-13b-chat
Meta
9.1
2K
-
-
312
llama2-70b-steerlm-chat
Nvidia
8.8
353
-
-
313
gemma-7b-it
Google
8.5
982
-
-
314
solar-10.7b-instruct-v1.0
-
8.2
529
128K
¥0 / ¥0Input/Output
315
dolphin-2.2.1-mistral-7b
-
7.9
201
262K
¥2.88 / ¥14.4Input/Output
316
codellama-34b-instruct
Meta
7.6
667
-
-
317
mpt-30b-chat
-
7.3
218
-
-
318
nous-hermes-2-mixtral-8x7b-dpo
-
7.0
464
1M
¥36 / ¥180Input/Output
319
zephyr-7b-beta
-
6.7
1.1K
-
-
320
palm-2
Google
6.5
801
-
-
321
gemma-1.1-2b-it
Google
6.2
1.2K
-
-
322
llama-2-7b-chat
Meta
5.9
1.4K
128K
¥4.03 / ¥48Input/Output
323
vicuna-13b
-
5.6
1.9K
-
-
324
stripedhyena-nous-7b
-
5.3
530
-
-
325
guanaco-33b
-
5.0
236
200K
¥14.4 / ¥57.6Input/Output
326
olmo-7b-instruct
Allenai
4.7
726
-
-
327
wizardlm-13b
Microsoft
4.4
567
-
-
328
mistral-7b-instruct
Mistral
4.1
814
262K
¥2.88 / ¥14.4Input/Output
329
gemma-2b-it
Google
3.8
525
-
-
330
qwen1.5-4b-chat
Alibaba
3.5
820
-
-
331
vicuna-7b
-
3.2
590
-
-
332
chatglm3-6b
-
2.9
466
200K
¥5.4 / ¥18.7Input/Output
333
koala-13b
-
2.6
692
-
-
334
RWKV-4-Raven-14B
-
2.3
488
-
-
335
chatglm-6b
-
2.1
466
200K
¥5.4 / ¥18.7Input/Output
336
mpt-7b-chat
-
1.8
397
-
-
337
oasst-pythia-12b
-
1.5
629
-
-
338
alpaca-13b
-
1.2
565
-
-
339
fastchat-t5-3b
-
0.9
400
-
-
340
dolly-v2-12b
-
0.6
343
-
-
341
stablelm-tuned-alpha-7b
-
0.3
294
-
-
342
llama-13b
Meta
0.0
228
-
-
Top model analysis

claude-opus-4-6 why it ranks first

claude-opus-4-6 ranks first with a percent score of 100.0 and 1.9K samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

数学行业任务排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

数学行业任务模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。