RankModelProviderScore (0-100)SamplesContextPrice / 1M tokens
1
A
claude-opus-4-7 Anthropic
100.0
4.8K
1M
¥36 / ¥180Input/Output
2
A
claude-opus-4-6-thinking Anthropic
98.8
4.9K
1M
¥36 / ¥180Input/Output
3
A
claude-opus-4-7-thinking Anthropic
97.5
4.5K
1M
¥36 / ¥180Input/Output
4
A
claude-opus-4-6 Anthropic
96.3
6.1K
1M
¥36 / ¥180Input/Output
5
G
gemini-3-pro Google
95.1
8.1K
1.05M
¥14.4 / ¥86.4Input/Output
6
M
muse-spark Meta
93.8
3.2K
-
-
7
G
gemini-3.1-pro-preview Google
92.6
11.4K
1.05M
¥14.4 / ¥86.4Input/Output
8
O
gpt-5.4-high Openai
91.4
4.2K
1.05M
¥18 / ¥108Input/Output
9
O
gpt-5.5 Openai
90.1
3.3K
1.05M
¥36 / ¥216Input/Output
10
O
gpt-5.5-high Openai
88.9
3K
1.05M
¥36 / ¥216Input/Output
11
A
claude-sonnet-4-6 Anthropic
87.7
6.3K
1M
¥21.6 / ¥108Input/Output
12
O
gpt-5.4 Openai
86.4
3.9K
1.05M
¥18 / ¥108Input/Output
13
G
gemini-3-flash Google
85.2
14.2K
1.05M
¥3.6 / ¥21.6Input/Output
14
M
kimi-k2.6 Moonshot
84.0
4K
262K
¥6.84 / ¥28.8Input/Output
15
B
dola-seed-2.0-pro Bytedance
82.7
6.2K
-
-
16
A
qwen3.7-plus-preview Alibaba
81.5
2.7K
131K
¥3.6 / ¥21.6Input/Output
17
O
gpt-5.2-chat-latest-20260210 Openai
80.2
8.5K
400K
¥12.6 / ¥101Input/Output
18
M
kimi-k2.5-thinking Moonshot
79.0
9.7K
262K
¥4.32 / ¥21.6Input/Output
19
A
qwen3.5-397b-a17b Alibaba
77.8
8K
262K
¥3.1 / ¥18.6Input/Output
20
G
gemini-3-flash (thinking-minimal) Google
76.5
12.9K
1.05M
¥3.6 / ¥21.6Input/Output
21
G
gemini-2.5-pro Google
75.3
30.8K
1.05M
¥9 / ¥72Input/Output
22
G
gemma-4-31b Google
74.1
11.9K
262K
¥3.24 / ¥7.2Input/Output
23
G
gemma-4-26b-a4b Google
72.8
7.3K
262K
¥0.94 / ¥2.88Input/Output
24
Z
glm-5v-turbo Zai
71.6
5.2K
200K
¥0 / ¥0Input/Output
25
M
kimi-k2.5-instant Moonshot
70.4
2.6K
262K
¥4.32 / ¥21.6Input/Output
26
X
grok-4.20-beta-0309-reasoning Xai
69.1
6.6K
2M
¥14.4 / ¥43.2Input/Output
27
G
gemini-2.5-flash-preview-09-2025 Google
67.9
2.8K
1M
¥2.16 / ¥18Input/Output
28
O
gpt-5.2-high Openai
66.7
10.2K
400K
¥12.6 / ¥101Input/Output
29
O
gpt-5.4-mini-high Openai
65.4
5.9K
400K
¥5.4 / ¥32.4Input/Output
30
A
qwen3-vl-235b-a22b-instruct Alibaba
64.2
7.6K
128K
¥2.16 / ¥8.64Input/Output
31
X
grok-4.20-multi-agent-beta-0309 Xai
63.0
6.1K
2M
¥14.4 / ¥43.2Input/Output
32
O
gpt-5.5-instant Openai
61.7
2.7K
400K
¥9 / ¥72Input/Output
33
MI
mimo-v2.5 Xiaomi
60.5
4.9K
1.05M
¥2.88 / ¥14.4Input/Output
34
O
gpt-5.1-high Openai
59.3
5.7K
400K
¥9 / ¥72Input/Output
35
B
ernie-5.0-preview-1220 Baidu
58.0
1.9K
128K
¥7.92 / ¥14.4Input/Output
36
O
chatgpt-4o-latest-20250326 Openai
56.8
11.2K
128K
¥18 / ¥72Input/Output
37
X
grok-4.3 Xai
55.6
2.8K
1M
¥9 / ¥18Input/Output
38
G
gemini-3.1-flash-lite-preview Google
54.3
9.5K
1.05M
¥1.8 / ¥10.8Input/Output
39
A
qwen3.5-122b-a10b Alibaba
53.1
6.9K
262K
¥2.88 / ¥23Input/Output
40
O
gpt-5-chat Openai
51.9
10.8K
400K
¥9 / ¥72Input/Output
41
A
qwen3.5-27b Alibaba
50.6
6.5K
262K
¥2.16 / ¥17.3Input/Output
42
O
gpt-5.1 Openai
49.4
6.5K
400K
¥9 / ¥72Input/Output
43
G
gemini-2.5-flash Google
48.1
25K
1.05M
¥2.16 / ¥18Input/Output
44
A
qwen-vl-max-2025-08-13 Alibaba
46.9
1.2K
131K
¥1.66 / ¥4.13Input/Output
45
O
gpt-5.2 Openai
45.7
10.9K
400K
¥12.6 / ¥101Input/Output
46
MI
mimo-v2-omni Xiaomi
44.4
5.2K
262K
¥2.88 / ¥14.4Input/Output
47
O
o3-2025-04-16 Openai
43.2
14.7K
200K
¥14.4 / ¥57.6Input/Output
48
O
gpt-5-high Openai
42.0
10.9K
400K
¥9 / ¥72Input/Output
49
O
gpt-4.1-2025-04-14 Openai
40.7
11.6K
1.05M
¥14.4 / ¥57.6Input/Output
50
A
qwen3-vl-235b-a22b-thinking Alibaba
39.5
1.4K
131K
¥2.06 / ¥8.26Input/Output
51
O
gpt-5.4-nano-high Openai
38.3
5.9K
400K
¥1.44 / ¥9Input/Output
52
G
gemini-2.5-flash-lite-preview-09-2025-no-thinking Google
37.0
2.8K
1.05M
¥0.72 / ¥2.88Input/Output
53
O
gpt-5-mini-high Openai
35.8
8.2K
400K
¥1.8 / ¥14.4Input/Output
54
O
o4-mini-2025-04-16 Openai
34.6
12.1K
200K
¥7.92 / ¥31.7Input/Output
55
A
claude-sonnet-4-20250514-thinking-32k Anthropic
33.3
744
200K
¥21.6 / ¥108Input/Output
56
A
claude-opus-4-20250514-thinking-16k Anthropic
32.1
852
200K
¥108 / ¥540Input/Output
57
X
grok-4-0709 Xai
30.9
10K
256K
¥21.6 / ¥108Input/Output
58
O
gpt-4.1-mini-2025-04-14 Openai
29.6
10.7K
1.05M
¥2.88 / ¥11.5Input/Output
59
G
gemini-2.5-flash-lite-preview-06-17-thinking Google
28.4
10.7K
65.5K
¥0.72 / ¥2.88Input/Output
60
X
grok-4-1-fast-reasoning Xai
27.2
7.3K
2M
¥1.44 / ¥3.6Input/Output
61
A
claude-3-7-sonnet-20250219-thinking-32k Anthropic
25.9
887
-
-
62
TE
hunyuan-vision-1.5-thinking Tencent
24.7
1.4K
-
-
63
A
claude-opus-4-20250514 Anthropic
23.5
1.4K
200K
¥108 / ¥540Input/Output
64
ST
step-1o-turbo-202506 Stepfun
22.2
1.3K
-
-
65
MA
mistral-medium-2508 Mistral
21.0
13.5K
262K
¥2.88 / ¥14.4Input/Output
66
A
claude-sonnet-4-20250514 Anthropic
19.8
1.1K
200K
¥21.6 / ¥108Input/Output
67
Z
glm-4.6v Zai
18.5
1.4K
128K
¥2.16 / ¥6.48Input/Output
68
ST
step-3 Stepfun
17.3
1.2K
65.5K
¥1.8 / ¥4.68Input/Output
69
TE
hunyuan-large-vision Tencent
16.0
815
-
-
70
G
gemma-3-27b-it Google
14.8
6.4K
128K
¥2.15 / ¥2.15Input/Output
71
A
claude-3-7-sonnet-20250219 Anthropic
13.6
915
200K
¥21.6 / ¥108Input/Output
72
O
gpt-5-nano-high Openai
12.3
1.7K
400K
¥0.36 / ¥2.88Input/Output
73
MA
mistral-medium-2505 Mistral
11.1
4.7K
262K
¥2.88 / ¥14.4Input/Output
74
Z
glm-4.5v Zai
9.9
1.2K
64K
¥4.32 / ¥13Input/Output
75
G
gemini-2.0-flash-001 Google
8.6
3.8K
1.05M
¥1.08 / ¥4.32Input/Output
76
M
llama-4-maverick-17b-128e-instruct Meta
7.4
3.2K
1M
¥1.8 / ¥6.26Input/Output
77
A
claude-3-5-sonnet-20241022 Anthropic
6.2
967
200K
¥21.6 / ¥108Input/Output
78
MA
mistral-small-2506 Mistral
4.9
4.2K
262K
¥2.88 / ¥14.4Input/Output
79
MA
mistral-small-3.1-24b-instruct-2503 Mistral
3.7
7.4K
262K
¥2.88 / ¥14.4Input/Output
80
M
llama-4-scout-17b-16e-instruct Meta
2.5
2.9K
128K
¥1.44 / ¥5.62Input/Output
81
A
claude-3-5-haiku-20241022 Anthropic
1.2
934
200K
¥5.76 / ¥28.8Input/Output
82
AI
molmo-2-8b Allenai
0.0
791
-
-
Top model analysisclaude-opus-4-7 why it ranks first
claude-opus-4-7 ranks first with a percent score of 100.0 and 4.8K samples. Use it as the first option for this leaderboard, then compare price, context and availability.
How to chooseDo not only look at rank #1
Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.
Related leaderboardsCompare adjacent capabilities