Chat · Vision · Humor Leaderboard

Ranking for Vision / Humor, based on public preference data.

Selection guide

Humor model ranking guide

Ranking for Vision / Humor, based on public preference data.

muse-sparkgemini-3-proclaude-opus-4-6gemini-3.1-pro-previewclaude-opus-4-6-thinking
Current DirectoryChat · Vision · Humor
Models60
Published2026/05/18
Arena public preference evaluationOriginal leaderboard: Vision / HumorPublished: 2026/05/18Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
muse-spark
Meta
100.0
200
-
-
2
gemini-3-pro
Google
98.3
565
1.05M
¥14.4 / ¥86.4Input/Output
3
claude-opus-4-6
Anthropic
96.6
298
1M
¥36 / ¥180Input/Output
4
gemini-3.1-pro-preview
Google
94.9
637
1.05M
¥14.4 / ¥86.4Input/Output
5
claude-opus-4-6-thinking
Anthropic
93.2
255
1M
¥36 / ¥180Input/Output
6
claude-opus-4-7
Anthropic
91.5
269
1M
¥36 / ¥180Input/Output
7
claude-opus-4-7-thinking
Anthropic
89.8
270
1M
¥36 / ¥180Input/Output
8
gemini-3-flash
Google
88.1
714
1.05M
¥3.6 / ¥21.6Input/Output
9
glm-5v-turbo
Zai
86.4
231
200K
¥0 / ¥0Input/Output
10
kimi-k2.6
Moonshot
84.7
201
262K
¥6.84 / ¥28.8Input/Output
11
kimi-k2.5-thinking
Moonshot
83.1
482
262K
¥4.32 / ¥21.6Input/Output
12
dola-seed-2.0-pro
Bytedance
81.4
338
-
-
13
gemma-4-26b-a4b
Google
79.7
428
262K
¥0.94 / ¥2.88Input/Output
14
gemini-3-flash (thinking-minimal)
Google
78.0
669
1.05M
¥3.6 / ¥21.6Input/Output
15
gpt-5.4
Openai
76.3
195
1.05M
¥18 / ¥108Input/Output
16
gemini-3.1-flash-lite-preview
Google
74.6
494
1.05M
¥1.8 / ¥10.8Input/Output
17
gpt-5.5
Openai
72.9
175
1.05M
¥36 / ¥216Input/Output
18
qwen3.5-397b-a17b
Alibaba
71.2
386
262K
¥3.1 / ¥18.6Input/Output
19
mimo-v2.5
Xiaomi
69.5
275
1.05M
¥2.88 / ¥14.4Input/Output
20
grok-4.20-beta-0309-reasoning
Xai
67.8
302
2M
¥14.4 / ¥43.2Input/Output
21
qwen3.5-27b
Alibaba
66.1
301
262K
¥2.16 / ¥17.3Input/Output
22
gemini-2.5-flash-preview-09-2025
Google
64.4
216
1M
¥2.16 / ¥18Input/Output
23
grok-4.20-multi-agent-beta-0309
Xai
62.7
308
2M
¥14.4 / ¥43.2Input/Output
24
gemma-4-31b
Google
61.0
672
262K
¥3.24 / ¥7.2Input/Output
25
qwen3.5-122b-a10b
Alibaba
59.3
315
262K
¥2.88 / ¥23Input/Output
26
gemini-2.5-pro
Google
57.6
3K
1.05M
¥9 / ¥72Input/Output
27
gpt-5.4-high
Openai
55.9
195
1.05M
¥18 / ¥108Input/Output
28
gpt-5.4-mini-high
Openai
54.2
276
400K
¥5.4 / ¥32.4Input/Output
29
chatgpt-4o-latest-20250326
Openai
52.5
773
128K
¥18 / ¥72Input/Output
30
gpt-5.1
Openai
50.8
333
400K
¥9 / ¥72Input/Output
31
claude-sonnet-4-6
Anthropic
49.2
314
1M
¥21.6 / ¥108Input/Output
32
gpt-5.2-chat-latest-20260210
Openai
47.5
387
400K
¥12.6 / ¥101Input/Output
33
gpt-5.1-high
Openai
45.8
290
400K
¥9 / ¥72Input/Output
34
gpt-5.2-high
Openai
44.1
510
400K
¥12.6 / ¥101Input/Output
35
grok-4-1-fast-reasoning
Xai
42.4
345
2M
¥1.44 / ¥3.6Input/Output
36
gemini-2.5-flash
Google
40.7
2.1K
1.05M
¥2.16 / ¥18Input/Output
37
o3-2025-04-16
Openai
39.0
1.7K
200K
¥14.4 / ¥57.6Input/Output
38
gpt-5-chat
Openai
37.3
1.5K
400K
¥9 / ¥72Input/Output
39
gpt-5-high
Openai
35.6
1.3K
400K
¥9 / ¥72Input/Output
40
gpt-5-mini-high
Openai
33.9
1.1K
400K
¥1.8 / ¥14.4Input/Output
41
qwen3-vl-235b-a22b-instruct
Alibaba
32.2
457
128K
¥2.16 / ¥8.64Input/Output
42
grok-4-0709
Xai
30.5
1.2K
256K
¥21.6 / ¥108Input/Output
43
o4-mini-2025-04-16
Openai
28.8
1.5K
200K
¥7.92 / ¥31.7Input/Output
44
gpt-4.1-2025-04-14
Openai
27.1
1.4K
1.05M
¥14.4 / ¥57.6Input/Output
45
gpt-5.2
Openai
25.4
492
400K
¥12.6 / ¥101Input/Output
46
gemini-2.5-flash-lite-preview-06-17-thinking
Google
23.7
1.4K
65.5K
¥0.72 / ¥2.88Input/Output
47
ernie-5.0-preview-1220
Baidu
22.0
169
128K
¥7.92 / ¥14.4Input/Output
48
gpt-4.1-mini-2025-04-14
Openai
20.3
1.4K
1.05M
¥2.88 / ¥11.5Input/Output
49
mimo-v2-omni
Xiaomi
18.6
235
262K
¥2.88 / ¥14.4Input/Output
50
gemini-2.5-flash-lite-preview-09-2025-no-thinking
Google
16.9
214
1.05M
¥0.72 / ¥2.88Input/Output
51
mistral-medium-2508
Mistral
15.3
1.5K
262K
¥2.88 / ¥14.4Input/Output
52
gemma-3-27b-it
Google
13.6
780
128K
¥2.15 / ¥2.15Input/Output
53
gpt-5.4-nano-high
Openai
11.9
281
400K
¥1.44 / ¥9Input/Output
54
gemini-2.0-flash-001
Google
10.2
228
1.05M
¥1.08 / ¥4.32Input/Output
55
mistral-small-2506
Mistral
8.5
499
262K
¥2.88 / ¥14.4Input/Output
56
mistral-medium-2505
Mistral
6.8
448
262K
¥2.88 / ¥14.4Input/Output
57
mistral-small-3.1-24b-instruct-2503
Mistral
5.1
967
262K
¥2.88 / ¥14.4Input/Output
58
gpt-5-nano-high
Openai
3.4
179
400K
¥0.36 / ¥2.88Input/Output
59
llama-4-scout-17b-16e-instruct
Meta
1.7
196
128K
¥1.44 / ¥5.62Input/Output
60
llama-4-maverick-17b-128e-instruct
Meta
0.0
222
1M
¥1.8 / ¥6.26Input/Output
Top model analysis

muse-spark why it ranks first

muse-spark ranks first with a percent score of 100.0 and 200 samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

幽默表达排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

幽默表达模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。