Code · Web Development · HTML Leaderboard

Ranking for Web Development / HTML, based on public preference data.

Selection guide

HTML model ranking guide

Ranking for Web Development / HTML, based on public preference data.

claude-opus-4-7claude-opus-4-7-thinkingclaude-opus-4-6claude-opus-4-6-thinkingqwen3.7-max-20260517
Current DirectoryCode · Web Development · HTML
Models80
Published2026/05/25
Arena public preference evaluationOriginal leaderboard: WebDev / Webdev-htmlPublished: 2026/05/25Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
claude-opus-4-7
Anthropic
100.0
593
1M
¥36 / ¥180Input/Output
2
claude-opus-4-7-thinking
Anthropic
98.7
688
1M
¥36 / ¥180Input/Output
3
claude-opus-4-6
Anthropic
97.5
1.3K
1M
¥36 / ¥180Input/Output
4
claude-opus-4-6-thinking
Anthropic
96.2
1.2K
1M
¥36 / ¥180Input/Output
5
qwen3.7-max-20260517
Alibaba
94.9
213
1M
¥18 / ¥54Input/Output
6
glm-5.1
Zai
93.7
530
200K
¥0 / ¥0Input/Output
7
muse-spark
Meta
92.4
186
-
-
8
claude-sonnet-4-6
Anthropic
91.1
1.6K
1M
¥21.6 / ¥108Input/Output
9
gpt-5.5-xhigh (codex-harness)
Openai
89.9
494
400K
¥9 / ¥72Input/Output
10
kimi-k2.6
Moonshot
88.6
532
262K
¥6.84 / ¥28.8Input/Output
11
gpt-5.5-high (codex-harness)
Openai
87.3
523
400K
¥9 / ¥72Input/Output
12
claude-opus-4-5-20251101-thinking-32k
Anthropic
86.1
7.9K
200K
¥108 / ¥540Input/Output
13
gemini-3.1-pro-preview
Google
84.8
1.3K
1.05M
¥14.4 / ¥86.4Input/Output
14
gpt-5.5 (codex-harness)
Openai
83.5
530
400K
¥9 / ¥72Input/Output
15
gemini-3.5-flash
Google
82.3
324
1.05M
¥10.8 / ¥64.8Input/Output
16
qwen3.6-max-preview
Alibaba
81.0
311
246K
¥9.5 / ¥56.9Input/Output
17
claude-opus-4-5-20251101
Anthropic
79.7
8.4K
200K
¥36 / ¥180Input/Output
18
mimo-v2.5-pro
Xiaomi
78.5
562
1.05M
¥7.2 / ¥21.6Input/Output
19
gpt-5.4-medium (codex-harness)
Openai
77.2
165
400K
¥9 / ¥72Input/Output
20
deepseek-v4-pro-thinking
Deepseek
75.9
507
1M
¥3.13 / ¥6.26Input/Output
21
qwen3.6-plus
Alibaba
74.7
786
1M
¥3.6 / ¥21.6Input/Output
22
gemini-3-pro
Google
73.4
13.8K
1.05M
¥14.4 / ¥86.4Input/Output
23
gpt-5.4-high (codex-harness)
Openai
72.2
160
400K
¥9 / ¥72Input/Output
24
glm-5
Zai
70.9
804
205K
¥7.2 / ¥23Input/Output
25
glm-4.7
Zai
69.6
4.8K
205K
¥0 / ¥0Input/Output
26
mimo-v2-pro
Xiaomi
68.4
794
1.05M
¥7.2 / ¥21.6Input/Output
27
gemini-3-flash
Google
67.1
9.2K
1.05M
¥3.6 / ¥21.6Input/Output
28
gpt-5.4-mini-high
Openai
65.8
672
400K
¥5.4 / ¥32.4Input/Output
29
mimo-v2.5
Xiaomi
64.6
451
1.05M
¥2.88 / ¥14.4Input/Output
30
gpt-5.3-codex (codex-harness)
Openai
63.3
394
400K
¥9 / ¥72Input/Output
31
kimi-k2.5-thinking
Moonshot
62.0
1.6K
262K
¥4.32 / ¥21.6Input/Output
32
gpt-5.3-codex (codex-harness)
Openai
60.8
360
400K
¥9 / ¥72Input/Output
33
kimi-k2.5-instant
Moonshot
59.5
589
262K
¥4.32 / ¥21.6Input/Output
34
gpt-5.2
Openai
58.2
1.5K
400K
¥12.6 / ¥101Input/Output
35
minimax-m2.7
Minimax
57.0
786
205K
¥0 / ¥0Input/Output
36
qwen3.5-397b-a17b
Alibaba
55.7
1.2K
262K
¥3.1 / ¥18.6Input/Output
37
minimax-m2.5
Minimax
54.4
1K
205K
¥0 / ¥0Input/Output
38
minimax-m2.1-preview
Minimax
53.2
6.8K
205K
¥0 / ¥0Input/Output
39
gpt-5-medium
Openai
51.9
3.8K
400K
¥9 / ¥72Input/Output
40
gemini-3-flash (thinking-minimal)
Google
50.6
6.5K
1.05M
¥3.6 / ¥21.6Input/Output
41
gpt-5.1-medium
Openai
49.4
6.1K
400K
¥9 / ¥72Input/Output
42
claude-sonnet-4-5-20250929-thinking-32k
Anthropic
48.1
11.3K
200K
¥21.6 / ¥108Input/Output
43
claude-opus-4-1-20250805
Anthropic
46.8
8.5K
200K
¥108 / ¥540Input/Output
44
claude-sonnet-4-5-20250929
Anthropic
45.6
12.9K
200K
¥21.6 / ¥108Input/Output
45
qwen3.5-27b
Alibaba
44.3
928
262K
¥2.16 / ¥17.3Input/Output
46
grok-4.20-beta-0309-reasoning
Xai
43.0
856
2M
¥14.4 / ¥43.2Input/Output
47
deepseek-v3.2-thinking
Deepseek
41.8
4K
128K
¥2.09 / ¥3.1Input/Output
48
gemma-4-31b
Google
40.5
371
262K
¥3.24 / ¥7.2Input/Output
49
glm-4.6
Zai
39.2
8.3K
205K
¥4.32 / ¥15.8Input/Output
50
grok-4.3
Xai
38.0
447
1M
¥9 / ¥18Input/Output
51
mimo-v2-flash (non-thinking)
Xiaomi
36.7
4.1K
262K
¥0.72 / ¥2.16Input/Output
52
gpt-5.1
Openai
35.4
10K
400K
¥9 / ¥72Input/Output
53
hunyuan-hy3-preview
Tencent
34.2
189
256K
¥0 / ¥0Input/Output
54
gemma-4-26b-a4b
Google
32.9
202
262K
¥0.94 / ¥2.88Input/Output
55
qwen3.5-122b-a10b
Alibaba
31.6
990
262K
¥2.88 / ¥23Input/Output
56
mimo-v2-flash (thinking)
Xiaomi
30.4
1.2K
262K
¥0.72 / ¥2.16Input/Output
57
gpt-5.2-codex
Openai
29.1
3.1K
400K
¥12.6 / ¥101Input/Output
58
gpt-5.1-codex
Openai
27.8
6.2K
400K
¥9 / ¥72Input/Output
59
kimi-k2-thinking-turbo
Moonshot
26.6
10K
262K
¥17.3 / ¥72Input/Output
60
qwen3.5-35b-a3b
Alibaba
25.3
251
262K
¥1.8 / ¥14.4Input/Output
61
minimax-m2
Minimax
24.1
8.4K
197K
¥0 / ¥0Input/Output
62
claude-haiku-4-5-20251001
Anthropic
22.8
11.6K
200K
¥7.2 / ¥36Input/Output
63
deepseek-v3.2
Deepseek
21.5
5.2K
128K
¥2.09 / ¥3.1Input/Output
64
qwen3.5-flash
Alibaba
20.3
196
1M
¥1.24 / ¥12.4Input/Output
65
deepseek-v3.2-exp
Deepseek
19.0
4.9K
128K
¥0 / ¥0Input/Output
66
qwen3-coder-480b-a35b-instruct
Alibaba
17.7
10.8K
262K
¥6.2 / ¥24.8Input/Output
67
trinity-large-thinking
-
16.5
197
262K
¥1.8 / ¥6.48Input/Output
68
gemini-3.1-flash-lite-preview
Google
15.2
1.1K
1.05M
¥1.8 / ¥10.8Input/Output
69
KAT-Coder-Pro-V1
-
13.9
1.9K
256K
¥0.22 / ¥8.64Input/Output
70
gpt-5.1-codex-mini
Openai
12.7
1.4K
400K
¥1.8 / ¥14.4Input/Output
71
grok-4-1-fast-reasoning
Xai
11.4
5.5K
2M
¥1.44 / ¥3.6Input/Output
72
mistral-large-3
Mistral
10.1
1K
262K
¥3.6 / ¥10.8Input/Output
73
granite-4.1-8b
Ibm
8.9
229
131K
¥0.36 / ¥0.72Input/Output
74
grok-4.1-thinking
Xai
7.6
1.2K
200K
¥14.4 / ¥72Input/Output
75
devstral-2
Mistral
6.3
1.3K
262K
¥2.88 / ¥14.4Input/Output
76
gemini-2.5-pro
Google
5.1
3.3K
1.05M
¥9 / ¥72Input/Output
77
mercury-2
Inception Ai
3.8
100
128K
¥1.8 / ¥5.4Input/Output
78
grok-4-fast-reasoning
Xai
2.5
933
2M
¥1.44 / ¥3.6Input/Output
79
grok-code-fast-1
Xai
1.3
982
256K
¥1.44 / ¥10.8Input/Output
80
devstral-medium-2507
Mistral
0.0
992
262K
¥2.88 / ¥14.4Input/Output
Top model analysis

claude-opus-4-7 why it ranks first

claude-opus-4-7 ranks first with a percent score of 100.0 and 593 samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

HTML 网页排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

HTML 网页模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。