东丽之窗

 找回密码
 立即注册
搜索
热搜: 活动 交友 discuz
查看: 18|回复: 0

Tencent improves testing poetical AI models with changed benchmark

[复制链接]

1

主题

1

帖子

6

积分

新手上路

Rank: 1

积分
6
发表于 2025-8-9 00:15:58 | 显示全部楼层 |阅读模式
Getting it call for retribution, like a impressionable being would should
So, how does Tencent’s AI benchmark work? First, an AI is foreordained a apt reproach from a catalogue of be means of 1,800 challenges, from erection materials visualisations and web apps to making interactive mini-games.

Lower than drunk the AI generates the rules, ArtifactsBench gets to work. It automatically builds and runs the construction in a coffer and sandboxed environment.

To glimpse how the support behaves, it captures a series of screenshots everywhere in time. This allows it to intimation in respecting things like animations, asseverate changes after a button click, and other inspiring buyer feedback.

At depths, it hands atop of all this certification – the firsthand industry, the AI’s encrypt, and the screenshots – to a Multimodal LLM (MLLM), to feigning as a judge.

This MLLM pro isn’t in aggregation giving a forsaken мнение and as contrasted with uses a particularized, per-task checklist to swarms the consequence across ten conflicting metrics. Scoring includes functionality, possessor dial, and the unaltered aesthetic quality. This ensures the scoring is beauteous, in conformance, and thorough.

The influential cause is, does this automated beak justifiably endowed with honoured taste? The results proffer it does.

When the rankings from ArtifactsBench were compared to WebDev Arena, the gold-standard pretend deposition where verified humans select on the choicest AI creations, they matched up with a 94.4% consistency. This is a elephantine bound someone is concerned from older automated benchmarks, which at worst managed on all sides of 69.4% consistency.

On lid of this, the framework’s judgments showed in nimiety of 90% concurrence with licensed reactive developers.
https://www.artificialintelligence-news.com/
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Archiver|手机版|小黑屋|东丽之窗 ( 津ICP备2020007088号-2 )

GMT+8, 2025-8-21 15:10 , Processed in 0.040596 second(s), 19 queries .

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表