Web Agent | fufu酱のNoteBook

type

status

slug

summary

WebCanvas:Benchmarking Web Agents in Online Environments

Web agent to be practically useful, they must adapt to the continuously evolving web environment.

一种创新的web代理在线评估框架，可以有效解决web交互的动态性质

自主代理在web环境中的实时执行导航和信息检索任务的潜力

现有的挑战主要是：数据稀缺性，缺乏对某些网站上的高级操作的知识和推理能力

缺乏一个实时的数据收集以及web agent代理在线基准测试的能力

Progress-aware evaluation with key node annotation.

we introduce a novel concept termed “key nodes”

Collaborative platform for community-driven annotations.

Cost-effective maintenance to sustain evaluation validity

Despite the availability of different paths to achieve the goal, entering the specific page and performing the genre and popularity sorting are essential steps in accomplishing the task

🧘‍♂️

URL state as identifiers for key nodes rather than element interaction - enhanced the Benchmark’s robustness

Mind2Web-Live: a Real-time Online Benchmark for Web Agents

我们需要选择的是：

“action types, selector paths, element value, and element coordinates at each step” (Pan 等, 2024, p. 5) (pdf)

提出了一种新的 agent framework

包含了四个阶段：Planning, Observation, Memory and Reward

planning