GUI Agent Evaluation
The thesis topic focuses on the systematic assessment of intelligent GUI agents that interact with graphical user interfaces in a human-like manner. These agents, often driven by machine learning or multimodal models, are designed to perceive visual elements on a screen, interpret interface structures, and execute actions such as clicking, typing, or navigation to accomplish tasks autonomously. The core objective of the thesis is to investigate how the performance, robustness, and generalization capabilities of such agents can be rigorously evaluated. This includes defining suitable benchmarks, metrics, and experimental protocols, as well as analyzing failure modes and limitations across different application contexts. Students will engage with questions around usability, reproducibility, and scalability of evaluation methods, potentially contributing new frameworks or empirical insights that advance the reliability and comparability of GUI agent systems.
Required knowledge: familiar with GUI agents and LLM agent evaluation, python programming and data processing, experience with CV/NLP and (multimodal) LLMs, experience with mining software repositories.