AI-Driven GUI Automation: A Comprehensive Review of Methods, Systems, and Challenges (2021–2025)
DOI:
https://doi.org/10.47392/IRJAEH.2026.0415Keywords:
GUI Automation, Windows UI Agents, Robotic Process Automation (RPA), Multimodal Large Language Models (MLLM), Computer Vision, Software Testing, Autonomous AgentsAbstract
This paper reviews recent research on artificial intelligence techniques used for Graphical User Interface (GUI) automation. The analysis is based on 45 peer-reviewed studies published between 2021 and 2025. Over the past few years, the field has gradually moved away from rigid rule-based scripts and Reinforcement Learning (RL) approaches toward more intelligent and autonomous systems powered by Multimodal Large Language Models (MLLMs). To better understand current developments, this review organizes existing methods into three main categories: accessibility-tree–based approaches, vision-based methods, and hybrid neuro-symbolic techniques that integrate both structural interface data and visual information. The study also compares automation research across different platforms, highlighting the contrast between relatively stable mobile ecosystems and the more complex and fragmented Windows desktop environment, which is often considered an open-world setting. In addition, several research challenges are identified, including the limited availability of Windows-focused datasets and the difficulty of achieving efficient real-time inference. The paper concludes by outlining future research opportunities and suggesting directions for building privacy-aware and practical desktop automation agents.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Research Journal on Advanced Engineering Hub (IRJAEH)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
.