Researchers often resort to reinforcement learning to teach AI agents new tasks, rewarding them as they make progress toward a goal, such as opening a kitchen cabinet. But designing these reward systems can be a time-consuming and intricate process, especially for complex tasks involving multiple steps.
To address this challenge, a team from MIT, Harvard University, and the University of Washington has introduced a novel approach to reinforcement learning. Unlike traditional methods that rely on expert-designed reward functions, this new approach uses crowdsourced feedback from non-experts to guide the AI agent’s learning process.
The new method, known as HuGE (Human Guided Exploration), capitalizes on non-expert feedback to direct the agent’s exploration, allowing it to learn more rapidly. Lead author Marcel Torne explains that while previous attempts at using non-expert feedback led to noisy data and the failure of other methods, HuGE separates the process into two parts. It employs a goal selector algorithm continuously updated with human feedback, guiding the agent’s exploration without directly optimizing a reward function.
In this system, non-experts provide feedback that incrementally guides the agent toward its goal without being directly used as a reward function. This two-part process enables the agent to explore more promising areas and continue learning autonomously, even with infrequent or delayed feedback.
Torne and collaborators tested HuGE on simulated tasks involving complex sequences and real-world experiments with robotic arms. In both cases, HuGE accelerated the learning process compared to traditional methods. Interestingly, data crowdsourced from non-experts outperformed synthetic data produced by researchers, demonstrating the method’s scalability and effectiveness.
Moreover, the researchers enhanced HuGE to enable autonomous learning without human resets, allowing the agent to continue learning by, for instance, opening and closing a cabinet. They emphasize the importance of aligning AI agents with human values and aim to refine HuGE further by incorporating natural language and physical interactions for learning and extending its application to teach multiple agents simultaneously.
This research, partially funded by the MIT-IBM Watson AI Lab, introduces a promising direction for accelerating AI learning through human-guided exploration, potentially simplifying the teaching process for complex tasks.