AI models invariably encounter ambiguous situations that they struggle to respond to with instructions alone. That’s problematic for autonomous agents tasked with, say, navigating an apartment, because they run the risk of becoming stuck when presented with several paths.
To solve this, researchers at Amazon’s Alexa AI division developed a framework that endows agents with the ability to ask for help in certain situations. Using what’s called a model-confusion-based method, the agents ask questions based on their level of confusion as determined by a predefined confidence threshold, which the researchers claim boosts the agents’ success by at least 15%.
“Consider the situation in which you want a robot assistant to get your wallet on the bed … with two doors in the scene and an instruction that only tells it to walk through the doorway,” wrote the team in a preprint paper describing their work. “In this situation, it is clearly difficult for the robot to know exactly through which door to enter. If, however, the robot is able to discuss the situation with the user, the situational ambiguity can be resolved.”
The team’s framework employs two agent models: Model Confusion, which mimics human user behavior under confusion, and Action Space Augmentation, a more sophisticated algorithm that automatically learns to ask only necessary questions at the right time during navigation. Human interaction data is used to fine-tune the second model further so that it becomes familiar with the environment.
Whenever the agent — in the case of this study, a robot navigating a simulated home — becomes lost during navigation, it sends out the signal “I am lost, please help me!” to a user and asks for help. As the user provides answers to the robot’s subsequent questions, the Action Space Augmentation corrects originally wrong trajectories, using the feedback to prevent future mistakes of the same kind.
The researchers compiled a data set containing 21,567 navigation instructions (14,025 of which were used for training; 1,020 for validation in seen environments; and 2,349 in unseen environments), with an instruction vocabulary consisting of around 3,100 words. They evaluated the robot on both its success rate and the number of steps taken, where “success” constituted a navigation error of less than 3 meters.
The team reports that the robot managed to adjust dynamically to unclear and erroneous human responses and that their proposed strategy was “substantially” more data-efficient than previously proposed pre-exploration techniques that involve robots exploring environments on their own. “We are among the first to introduce human-agent interaction in the instruction-based navigation task,” they wrote. “[This] data augmentation method … is useful in a continual learning scenario [because the] agent can improve its performance continually in customers’ home[s].”
The work might inform Amazon’s long-rumored home robot, which Bloomberg described in a report last year as akin to the Echo Show — albeit with wheels. Code-named Vesta after the Roman goddess of the hearth, it’s said to pack far-field microphones and speakers that enable it to understand and respond to the thousands of commands Alexa recognizes and to be able to navigate through homes using computer vision and techniques like simultaneous localization and mapping.