Autonomous robot that interacts with humans using natural language and vision processing

  
Autonomous robot that interacts with humans using natural language and vision processing
Researchers Jared Johansen (left) and Thomas Ilyevsky (right) assess the autonomous robot, Hosh, reviewing the operating systems in the environment. The robot will locate autonomously a room, building or individual through its integrative vision and language software. Credit: Hope Sale / Purdue Research Foundation image

Purdue University researchers in the School of Electrical and Computer Engineering are developing integrative language and vision software that may enable an autonomous robot to interact with people in different environments and accomplish navigational goals.

"The project's overall goal is to tell the robot to find a particular person, room or building and have the robot interact with ordinary, untrained people to ask in natural language for directions toward a particular place," said Jeffrey Mark Siskind, an associate professor leading the research team. "To accomplish this task, the robot must operate safely in people's presence, encourage them to provide directions and use their information to find the goal."

Doctoral candidates Thomas Ilyevsky and Jared Johansen are working with Siskind to develop a robot named Hosh that can integrate graphic and language data into its navigational process in order to locate a specific place or person. The team is developing the robot through a grant funded by the National Science Foundation's National Robotics Initiative.

This robot could help self-driving cars communicate with passengers and pedestrians or could complete small-scale tasks in a business place such as delivering mail. The robot would contribute to the expected $14 billion growth of the consumer robotics industry by 2025, as projected by the Boston Consulting Group.

The robot will receive a task to locate a specific room, building or individual in a known or unknown location. Then, the robot will unite novel language and visual processing in order to navigate the environment, ask for directions, request doors to be opened or elevator buttons pushed and reach its goal.

The researchers are developing high-level software to give the robot "common sense knowledge," the ability to understand objects and environments with human-level intuition, enabling it to recognize navigational conventions. For example, the robot will incorporate both spoken statements and physical gestures into its navigation process.

Autonomous robot that interacts with humans using natural language and vision processing

The autonomous robot, named Hosh, will navigate environments and interact with people. Shown in the top photo is the robot’s computer display including a map, camera view and additional operating software. The bottom shows researchers Jeffrey Mark Siskind (left), Thomas Ilyevsky (center) and Jared Johansen (right) through the robot’s computer vision. Credit: Hope Sale / Purdue Research Foundation image
"The robot needs human level intuition in order to understand navigational conventions," Ilyevsky said. "This is where common sense knowledge comes in. The robot should know that odd and even numbered rooms sit across from each other in a hallway or that Room 317 should be on the building's third floor."

To develop the robot's common sense knowledge, the researches will develop integrative natural language processing and computer vision software. Typically, natural language processing will enable the robot to communicate with people while the computer vision software will enable the robot to navigate its environment. However, the researchers are advancing the software to inform each other as the robot moves.

"The robot needs to understand language in a visual context and vision in a language context," Siskind said. "For example, while locating a specific person, the robot might receive information in a comment or physical gesture and must understand both within the context of its navigational goals."

For instance, if the response is "Check for that person in Room 300," the robot will need to process the statement in a visual context and identify what room it is currently in as well as the best route to reach Room 300. If the response is "That person is over there" with a physical cue, the robot will need to integrate the visual cue with the statement's meaning in order to identify Person A.

"Interacting with humans is an unsolved problem in artificial intelligence," Johansen said. "For this project, we are trying to help the robot to understand certain conventions it might run into or to anticipate that a dozen different responses could all have the same meaning."

"We expect this technology to be really big, because the industry of autonomous robots and self-driving cars is becoming very big," Siskind said. "The technology could be adapted into self-driving cars, allowing the cars to ask for directions or passengers to request a specific destination, just like human drivers do."

The researchers expect to send the robot on autonomous missions with growing complexity as the technology progresses. First, the robot will learn to navigate indoors on a single floor. Then, to move to other floors and buildings, it will ask people to operate the elevator or open doors for it. The researchers hope to progress to outdoor missions in the spring.

Explore further: How game theory can bring humans and robots closer together