Digital simulations help robots learn real-world tasks
A novel approach to training robots using 3D scans of real environments paves the path for robust and accessible home robotics.
How does a robot learn how to load a dishwasher?
One way is through trial and error, testing different behaviors until something works. This is the concept behind reinforcement learning, where robots interact with their environment and receive feedback in the form of rewards or penalties. The robot takes an action, observes the result, and adjusts its future actions based on whether the outcome was favorable or not. Running this in real life, however, would require a lot of training time and probably leave broken plates on the floor.
Recently, a study submitted in March from the MIT Improbable AI Lab proposed a “real-to-sim-to-real” pipeline called RialTo that scans the robot’s real-world environment using just a phone, rapidly builds a digital model with accurate geometry and kinematics, and conducts reinforcement learning in the simulation before transferring the results back into the physical world.
“There's this very famous bridge called Rialto. It's one of the most famous routes in Venice,” says Marcel Torné Villasevil, the study’s first author and a current PhD student at Stanford. “Real-to-sim-to-real is like that bridge, going from real to real again, but through the simulation.”
Many robots rely on imitation learning: a human manually controls the robot to conduct a task, and the robot tries to copy those motions. This requires many demonstrations, though, and the result is not as robust with environmental disturbances. For example, if a dish is slightly out of place, an imitation-trained robot might grab for empty air instead.
The RialTo pipeline starts off with human demonstrations so the robot can understand its task. Instead of hundreds of demonstrations, it only needs fifteen. Then, the simulation kicks in.
First, existing 3D reconstruction technology scans a real environment. Then, the scene is edited, adding detailed meshes, joints, and objects with estimated physical properties, like a mug with mass or a cabinet drawer with friction. The virtual robot learns to interact with this reconstructed geometry, generating thousands of example valid motion paths, replacing the demonstrations in traditional imitation learning. These examples are then used to train the real robot in a physical environment.
“So just providing a few demos and sparse rewards, you can already train robust policies, or strategies for completing tasks,” says Torné. With this approach, robots can adapt to new situations without extensive human effort.
It’s not a perfect system. Sometimes, the robot abuses the simulation; for example, it jams itself into the bottom of a microwave to exploit a slightly misaligned hinge rather than opening the microwave door properly. But, by using simulations alongside the initial real-world demonstrations, the robot can learn to correct its behavior. The full RialTo pipeline results in a 67% improvement in policy robustness over imitation learning.
Martin Huynh, an MIT graduate student studying autonomous robotics and path optimization who was not involved in the study, says that “this learning pipeline shows significant promise for teaching robots a variety of policies with reduced human intervention.”
Huynh notes that there are still human bottlenecks such as the effort required to manually define interactive objects, but says that RialTo represents an important step in turning reality into simulation. “We may be closer than we think to having robot assistants in our homes — a prospect that is both exciting and a little unsettling, to be honest.”