Policy Learning with Pool Shots and Chopsticks

Simple trial and error is one the most common ways that we learn how to perform a task. While we can learn a great deal by watching and imitating others, it is often through our own repeated experimentation that we learn how -- or how not -- to perform a given task. Peter Pastor, a PhD student from the Computational Learning and Motor Control Lab at the University of Southern California (USC), has spent his last two internships here working on helping the PR2 to learn new tasks by imitating humans, and then improving that skill through trial and error.

Last summer, Peter's worked focused on imitation learning. The PR2 robot learned to grasp, pour, and place beverage containers by watching a person perform the task a single time. While it could perform certain tasks well with this technique, many tasks require learning about information that cannot necessarily be seen. For example, when you open a door, the robot cannot tell how hard to push or pull. This summer, Peter extended his work to use reinforcement learning algorithms that enable the PR2 to improve its skills over time.

Peter focused on teaching the PR2 two tasks with this technique: making a pool shot and flipping a box upright using chopsticks. With both tasks, the PR2 robot first learned the task via multiple demonstrations by a person. With the pool shot, the robot was able to learn a more accurate and powerful shot after 20 minutes of experimentation. With the chopstick task, the robot was able to improve its success from 3% to 86%. To illustrate the difficulty of the task, Peter conducted an informal user study in which 10 participants were performed 20 attempts at flipping by box by guiding the robot's gripper. Their success rate was only 15%.

Peter used Dynamic Movement Primitives (DMPs) to compactly encode movement plans. The parameters of these DMPs can be learned efficiently from a single demonstration by guiding the robot's arms. These parameters then become the initialization of the reinforcement learning algorithm that updates the parameters until the system has minimized the task-specific cost function and satisfied the performance goals. This state-of-the-art reinforcement learning algorithm is called Policy Improvement using Path Integrals (PI^2). It can handle high dimensional reinforcement learning problems and can deal with almost arbitrary cost functions.

Programming robots by hand is challenging, even for experts. Peter's work aims to facilitate the encoding of new motor skills in a robot by guiding the robot's arms, enabling it to improve its skill over time through trial and error. For more information about Peter's research with DMPs, see "Learning and Generalization of Motor Skills by Learning from Demonstration" from ICRA 2009. The work that Peter did this summer has been submitted to ICRA 2011 and is currently under review. Check out his presentation slides below (download PDF). The open source code has been written in collaboration with Mrinal Kalakrishnan and is available in the ROS policy_learning stack.