Detecting Object Poses

Stefan Holzer, a PhD student at TUM, spent the past two and a half months enabling the PR2 to detect objects based on visual information. The goal was not only to detect the objects within a scene, but also to get useful information about the pose-- position and orientation-- of the detected object in order to simplify tasks like grasping.

Stefan's approach had the PR2 remember different views of an object during a learning phase, and search for appearances of these views during the detection phase.  The learning of these different views of an object can be done either offline using a pan-tilt unit, which rotates the object such that it is visible from all necessary angles, or online when the object is visible for the robot.  The advantage of offline learning is that the environment can be easily controlled and therefore it is easier to select only the useful information in the scene.  By segmenting and storing the 3D point cloud of the object for each of the training views, each detection can be associated with a specific pose.

Stefan used Dominant Orientation Templates (DOT), which allow for an efficient and fast template search.  The high speed for the template search is achieved by discretizing and down-sampling the image data in an intelligent way and making use of the newest processor technology (SSE).

To learn more about Stefan's work, click here.  You can also check out Stefan's presentation slides below or download the slides as PDF.