Unsupervised Object Discovery With Megaworldmodel

During his summer internship at Willow Garage, Julian "Mac" Mason, a Ph.D. candidate from Duke University, worked on the Megaworldmodel: a framework for large-scale, long-term semantic maps.  In constrast to occupancy maps (which model free and occupied space), semantic maps model the location of objects, and (when possible) their identities.  Determining these identities is difficult: object recognition remains an open problem.  For this reason, the Megaworldmodel provides a generic interface to object recognition systems, allowing existing tools to be easily integrated.  Two such tools have already been included: Willow Garage's textured_object_detection, and Hilton Bristow's implementation of Deva Ramanan's deformable-parts model.

The Megaworldmodel cleanly encapsulates the capture, processing, and mapping of recognizable objects, and the querying of the resulting map.  However, not all objects are recognizable! State-of-the-art object recognition algorithms require extensive supervised training to accurately recognize objects.  In large, general environments, manual training is intractable.  There are simply too many objects.  To enable large-scale semantic mapping, the Megaworldmodel includes tools for active object search (using a Kinect-equipped PR2) and for unsupervised object discovery.  While autonomously exploring an environment, the robot will encounter objects (which it cannot yet recognize) from many different viewpoints.  Using unsupervised segmentation, these objects can be detected, and then clustered into training examples for existing object recognition techniques.  Although this does not provide semantic labels (you get "object 6," not "coffee cup"), it does allow object instances to be recognized in other locations, and at other times.  Ongoing work seeks to scale this technique to extremely large datasets, permitting the entirely unsupervised creation of a large object database.

More information about megaworld is available here.