Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery

TitleDepth-Encoded Hough Voting for Joint Object Detection and Shape Recovery
Publication TypeConference Paper
Year of Publication2010
AuthorsSun, Min., Xu, Bing-Xin., Bradski, Gary., and Savarese, Silvio
Conference NameECCV
Date Published09/2010
Conference LocationCrete, Greece
Keywords3D shape, AI, computer vision, perception
Abstract Detecting objects, estimating their pose and recovering 3D shape information is a critical problem in many vision and robotics ap- plications. This paper addresses the above needs by proposing a new method called DEHV - Depth-Encoded Hough Voting detection scheme. Inspired by the Hough voting scheme introduced in [13], DEHV incor- porates depth information into the process of learning distributions of image features (patches) representing an object category. DEHV takes advantage of the interplay between the scale of each object patch in the image and its distance (depth) from the corresponding physical patch attached to the 3D object. In training, we use various views of an object using a 2D image and its associated depth map (which we assume is avail- able in learning). In testing, DEHV jointly detects objects, infers their categories, estimates their pose, and infers/decodes objects depth maps from either a single image (when no depth maps are available in testing) or a single image augmented with depth map (when this is available in testing). Extensive quantitative and qualitative experimental analysis on existing datasets [6,9,22] and a newly proposed 3D table-top object cat- egory dataset shows that our DEHV scheme obtains competitive detec- tion and pose estimation results on all the dataset. Most importantly, we demonstrate (with quantitative and qualitative evaluation) that DEHV is capable to reconstruct the 3D shape of the object from just one single uncalibrated image. Finally, we demonstrate that our technique can be successfully employed as a key building block in two application scenar- ios (highly accurate 6 degree of freedom (6 DOF) pose estimation and 3D object modeling).
eccv2010CameraReadyLessVspace.pdf.pdf1.32 MB