Person-following and Detection in an Indoor Environment

Ethan Dreyfuss, who recently received a master's degree from Stanford University, is continuing his work here on autonomous person-following and dataset collection and annotation.  The former project provides a useful building block for a wide variety of tasks. Consider a robot that helps you carry groceries.  This robot is vastly more useful if it can carry your bags to the house without requiring teleoperation; the robot can simply track you and follow behind.  At a high level, person-following comprises two principal tasks: person tracking and navigation.

The approach developed by Ethan and Caroline Pantofaru fuses a face detector with two weak person trackers: one for legs, and one for 3D blobs at person-height.  None of these approaches is individually effective enough to provide robust tracking, but their strengths are complementary.  The face detector is effective when the person is close to, and directly facing the robot. While the leg tracker provides high accuracy when multiple people are present, it is often confused by non-human obstacles and can therefore not work reliably from afar. Conversely, the height-based blob tracker can effectively track from further away, yet it is easily confused by groups of people. By combining techniques, Ethan and Caroline were able to develop a more robust person-tracking tool.

Once the robot can track a designated person, the information is passed on to the navigation stack. This same navigation software was used to complete Milestone 2, with some improvements made to help deal more quickly and robustly with dynamically-moving obstacles such as people.

In addition to the person-following project, Ethan is contributing to the collection and labeling of a large dataset of people in an indoor office environment.  One of the major drivers of computer vision research is the availability of high-quality labeled data.  The bulk of existing person datasets exclude indoor environments, and instead focus on outdoor pedestrians. Indoor environments present numerous challenges for person detection, including poor lighting and environmental clutter.  By automating as much as possible, the process of both collecting (using the robot) and labeling (using Amazon's Mechanical Turk  and Alex Sorokin's CV Web Annotation Toolkit), Ethan's team will be able to provide a large, compelling dataset to encourage other researchers to tackle these challenging problems.

Ethan also picked up a number of side projects including rapid neighborhood computation on point clouds, and implementing a package that uses the open-source video codec Theora to allow low-bandwidth video streaming within ROS.