Generic object and action detection with LARK

This summer, Hae Jong Seo, a PhD student from the Multidimensional Signal Processing Research Group at UC Santa Cruz, worked with us on object and action recognition using low-cost web cameras. In order for personal robots to interact with people, it is useful for robots to know where to look, locate and identify objects, and locate and identify human actions. To address these challenges, Hae Jong's implemented a fast and robust object and action detection system using features called locally adaptive regression kernels (LARK).

LARK features have many applications, such as saliency detection. Saliency detection determines which parts of an image are more significant, such as containing objects or people. You can then focus your object detection on the salient regions of the image in order to detect more quickly. Saliency detection can be extended to "space-time" for use with video streams.

LARK features can also be used for generic object and action detection. As you can see in the video, objects such as door knobs, the PR2 robot, and human faces and be detected using LARK. Space-time LARK can also detect human actions, such as waving, sitting down, and getting closer to the camera.

For more information, see the larks package on ROS.org or see Hae Jong's slides below (download PDF). You can also consult Peyman Milanfar's publications for more information on these techniques.