A Gestaltist approach to contour-based object recognition: Combining bottom-up and top-down cues

Ching L. Teo, Cornelia Fermüller, Yiannis Aloimonos

We present a novel mid-level contour-based object recognition approach that exploits the "image-torque" operator [1] for the purpose of recognizing generic object shape categories, e.g. bottles, round, elongated etc. To do this, we introduce a robust shape matching descriptor, termed the torque shape-context that embeds border ownership information via image-torque into the shape-context descriptor of [2]. This improves the discriminative power of the descriptor when matching partial contour fragments. In addition, by applying a Fast Fourier Transform (FFT) over its angular components, we are able to handle changes in scale and rotation. We show, using several diverse datasets, that our approach is able to detect multiple objects shape categories in complex scenes containing clutter, occlusions, scale and rotation reliably.

Torque Shape-Context Descriptor

Given an image torque "fixation" point, edges that contribute to the torque are then embedded with border ownership information: the side nearer to the fixation point is considered "foreground" while the other side is "background". We then apply a truncated Gaussian to weigh angular bins towards the foreground side. Combining the original shape-context and the weighted angular bins yields the torque-shape context.

Matching in clutter

When matching contour fragments in clutter, border ownership increases the discriminatory power of the torque shape-context, producing more accurate matches.

Robust to deformations

Soft weights applied on the angular bins also makes the descriptor robust against typical deformations and noise in localizing the torque fixation point.

Rotation and scale estimation via FFT

By applying an FFT over angular bins of the torque shape-context centered at the torque fixation point, we are able to estimate the amount of rotation between the model and target, enabling us to compensate for rotation and scale changes during matching.

Example detection results

We adapt the modulated torque-based detection method of [3] using matching scores derived from torque shape-context. Example detections of "Mug" from the UMD Clutter dataset is shown below. Note that there are two instances of "Mug" in the sequences:

For each detection, we show from left to right, the matching scores (red means higher), modulated torque and the final detection as bounding boxes.

Resources

Paper [preprint] The International Journal of Robotics Research (IJRR), vol 34(4-5):627--652, 2015. doi:10.1177/0278364914558493.
Download the UMD Hand Manipulation Dataset introduced in the paper.
Matlab code that demonstrates the entire approach with training script to extract model codons.

References

[1] M. Nishigaki, C. Fermüller, and D. Dementhon. "The Image Torque Operator: A New Tool for Mid-level Vision". Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) , pp. 502–509, 2012

[2] S. Belongie, J. Malik and J. Puzicha. "Shape matching and object recognition using shape contexts". IEEE Trans. on Pattern Analysis and Machine Intelligence, vol 24(4), pp. 509–522, 2002.

[3] Ching L. Teo, A. Myers, C. Fermüller and Y. Aloimonos. "Embedding High-Level Information into Low Level Vision: Efficient Object Search in Clutter". Proc. IEEE Conf. on Robotics and Automation (ICRA) , pp. 126--132, 2013

Acknowledgements

The support of the European Union under the Cognitive Systems program (project POETICON++), the National Science Foundation under the Cyberphysical Systems Program and the Qualcomm Innovation Fellowship (Ching L. Teo) is gratefully acknowledged.

Questions? Please contact cteo "at" umd dot edu