Contour Detection and Characterization for Asynchronous Event Sensors

Francisco Barranco, Ching L. Teo, Cornelia Fermüller, Yiannis Aloimonos

The bio-inspired, asynchronous event-based dynamic vision sensor records temporal changes in the luminance of the scene at high temporal resolution. Since events are only triggered at significant luminance changes, most events occur at object boundaries. This paper presents an approach to learn the location of contours and their border ownership using Structured Random Forests on event-based features that encode motion, timing, texture, and spatial orientations. The classifier integrates elegantly information over time by utilizing the classification results previously computed. Finally, the contour detection and boundary assignment are demonstrated in a layer-segmentation of the scene.

The Dynamic Vision Sensor (DVS)

The DVS [Lich_08] asynchronously records address-events corresponding to scene reflectance changes, with a maximum temporal resolution of 15 microseconds, and 128 x 128 spatial resolution. This sensor fires an event every time the log intensity changes by a fixed amount at a position (x, y). An event ev(x, y, t, p) encodes position and time information, along with the polarity p of the event, that is, +1 or -1 depending on whether the log of the intensity increases or decreases by a global threshold.

Event-based features

Due to the nature of the DVS, visual frame-based features cannot be extracted. We have selected our own event-based features on the basis of how their evolution in time could help us detecting occlusion boundaries.
Event-based Orientation. As done in [Kan_76], we use the convexity of edges, which is encoded by the edge orientation as a cue for border ownership. The intuition is that concave edges often belong to foreground objects. We consider spatio-temporal neighborhoods of 11 x 11 pixels and 20ms, and 8 orientations (from 0 to pi). A normalized Histogram of Orientations with 8 bins is computed for patches of 15 x 15 pixels.
Event temporal information. Timestamps provide information for tracking contours [Del_08], defining a surface that encodes locally the direction and speed of image motion. The changes of this time-surface also encode information about occlusions boundaries. We collect the following features: the number of events accumulated for different time intervals, the first and last times tamps of the events at every pixel, and the series of timestamps for all the events per location.
Event-based Motion Estimation. Image motion encodes relative depth information useful for assigning border ownership. The idea is formulated in the so-called motion parallax constraint, used in previous works [Ste_09], i.e. objects that are close to the camera move faster in the image than objects farther away.
Event-based time texture. The aim in this case is to separate occlusion edges from texture edges. First, texture edges are surrounded by nearby edges with comparable contrasts making them not very salient [Ren_06]. Second, long smooth boundaries with strong texture gradients are more likely to occur at occlusions. Instead of intensity texture gradients as used on images, we use a map of the timestamps of the last event triggered at every pixel.

Fast boundary and border ownership detection via Structured Random Forest

The SRF (see [Dol_13][Kon_11]) is trained for border ownership assignment using event-based features from random (16 x 16) patches. (A) Given the training data D, we learn an optimal splitting threshold Θ_i, associated with a binary split function hi at every split node. (B) The leaves at each tree T_j encode a distribution of the ownership orientation which we use during inference. Averaging the responses over all K trees produces the final boundary and ownership prediction: E_O= {E_B, E_FG, E_BG}. We then obtain E_owt by applying a watershed transformation over E_B to construct an initial segmentation S_owt (C).
For the refinement and event based segmentation, we augment in (D-E) the event-based features with the features with the predictions computed for the previous time interval.
The original non-sequential SRF R_ns creates predictions E_Oⁿ, which are used with the features for the next time (n+1) as input to the sequential SRF R_sq. Finally, for the refined segmentation: 1) initial segmentation S_owt estimated form the predictions E_O of the SRF; 2) segments are refined by enforcing motion coherence between them.

Example results

Following, some results and the PR curves for the F2 metric explained in the paper:

Resources

Paper, supplementary material, spotlight video, and poster presented at IEEE International Conference on Computer Vision (ICCV), Santiago Chile, 2015.
DVS dataset with annotated boundary and border ownership groundtruths.
Matlab code for extracting event-based features.
C++ and Matlab code for detecting boundaries and border ownership from event-based features.

Paper citation

If you use any of our code or dataset files, please cite our paper as

F. Barranco, C. L. Teo, C. Fermuller, and Y. Aloimonos. "Contour Detection and Characterization for Asynchronous Event Sensors." In Proceedings of the IEEE International Conference on Computer Vision, pp. 486-494. 2015.

@inproceedings{barranco_contour_2015,
	author = {Barranco, F. and Teo, C. L. and Fermuller, C. and Aloimonos, Y.},
	title = {Contour Detection and Characterization for Asynchronous Event Sensors},
	journal = {Proceedings of the IEEE International Conference on Computer Vision},
	pages = {486--494},
	year = {2015}
}

References

[Lich_08] P. Lichtsteiner, C. Posch, and T. Delbruck. A 128 x 128 120db 15 microseconds latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State Circuits, 43(2):566-576, 2008.
[Kan_76] G. Kanizsa and W. Gerbino. Convexity and symmetry in figure-ground organization. Vision and artifact, 25-32, 1976.
[Del_08] T. Delbruck. Frame-free dynamic digital vision. In Int. Symp. on Secure-Life Electronics, Advanced Electronics for Quality Life and Society, 1-26, 2008.
[Ste_09] A. N. Stein and M. Hebert. Occlusion boundaries from motion: Low-level detection and mid-level reasoning. IJCV, 82(3):325-357, 2009.
[Ren_06] X. Ren, C. C. Fowlkes, and J. Malik. Figure/ground assignment in natural images. In ECCV, pages 614-627. Springer, 2006.
[Dol_13] P. Dollar and C. L. Zitnick. Structured forests for fast edge detection. In ICCV, 1841-1848. IEEE, 2013.
[Kon_11] P. Kontschieder, S. R. Bulo, H. Bischof, and M. Pelillo. Structured class-labels in random forests for semantic image labelling. In ICCV, 2190-2197. IEEE, 2011.

Acknowledgements

This work was supported by an EU Marie Curie grant (FP7-PEOPLE-2012-IOF-33208), the EU Project Poeticon++ under the Cognitive Systems program, the National Science Foundation under INSPIRE grant SMA 1248056, the Junta de Andalucia VITVIR project (P11-TIC-8120), and by DARPA through U.S. Army grant W911NF-14-1-0384.

Questions? Please contact fbarranco "at" ugr dot es