Next: Block Distributed Memory Up: Parallel Algorithms for Previous: Parallel Algorithms for

Problem Overview

Given an image with k grey levels on a p processor machine (), we wish to develop efficient and portable parallel algorithms to perform various useful primitive image processing computations. Efficiency is a performance measure used to evaluate parallel algorithms. This measure provides an indication of the effective utilization of the p processors relative to the given parallel algorithm. For example, an algorithm with an efficiency near one runs approximately p times faster on p processors than the same algorithm on a single processor. Portability refers to code that is written independently of low-level primitives reflecting machine architecture or size. Our goal is to develop portable algorithms that are scalable in terms of both image size and number of processors, when ran on distributed memory multiprocessors.

Our first algorithm computes the histogramming of an image; i.e., the output consists of an array held in a single processor such that is equal to the number of pixels in the image with grey level i. Without loss of generality, k is assumed to be a power of two. The second algorithm performs the connected components of images ([1], [6], [8], [9], [12], [16], [17], [20], [38]). The task of connected component labeling is cited as an important object recognition problem in the DARPA Image Understanding benchmarks ([37], [47]), and also can be applied to several computational physics problems such as percolation ([41], [5]) and various cluster Monte Carlo algorithms for computing the spin models of magnets such as the two-dimensional Ising spin model ([2], [3], [4], [39], [40]). All pixels with grey level (or `color') 0 are assumed to be background, while pixels with color > 0 are foreground objects. A connected component in the image is a maximal collection of pixels such that a path exists between any pair of pixels in the component. Note that we are using the notion of 8-connectivity, meaning that two pixels are adjacent if and only if one pixel lies in any of the eight positions surrounding the other pixel, or 4-connectivity, in which only the north, east, south, and west pixels are adjacent. Each pixel in the image will receive a positive integer label; pixels will have the same label if and only if they belong to the same connected component. Also, all 0-pixels will receive a label of 0.

The majority of previous parallel histogramming algorithms are architecture- or machine-specific and do not port well to other platforms. Table 1 shows some previous running times for histogramming algorithms on parallel machines. Note that several of these machines are special purpose image processing machines. The last column corresponds to a normalized measure of the amount of work per pixel, where the total work is defined to be the product of the execution time and the number of processors. In order to normalize the results between fine- and coarse-grained machines, we divide the number of processors in the fine-grained machines by 32 to compute the work per pixel site.

Table 1: Implementation Results of Parallel Histogramming Algorithms

As with the histogramming algorithms, most of the previous connected components parallel algorithms as well are architecture- or machine-specific, and do not port easily to other platforms. Table 2 shows some previous running times for implementations of connected components for images using parallel machines. Again, several of these machines are special purpose image processing machines. The second last column corresponds to a normalized measure of the amount of work per pixel, where the total work is defined to be the product of the execution time and the number of processors. Note that only the entries labeled ``DARPA II Image'' use the benchmark image given in Figure 2. All other test images used are unknown to the authors.

Section 2 describes the algorithmic model used to analyze the algorithms whereas Section 3 describes the input images used, as well as the data layout on the Thinking Machines CM-5, IBM SP-1 and SP-2, Meiko CS-2, and Intel Paragon. The histogramming algorithm is presented in Section 4, and the binary connected components algorithm is described in Section 5. Section 6 generalizes the connected components algorithm to multi-grey level images.

The experimental data obtained reflect the execution times from implementations on the TMC CM-5, IBM SP-1 and SP-2, Meiko CS-2, and the Intel Paragon, with the number of parallel processing nodes ranging from 16 to 128 for each machine when possible. The shared memory model algorithms are written in SPLIT-C [10], a shared memory programming model language which follows the SPMD (single program multiple data) model on these parallel machines, and the source code is available for distribution to interested parties.

Table 2: Implementation Results of Parallel Connected Components of Images Algorithms

Next: Block Distributed Memory Up: Parallel Algorithms for Previous: Parallel Algorithms for

David A. Bader
dbader@umiacs.umd.edu