Parallel Complexity for Delta-Connected Components

Next: Implementation Notes Up:

-Connected Components Previous:

-Connected Components

Parallel Complexity for -Connected Components

Thus, for , the total complexities for the parallel -Connected Components algorithm are [4]

Clearly, the computational complexity is the best possible asymptotically. As for the communication complexity, intuitively a latency factor has to be incurred during each merge operation, and hence the factor .

Table I: Implementation Results of Parallel Connected Components of DARPA II Image ()

The majority of previous connected components parallel algorithms are architecture- or machine-specific, and do not port easily to other platforms. Table I shows some previous running times for parallel implementations of connected components on the ``DARPA II Image'' given in Figure 6. The second to last column corresponds to a normalized measure of the amount of work per pixel, where the total work is defined to be the product of the execution time and the number of processors. In order to normalize the results between fine- and coarse-grained machines, we divide the number of processors in the fine-grained machines by 32 to compute the work per pixel site.

Table II: Implementation Results of Segmentation Algorithm on Image 3 from [13], seven grey circles ()

Table III: Implementation Results of Segmentation Algorithm on Image 6 from [13], a binary tool ()

Our implementation also performs better compared with other recent parallel region growing codes ([13]). Note that this implementation uses data parallel Fortran on the TMC CM-2 and CM-5 machines, and lower-level implementations on the CM-5 using Fortran with several message passing schemes. For example, Figure 7 shows two of the more difficult images from [13] which are segmented by region growing. Image 3 is a 256-grey level image, containing seven homogeneous circles. Image 6 is a binary image of a tool. Tables II and III show the comparison of execution times for Images 3 and 6, respectively. Because these images are noise-free, our algorithm skips the image enhancement task.

Table IV: Segmentation Execution Time (in ms) for band 5 image

Table V: Segmentation Execution Time (in ms) for band 5 image

Figure 5: Scalability of the Segmentation Algorithm

Table VI: Total SNF Execution Time (in seconds) for band 5 image

Execution timings for segmentation of the Landsat TM band 5 subimage, shown in Figure 8, are given in Table IV. Corresponding results are given in Table V for a larger subimage of the same view. Note that the SNF and 1-Nearest Neighbor filters are iterative and data dependent, with timings that ramp down after the initial iteration; thus, only the slowest timing for a single iteration is reported. Figure 5 shows scalability of the segmentation algorithm running on the subimage, with various machine configurations of the CM-5, SP-2, and T3D. For this image, the first, second, and third phases of SNF iterate 4, 56, and 47 times, respectively. Also, the 1-Nearest Neighbor task contains 11 iterations. Table VI compares the best-known sequential code for SNF to that of the parallel implementation. Again, this test uses the image, and iterates with the counts specified above. The sequential tests are performed on fast workstations dedicated to a single user and reflect only the time spent doing the filter calculations. These empirical results show our segmentation algorithm scaling with machine and problem size, and exhibiting superior performance on several parallel machines when compared with state-of-the-art sequential platforms.

Next: Implementation Notes Up:

-Connected Components Previous:

-Connected Components

David A. Bader
dbader@umiacs.umd.edu