next up previous
Next: Sorting Up: Personalized Communication Previous: Algorithm

Summary of Experimental Results

   figure200
Figure 1: Final distribution of the keys corresponding to our input data sets

We examine the performance of our h-relation algorithm on various configurations of the IBM SP-2 and Cray T3D, using four values of h: tex2html_wrap_inline1456 , tex2html_wrap_inline1458 , tex2html_wrap_inline1460 , and tex2html_wrap_inline1462 . More results are given in [4]. The data sets used in these experiments are defined as follows. Our input of size n is initially distributed cyclically across the p processors such that each processor tex2html_wrap_inline1400 initially holds tex2html_wrap_inline1384 keys, for tex2html_wrap_inline1402 . For tex2html_wrap_inline1456 the input consists of tex2html_wrap_inline1476 keys labelled for tex2html_wrap_inline1478 , followed by tex2html_wrap_inline1480 keys labelled for tex2html_wrap_inline1482 , and so forth, (with tex2html_wrap_inline1484 keys labelled for tex2html_wrap_inline1400 ), with the last tex2html_wrap_inline1488 keys labelled for tex2html_wrap_inline1490 . Note that this results in the same data movement as the transpose primitivegif. For tex2html_wrap_inline1492 , instead of an equal number of elements destined for each processor, the function tex2html_wrap_inline1494 , for ( tex2html_wrap_inline1496 ), is characterized by

  equation232

The result of this data movement, shown in Figure 1, is that processor 0 receives the largest imbalance of elements, i.e. h, while other processors receive varying block sizes ranging from 0 to at most h. For tex2html_wrap_inline1512 , approximately tex2html_wrap_inline1514 processors receive no elements, and hence this represents an extremely unbalanced case.

   figure254
Figure 2: Performance of personalized communication ( tex2html_wrap_inline1516 ) with respect to machine and problem size.

As shown in Figure 2, the time to route an h-relation personalized communication for a given input size on a varying number of processors (p) scales inversely with p whenever n is large compared with p. However, for inputs which are small compared with the machine size, the communication time is dominated by O tex2html_wrap_inline1528 as shown in the case of the 128-processor Cray T3D with n = 256K.


next up previous
Next: Sorting Up: Personalized Communication Previous: Algorithm

David R. Helman
helman@umiacs.umd.edu