next up previous
Next: Conclusion Up: Performance Evaluation Previous: Experimental Results

Comparison with Previous Results

Despite the enormous theoretical interest in parallel sorting, we were able to locate relatively few empirical studies. Of these, only a few were done on machines which either were available to us for comparison or involved code which could be ported to these machines for comparison. In Tables ix and x, we compare the performance of our sample sort algorithm with two other sample sort algorithms. In all cases, the code was written in SPLIT-C. In the case of Alexandrov et al. [1], the times were determined by us directly on a 32 node CM-5 using code supplied by the authors which had been optimized for a Meiko CS-2. In the case of Dusseau [17], the times were obtained from the graphed results reported for a 64 node CM-5.

   table1061
Table: Total execution time (in seconds) required to sort a variety of benchmarks and problem sizes, comparing our version of sample sort (HBJ) with that of Alexandrov et al. (AIS) on a 32-node CM-5.
tex2html_wrap_inline2676 We were unable to run the (AIS) code on this input.

   table1101
Table x: Time required per element (in microseconds) to sample sort 64M integers, comparing our results (HBJ) with those obtained from the graphed results reported by Dusseau (DUS) on a 64 node CM-5.

   table1130
Table xi: Comparison of our execution time (in seconds) with the best reported times for the Class A and Class B NAS Parallel Benchmark for integer sorting. Note that while we actually place the integers in sorted order, the benchmark only requires that they be ranked without actually reordering.

Finally, there are the results for the NAS Parallel Benchmark [30] for Integer Sorting (IS). The name of this benchmark is somewhat misleading. Instead of requiring that the integers be placed in sorted order as we do, the benchmark only requires that they be ranked without any reordering, which is a significantly simpler task. Specifically, the Class A Benchmark requires ten repeated rankings of a Gaussian distributed random input consisting of tex2html_wrap_inline2680 integers ranging in value from 0 to tex2html_wrap_inline2684 . The Class B Benchmark is similar, except that the input consists of tex2html_wrap_inline2686 integers ranging in value from 0 to tex2html_wrap_inline2690 . Table xi compares our results on these two variations of the NAS Benchmark with the best reported times for the CM-5, the T3D, and the SP-2-WN. We believe that our results, which were obtained using high-level, portable code, compare favorably with the other reported times, which were obtained by the vendors using machine-specific implementations and perhaps system modifications.

   table1182
Table xii: Total execution time (in seconds) required to sort 8M signed integers, comparing our results (HBJ) with those of Tridgell and Brent (TB) on a 32 node CM-5.

The only performance studies we are aware of on similar platforms for generalized sorting are those of Tridgell and Brent [32], who report the performance of their algorithm using a 32 node CM-5 on a uniformly distributed random input of signed integers, as described in Table xii.


next up previous
Next: Conclusion Up: Performance Evaluation Previous: Experimental Results

David R. Helman
helman@umiacs.umd.edu