In this paper, we introduced a novel variation on sample sort and conducted an experimental study of its performance on a number of platforms using widely different benchmarks. Our results illustrate the efficiency and scalability of our algorithm across the different platforms and appear to improve on all similar results known to the authors. Our results also compare favorably with those reported for the simpler ranking problem posed by the NAS Integer Sorting (IS) Benchmark.
We have also studied several variations on our algorithm which use
differing strategies to ensure that every bucket in Step (1)
receives an equal number of elements. The results obtained for these
variations were very similar to those reported in this paper. On no
platform did the improvements exceed approximately , and in many
instances they actually ran more slowly. We believe that a significant
improvement of our algorithm would require the enhancement of the
sequential sorting and merging in Steps (3) and (8), and that
there is little room for significant improvement in either the
load balance or the communication efficiency.