For each experiment, the input contains a total of integers
distributed evenly across p processors. The output consists of the
sorted elements held in an array congruent with the input. Each
processor's output block of elements is in non-descending order, and
no element in processor i is greater than any element in processor
j, for all i < j. Note that we use 32-bit keys and sort using
all 32-bits, even when the input distribution is known to be more
restrictive, such as the N input which contains only 19
significant bits.
Figure 4: Performance is independent of key distribution
The performance of our radix sort is independent of input distribution, as shown in Figure 4.. This figure presents results from the IBM SP-2; results obtained from other machines, such as the CM-5, CS-2, and T3D, validate this claim as well.
As shown in Figure 5, the execution time of radix
sort using a fixed number of processors is linear in input size n.
Note that this figure is a log-log plot. Since and R
are constants for a given problem size, the running time is
O
, validating our prediction from the bounds in the
previous section. In addition, the execution time of radix sort for a
given input size on a varying number of processors (p) scales
inversely with p. Again, this was predicted by our earlier analysis.
Figure 5: Scalability of Radix Sort With Respect to Machine and Problem Size