For each experiment, the input contains a total of integers distributed evenly across p processors. The output consists of the sorted elements held in an array congruent with the input. Each processor's output block of elements is in non-descending order, and no element in processor i is greater than any element in processor j, for all i < j. Note that we use 32-bit keys and sort using all 32-bits, even when the input distribution is known to be more restrictive, such as the N input which contains only 19 significant bits.
Figure 4: Performance is independent of key distribution
The performance of our radix sort is independent of input distribution, as shown in Figure 4.. This figure presents results from the IBM SP-2; results obtained from other machines, such as the CM-5, CS-2, and T3D, validate this claim as well.
As shown in Figure 5, the execution time of radix sort using a fixed number of processors is linear in input size n. Note that this figure is a log-log plot. Since and R are constants for a given problem size, the running time is O, validating our prediction from the bounds in the previous section. In addition, the execution time of radix sort for a given input size on a varying number of processors (p) scales inversely with p. Again, this was predicted by our earlier analysis.
Figure 5: Scalability of Radix Sort With Respect to Machine and Problem Size