Consider the problem of sorting n integer keys in the range that are distributed equally over a p-processor distributed
memory machine. An efficient algorithm is radix sort that
decomposes each key into groups of r-bit blocks, for a suitably
chosen r, and sorts the keys by sorting on each of the r-bit
blocks beginning with the block containing the least significant bit
positions [30]. Let
. Assume (w.l.o.g.)
that the number of processors is a power of two, say
, and
hence
is an integer
. Our algorithm
demonstrates efficient uses of the transpose communication
primitive, as well as the h-relation communication scheme.