We start by describing the counting sort algorithm used to sort on
individual blocks of the keys. The Counting Sort algorithm sorts
n integers in the range by using R counters to
accumulate the number of keys equal to i in bucket
, for
, followed by determining the rank of the each
element. Once the rank of each element is known, we can use our
h-relation personalized communication to move each element into the
correct position; in this case
. Counting Sort is a
stable sorting routine, that is, if two keys are identical,
their relative order in the final sort remains the same as their
initial order.
In a practical integer sorting problem, we expect . The pseudocode for our Counting Sort algorithm uses six
major steps and is as follows.
The analysis of our counting sort algorithm is as follows.
Steps (1), (3), and (5) execute in O
local computation time with no communication. Steps (2),
(4), and (6) are communication supersets and have the following
analysis. Steps (2) and (4) are the transpose
primitive with block sizes
and
and hence
result in O
communication. Step (6) uses
the personalized communication primitive for n elements distributed
equally across p processors. Because this routing is a permutation
, it has the following complexity
provided that . Thus, the overall complexity of our
Counting Sort algorithm is given by
Notice that an obvious lower bound to sort the integers is
, and hence our algorithm
is asymptotically optimal when
and
.