We use a simple model to analyze the performance of our parallel algorithms. Our model is based on the fact that current hardware platforms can be viewed as a collection of powerful processors connected by a communication network that can be modeled as a complete graph on which communication is subject to the restrictions imposed by the latency and the bandwidth properties of the network. We view a parallel algorithm as a sequence of local computations interleaved with communication steps, where we allow computation and communication to overlap. We account for communication costs as follows.
Assuming no congestion, the transfer of a block consisting of m
contiguous words between two processors takes time,
where
is the latency of the network and
is the time per word at which a processor can inject or receive data from
the network. Note that the bandwidth per processor is inversely
proportional to
. We assume that the bisection bandwidth is
sufficiently high to support block permutation routing amongst the p
processors at the rate of
. In particular, for any
subset of q processors, a block permutation amongst the q processors
takes
time, where m is the size of the largest block.
Using this cost model, we can evaluate the communication time
of an algorithm as a function of the input size n, the number
of processors p , and the parameters
and
. The coefficient
of
gives the total number of times
collective communication primitives are used, and the coefficient of
gives the maximum total amount of data exchanged between a
processor and the remaining processors.
This communication model is close to a number of similar models (e.g. [16, 33, 1]) that have recently appeared in the literature and seems to be well-suited for designing parallel algorithms on current high performance platforms.
We define the computation time as the maximum time it takes a
processor to perform all the local computation steps. In general, the
overall performance
involves a tradeoff between
and
. In many cases, it is possible to minimize both
and
simultaneously, and sorting is such a case.