Figure 2: Example of Dynamic Data Redistribution (Method A) with p=8 and n=63
A simple method for dynamic data redistribution ranks each element in
order across the p processors, and assigns each set of q
consecutively labeled elements to a processor, where . Note that when p does not
divide n evenly, the last processor will receive less than q
elements. We refer to this as Method A.
Figure 2 shows a dynamic data redistribution example for
Method A. This is a simple example for 8 processors and 63
elements, with an arbitrary initial distribution of . Here,
, for
, while
, since
receives the remainder of elements when p does not divide the total
number of elements evenly.
An algorithm for Method A first calls the
CONCAT communication primitive and assigns it to array
, a
shared array. Another
shared
array of prefix-sums of the values from N, say PS, is derived from
by simple local running sum calculations. Thus, every
processor contains local copies of all prefix-sums. Suppose elements
are logically ranked in consecutive order from 1 to n. In the
final layout, processor i will hold elements ranked from q i + 1
to
, inclusively. Using the prefix-sum information, each
processor easily determines where these elements are located and
issues READ primitives for the respective remote locations to
fill the
distributed
output array.
The analysis for the dynamic data redistribution algorithm using the
BDM model is as follows. The CONCAT primitive requires
communication and
(Eq. (2)). The local prefix-sum calculation
requires O
. Determining the location of elements to be read
using the prefix-sums has computational complexity of
. Assume that the maximum number of elements initially on a processor
is m, i.e.,
. The READ primitive for
actually issuing the remote read requests uses
and
since each processor fetches at most
elements, but in the worst case, a processor
is the source of m fetched elements. Since these requests are
pipelined, only a single latency
is incurred. Since
, the dynamic data redistribution algorithm has the
following complexity:
Note that the input distribution N for dynamic data redistribution
can range from already balanced data to the
case where all data is located on a single processor
. For a large class of
irregular problems such that data are distributed with a certain class
of distributions, it has been shown that the distribution is typically
closer to the first scenario,
[28].