Figure 2: Example of Dynamic Data Redistribution (Method A) with p=8 and n=63
A simple method for dynamic data redistribution ranks each element in order across the p processors, and assigns each set of q consecutively labeled elements to a processor, where . Note that when p does not divide n evenly, the last processor will receive less than q elements. We refer to this as Method A.
Figure 2 shows a dynamic data redistribution example for Method A. This is a simple example for 8 processors and 63 elements, with an arbitrary initial distribution of . Here, , for , while , since receives the remainder of elements when p does not divide the total number of elements evenly.
An algorithm for Method A first calls the CONCAT communication primitive and assigns it to array , a shared array. Another shared array of prefix-sums of the values from N, say PS, is derived from by simple local running sum calculations. Thus, every processor contains local copies of all prefix-sums. Suppose elements are logically ranked in consecutive order from 1 to n. In the final layout, processor i will hold elements ranked from q i + 1 to , inclusively. Using the prefix-sum information, each processor easily determines where these elements are located and issues READ primitives for the respective remote locations to fill the distributed output array.
The analysis for the dynamic data redistribution algorithm using the BDM model is as follows. The CONCAT primitive requires communication and (Eq. (2)). The local prefix-sum calculation requires O. Determining the location of elements to be read using the prefix-sums has computational complexity of . Assume that the maximum number of elements initially on a processor is m, i.e., . The READ primitive for actually issuing the remote read requests uses and since each processor fetches at most elements, but in the worst case, a processor is the source of m fetched elements. Since these requests are pipelined, only a single latency is incurred. Since , the dynamic data redistribution algorithm has the following complexity:
Note that the input distribution N for dynamic data redistribution can range from already balanced data to the case where all data is located on a single processor . For a large class of irregular problems such that data are distributed with a certain class of distributions, it has been shown that the distribution is typically closer to the first scenario, [28].