next up previous
Next: Experimental Results for Up: Block Distributed Memory Previous: Block Distributed Memory

Analysis For Matrix Transpose Algorithm

The analysis for the matrix transpose algorithm is similar to that of the LogP model analysis [11]. The algorithm to perform a matrix transpose on a p processor machine operates as follows. The data layout of matrix A is straightforward; each column i of q elements is stored on processor i, for . Note that the first index of A contains the processor number, while the second index provides the element offset in that processor.

Processor i runs the following program:

 

Each prefetch in Step 1.2 requests a block of elements. Since each processor prefetches p-1 blocks of each, this matrix transpose algorithm will take communication complexity, or

 



next up previous
Next: Experimental Results for Up: Block Distributed Memory Previous: Block Distributed Memory



David A. Bader
dbader@umiacs.umd.edu