next up previous
Next: Broadcasting Up: Block Distributed Memory Previous: Block Distributed Memory

Matrix Transposition

Given a matrix on a p processor machine, where p divides q, the matrix transposition consists of rearranging the data such that the first rows of elements are moved to the first processor, the second rows to the second processor, and so on, with the last rows of the matrix moved to the last processor. An efficient matrix transposition algorithm consists of p iterations such that, during iteration i, , each processor prefetches the appropriate block of elements from processor . The BDM algorithm and analysis for the matrix transpose data movement is given in [4] and is similar to that of the LogP model [16]. This matrix transpose algorithm has the following complexity:

 



next up previous
Next: Broadcasting Up: Block Distributed Memory Previous: Block Distributed Memory



David A. Bader
dbader@umiacs.umd.edu