Given a matrix on a p processor partition, where p
divides q, the TRANSPOSE Communication Library Primitive
consists of rearranging the data in the
array such that
the first
rows of elements are moved to the first
processor, the second
rows to the second processor, and
so on, with the last
rows of the matrix moved to the
last processor. This primitive is also known as the index
operation ([8], [11]). The BDM algorithm and
analysis for the TRANSPOSE data movement is given in
[3] and is similar to that of the LogP model [17].
This TRANSPOSE communication algorithm has the following
complexity: