Given a matrix on a p processor partition, where p divides q, the TRANSPOSE Communication Library Primitive consists of rearranging the data in the array such that the first rows of elements are moved to the first processor, the second rows to the second processor, and so on, with the last rows of the matrix moved to the last processor. This primitive is also known as the index operation ([8], [11]). The BDM algorithm and analysis for the TRANSPOSE data movement is given in [3] and is similar to that of the LogP model [17]. This TRANSPOSE communication algorithm has the following complexity: