We finally consider the problem of multiplying two matrices A and B. We assume that . We partition each of the matrices A and B into submatrices, say and , where each of and is of size assuming without loss of generality that is an integer that divides n. For simplicity we view the processors indices are arranged as a cube of size , that is, they are given by , where . We show that product C=AB can be computed in computation time and communication time. The overall strategy is similar to the one used in [1,10], where some related experimental results supporting the efficiency of the algorithm appear in [10]. Before presenting the algorithm, we establish the following lemma.
Proof: We partition the p processors into groups such that each group contains k processors, where . Using the matrix transposition algorithm, we put the first set of rows of each matrix in a group into the first processor of that group, the second set of rows into the second processor, and so on. Then for each processor, we add the k submatrices locally. At this point, each of the processors in a group holds an portion of the sum matrix corresponding to the initial matrices stored in these processors. We continue with the same strategy by adding each set of k submatrices within a group of k processors. However this time the submatrices are partitioned along the columns resulting in submatrices of size . We repeat the procedure times at which time each processor has an portion of the overall sum matrix. We then collect all the submatrices into a single processor. The complexity bounds follow as in the proof of Theorem 3.3.
Algorithm Matrix_Multiplication
Input: Two matrices A and B such that
.
Initially, submatrices and are stored in processor
, for each .
Output: Processor holds the submatrix , where
, and each
is of size .
begin
Therefore we have the following theorem.
Remark: We could have used the second algorithm described in Theorem 3.3 to
execute Steps 1 and 3 of the matrix multiplication algorithm. The resulting communication
bound would be at most .