next up previous
Next: Image (Data) Layout Up: Block Distributed Memory Previous: Analysis for the

Experimental Results for Broadcasting

The performance graphs for broadcasting using the prefetching matrix transposition on a 32 processor CM-5, SP-2, and CS-2, and 8 processor Paragon are given in Figures 6, 7, 8 and 9, respectively, in Appendix A.1. As expected, these graphs show that the SPLIT-C broadcasting algorithm takes roughly twice the time of the SPLIT-C matrix transpose algorithm. In addition, these figures show the attained data bandwidth per processor for this broadcast algorithm. As expected, we achieve approximately the same results as that of the transpose algorithm on both machines.



next up previous
Next: Image (Data) Layout Up: Block Distributed Memory Previous: Analysis for the



David A. Bader
dbader@umiacs.umd.edu