Experimental Results for Broadcasting

Next: Image (Data) Layout Up: Block Distributed Memory Previous: Analysis for the

Experimental Results for Broadcasting

The performance graphs for broadcasting using the prefetching matrix transposition on a 32 processor CM-5, SP-2, and CS-2, and 8 processor Paragon are given in Figures 6, 7, 8 and 9, respectively, in Appendix A.1. As expected, these graphs show that the SPLIT-C broadcasting algorithm takes roughly twice the time of the SPLIT-C matrix transpose algorithm. In addition, these figures show the attained data bandwidth per processor for this broadcast algorithm. As expected, we achieve approximately the same results as that of the transpose algorithm on both machines.

Next: Image (Data) Layout Up: Block Distributed Memory Previous: Analysis for the

David A. Bader
dbader@umiacs.umd.edu