next up previous
Next: Communication Primitive: PREFIX Up: Communication Library Primitives Previous: Communication Primitive: TRANSPOSE

Communication Primitive: BCAST

Another useful data movement primitive is BCAST broadcasting primitive. An efficient BDM algorithm is given ([3], [26]) which takes q elements on a single processor and broadcasts them to the other p-1 processors using just two TRANSPOSE Communication Primitives.

An efficient algorithm for broadcasting no greater than p elements from one processor () to the remaining p-1 processors is to perform the CONCAT communication primitive, such that processors only prefetch data when it is from processor r. This algorithm is identical in complexity to Eq. (2). On the other hand, this problem can be solved using k-ary balanced tree algorithm [26], in which case the communication would be .

For larger q, a more efficient algorithm to broadcast the q elements from a single processor to p processors is based on the TRANSPOSE primitive. Processor r holds the q elements to be broadcast in the first column of matrix A. We perform the TRANSPOSE primitive, thus, giving every processor elements. Each processor then locally rearranges the data so that an additional TRANSPOSE data movement will result in each processor holding a copy of all the q elements in its column of A [26].

The analysis of this BCAST algorithm is simple. Since this algorithm just performs two TRANSPOSE Communication Primitives, the complexities of the BCAST Primitive are

 

See [3] and [4] for algorithmic details, performance analysis, and empirical results for these communication primitives.



next up previous
Next: Communication Primitive: PREFIX Up: Communication Library Primitives Previous: Communication Primitive: TRANSPOSE



David A. Bader
dbader@umiacs.umd.edu