Given an matrix distributed across a p processor partition, where q = s p, the GATHER Primitive converts the data layout such that the entire s p elements are held in a array local to a single processor. A simple algorithm consists of logically replicating the input data such that there are p copies in contiguous memory, and then calling the TRANSPOSE Communication Primitive. Note that the inverse operation to this primitive is that of SCATTER, where a single column of q elements of data on one processor is divided into p equal-sized chunks and transposed to fill a distributed layout. The analysis for these two primitives is given in Eq. (3).