Given an associative operator (e.g. +, , , , etc.) and a shared input array on a p processor partition, distributed with one element per processor, the PREFIX Communication Library Primitive coalesces the data such that each processor k contains a single element . Parallel computers can handle this efficiently when the element is assumed to reside on processor k [10], and SPLIT-C implements this as a primitive library function. An analysis for this operation on the BDM model is given in [4]. Since these rounds can be realized with an CONCAT primitive operation followed by O local computation of the prefix-sums, the resulting complexity is
Note that our algorithm can perform a stronger operation for the same complexity; namely, all p prefix-sums values can be made available as local elements on all processors. Thus, each processor k contains , for all . This is equivalent to calling CONCATPREFIX.