The client processors are any processor not selected in the current iteration to run the group manager tasks. These processors calculate the logical processor grid address of the manager in charge of their border to be merged and wait for the first barrier. After this barrier, the clients prefetch the size ( chSize) of the list of change pairs from the manager in , where and are the number of vertical and horizontal merges, respectively, performed inclusively during the merge phase.
Next, the clients prefetch a block of chSize change pairs from the manager. This is done in for horizontal merges, and for vertical merges, since there are at most (or ) changes, and exactly processors requesting these change pairs from each group manager. The client processors use the same procedure described in the previous section for relabeling their border pixels at the end of each merge iteration, and the interior pixels after the final merge. After each pixel label update, the clients hit another barrier, and wait for the end of this iteration.
Over the iterations, the clients' routine has the following complexities:
Clearly, for large p, this is not an optimal procedure for distributing the list of change pairs from a group manager to the respective clients. If a manager has clients at the end of iteration i, , instead of sending the entire list of change pairs to processors, a distribution algorithm based on the matrix transpose can be used. Using this algorithm, the manager will send blocks of size to each of processors during the first phase. All of the processors repeat this operation by concurrently sending their block to the other processors, in a circular fashion. The complexities for this are
The clients' complexities are thus improved to: