The client processors are any processor not selected in the current
iteration to run the group manager tasks. These processors calculate
the logical processor grid address of the manager in charge of their
border to be merged and wait for the first barrier. After this
barrier, the clients prefetch the size ( chSize) of the list of
change pairs from the manager in , where
and
are the number of vertical and
horizontal merges, respectively, performed inclusively during the
merge phase.
Next, the clients prefetch a block of chSize
change pairs from the manager. This is done in
for horizontal merges, and
for
vertical merges, since there are at most
(or
) changes, and exactly
processors requesting these change pairs from each group manager. The
client processors use the same procedure described in the previous
section for relabeling their border pixels at the end of each merge
iteration, and the interior pixels after the final merge. After each
pixel label update, the clients hit another barrier, and wait for the
end of this iteration.
Over the iterations, the clients' routine has the following
complexities:
Clearly, for large p, this is not an optimal procedure for
distributing the list of change pairs from a group manager to the
respective clients. If a manager has clients at the end of
iteration i,
, instead of sending the entire list
of
change pairs to
processors, a distribution
algorithm based on the matrix transpose can be used. Using this
algorithm, the manager will send blocks of size
to
each of
processors during the first phase. All of the
processors repeat this operation by concurrently sending their block
to the other processors, in a circular fashion. The complexities for
this are
The clients' complexities are thus improved to: