We perform merge iterations, alternating between horizontal
and vertical merge phases. For each odd merge iteration t,
, we will perform the
horizontal merge phase, and similarly, for each even merge iteration
t,
, we will perform the
vertical merge phase.
Let t represent the current merge phase iteration, with . Thus, there will be
and
horizontal and vertical
merge phases, respectively.
During each merge, a subset of the processors will act as group
managers. These designated processors will prefetch the necessary
border information along the column (or row) that they are located
upon in the logical processor grid, setting up an equivalent graph
problem, running a sequential connected components algorithm on the
graph, noting any changes in the labels, and storing these changes
( pairs) in a shared structure. The
clients decide who their current group manager is and wait until the
list of label changes is ready. They retrieve the list, and all
processors make the necessary updates to a proper subset of their
labels.
During odd merge iterations t, the horizontal merge phases, processors are group managers if they reside in the logical grid with both
Similarly, during the even merge iterations t, the vertical merge phases, processors are group managers if they reside in the logical grid with both
Figure 4: Data Layout of a 512 x 512 Image on 32 Processors - Vertical Merge (t=2)
An example data layout and merge is given in
Figure 4. This image is , distributed
onto a
logical processor grid, with each tile being
pixels in size. This example shows the second merge step, a
vertical merge, for t = 2. Group managers are, thus, any processor
sitting in the logical processor grid with both last bits of the row
and column numbers' binary representation equal to `0'. These group
managers, along with their respective borders to be merged, are
circled in this figure. Suppose now that
, and we are at
the
merge phase, which will be a horizontal merge.
A processor in this case is a group manager if it is in a logical grid
position whose row number's binary representation ends with 0111,
and whose column number's binary representation ends with 0000.
For a horizontal merge, the group manager will prefetch the pixel
colors and labels from the vertical borders to be merged, which spans
across rows of processors. There are 2 q (
) pixels per processor row in the border to be
merged, meaning that
pixels and an equal
number of labels need to be prefetched from the clients, while q
pixels and q labels are locally available. Thus, each prefetch in
the horizontal merge can be done in
and
.
Similarly for a vertical merge, the group manager will prefetch the
pixel colors and labels from the horizontal borders to be merged,
which spans across columns of processors. There are
2 r pixels per processor column in the border to be merged, meaning
that
pixels and an equal number of labels
need to be prefetched from the clients for each iteration, while r
pixels and r labels are locally available. Thus, each prefetch in
the vertical merge can be done in
and
.
Note that the running time of this prefetching is improved by using a second processor, called a shadow manager, which is designated as the processor adjacent to the group manager, directly across the border being merged. Using this implementation, both the group and shadow manager prefetch only their side of the border, respectively, and sort each border by label. The reasons for this sorting will be described below. The group manager then prefetches the sorted results from the shadow manager and continues on with the algorithm. From this point on, the shadow manager reverts back to being a client of this group manager.
The total complexities for prefetching summed up over the
horizontal merges and the
vertical merges are
The merging problem is converted into finding the connected components
of a graph represented by the border pixels. We use an adjacency list
representation for the graph, and add vertices to the graph
representing colored pixels. Two types of edges are added to the
graph. First, pixels are scanned down the left (or upper) border, and
edges are strung linearly down the list between pixels containing the
same connected component label. The same is done for pixels on the
right (or lower) border. The second step adds edges between pixels of
the left (upper) and right (lower) border which are both like-colored
pixels and adjacent to each other. We scan down the left column
(upper row) elements, and if we are at a colored pixel, we check the
pixels in the right column (lower row) adjacent to it. In order to add
the first type of edges, the pixels are sorted according to their label
for both the left (upper) and right (lower) border by using radix
sort. Note
the discussion above regarding the use of a shadow manager. A
secondary processor is used to prefetch and sort the border elements
on the opposite side of the border from the group manager, and the
results are then sent to the group manager. This sort takes
steps for a border of |V| nodes
. The maximum number of edges attached to each
vertex in this graph is at most five; two edges in its own column to
pixels above and below of the same label plus the three adjacent
pixels in the right column. Thus, inserting an edge into the
adjacency list takes at most five steps, and we add at most 5 |V|
edges. For each horizontal merge step, the number of vertices
, and for each vertical merge step,
. Thus, the construction of this graph
summed over all the iterations of the connected components algorithm
takes
A sequential breadth-first search based connected components algorithm
computes the connected components of this graph. It runs in O steps, with
for
horizontal merges and O
for vertical merges.
The pixels in this graph are then scanned again, and any changes in
the labeling (
changing to label
) are eventually
stored in a sorted array of all unique changes
. The following algorithm describes the procedure for
creating the sorted array of label changes from the original arrays.
There are at most 2 |V| changes, so Steps 1, 2, and 3 take
O time. Thus, the creation of the sorted array of label
changes takes O
time. Summing over the
steps, this
is equivalent to
.
The array structure is actually two contiguous arrays, one holding the
obsolete labels ('s) and the other holding the corresponding
new labels (
's). The size of these arrays of
's and
's is also placed into a shared memory location.
Now all the processors hit a barrier and wait until everyone has
completed their tasks. After the barrier, the group manager will
update its pixels' labels in O by the following procedure.
After the initial tile labelings, but before the merging iterations, each processor creates a sorted array of hooks to each local component containing a border pixel of the tile. There will be exactly one hook for each of these components, including the initial label of that component and the offset address in the tile of any pixel in that component. This is done as follows:
This initialization takes computational complexity of
O for each of Steps 1, 2, and 3,
yielding a total of
.
Figure 5: An example of Tile Hooks
At the conclusion of each of the merging steps, only the
labels of pixels on the border of each tile are updated. There is no
need to relabel interior pixels since they are not used in the merging
stage. Only boundary pixels need their labels updated. The procedure
is simple; for each colored pixel on the boundary, we will binary
search the list of label changes in
per step. The total computational complexity over the
merging iterations is then
.
At the end of the last merging step, each processor must update its
interior pixel labels. Each hook described above is compared to the
current label at the hook's offset position index. If the hook's label
is different from the current label at position
i, the processor will run a breadth-first search relabeling
technique beginning at pixel i, relabeling all the connected pixels'
labels to the new label. Since there is only one hook per tile
component on the border, the breadth-first search relabeling procedure
takes O
time.
The total complexity associated with updating the labels of each tile
is
assuming
for large enough n. For
,
is sufficient.
After each merging step label update, the manager hits another barrier, waiting for the end of this iteration.
In summary, the group managers' routine has the following complexities: