Merging Algorithm - Group Managers' Task

Next: Merging algorithm - Up: Connected Components of Previous: Merging Algorithm -

Merging Algorithm - Group Managers' Task

We perform merge iterations, alternating between horizontal and vertical merge phases. For each odd merge iteration t, , we will perform the horizontal merge phase, and similarly, for each even merge iteration t, , we will perform the vertical merge phase.

Let t represent the current merge phase iteration, with . Thus, there will be and horizontal and vertical merge phases, respectively.

During each merge, a subset of the processors will act as group managers. These designated processors will prefetch the necessary border information along the column (or row) that they are located upon in the logical processor grid, setting up an equivalent graph problem, running a sequential connected components algorithm on the graph, noting any changes in the labels, and storing these changes ( pairs) in a shared structure. The clients decide who their current group manager is and wait until the list of label changes is ready. They retrieve the list, and all processors make the necessary updates to a proper subset of their labels.

During odd merge iterations t, the horizontal merge phases, processors are group managers if they reside in the logical grid with both

row numbers whose binary representation end with a 0 followed by () 1's (or just ending in a 0 when t = 1), and
column numbers whose binary representation end in 0's.

Similarly, during the even merge iterations t, the vertical merge phases, processors are group managers if they reside in the logical grid with both

row numbers whose binary representation end in 0's, and
column numbers whose binary representation end with a 0 followed by () 1's (or just ending in a 0 when t = 2).

Figure 4: Data Layout of a 512 x 512 Image on 32 Processors - Vertical Merge (t=2)

An example data layout and merge is given in Figure 4. This image is , distributed onto a logical processor grid, with each tile being pixels in size. This example shows the second merge step, a vertical merge, for t = 2. Group managers are, thus, any processor sitting in the logical processor grid with both last bits of the row and column numbers' binary representation equal to `0'. These group managers, along with their respective borders to be merged, are circled in this figure. Suppose now that , and we are at the merge phase, which will be a horizontal merge. A processor in this case is a group manager if it is in a logical grid position whose row number's binary representation ends with 0111, and whose column number's binary representation ends with 0000.

For a horizontal merge, the group manager will prefetch the pixel colors and labels from the vertical borders to be merged, which spans across rows of processors. There are 2 q () pixels per processor row in the border to be merged, meaning that pixels and an equal number of labels need to be prefetched from the clients, while q pixels and q labels are locally available. Thus, each prefetch in the horizontal merge can be done in and .

Similarly for a vertical merge, the group manager will prefetch the pixel colors and labels from the horizontal borders to be merged, which spans across columns of processors. There are 2 r pixels per processor column in the border to be merged, meaning that pixels and an equal number of labels need to be prefetched from the clients for each iteration, while r pixels and r labels are locally available. Thus, each prefetch in the vertical merge can be done in and .

Note that the running time of this prefetching is improved by using a second processor, called a shadow manager, which is designated as the processor adjacent to the group manager, directly across the border being merged. Using this implementation, both the group and shadow manager prefetch only their side of the border, respectively, and sort each border by label. The reasons for this sorting will be described below. The group manager then prefetches the sorted results from the shadow manager and continues on with the algorithm. From this point on, the shadow manager reverts back to being a client of this group manager.

The total complexities for prefetching summed up over the horizontal merges and the vertical merges are

The merging problem is converted into finding the connected components of a graph represented by the border pixels. We use an adjacency list representation for the graph, and add vertices to the graph representing colored pixels. Two types of edges are added to the graph. First, pixels are scanned down the left (or upper) border, and edges are strung linearly down the list between pixels containing the same connected component label. The same is done for pixels on the right (or lower) border. The second step adds edges between pixels of the left (upper) and right (lower) border which are both like-colored pixels and adjacent to each other. We scan down the left column (upper row) elements, and if we are at a colored pixel, we check the pixels in the right column (lower row) adjacent to it. In order to add the first type of edges, the pixels are sorted according to their label for both the left (upper) and right (lower) border by using radix sort. Note the discussion above regarding the use of a shadow manager. A secondary processor is used to prefetch and sort the border elements on the opposite side of the border from the group manager, and the results are then sent to the group manager. This sort takes steps for a border of |V| nodes. The maximum number of edges attached to each vertex in this graph is at most five; two edges in its own column to pixels above and below of the same label plus the three adjacent pixels in the right column. Thus, inserting an edge into the adjacency list takes at most five steps, and we add at most 5 |V| edges. For each horizontal merge step, the number of vertices , and for each vertical merge step, . Thus, the construction of this graph summed over all the iterations of the connected components algorithm takes

A sequential breadth-first search based connected components algorithm computes the connected components of this graph. It runs in O steps, with for horizontal merges and O for vertical merges. The pixels in this graph are then scanned again, and any changes in the labeling ( changing to label ) are eventually stored in a sorted array of all unique changes . The following algorithm describes the procedure for creating the sorted array of label changes from the original arrays.

There are at most 2 |V| changes, so Steps 1, 2, and 3 take O time. Thus, the creation of the sorted array of label changes takes O time. Summing over the steps, this is equivalent to .

The array structure is actually two contiguous arrays, one holding the obsolete labels ('s) and the other holding the corresponding new labels ('s). The size of these arrays of 's and 's is also placed into a shared memory location.

Now all the processors hit a barrier and wait until everyone has completed their tasks. After the barrier, the group manager will update its pixels' labels in O by the following procedure.

After the initial tile labelings, but before the merging iterations, each processor creates a sorted array of hooks to each local component containing a border pixel of the tile. There will be exactly one hook for each of these components, including the initial label of that component and the offset address in the tile of any pixel in that component. This is done as follows:

This initialization takes computational complexity of O for each of Steps 1, 2, and 3, yielding a total of .

Figure 5: An example of Tile Hooks

At the conclusion of each of the merging steps, only the labels of pixels on the border of each tile are updated. There is no need to relabel interior pixels since they are not used in the merging stage. Only boundary pixels need their labels updated. The procedure is simple; for each colored pixel on the boundary, we will binary search the list of label changes in per step. The total computational complexity over the merging iterations is then .

At the end of the last merging step, each processor must update its interior pixel labels. Each hook described above is compared to the current label at the hook's offset position index. If the hook's label is different from the current label at position i, the processor will run a breadth-first search relabeling technique beginning at pixel i, relabeling all the connected pixels' labels to the new label. Since there is only one hook per tile component on the border, the breadth-first search relabeling procedure takes O time.

The total complexity associated with updating the labels of each tile is
assuming for large enough n. For , is sufficient.

After each merging step label update, the manager hits another barrier, waiting for the end of this iteration.

In summary, the group managers' routine has the following complexities:

Next: Merging algorithm - Up: Connected Components of Previous: Merging Algorithm -

David A. Bader
dbader@umiacs.umd.edu