next up previous
Next: About this document Up: The Block Distributed Memory Previous: Matrix Multiplication

References

1
A. Aggarwal, A. Chandra, and M. Snir, On Communication Latency in PRAM Computations, Proc. 1st ACM Symp. on Parallel Algorithms and Architectures, pp. 11-21, June 1989.
2
A. Aggarwal, A.K. Chandra, and M. Snir, Hierarchical Memory with Block Transfer, Proc. 28th Annual Symp. on Foundations of Computer Science, pp. 204-216, Oct. 1987.
3
Anant Agarwal, B.-H. Lim, D. Kranzs, and J. Kubiatowicz, APRIL: A Processor Architecture for Multiprocessing, Proc. of the 17th Annual International Symp. on Computer Architecture, pp. 104-114, May 1990.
4
A. Bar-Noy and S. Kipnis, Designing Broadcasting Algorithms in the Postal Model for Message-Passing Systems, Proc. 4th Symp. on Parallel Algorithms and Architectures, pp. 13-22, July 1992.
5
A. Bar-Noy and S. Kipnis, Multiple Message Broadcasting in the Postal Model, Proc. 7th International Parallel Processing Symp., pp. 463-470, April 1993.
6
G.E. Blelloch et.al., A Comparison of Sorting Algorithms for the Connection Machine CM-2, Proc. 3th Symp. on Parallel Algorithms and Architectures, pp. 3-16, July 1991.
7
H. Burkhardt III, S. Frank, B. Knobe, and J. Rothnie, Overview of the KSRI Computer System, TR KSR_TR_9202001, Kendall Square Rescard, Boston, Feb. 1992.
8
D. Culler et. al., LogP: Toward a Realistic Model of Parallel Computation, Proc. 4th ACM PPOPP, pp. 1-12, May 1993.
9
P.B. Gibbons, Asynchronous PRAM Algorithms, a chapter in Synthesis of Parallel Algorithms, J.H. Reif, editor, Morgan-Kaufman, 1990.
10
A. Gupta and V. Kumar, Scalability of Parallel Algorithms for the Matrix Multiplication, 1993 International Conference on Parallel Processing, Vol. III, pp. 115-123.
11
E. Hagersten, S. Haridi, and D. Warren, The Cache-Coherence Protocol of the Data Diffusion Machine, M. Dubois and S. Thakkar, editors, Cache and Interconnect Architectures in Multiprocessors, Kluwer Academic Publishers, 1990.
12
J. JJ and K.W. Ryu, Load Balancing and Routing on the Hypercube and Related Networks, Journal of Parallel and Distributed Computing 14, pp. 431-435, 1992.
13
R.M. Karp, A. Sahay, E.E. Santos, K.E. Schauser, Optimal Broadcast and Summation in the LogP Model, Proc. 5th Symp. on Parallel Algorithms and Architectures, pp. 142-153, July 1993.
14
C.P. Kruskal, L. Rudolph, and M. Snir, A Complexity Theory of Efficient Parallel Algorithms, Theoretical Computer Science 71, pp. 95-132, 1990.
15
T. Leighton, Tight bounds on the Complexity of Parallel Sorting, IEEE Trans. Comp. C-34(4), pp. 344-354, April 1985.
16
D. Lenoski et. al., The Stanford Dash Multiprocessor, IEEE Computer 25(3), pp. 63-79, March 1992.
17
C.V. Loan, Computational Frameworks for the Fast Fourier Transform, SIAM, 1992.
18
J.M. Marberg and E. Gafni, Sorting in Constant Number of Row and Column Phases on a Mesh, Algorithmica 3, pp. 561-572, 1988.
19
K. Mehrotra, S. Ranka, and J.-C. Wang, A Probabilistic Analysis of a Locality Maintaining Load Balancing Algorithm, Proc. 7th International Parallel Processing Symp., pp. 369-373, April 1993.
20
S. Rajasekaran and T. Tsantilas, Optimal Routing Algorithms for Mesh-connected Processor Arrays, Algorithmica (8), pp. 21-38, 1992.
21
K.W. Ryu and J. JJ, Efficient Algorithms for List Ranking and for Solving Graph Problems on the Hypercube, IEEE Trans. Parallel and Distributed Systems 1(1), pp. 83-90, Jan. 1990.
22
H. Shi and J. Schaeffer, Parallel Sorting by Regular Sampling, J. of Parallel and Distributed Computing 14, pp. 361-372, 1992.
23
H.J. Siegel et. al., Report of the Purdue Workshop in Grand Challenges in Computer Architecture for the Support of High Performance Computing, J. of Parallel and Distributed Computing 16(3), pp. 198-211, 1992.
24
L. G. Valiant, A Bridging Model for Parallel Computation, CACM 33(8), pp. 103-111, Aug. 1990.


next up previous
Next: About this document Up: The Block Distributed Memory Previous: Matrix Multiplication



joseph@umiacs.umd.edu