References

Next: About this document Up: The Block Distributed Memory Previous: Matrix Multiplication

References

1: A. Aggarwal, A. Chandra, and M. Snir, On Communication Latency in PRAM Computations, Proc. 1st ACM Symp. on Parallel Algorithms and Architectures, pp. 11-21, June 1989.
2: A. Aggarwal, A.K. Chandra, and M. Snir, Hierarchical Memory with Block Transfer, Proc. 28th Annual Symp. on Foundations of Computer Science, pp. 204-216, Oct. 1987.
3: Anant Agarwal, B.-H. Lim, D. Kranzs, and J. Kubiatowicz, APRIL: A Processor Architecture for Multiprocessing, Proc. of the 17th Annual International Symp. on Computer Architecture, pp. 104-114, May 1990.
4: A. Bar-Noy and S. Kipnis, Designing Broadcasting Algorithms in the Postal Model for Message-Passing Systems, Proc. 4th Symp. on Parallel Algorithms and Architectures, pp. 13-22, July 1992.
5: A. Bar-Noy and S. Kipnis, Multiple Message Broadcasting in the Postal Model, Proc. 7th International Parallel Processing Symp., pp. 463-470, April 1993.
6: G.E. Blelloch et.al., A Comparison of Sorting Algorithms for the Connection Machine CM-2, Proc. 3th Symp. on Parallel Algorithms and Architectures, pp. 3-16, July 1991.
7: H. Burkhardt III, S. Frank, B. Knobe, and J. Rothnie, Overview of the KSRI Computer System, TR KSR_TR_9202001, Kendall Square Rescard, Boston, Feb. 1992.
8: D. Culler et. al., LogP: Toward a Realistic Model of Parallel Computation, Proc. 4th ACM PPOPP, pp. 1-12, May 1993.
9: P.B. Gibbons, Asynchronous PRAM Algorithms, a chapter in Synthesis of Parallel Algorithms, J.H. Reif, editor, Morgan-Kaufman, 1990.
10: A. Gupta and V. Kumar, Scalability of Parallel Algorithms for the Matrix Multiplication, 1993 International Conference on Parallel Processing, Vol. III, pp. 115-123.
11: E. Hagersten, S. Haridi, and D. Warren, The Cache-Coherence Protocol of the Data Diffusion Machine, M. Dubois and S. Thakkar, editors, Cache and Interconnect Architectures in Multiprocessors, Kluwer Academic Publishers, 1990.
12: J. JJ and K.W. Ryu, Load Balancing and Routing on the Hypercube and Related Networks, Journal of Parallel and Distributed Computing 14, pp. 431-435, 1992.
13: R.M. Karp, A. Sahay, E.E. Santos, K.E. Schauser, Optimal Broadcast and Summation in the LogP Model, Proc. 5th Symp. on Parallel Algorithms and Architectures, pp. 142-153, July 1993.
14: C.P. Kruskal, L. Rudolph, and M. Snir, A Complexity Theory of Efficient Parallel Algorithms, Theoretical Computer Science 71, pp. 95-132, 1990.
15: T. Leighton, Tight bounds on the Complexity of Parallel Sorting, IEEE Trans. Comp. C-34(4), pp. 344-354, April 1985.
16: D. Lenoski et. al., The Stanford Dash Multiprocessor, IEEE Computer 25(3), pp. 63-79, March 1992.
17: C.V. Loan, Computational Frameworks for the Fast Fourier Transform, SIAM, 1992.
18: J.M. Marberg and E. Gafni, Sorting in Constant Number of Row and Column Phases on a Mesh, Algorithmica 3, pp. 561-572, 1988.
19: K. Mehrotra, S. Ranka, and J.-C. Wang, A Probabilistic Analysis of a Locality Maintaining Load Balancing Algorithm, Proc. 7th International Parallel Processing Symp., pp. 369-373, April 1993.
20: S. Rajasekaran and T. Tsantilas, Optimal Routing Algorithms for Mesh-connected Processor Arrays, Algorithmica (8), pp. 21-38, 1992.
21: K.W. Ryu and J. JJ, Efficient Algorithms for List Ranking and for Solving Graph Problems on the Hypercube, IEEE Trans. Parallel and Distributed Systems 1(1), pp. 83-90, Jan. 1990.
22: H. Shi and J. Schaeffer, Parallel Sorting by Regular Sampling, J. of Parallel and Distributed Computing 14, pp. 361-372, 1992.
23: H.J. Siegel et. al., Report of the Purdue Workshop in Grand Challenges in Computer Architecture for the Support of High Performance Computing, J. of Parallel and Distributed Computing 16(3), pp. 198-211, 1992.
24: L. G. Valiant, A Bridging Model for Parallel Computation, CACM 33(8), pp. 103-111, Aug. 1990.

Next: About this document Up: The Block Distributed Memory Previous: Matrix Multiplication

joseph@umiacs.umd.edu