Start Here
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. (2003) The Google File System. Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP-03), pages 29-43.
Jeffrey Dean and Sanjay Ghemawat. (2004) MapReduce: Simplified Data Processing on Large Clusters. Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI 2004), pages 137-150.
Jeffrey Dean and Sanjay Ghemawat. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51(1):107-113.
Infrastructure
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. (2003) Xen and the Art of Virtualization. Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP-03), pages 164-177.
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. (2006) Bigtable: A Distributed Storage System for Structured Data. Proceedings of the 7th Symposium on Operating System Design and Implementation (OSDI 2004), pages 205-218.
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall, and Werner Vogels. (2007) Dynamo: Amazon's Highly Available Key-Value Store. Proceedings of the 21st ACM Symposium on Operating Systems Principles, pages 205-220.
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. (2007) Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys 2007), pages 59-72.
Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. (2005) Interpreting the Data: Parallel Analysis with Sawzall. Scientific Programming Journal, 13(4):277-298.
Ranger, Colby, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. (2007) Evaluating MapReduce for Multi-core and Multiprocessor Systems. Proceedings of the 13th International Symposium on High-Performance Computer Architecture (HPCA 2007), pages 205-218.
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. (2007) Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 1029-1040.
Applications
Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Ng, and Kunle Olukotun. (2006) Map-Reduce for Machine Learning on Multicore. Advances in Neural Information Processing Systems 19 (NIPS 2006), pages 281-288.
Chris Dyer, Aaron Cordova, Alex Mont, and Jimmy Lin. (2008) Fast, Easy, and Cheap: Construction of Statistical Machine Translation Models with MapReduce. Proceedings of the Third Workshop on Statistical Machine Translation at ACL 2008, pages 199-207.
Tamer Elsayed, Jimmy Lin, and Douglas Oard. (2008) Pairwise Document Similarity in Large Collections with MapReduce. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008), Companion Volume, pages 265-268.
Jimmy Lin. (2008) Exploring Large-Data Issues in the Curriculum: A Case Study with MapReduce. Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics (TeachCL-08) at ACL 2008, pages 54-61.
Jimmy Lin. (2008) Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008).