Cloud9 was designed to serve as both a teaching tool and to support research in text processing. It was used in a MapReduce course at the University of Maryland in Spring 2008, and is again being used in Fall 2008 for a similar course. The library itself is available via anonymous Subversion checkout. Like Hadoop itself, Cloud9 is distributed under the Apache License.
Starting Points
- Hadoop homepage
- Hadoop API (0.17.0)
- Cloud9 API javadoc
- Downloading Cloud9 and getting started with EC2
- Getting started with S3
- Accessing the Google/IBM CLuE cluster
- MapReduce and related bibliography
- Sample text collection, consisting of the Bible and the complete works of Shakespeare (~9 MB)
Next Steps
- Staging records and working with SequenceFiles
- Working with complex data types
- Primer on MapReduce algorithm design
- Working with counters
- Primer on HBase Shell, the command-line interface to HBase
- Using HBase with MapReduce
Exercises
Subversion Access
- umd-hadoop-core: https://subversion.umiacs.umd.edu/umd-hadoop/core
- umd-hadoop-dist: https://subversion.umiacs.umd.edu/umd-hadoop/dist
- Adding Cloud9 to your project
- Layout of project directory tree
This work is supported by the following sources: the Intramural Research Program of the NIH, National Library of Medicine; NSF under awards IIS-0705832 and IIS-0836560; DARPA/IPTO Contract No. HR0011-06-2-0001 under the GALE program; IBM and Google under the Academic Cloud Computing Initiative (ACCI); and Amazon. Any opinions, findings, conclusions, or recommendations expressed here do not necessarily reflect those of the sponsors.