Description
This Cloud9 library resides across two Subversion repositories: umd-hadoop-core and umd-hadoop-dist. Here's a description of what's in each:
umd-hadoop-core
- build/: automatically built class files.
- etc/: miscellaneous scripts and files.
- src/: root of the source tree.
umd-hadoop-dist
- cloud9-docs/: directory that holds all the documentation (this file, for example).
- cloud9-docs/api/: Javadoc API for Cloud9, which is automatically generated by ant. To generate, go into umd-hadoop-core, type "ant javadoc".
- hadoop/: tarballs of different Hadoop distributions. You want to take the latest one and unpack it.
- jars/: holds jars of other dependencies needed by the library.
- regression-results/: results of junit regression tests. This directory is automatically populated by the test framework; to run junit tests, go into umd-hadoop-core, type "ant test".
- sample-input/: data files needed for the demos.
Design Rationale
The project directory tree layout for Cloud9 is peculiar for a few reasons:
- We are distributing Hadoop along with Cloud9 to eliminate hidden dependencies on Hadoop versions. In the early stages of a software development project, there may be significant changes from version to version, and worse still, many of these changes might not be properly documented. Putting Hadoop itself in Subversion gives us a clean way to roll out new versions and ensures compatibility with the Cloud9 libraries. Thus, projects built on top of Cloud9 won't have to worry about upgrading Hadoop versions and will be better insulated from version-specific quirks.
- We want to keep umd-hadoop-core as small as possible, since it is a possible launch site for Hadoop when using the Eclipse plug-in (this means that you can right click on a class and run the code on a Hadoop cluster in Eclipse). The Hadoop Eclipse plug-in jars up everything in the current project and ships the jar over to the cluster — so you don't want to waste effort packing and shipping extra bits that aren't going to be used.