Cloud9: Using HBase with MapReduce

by Jimmy Lin

(Page first created: 22 Mar 2008; last updated: )

Note: This page is out of date. It was written with respect to the distribution of HBase in hadoop-0.16.0.

It'd be a good idea to first read the primer on HBase Shell, the command-line interface to HBase.

There are three demo programs in Cloud9 that provide examples of how HBase might integrate with MapReduce jobs:

  • edu.umd.cloud9.demo.DemoHBaseClient provides examples of accessing the client API. It shows examples of how to insert data, query for data, and iterate over rows.
  • edu.umd.cloud9.demo.DemoHBaseSink illustrates using HBase as a data sink for MapReduce jobs, i.e., insert results of reducer into HBase. The demo inserts the sample collection into the "default:text" column of the "test" table.
  • edu.umd.cloud9.demo.DemoHBaseSource illustrates using HBase as a data source for MapReduce jobs, i.e., map over HBase rows. The demo performs word counting on the sample collection stored in HBase (result of previous demo). The collection is read from the "default:text" column of the "test" table.

Commentary

  • Currently, the demos have minimal error checking, and will croak if tables haven't been properly set up. DemoHBaseClient assumes the existence of a "test" table with two column familes, "family1" and "family2". The source and sink demos assume the existence of a "test" table with a "default" column family.
  • The HBase API is rapidly changing, but these demo have been verified to work on the HBase version packaged with version 0.16.0 of Hadoop. Nevertheless, I know for a fact that some of the code is already using deprecated API's, with respect to the main trunk in the HBase svn repository.

Back to main page

Creative Commons: Attribution-Noncommercial-Share Alike 3.0 United States Valid XHTML 1.0! Valid CSS!