Built-In Counters
Counters are lightweight objects in Hadoop that allow you to keep track of system progress in both the map and reduce stages of processing. By default, Hadoop defines a number of standard counters in "groups"; these show up in the jobtracker Webapp, giving you information such as "Map input records", "Map output records", etc.
Here's some sample code that programmatically iterates through all available counters:
RunningJob job = JobClient.runJob(conf); // blocks until job completes
Counters counters = job.getCounters();
for (Counters.Group group : counters) {
System.out.println("- Counter Group: " + group.getDisplayName() + " (" + group.getName() + ")");
System.out.println(" number of counters in this group: " + group.size());
for (Counters.Counter counter : group) {
System.out.println(" - Counter: " + counter.getDisplayName());
}
}
The above fragment will generate the following output:
- Counter Group: File Systems (org.apache.hadoop.mapred.Task$FileSystemCounter) number of counters in this group: 4 - Counter: Local bytes read - Counter: Local bytes written - Counter: HDFS bytes read - Counter: HDFS bytes written - Counter Group: Job Counters (org.apache.hadoop.mapred.JobInProgress$Counter) number of counters in this group: 4 - Counter: Launched map tasks - Counter: Launched reduce tasks - Counter: Data-local map tasks - Counter: Rack-local map tasks - Counter Group: Map-Reduce Framework (org.apache.hadoop.mapred.Task$Counter) number of counters in this group: 9 - Counter: Map input records - Counter: Map output records - Counter: Map input bytes - Counter: Map output bytes - Counter: Combine input records - Counter: Combine output records - Counter: Reduce input groups - Counter: Reduce input records - Counter: Reduce output records
As of Hadoop 0.18.1, it is a bit awkward to access counter values directly. The easiest way to look up a counter is by an enum, but unfortunately the enums that correspond to the built-in counters are not publicly accessible. See JIRA 4043 for more information about this. The current working solution is to look up counters by their string names, which is more verbose. Here is code that fetches values of all the built-in counters:
counter = counters.findCounter("org.apache.hadoop.mapred.Task$FileSystemCounter", 0, "Local bytes read");
System.out.println("Local bytes read: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$FileSystemCounter", 1, "Local bytes written");
System.out.println("Local bytes written: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$FileSystemCounter", 2, "HDFS bytes read");
System.out.println("HDFS bytes read: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$FileSystemCounter", 3, "HDFS bytes written");
System.out.println("HDFS bytes written: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$FileSystemCounter", 3, "HDFS bytes written");
System.out.println("HDFS bytes written: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.JobInProgress$Counter", 2, "Launched map tasks");
System.out.println("Launched map tasks: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.JobInProgress$Counter", 3, "Launched reduce tasks");
System.out.println("Launched reduce tasks: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.JobInProgress$Counter", 4, "Data-local map tasks");
System.out.println("Data-local map tasks: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.JobInProgress$Counter", 5, "Rack-local map tasks");
System.out.println("Rack-local map tasks: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", 0, "Map input records");
System.out.println("Map input records: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", 1, "Map output records");
System.out.println("Map output records: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", 2, "Map input bytes");
System.out.println("Map input bytes: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", 3, "Map output bytes");
System.out.println("Map output bytes: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", 4, "Combine input records");
System.out.println("Combine input records: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", 5, "Combine output records");
System.out.println("Combine output records: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", 6, "Reduce input groups");
System.out.println("Reduce input groups: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", 7, "Reduce input records");
System.out.println("Reduce input records: " + counter.getCounter());
counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", 8, "Reduce output records");
System.out.println("Reduce output records: " + counter.getCounter());
Custom Counters
In addition to the built-in counters associated with every Hadoop job, you can create your own counters. First, define an enum that contains a list of the counters you'd like to have:
protected static enum MyCounter {
INPUT_WORDS
};
Then, in the mapper or reducer, all you have to do is update the counter via the reporter object:
reporter.incrCounter(MyCounter.INPUT_WORDS, 1);
You can programmatically access the counters in the following manner:
RunningJob job = JobClient.runJob(conf); // blocks until job completes Counters c = job.getCounters(); long cnt = c.getCounter(MyCounter.INPUT_WORDS);
Custom counters are very useful in aggregating small bits of information. Pretty simple, huh?