---------------------------------------------------------------- Some helpful notes from Sam Lamphier in Spring 2008 ---------------------------------------------------------------- I installed the Windows version, which provides a GUI. I did get the latest dev version, 3.5.7, as their later versions supported Java 1.5... and I figured if we work with that, Java 1.5 has alot that 1.4 doesn't. I didn't notice any reason to not use the latest version vs. the book version, so please let me know if you come across something. The easiest way to figure out how to use the UI is through their tutorial (ppt)... look for the following on the Documentation page, under "Tutorials": "A presentation demonstrating all graphical user interfaces (GUI) in Weka. (Warning: this is a large Powerpoint file.) [WekaDoc]" To run the project J48, it's as simple as loading the arff file, then choosing the J48 classifier. I think the results might be slightly different, in that there's a little more model info printed at the top than if run from the command line. By default, I think it does 10 way cross validation. Since the weka.jar is also at the root of the install dir, in theory you can run java from a command prompt with the same args as described for Linux by Dr. Resnik, but I didn't try it. I believe I did try their UI SimpleCLI and that does allow you to enter the args and did produce the same output as using the UI (maybe some cosmetic differences, I can't remember). To train for the homework, I had to edit the RunWeka.ini file to set the -Xmx field (max memory for the JVM) to be -Xmx512m (the default was something like -Xmx128m). Otherwise the JVM ran out of memory. The file that runs weka is RunWeka.bat -- the ini file is in the same directory as the bat file (c:\Program Files\Weka 3.5 depending on where you installed it). Training classifiers is just choosing the "classify" tab from their Explorer app (under Applications) and choosing the classifier of interest (J48 is under the "tree" hierarchy). Then the Start button kicks it off... the tutorial (referenced above) is a nice place to see the steps for this sort of thing. When you choose a classifier, the command line args are displayed in the GUI -- helpful if you want to compare the command line to GUI. The source code for weka is in weka-src.jar, also at the root of the install. All in all, they've done some very nice work making this product, as far as I can tell so far. I did notice that the arff format allows Strings as a data type, but not sure if you can use a string instead of the enumerated attributes that we used for the project. If you can use strings, then we might not need to convert U.S. to UXSX. However, there would have to be some delimiter specified, so I'm not sure what limitations would be there. Possibly using their xml format (xrff) would solve any delimiter issue. http://weka.sourceforge.net/wekadoc/index.php/en:XRFF_%283.5.6%29 [Additional note from me: I tried using String rather than enumerated values, since that would certainly be simpler to implement, but it complained. I then tried to use one or two of the Filters that seem as if they do the conversion for you, and couldn't get those to work.]