Archive for July 2007
JFreq – a little command line tool
Just a quick pointer to a bit of code that you might find useful, if you’re a command line kind of person.
JFreq is a simple word counter. It takes your text files, filters them various ways, and spits out a table of counts organized word by document. The handy bit is probably the filtering. JFreq can currently stem in 12 languages (courtesy of the lucene project). It can also remove currency references, number references, and stop words from a list you provide. Requires Java 5 or higher. Output is in UTF-8.
JFreq is already in use as the part of the backend of the Stata implementation of Wordscores, and has been used by the Wordfish folk for research on the EU. One of these days it will get a nice graphical interface, but given the speed of Yoshikoder development lately, that’s unlikely to happen soon. And now comes with a nice graphical interface if you don’t fancy the command line version.
Minor converter update
There’s a tiny weeny little update to the converter, available from the usual spot. I’ve made the help a bit better, and it should feel a bit slightly more native on Windows.