Archive for the ‘Development’ Category
A little bit of infrastructure
OK. About the transparency thing. There’s now a proper place to send bug reports and feature requests for the Yoshikoder. I had considered using the many options available from Sourceforge, but plumped for something that was much simpler, arguably more elegant, and most importantly: blue.
And another one
Another preview (RC2) is available here. Most of the debugging happened on the Mac side, but it ought to go slightly better everywhere. Read the rest of this entry »
Sneaky preview
In a remarkably productive Christmas break I finally got working on the much neglected, at this point almost mythical ‘next version’ of the Yoshikoder. It’s not quite there yet, but I thought it might be nice to share a sneaky new year preview with you.
JFreq – a little command line tool
Just a quick pointer to a bit of code that you might find useful, if you’re a command line kind of person.
JFreq is a simple word counter. It takes your text files, filters them various ways, and spits out a table of counts organized word by document. The handy bit is probably the filtering. JFreq can currently stem in 12 languages (courtesy of the lucene project). It can also remove currency references, number references, and stop words from a list you provide. Requires Java 5 or higher. Output is in UTF-8.
JFreq is already in use as the part of the backend of the Stata implementation of Wordscores, and has been used by the Wordfish folk for research on the EU. One of these days it will get a nice graphical interface, but given the speed of Yoshikoder development lately, that’s unlikely to happen soon. And now comes with a nice graphical interface if you don’t fancy the command line version.
Minor converter update
There’s a tiny weeny little update to the converter, available from the usual spot. I’ve made the help a bit better, and it should feel a bit slightly more native on Windows.
Batch dictionary reports
Folk have been asking about being able to run dictionary reports over all their documents rather than one at a time. Since the code for the next version of the Yoshikoder is in pieces around my bedroom, with several bits having rolled under the carpet or been borrowed by the cat, I’ve made a little program to do dictionary reports in a batch. This application currently lives here, and is wrapped up for Windows. Give it a project file from the Yoshikoder and it will run a dictionary report on every document in the project, and drop the results into a file. At least, that’s the idea.
Concordance reports in the preview
Yes, I bust the concordance reports in the latest Preview. They are now unbusted. Version 0.6.3-Preview.1 is a bug-fix release, available from the usual place.
Tying up loose ends on the preview
Although it may not look like it, there’s a fresh preview of 0.6.3 for download. This should fix two issues:
In the first, folk were running out of memory with large projects. The new preview allows, but does not require, 256M of RAM to be used by the program. This is four times as much as before, but comfortable in the context of most machines’ specifications I hope.
The second issue is the notorious ‘Already running’ problem that usually appeared after a sudden program shutdown. That shouldn’t happen now.
Yoshikoder 0.6.3 – a preview
A preview release of version 0.6.3 of the Yoshikoder is available from the home page. I described the changes in a previous post a while back.
Remember that this is only a preview: online help is not up to date, and I’m sure there will be some other things to iron out. Nevertheless statistical document comparisons are ready. And you saw them here first…
Pesky drive names (Version 0.2.1)
There's a new release of the YKConverter that fixes a rather nasty bug for Windows users trying to convert webpages. It seems the old C:\ shuffle tripped me up. The moral of this bug is: do not construct local URLs by hand, use file.toURI().toURL() instead.
This release also attempts to remove <!— thing —> sections that the html parser tends to leave in. These are usually useful but deeply uninteresting chunks of javascript. The removal code might not always work, but it might save you some post-editing.
The new release is available from the converter's homepage.