Archive for the ‘Content Analysis’ Category
Horrid html
First off, there’s an update to the YK Converter available from here. It’s a ‘related file’ for the Yoshikoder, for some reason. These updates are primarily because I’ve just finished working on a project that required scraping a lot of truly horrid web pages, and the current machinery wasn’t quite up to dealing with them. In fact, they bust everything except TagSoup.
RID dictionary corrections
JFreq – a little command line tool
Just a quick pointer to a bit of code that you might find useful, if you’re a command line kind of person.
JFreq is a simple word counter. It takes your text files, filters them various ways, and spits out a table of counts organized word by document. The handy bit is probably the filtering. JFreq can currently stem in 12 languages (courtesy of the lucene project). It can also remove currency references, number references, and stop words from a list you provide. Requires Java 5 or higher. Output is in UTF-8.
JFreq is already in use as the part of the backend of the Stata implementation of Wordscores, and has been used by the Wordfish folk for research on the EU. One of these days it will get a nice graphical interface, but given the speed of Yoshikoder development lately, that’s unlikely to happen soon. And now comes with a nice graphical interface if you don’t fancy the command line version.
The app at APSA
This year the American Political Science Association meeting was in Philadelphia. The Yoshikoder went too, under cover of a Content Analysis working group organized by the irrepressible Stephen Purpura.
Attendees were asked for a short presentation, and a brief document covering either the content analysis methods they were developing, or the research problem they thought might benefit. I took the opportunity to write 4 pages on the Yoshikoder. You might find them useful. In particular, there’s motivation for, description of, and a guide to interpreting the relative risk ratios that the new statistical comparison report provides.
There’s plenty to be said about the other methods we heard about at the workshop, but I’ll save those for another entry.
Laver and Garry dictionary update
Some of you may be using Laver and Garry’s policy positions dictionary, available from the resources page. This is a dictionary of english-language political terms designed to capture issue content in party manifestos. After some investigation it appears there were actually two different versions of this dictionary floating about the academic end of the web. And it seems that the one I translated for the Yoshikoder was not the one used in the final article.
Tush.
I’ve unlinked the old one and added LaverGarryAJPS.ykd to the page. Thanks to John Garry for sorting out which was which. Practically speaking, they’ve much the same content, but the real one is smaller.
VBPro goes open-source
Mark Miller has decided to make VBPro open-source, under the Gnu Public License. VBPro is a classic computer content analysis program and a de facto standard in the field, so it’s great for the scientific replication standard to have the algorithms available. But it’s also great to have a code base that can be updated (VBPro runs only in DOS), enhanced, and ported to other operating systems.
VBPro was always free, but I’m proud to have been part of the crowd presenting the arguments for making it open too. And I’ll be helping Mark sort of the licensing and hosting arrangments for VBPro’s new life. Frankly, I’m pretty excited about the whole thing.
Philip Stone
Philip Stone, pioneer of computer content analysis and author of the General Inquirer, is dead.
The Boston Globe has a short obituary.
There’s nothing more for me to say.
Regressive Imagery
Colin Martindale’s Regressive Imagery dictionary is now available from the Yoshikoder resources page in English, French, German, Swedish and Portugese.
These are pretty much straight translations of the Wordstat files that Provalis Research makes available, except that I’ve ignored the exclusion lists that come with the English, French and Portugese versions.
Credit to Provalis for making it available electronically, and to the crowd of translators for making it available in multiple languages. This should happen a lot more often than it does.
If you go to all the work of making a dictionary, save it in an open format and make it available on the web. If you’re not sure how to do that, send it to me with your details, and I’ll put it on the resources page.