Yoshikoder

What’s new with the Yoshikoder?

Archive for the ‘Content Analysis’ Category

Horrid html

without comments

First off, there’s an update to the YK Converter available from here.  It’s a ‘related file’ for the Yoshikoder, for some reason. These updates are primarily because I’ve just finished working on a project that required scraping a lot of truly horrid web pages, and the current machinery wasn’t quite up to dealing with them.  In fact, they bust everything except TagSoup.

Read the rest of this entry »

Written by Will

January 8, 2009 at 10:26 am

RID dictionary corrections

without comments

It seems the regressive imagery dictionaries for French and Portugese were not in good shape to import into the Yoshikoder.

Now they are.

Thanks to Sophie for pointing that out.

Written by Will

December 19, 2008 at 9:17 am

JFreq – a little command line tool

with one comment

Just a quick pointer to a bit of code that you might find useful, if you’re a command line kind of person.

JFreq is a simple word counter. It takes your text files, filters them various ways, and spits out a table of counts organized word by document. The handy bit is probably the filtering. JFreq can currently stem in 12 languages (courtesy of the lucene project). It can also remove currency references, number references, and stop words from a list you provide. Requires Java 5 or higher.  Output is in UTF-8.

JFreq is already in use as the part of the backend of the Stata implementation of Wordscores, and has been used by the Wordfish folk for research on the EU. One of these days it will get a nice graphical interface, but given the speed of Yoshikoder development lately, that’s unlikely to happen soon. And now comes with a nice graphical interface if you don’t fancy the command line version.

Written by Will

July 6, 2007 at 11:19 am

The app at APSA

without comments

This year the American Political Science Association meeting was in Philadelphia. The Yoshikoder went too, under cover of a Content Analysis working group organized by the irrepressible Stephen Purpura.

Attendees were asked for a short presentation, and a brief document covering either the content analysis methods they were developing, or the research problem they thought might benefit. I took the opportunity to write 4 pages on the Yoshikoder. You might find them useful. In particular, there’s motivation for, description of, and a guide to interpreting the relative risk ratios that the new statistical comparison report provides.

There’s plenty to be said about the other methods we heard about at the workshop, but I’ll save those for another entry.

Written by Will

September 27, 2006 at 7:24 pm

Laver and Garry dictionary update

without comments

Some of you may be using Laver and Garry’s policy positions dictionary, available from the resources page. This is a dictionary of english-language political terms designed to capture issue content in party manifestos. After some investigation it appears there were actually two different versions of this dictionary floating about the academic end of the web. And it seems that the one I translated for the Yoshikoder was not the one used in the final article.

Tush.

I’ve unlinked the old one and added LaverGarryAJPS.ykd to the page. Thanks to John Garry for sorting out which was which. Practically speaking, they’ve much the same content, but the real one is smaller.

Written by Will

September 25, 2006 at 7:25 pm

VBPro goes open-source

with 7 comments

Mark Miller has decided to make VBPro open-source, under the Gnu Public License. VBPro is a classic computer content analysis program and a de facto standard in the field, so it’s great for the scientific replication standard to have the algorithms available. But it’s also great to have a code base that can be updated (VBPro runs only in DOS), enhanced, and ported to other operating systems.

VBPro was always free, but I’m proud to have been part of the crowd presenting the arguments for making it open too. And I’ll be helping Mark sort of the licensing and hosting arrangments for VBPro’s new life. Frankly, I’m pretty excited about the whole thing.

Written by Will

April 12, 2006 at 8:24 am

Philip Stone

without comments

Philip Stone, pioneer of computer content analysis and author of the General Inquirer, is dead.

The Boston Globe has a short obituary.

There’s nothing more for me to say.

Written by Will

March 24, 2006 at 5:05 pm

Posted in Content Analysis

Regressive Imagery

without comments

Colin Martindale’s Regressive Imagery dictionary is now available from the Yoshikoder resources page in English, French, German, Swedish and Portugese.

These are pretty much straight translations of the Wordstat files that Provalis Research makes available, except that I’ve ignored the exclusion lists that come with the English, French and Portugese versions.

Credit to Provalis for making it available electronically, and to the crowd of translators for making it available in multiple languages. This should happen a lot more often than it does.

If you go to all the work of making a dictionary, save it in an open format and make it available on the web. If you’re not sure how to do that, send it to me with your details, and I’ll put it on the resources page.

Written by Will

March 24, 2006 at 4:42 pm

Posted in Content Analysis