Wednesday, March 2, 2011

A Simple Scheme for Book Classification Using Wikipedia, by Andromeda Yelton

Editor’s note: This article is the winner of the LITA/Ex Libris Student Writing Award, 2010.

Because the rate at which documents are being generated outstrips librarians’ ability to catalog them, an accurate, automated scheme of subject classification is desirable. However, simplistic word-counting schemes miss many important concepts; librarians must enrich algorithms with background knowledge to escape basic problems such as polysemy and synonymy. I have developed a script that uses Wikipedia as context for analyzing the subjects of nonfiction books. Though a simple method built quickly from freely available parts, it is partially successful, suggesting the promise of such an approach for future research.

4 comments:

Michael Doran said...

Based on the title & abstract, this article looked interesting. If, in six months, when the non-LITA-member embargo expires and I can actually read the article, maybe I'll remember to come back and do that.

Andy said...

Michael,

I don't believe you'll have to wait six months. UTA subscribes to several full-text databases that include ITAL. EBSCOhost already has the March 2011 issue online.

Michael Doran said...

Andy,

Per your suggestion I was able to get this article via our Library's EBSCOhost subscription. I appreciate your help on that.

However, I believe the points I raised on the list are still valid -- what purpose does it serve to put barriers in the way of the people whom LITA (presumably) wants to read the content?

Andromeda Yelton said...

@Michael: ITAL also allows authors to self-archive, so you can see a (post-editing, pre-formatting) version on my web site: http://www.andromedayelton.com/wp/resume/ , in the publications section. Thank you for your interest.