Thursday, March 11, 2010

Dublin Core, DSpace, and a Brief Analysis of Three University Repositories, by Mary Kurtz

This paper provides an overview of Dublin Core (DC) and DSpace together with an examination of the institutional repositories of three public research universities. The universities all use DC and DSpace to create and manage their repositories. I drew a sampling of records from each repository and examined them for metadata quality using the criteria of completeness, accuracy, and consistency. I also examined the quality of records with reference to the methods of educating repository users. One repository used librarians to oversee the archiving process, while the other two employed two different strategies as part of the self-archiving process. The librarian-overseen archive had the most complete and accurate records for DSpace entries.

2 comments:

Alice Platt, Southern New Hampshire University said...

I believe it is a valuable exercise to examine institutional repository records for quality metadata. However, I believe the author overlooks a few factors.

In the "How DSpace Works" section, Ms. Kurtz goes to great lengths to explain how the fields in the submission forms are labeled and which fields are required by DSpace. It should be pointed out that these defaults can be customized by the DSpace technical administrators. Labels can be changed, field requirements can be changed, and the metadata fields on the form can be added or removed. From this article, there is no way to know if the author confirmed with the institutions if they were using the default forms, or if they had been customized in some way.

I was also surprised by the amount of space Ms. Kurtz spends attacking the lack of contributor education. It is not practical to think that a one-time contributor is going to educate themselves about how to create metadata. According to Ms. Kurtz, the whole point of the DSpace submission process is that they don't have to. In the best-case scenario, a non-librarian's submission would be reviewed and improved by a librarian, using the standard approval steps built into the DSpace workflow. Related to this topic, toward the end of the "The repositories" section Ms. Kurtz notes that the OSU Knowledge Bank does not use a controlled vocabulary list for subject headings. I'm not surprised, since DSpace doesn't easily support controlled vocabulary use.

Finally, I was very surprised to see the sample record from the University of New Mexico (Appendix B). It's possible, but we don't really know, that this record might hav ebeen imported from another database, explaining the strange metadata. I do not refute that something terribly wrong happened in this process, but it's difficult to make assumptions without knowing the full story about these records. Since these do not have provenance data in the record, it's difficult to say without interviewing the DSpace administrator.

Overall, I think my biggest disagreement in the article is in the conclusion, "DSpace still relies heavily on contributor-generated data." This is completely false. It is the institution's choice whether to rely heavily on their contributors for data. At my institution, we do not rely on our contributors at all. It is our belief that libraries should not be expected to push the work of cataloging onto our DSpace contributors, just as we would not expect book authors to enter their metadata into our OPAC.

I believe Ms. Kurtz is trying to make the point that allowing contributors to make all of the decisions about metadata input can cause more problems than it's worth. However, it would have been practical to ensure how the records examined came into being before assigning blame to the contribution process, as there are many different ways they might have entered DSpace.

Thomas Dodson said...

A lot of my work in Harvard's Office for Scholarly Communication focuses on our DASH, our Dspace repository; I'm finding Ms. Kurtz's article very helpful as I try to better understand the relationship between DC and the structure of data recorded by DSpace.

I did want to ask a question about one of the conclusions drawn, though. Examining sample records from the OSU Knowledgebank, the author infers that the "relatively low fill rate" of "subject" and "description.abstract" fields "suggests a lack of completeness in that repository's records" (43).

Without some additional qualification, that statement might be misleading.

In my experience with DASH records, at least, articles in the humanities very often do not have abstracts or author- or indexing service-supplied keywords (the only keywords we accept for the dc.subject field). Thus, for those records, the fields are left blank.

This leads me to wonder if there were more humanities articles in the OSU records than in samples from other collections. If so, disciplinary conventions around abstracting and providing keywords may account for the relative incompleteness of these fields (rather than telling us something about the diligence of OSU's DSpace contributors or those responsible for reviewing those submissions).

Just a thought; again, very helpful paper.