Monday, August 18, 2008

Repurposing "Born MARC" metadata

Just wanted to repeat the question that I posed in my original comment but under its own header:

What opportunities and challenges can we anticipate as we start working with (e.g. manipulating, searching, augmenting, cleaning up, transforming) MARC metadata from an ILS (or from another MARC-based system) outside of that system?

How can we ensure that reusing MARC data is a worthwhile undertaking, and convince those outside the library world of this, if necessary?


Karen Coyle said...

Jennifer, this is a pretty broad question, but I think we have a good shot a re-purposing the data because it is fairly well coded and somewhat consistent. I know that we tend to notice what's wrong with MARC records, but compared to less rigorous metadata we're actually in pretty good shape.

I also think that an algorithmic transform of MARC to something else is not a huge deal. Many people talk about the hundreds of millions of MARC records as a reason why we cannot change, but once we know what output format we want, we can convert millions of records in a short time, measured in hours. The problems are going to be the intellectual ones, not the technical ones: what do we want this new record to be? Where is it incompatible with MARC? How can we mitigate any deep differences between old MARC and new metadata?

Hugh Taylor said...

The first big issue I see is whether the "something else" is going to be able to retain the richness of the MARC data. If it can, then we might still want to ask ourselves whether we need all of that richness, but at least we'll have a choice. If it can't, and there's going to be some loss, we need to make sure we understand the implications before rushing headlong into change.

Of course, once you start applying indexing and searching layers on top of the data you may not be able to take advantage of some of that richness, but at least it's retained in your data and might be made to work harder at some later date. Lose it now, though, and it's gone.

One of the missing pieces in much of this sort of discussion is in knowing how much of our MARC data is actually being exploited - whether by humans or machines. There's some potentially interesting work in this area going on within OCLC's RLG Programs (I need to declare an interest here, being a member of a group working on this strand).

[Discovered that the URL would be too long to post to this blog, so here's the way in]

RLG Programs--Our work--Renovating Descriptive & Organizing Practices--Make Metadata Creation Processes More Effective Program

Jennifer Bowen said...

My own feeling is that I agree with Karen, that we probably have a good shot at being able to re-purpose our MARC data, although the more I delve into MARC's inconsistencies and complexities I sometimes start to question this!

I think that one of the valuable things that the XC Project can contribute to this is that we're creating a platform where we can try different things - as Karen notes, we'll be able to transform all of our MARC data to another schema fairly rapidly, and if we find out that the new schema isn't working, we can make changes easily, re-export the MARC data, and try something else. And I also agree that once we get used to working with data that was "born MARC" after it has been transformed to a different schema, we're probably going to feel less tied to using MARC in the future.

I'm happy to know about the RLG work that Hugh cited - once we have a better idea what MARC data our users actually care about, we can focus our efforts more appropriately instead of trying to deal with EVERY tag in MARC!

Rick said...

the previous commenters have been pretty much on target. but I would like to see more discussion of how marc relates to frbr; to larger entities conceptual entities. even the bib record of marc is fundamentally related to inventory control, and does not scale well to the situations that large catalogs, like worldcat, find themselves in, when there are many many manifestations of the same work. I think users do not understand the importance that is put on the 'work in hand'.
so I would emphasize, a) marc in an xml format so we can stop talking about the 110, and the 245 -- that turns everyone on the web off big time.
b) better use of information about such things as 'previous title' in serials to help people locate items under older/newer names.
just my .02 yoctocents,
Rick Silterra

Owen said...

Just picking up on Hugh's comments. I think that we actually have a chance of making MARC data richer by converting to a new format.

At the moment much of the richness is hidden in obscure coding. Although it might be some effort to write appropriate rules to deal with this when converting the data, as Karen says, because we have a reasonable degree of consistency this effort would actually be small compared to the potential benefit.

Tom.Pasley said...

Hmmm... definitely something to think about!
In NZ there is Index New Zealand, which is produced by our National Library. Index New Zealand articles are in MaRC, and it's a great resource, although I think this is a case where there could be a chance that we/they could make "MARC data richer by converting to a new format." as I don't think that MaRC is a good fit for articles (alhough it does work).