Friday, June 17, 2011

Management and Support of Shared Integrated Library Systems, by Jason Vaughan and Kristen Costello

The University of Nevada, Las Vegas (UNLV) University Libraries has hosted and managed a shared integrated library system (ILS) since 1989. The system and the number of partner libraries sharing the system has grown significantly over the past two decades. Spurred by the level of involvement and support contributed by the host institution, the authors administered a comprehensive survey to current Innovative Interfaces libraries. Research findings are combined with a description of UNLV’s local practices to provide substantial insights into shared funding, support, and management activities associated with shared systems.

Benign Neglect: Developing Life Rafts for Digital Content, by Jody L. DeRidder

In his keynote speech at the Archiving 2009 Conference in Arlington, Virginia, Clifford Lynch called for the development of a benign neglect model for digital preservation, one in which as much content as possible is stored in whatever manner available in hopes of there someday being enough resources to more properly preserve it. This is an acknowledgment of current resource limitations relative to the burgeoning quantities of digital content that need to be preserved.

Seeing the Wood for the Trees: Enhancing Metadata Subject Elements with Weights, by Hong Zhang, Linda C. Smith, Michael Twidale, and Fang Huang Gao

Subject indexing has been conducted in a dichotomous way in terms of what the information object is primarily about/of or not, corresponding to the presence or absence of a particular subject term, respectively. With more subject terms brought into information systems via social tagging, manual cataloging, or automated indexing, many more partially relevant results can be retrieved. Using examples from digital image collections and online library catalog systems, we explore the problem and advocate for adding a weighting mechanism to subject indexing and tagging to make web search and navigation more effective and efficient. We argue that the weighting of subject terms is more important than ever in today’s world of growing collections, more federated searching, and expansion of social tagging. Such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality and metadata schemas.

Building an Open Source Institutional Repository at a Small Law School Library:,Is it Realistic or Unattainable? by Fang Wang

Digital preservation activities among law libraries have largely been limited by a lack of funding, staffing and expertise. Most law school libraries that have already implemented an Institutional Repository (IR) chose proprietary platforms because they are easy to set up, customize, and maintain with the technical and development support they provide. The Texas Tech University School of Law Digital Repository is one of the few law school repositories in the nation that is built on the DSpace open source platform.1 The repository is the law school’s first institutional repository in history. It was designed to collect, preserve, share and promote the law school’s digital materials, including research and scholarship of the law faculty and students, institutional history, and law-related resources. In addition, the repository also serves as a dark archive to house internal records.

Saturday, April 2, 2011

New preprint

A new preprint.
  • "Click Analytics: Visualizing Web Site Use Data" by Tabatha A. Farney

Tuesday, March 29, 2011

New preprints

Two new preprints were recently added.
  • "Adoption of E-Book Readers among College Students: A Survey" by Nancy M. Foasberg
  • "Editorial and technological workflow tools to promote website quality" by Emily G. Morton-Owens
    (Originally presented at the 2010 LITA National Forum)

Wednesday, March 2, 2011

New preprints

Two new preprints have been added to the ITAL Web site today.
http://www.lita.org/ala/mgrps/divs/lita/ital/prepub/index.cfm

"Investigations into Library Web Scale Discovery Services" by Jason Vaughan

 "Graphs in Libraries: A Primer" by James E. Powell, Daniel Alcazar, Matthew Hopkins, Robert Olendorf, Tamara M. McMahon, Amber Wu, Linn Collins

A Simple Scheme for Book Classification Using Wikipedia, by Andromeda Yelton

Editor’s note: This article is the winner of the LITA/Ex Libris Student Writing Award, 2010.

Because the rate at which documents are being generated outstrips librarians’ ability to catalog them, an accurate, automated scheme of subject classification is desirable. However, simplistic word-counting schemes miss many important concepts; librarians must enrich algorithms with background knowledge to escape basic problems such as polysemy and synonymy. I have developed a script that uses Wikipedia as context for analyzing the subjects of nonfiction books. Though a simple method built quickly from freely available parts, it is partially successful, suggesting the promise of such an approach for future research.

The Internet Public Library (IPL): An Exploratory Case Study on User Perceptions, by Monica Maceli, Susan Wiedenbeck, and Eileen Abels

The Internet Public Library (IPL), now known as ipl2, was created in 1995 with the mission of serving the public by providing librarian-recommended Internet resources and reference help. We present an exploratory case study on public perceptions of an “Internet public library,” based on qualitative analysis of interviews with ten college student participants: some current users and others unfamiliar with the IPL. The exploratory interviews revealed some confusion around the IPL’s name and the types of resources and services that would be offered. Participants made many positive comments about the IPL’s resource quality, credibility, and personal help.

Semantic Web for Reliable Citation Analysis in Scholarly Publishing, by Ruben Tous, Manel Guerrero, and Jaime Delgado

Analysis of the impact of scholarly artifacts is constrained by current unreliable practices in cross-referencing, citation discovering, and citation indexing and analysis, which have not kept pace with the technological advances that are occurring in several areas like knowledge management and security. Because citation analysis has become the primary component in scholarly impact factor calculation, and considering the relevance of this metric within both the scholarly publishing value chain and (especially important) the professional curriculum evaluation of scholarly professionals, we defend that current practices need to be revised. This paper describes a reference architecture that aims to provide openness and reliability to the citation-tracking lifecycle. The solution relies on the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow in such a manner that authors, publishers, repositories, and citation-analysis systems will have access to independent reliable evidences that are resistant to forgery, impersonation, and repudiation. As far as we know, this is the first paper to combine Semantic Web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing.

Web Accessibility, Libraries, and the Law, by Camilla Fulton

With an abundance of library resources being served on the web, researchers are finding that disabled people oftentimes do not have the same level of access to materials as their nondisabled peers. This paper discusses web accessibility in the context of United States’ federal laws most referenced in web accessibility lawsuits. Additionally, it reveals which states have statutes that mirror federal web accessibility guidelines and to what extent. Interestingly, fewer than half of the states have adopted statutes addressing web accessibility, and fewer than half of these reference Section 508 of the Rehabilitation Act or Web Content Accessibility Guidelines (WCAG) 1.0. Regardless of sparse legislation surrounding web accessibility, librarians should consult the appropriate web accessibility resources to ensure that their specialized content reaches all.

Usability of the VuFind Next-Generation Online Catalog, by Jennifer Emanuel

The VuFind open–source, next-generation catalog system was implemented by the Consortium of Academic and Research Libraries in Illinois as an alternative to the WebVoyage OPAC system. The University of Illinois at Urbana-Champaign began offering VuFind alongside WebVoyage in 2009 as an experiment in next-generation catalogs. Using a faceted search discovery interface, it offered numerous improvements to the UIUC catalog and focused on limiting results after searching rather than limiting searches up front. Library users have praised VuFind for its Web 2.0 feel and features. However, there are issues, particularly with catalog data.

Monday, January 31, 2011

Generating Collaborative Systems for Digital Libraries: a Model-Driven Approach, by Alessio Malizia, Paolo Bottoni, and S. Levialdi

The design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. To this end, high-level, language-neutral models have to be devised. Metamodeling techniques favor the definition of domain-specific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. This paper describes CRADLE (Cooperative-Relational Approach to Digital Library Environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. A collection of tools allows the automatic generation of several services, defined with the CRADLE visual language, and of the graphical user interfaces providing access to them for the final user. The effectiveness of the approach is illustrated by presenting digital libraries generated with CRADLE, while the CRADLE environment has been evaluated by using the cognitive dimensions framework.

The Middle Mile: The Role of the Public Library in Ensuring Access to Broadband, by Marijke Visser and Mary Alice Ball

This paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. It examines the culture of information in 2010, and then asks what it means if individuals are online or not. The paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world.

An Evolutive Process to Convert Glossaries into Ontologies, by José R. Hilera, Carmen Pagés, J. Javier Martínez, J. Antonio Gutiérrez, and Luis de-Marcos

This paper describes a method to generate ontologies from glossaries of terms. The proposed method presupposes an evolutionary life cycle based on successive transformations of the original glossary that lead to products of intermediate knowledge representation (dictionary, taxonomy, and thesaurus). These products are characterized by an increase in semantic expressiveness in comparison to the product obtained in the previous transformation, with the ontology as the end product. Although this method has been applied to produce an ontology from the “IEEE Standard Glossary of Software Engineering Terminology,” it could be applied to any glossary of any knowledge domain to generate an ontology that may be used to index or search for information resources and documents stored in libraries or on the Semantic Web.

Bridging the Gap: Self-Directed Staff Technology Training, by Kayla L. Quinney, Sara D. Smith, and Quinn Galbraith

Undergraduates, as members of the Millennial Generation, are proficient in Web 2.0 technology and expect to apply these technologies to their coursework—including scholarly research. To remain relevant, academic libraries need to provide the technology that student patrons expect, and academic librarians need to learn and use these technologies themselves. Because leaders at the Harold B. Lee Library of Brigham Young University (HBLL) perceived a gap in technology use between students and their staff and faculty, they developed and implemented the Technology Challenge, a self-directed technology training program that rewarded employees for exploring technology daily. The purpose of this paper is to examine the Technology Challenge through an analysis of results of surveys given to participants before and after the Technology Challenge was implemented. The program will also be evaluated in terms of the adult learning theories of andragogy and self-directed learning. HBLL found that a self-directed approach fosters technology skills that librarians need to best serve students. In addition, it promotes lifelong learning habits to keep abreast of emerging technologies. This paper offers some insights and methods that could be applied in other libraries, the most valuable of which is the use of self-directed and andragogical training methods to help academic libraries better integrate modern technologies.

Next-Generation Library Catalogs and the Problem of Slow Response Time, by Margaret Brown-Sica, Jeffrey Beall, and Nina McHale

Response time as defined for this study is the time that it takes for all files that constitute a single webpage to travel across the Internet from a Web server to the end user’s browser. In this study, the authors tested response times on queries for identical items in five different library catalogs, one of them a next-generation (NextGen) catalog. The authors also discuss acceptable response time and how it may affect the discovery process. They suggest that librarians and vendors should develop standards for acceptable response time and use it in the product selection and development processes.

Sunday, December 5, 2010

ITAL reader survey results...

Hi all,

I promised in my December 2010 ITAL editor's column to post the full results of the 2009/2010 reader survey. Unfortunately, it seems I can't do this here, so please find the results on the ITAL website, here.

Thanks,

- Marc Truitt, Editor, Information Technology and Libraries

Wednesday, September 1, 2010

Batch Loading Collections into DSpace: Using Perl Scripts for Automation and Quality Control, by Maureen P. Walsh [Appendixes A-E]

Due to space considerations, Appendixes A-D were not included with the published article (http://www.ala.org/ala/mgrps/divs/lita/ital/292010/2903sep/walsh_pdf.cfm).
Appendixes A-D are included below along with Appendix E.



Appendix A. OJS Batch Loading Scripts

-- mkcol.sh --

#!/bin/sh
# Create a Collection given a name and a collection handle.
# Gets information from DSpace web pages and returns data via GET parameters to the DSpace
# Collection Wizard.

NAME="$1"
COLLECTION_HANDLE="$2"

URL="https://kb.osu.edu/dspace"
NAME_PAT=">$NAME</option>"

# Login to DSpace and create the cookie.txt file.
curl -k -L -s $URL/password-login -d "login_email=[name removed]@osu.edu" -d "login_password=XXXXX" -c cookie.txt > /dev/null

# Cut the community_id out of the web page.
COMMUNITY_ID=`curl -k -L -s -b cookie.txt \
   $URL/handle/1811/$COLLECTION_HANDLE \
   | grep -m1 name=\"community_id\" \
   | cut -d\" -f6`

# Cut the collection_id out of the web page.
COLLECTION_ID=`curl -k -L -s -b cookie.txt \
   $URL/tools/collection-wizard \
   -d "community_id=$COMMUNITY_ID" \
   | grep -m1 name=\"collection_id\" \
   | cut -d\" -f6`

# Begin building the collection.
curl -k -L -s -b cookie.txt \
   $URL/tools/collection-wizard \
   -d "public_read=true" \
   -d "workflow1=true" \
   -d "workflow2=" \
   -d "workflow3=" \
   -d "collection_id=$COLLECTION_ID" \
   -d "default-item=" \
   -d "stage=1" \
   -d "admins=" > /dev/null

# Finish making the collection.
curl -k -L -s -b cookie.txt \
   $URL/tools/collection-wizard \
   -F "name=$NAME" \
   -F "short_description=" \
   -F "introductory_text=" \
   -F "copyright_text=" \
   -F "side_bar_text=" \
   -F "provenance_description=" \
   -F "license=" \
   -F "file=" \
   -F "collection_id=$COLLECTION_ID" \
   -F "stage=2" \
   -F "permission=12"  > /dev/null

# Get and return the handle_id.
HANDLE_ID=`curl -k -L -s -b cookie.txt \
   $URL/handle/1811/$COLLECTION_HANDLE \
   | grep -m1 "$NAME_PAT" \
   | cut -d\" -f2`
echo $HANDLE_ID

-------------------------------------------------------------------------------------------------------------------------------

-- mkallcol.pl --

#!/usr/bin/perl

# Routine to clean up individual fields.
sub trim($)
{
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
}

# Read the file of issue names into an array.
open(fh,"issues-prod.remainder");
@lines=<fh>;
close(fh);

$linenum = 0;
%lt=();

$COMMUNITY = "686";

# For each issue get the parameters from the array and call the script to create the collection.
while ($linenum <= $#lines) {
        @fields = split(/\t/, $lines[$linenum]);
        $issue = $fields[1];
        chop($issue);
        system("echo -n $fields[0] ");
        print " ";
        system("./mkcol.sh $issue $COMMUNITY");
        $linenum++;
}

-- Sample of the file of issue names --

V074N2  "Ohio Journal of Science: Volume  74, Issue 2 (March, 1974)"
V074N3  "Ohio Journal of Science: Volume  74, Issue 3 (May, 1974)"
V074N4  "Ohio Journal of Science: Volume  74, Issue 4 (July, 1974)"
V074N5  "Ohio Journal of Science: Volume  74, Issue 5 (September, 1974)"



-------------------------------------------------------------------------------------------------------------------------------

-- metadata.pl --

#!/usr/bin/perl

use Encode;     # Routines for UTF encoding.

# Routine to clean up individual fields of metadata.
sub trim($)
{
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
}

# Read the metadata into an array.
open(fh,"<:encoding(UTF-16)", "OJSPhase2-1.txt");
@lines=<fh>;
close(fh);

# Process each line of metadata, consolidating lines for the same item.
$linenum = 0;
%lt=();
while ($linenum <= $#lines) {
        @fields = split(/\t/, $lines[$linenum]);
        if ($fields[0] =~ /^((v|V)[0-9]+(n|N)[0-9A-Za-z]+)/) {
                $lt{uc($1)} = [@{$lt{uc($1)}}, $linenum];
        }
        $linenum++;
}

# Build the load set for each item.
for $key (sort(keys(%lt))) {
        # Put each load set in its own subdirectory.
        print "mkdir ./src/$key\n";
        system "mkdir ./src/$key";
        # Process the lines for this load set.
        for $i (0 .. $#{$lt{$key}}) {
                $dir = sprintf("item_%03d", $i);
                print "mkdir ./src/$key/$dir\n";
                system "mkdir ./src/$key/$dir";
                # Create the XML for the metadata.
                open(fh,">:encoding(UTF-8)", "./src/$key/$dir/dublin_core.xml");
                print fh '<dublin_core>'."\n";
                @fields = split(/\t/, $lines[$lt{$key}[$i]]);
                $fields[1] =~ s/"//g;
                $fields[5] =~ s/"//g;
                if (length($fields[9])>0) {
                    print fh '<dcvalue element="identifier" qualifier="citation">'
                           . "$fields[1]. v$fields[3], n$fields[4] ($fields[5]), $fields[8]-$fields[9]</dcvalue>\n";
                } else {
                    print fh '<dcvalue element="identifier" qualifier="citation">'
                         ."$fields[1]. v$fields[3], n$fields[4] ($fields[5]), $fields[8]</dcvalue>\n";
                }
                if (length($fields[10]) > 0) {
                        $fields[10] =~ s/["]{1}([^"])/$1/g;
                        $fields[10] =~ s/("|"")$//g;
                        print fh '<dcvalue element="title" qualifier="">'.$fields[10]."</dcvalue>\n";
                }
                print fh '<dcvalue element="identifier" qualifier="issn">'.$fields[2]."</dcvalue>\n";
                print fh '<dcvalue element="date" qualifier="issued">'.$fields[6]."-".$fields[7]."</dcvalue>\n";
                # Process multiple authors.
                if (length($fields[11]) > 0) {
                        $fields[11] =~ s/"//g;
                        @authors = split(/;/,$fields[11]);
                        foreach $author (@authors) {
                                $author =~ s/^\s+//;
                                if (length($author) > 0) {
                                        print fh '<dcvalue element="creator" qualifier="">'.$author.'</dcvalue>'."\n";
                                }
                    }
                }
                if (length($fields[12]) > 0) {
                        $fields[12] =~ s/"//g;
                        print fh '<dcvalue element="description" qualifier="">Author Institution: '.$fields[12]."</dcvalue>\n";
                }
                if (length($fields[13]) > 0) {
                        $fields[13] =~ s/"//g;
                        print fh '<dcvalue element="description" qualifier="abstract">'.$fields[13]."</dcvalue>\n";
                }
                print fh "</dublin_core>\n";
                close(fh); # Finished creating the XML file.

                # Create the contents file.
                open(fh, ">./src/$key/$dir/contents");
                $fields[0] = trim($fields[0]);
                print fh "$fields[0].pdf\n";
                close(fh);

                # Move the data files into the load set.
                print "cp pdfs/$fields[0] ./src/$key/$dir\n";
                system "cp pdfs/$fields[0].pdf ./src/$key/$dir";
        }
}

-------------------------------------------------------------------------------------------------------------------------------

-- loaditems.pl --

#!/usr/bin/perl

#Load the list of issues into an array.
open(fh,"loaditems");
@lines=<fh>;
close(fh);

# Process each issue.
$linenum = 0;
while ($linenum <= $#lines) {
        @fields = split(/ /, $lines[$linenum]);
        chop($fields[1]);
        # Add the issue to DSpace.
        system("./import.sh $fields[1] $fields[0]");
        $linenum++;
}

-- Sample of the load items file --

V074N2 1811/22016
V074N3 1811/22017
V074N4 1811/22018
V074N5 1811/22019

-------------------------------------------------------------------------------------------------------------------------------

-- import.sh --

#!/bin/sh

# import.sh collection_id dir
# Import a collection from files generated on dspace
# Requires the directory of the destination collection and the collection id.

COLLECTION_ID=$1
EPERSON=[name removed]@osu.edu
SOURCE_DIR=./src/$2
MAP_DIR=./prod-map/
BASE_ID=`basename $COLLECTION_ID`
MAPFILE=./$MAP_DIR/map.$2
/dspace/bin/dsrun org.dspace.app.itemimport.ItemImport --add --eperson=$EPERSON
--collection=$COLLECTION_ID --source=$SOURCE_DIR --mapfile=$MAPFILE

-------------------------------------------------------------------------------------------------------------------------------

-- intro.pl --

#!/usr/bin/perl

# Routine to clean up individual fields.
sub trim($)
{
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
}

# Read the metadata into an array.
open(fh,"<:encoding(UTF-16)", "OJSPhase2-1.txt")
    or die "Can't open metadata file: $!";
@lines=<fh>;
close(fh);

# Process each line of metadata, consolidating lines for the same item.
$linenum = 0;
%lt=();
while ($linenum <= $#lines) {
        @fields = split(/\t/, $lines[$linenum]);
        if ($fields[0] =~ /^((v|V)[0-9]+(n|N)[0-9A-Za-z]+)/) {
                $lt{uc($1)} = [@{$lt{uc($1)}}, $linenum];
        }
        $linenum++;
}

# Assemble each intro.
for $key (sort(keys(%lt))) {
        open(fh,"./prod-map/map.$key") or next;
        @fids=<fh>;
        close(fh);
        @fids = sort(@fids);

        print "Generating intro for $key ...\n";
        open(fh,">:encoding(UTF-8)", "./src/$key/intro");

        # Create the HTML for each article.
        for ($i = 0; $i <= $#{$lt{$key}}; $i++) {
                @fields = split(/\t/, $lines[$lt{$key}[$i]]);
                if (length($fields[10]) > 0) {
                        $fields[10] =~ s/["]{1}([^"])/$1/g;
                        $fields[10] =~ s/("|"")$//g;
                        print fh "<strong>$fields[10]</strong><br>\n";
                }
                # Create the list of authors.
                $authcnt = 0;
                if (length($fields[11]) > 0) {
                        $fields[11] =~ s/"//g;
                        @authors = split(/;/,$fields[11]);
                        foreach $author (@authors) {
                                $author =~ s/^\s+//;
                                if ($authcnt > 0) {
                                        print fh "; $author";
                                } else {
                                        print fh $author;
                                }
                                $authcnt++;
                        }
                }
                # Add page numbers.
                if (length($fields[8]) > 0) {
                        print fh " pp. $fields[8]";
                }
                if (length($fields[9]) > 0) {
                        print fh "-$fields[9]";
                }
                print fh "<br>\n";
                # Create links for each article.
                @item_hid = split(/\s/,$fids[$i]);
                $itemno = $item_hid[0];
                $itemhid = $item_hid[1];
                $fields[0] = trim($fields[0]);
                $filename = "./src/$key/$itemno/".$fields[0].".pdf";
                @st = stat($filename) or die "No $filename: $!";
                $size = int($st[7]/1024);
                $url_1 = "/dspace/handle/$itemhid";
                $url_2 = "/dspace/bitstream/$itemhid/1/$fields[0]";
                print fh '<a href="'.$url_1.'">Article description</a> | <a href="'.$url_2.'">Article Full Text PDF ('.$size.'KB)</a><br><br>';
                print fh "\n";
        }
        close(fh);
}

-------------------------------------------------------------------------------------------------------------------------------

-- installintro.sh --

#!/bin/sh
# Install an intro given a dir and a community id.
DIR="$1"
HANDLE="$2"
URL="https://kb.osu.edu/dspace"
# Login to DSpace
curl -k -L -s $URL/password-login -d "login_email=[name removed]@osu.edu" -d "login_password=password" -c cookie.txt > /dev/null

# Cut the community_id out of the web page.
COMMUNITY_ID=`curl -k -L -s -b cookie.txt \
    $URL/handle/$HANDLE \
    | grep -m1 name=\"community_id\" \
    | cut -d\" -f6`

# Cut the collection_id out of the web page.
COLLECTION_ID=`curl -k -L -s -b cookie.txt \
    $URL/handle/$HANDLE \
    | grep -m1 name=\"collection_id\" \
    | cut -d\" -f6`

# Cut the title out of the web page.
TITLE=`curl -k -L -s -b cookie.txt \
    $URL/tools/edit-communities \
    -d "community_id=$COMMUNITY_ID" \
    -d "collection_id=$COLLECTION_ID" \
    -d "action=4" \
    | grep -m1 name=\"name\" \
    | cut -d\" -f6`

# Put the introductory text in DSpace.
curl -k -L -s -b cookie.txt \
    $URL/tools/edit-communities \
    -d "name=$TITLE" \
    -d "short_description=" \
    -d "introductory_text=`cat ./src/$DIR/intro`" \
    -d "copyright_text=" \
    -d "side_bar_text=" \
    -d "license=" \
    -d "provenance_description=" \
    -d "community_id=$COMMUNITY_ID" \
    -d "collection_id=$COLLECTION_ID" \
    -d "create=false" \
    -d "action=9" \
    -d "submit=Update" > /dev/null

-------------------------------------------------------------------------------------------------------------------------------

-- ldallintro.pl --

#!/usr/bin/perl

# Load file of issues into an array.
open(fh,"loaditems");
@lines=<fh>;
close(fh);

$linenum = 0;
%lt=();

# Process each intro.
while ($linenum <= $#lines) {
        @fields = split(/\t/, $lines[$linenum]);
        print("$lines[$linenum]");
        system("./installintro.sh $lines[$linenum] ");
        $linenum++;
}








Appendix B. MSS Phase Two Scripts

-- mkxml2.pl --

#!/usr/bin/perl

# Load routines for UTF-16 and UTF-8
use Encode;

# Routine to clean up metadata fields
sub trim($)
{
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    $string =~ s/^"//;
    $string =~ s/"$//;
    return $string;
}

# Load metadata into an array.
open(fh,"<:encoding(UTF-16)", "MSA-phase-2-v3.txt");
@lines=<fh>;
close(fh);

$linenum = 0;
%lt=();

# Split tab separated metadata fields
while ($linenum <= $#lines) {
            @fields = split(/\t/, $lines[$linenum]);
            if ($fields[4] =~ /^([0-9]{4}-[^0-9]+[0-9]+)/) {
                        $lt{$1} = [@{$lt{$1}}, $linenum];
            }
            $linenum++;
}

$cnt1 = 0; $cnt2 = 0; $cnt3 = 0; $cnt4 = 0; $cnt5 = 0; $cnt6 = 0;

# Process metadata line by line
for $key (sort(keys(%lt))) {
        $year =  substr($key, 0, 4);

            # Generate possible image file names.
            $keyzero =  substr($key,0,-1). "0" . substr($key, -1, 1);
            $keyuc =  uc($key);
            $keyuczero =  uc($keyzero);

            # Compensate for inconsistent naming of images in metadata.
        if (-e "../images/$year/$key.jpg") {
                $filename = $key;
        } elsif (-e "../images/$year/$keyzero.jpg") {
                $filename = $keyzero;
        } elsif (-e "../images/$year/$keyuc.jpg") {
                $filename = $keyuc;
        } elsif (-e "../images/$year/$keyuczero.jpg") {
                $filename = $keyuczero;
        } else {
                        $filename = "";
                print " NO FILE FOUND images/$year/$key.jpg\n";
            }

            # Divide output into separate load sets based on year.
        if (($year >= "1946") && ($year <= "1959")) {
            $dir = sprintf("1/item_%04d", $cnt1++);
        }
        if (($year >= "1960") && ($year <= "1969")) {
            $dir = sprintf("2/item_%04d", $cnt2++);
        }
        if (($year >= "1970") && ($year <= "1979")) {
            $dir = sprintf("3/item_%04d", $cnt3++);
        }
        if (($year >= "1980") && ($year <= "1989")) {
            $dir = sprintf("4/item_%04d", $cnt4++);
        }
        if (($year >= "1990") && ($year <= "1999")) {
            $dir = sprintf("5/item_%04d", $cnt5++);
        }
        if (($year >= "2000") && ($year <= "2100")) {
            $dir = sprintf("6/item_%04d", $cnt6++);
        }

            # Make a directory for the item.
            print "mkdir $dir\n";
            system "mkdir $dir";

            # Create XML file from metadata
            open(fh,">:encoding(UTF-8)", "$dir/dublin_core.xml");
            print fh '<dublin_core>'."\n";
            print fh '<dcvalue element="identifier" qualifier="none">'
                .$key.'</dcvalue>'."\n";
            print fh '<dcvalue element="type" qualifier="none">Article</dcvalue>'."\n";
            print fh '<dcvalue element="language" qualifier="iso">en</dcvalue>'."\n";
            $affiliation = '';
            $affiliation1 = '';
            $affiliation2 = '';

            # Metadata for items with multiple authors, each
            # with individual affiliations, span multiple lines.
            # Collect them and produce XML for them.
            for $i (0 .. $#{$lt{$key}}) {
                        @fields = split(/\t/, $lines[$lt{$key}[$i]]);
                        $title = trim($fields[9]);
                        if (length($title) > 0) {
                                    $title =~ s/["]{1}([^"])/$1/g;
                                    $title =~ s/("|"")$//g;
                                    print fh '<dcvalue element="title" qualifier="none">'
                                        .$title.'</dcvalue>'."\n";
                        }
                $year1 = trim($fields[1]);
                        if (length($year1) > 0) {
                                    print fh '<dcvalue element="date" qualifier="issued">'
                                        ."$year</dcvalue>\n";
                        }
                $author = trim($fields[5]);
                        if (length($author) > 0) {
                                    $author =~ s/(\$|\^|\{|\}|\*)//g;
                                    print fh '<dcvalue element="creator" qualifier="none">'
                                        .$author.'</dcvalue>'."\n";
                }
                        $abstract = trim($fields[10]);
                        if (length($abstract) > 0) {
                                    print fh '<dcvalue element="description" qualifier="abstract">'
                                        .$abstract.'</dcvalue>'."\n";
                }
                        if (length(trim($fields[6])) > 0) {
                                    $affiliation1 = trim($fields[6]);
                        }
                        if (length(trim($fields[7])) > 0) {
                                    $affiliation2 = trim($fields[7]);
                        }
                        if ((length(trim($fields[6])) > 0)
                            || (length(trim($fields[7])) > 0)) {
                                    if ((length(trim($fields[6])) == 0)
                                        && (length($affiliation1) == 0)) {
                                                $append = $affiliation2;
                                    } elsif ((length(trim($fields[7])) == 0)
                                        && (length($affiliation2) == 0)) {
                                                $append = $affiliation1;
                                    } else {
                                                $append = $affiliation1.", "
                                                    .$affiliation2;
                                    }
                                    if (length($affiliation) > 0) {
                                                $affiliation = $affiliation.
                                                    "; ".$append;
                                    } else {
                                                $affiliation = $append;
                                    }
                        }
                        $note = trim($fields[11]);
                        if (length($note) > 0) {
                                    print fh '<dcvalue element="description" qualifier="none">'
                                        .$note.'</dcvalue>'."\n";
                }
            } # Done processing multiple authors.

            # Finish producing the XML for this item.
            print fh '<dcvalue element="description" qualifier="none">Author Institution: '
                .$affiliation.'</dcvalue>'."\n";
            print fh '</dublin_core>'."\n";
            close(fh);

            # Create the 'contents' file.
            open(fh, ">$dir/contents");

        if ($filename != "") {
                        print fh "$filename.jpg";
            $cmd = "cp \"../images/$year/$filename.jpg\" $dir";
                        print $cmd."\n";
                        system $cmd;
            }
        close(fh);
} # Finished processing this item.

-------------------------------------------------------------------------------------------------------------------------------

-- import_collections.sh --

#!/bin/sh
#
# Import a collection from files generated on dspace
COLLECTION_ID=1811/6634
EPERSON="[name removed]@osu.edu"
SOURCE_DIR=./5
BASE_ID=`basename $COLLECTION_ID`
MAPFILE=./map.$BASE_ID

/dspace/bin/dsrun org.dspace.app.itemimport.ItemImport --add --eperson=$EPERSON --collection=$COLLECTION_ID --source=$SOURCE_DIR --mapfile=$MAPFILE








Appendix C. Example dublin_core.xml for MSS 2009

<dublin_core>
    <dcvalue element="identifier" qualifier="none">2009-MJ-10</dcvalue>
    <dcvalue element="title" qualifier="none">VIBRATIONAL OVERTONE SPECTRA OF $C_2H_6$ AND $C_2H_4$ IN CRYOGENIC LIQUIDS</dcvalue>
    <dcvalue element="date" qualifier="issued">2009</dcvalue>
    <dcvalue element="description" qualifier="abstract">Vibrational overtone spectra of $C_2H_6$ and $C_2H_4$ in cryogenic solutions were recorded between 5000 and 14000 cm$^{-1}$. Spectral regions for the first four overtones were measured using a Fourier transform spectrophotometer. The fifth overtone $(\Delta\nu=6)$ spectra between 15,000 and 16,000 cm$^{-1}$ were recorded with a double beam (pump-probe) thermal lens technique using concentrations as low as 10$^{-3}$ mole fraction. The peak frequency shift $(\Delta\omega)$ from gas phase to solution is explained by the change in harmonic frequency and anharmonicity in solution with respect to the gas phase values.  The bandwidth $(\Delta\omega_{1/2})$ of the $(\Delta\nu= 6)$ C-H absorption bands in solution can be explained in terms of collisions with the solvent molecules.</dcvalue>
    <dcvalue element="description" qualifier="none">Author Institution: Department of Chemistry and Biochemistry, Baylor University, Waco, Texas, 76798</dcvalue>
    <dcvalue element="type" qualifier="none">Article</dcvalue>
    <dcvalue element="language" qualifier="iso">en</dcvalue>
    <dcvalue element="creator" qualifier="none">Diez-y-Riega, Maria H.</dcvalue>
    <dcvalue element="creator" qualifier="none">Manzanares, Carlos E.</dcvalue>
</dublin_core>








Appendix D. Section of MSS Author Quality Control Script

-- flipper.pl --

#!/usr/bin/perl

#### Sections omitted ####

#### Begin author correction block ####

    $creatorxml = "";
    if (length($creators) > 0) {
             # Creator name are contaminated with comments.
             # Remove the comments.
         $creators =~ s/"//g;
             $creators =~ s/\\thanks\{.+\}//;
             $creators =~ s/\\thanks \{.+\}//;
             $creators =~ s/\\footnote\{.+\}//;
       # Multiple creators are separated by ';' or AND in the metadata.
       @creatorlist = split(/;| and | AND /,$creators);
             # Process each creator.
         foreach $creator (@creatorlist) {
                 # Remove per name comments and punctuation.
           $creator =~ s/^\s+//;
                 $creator =~ s/FULL NAME OF AUTHOR FROM OTHER LOCATION//;
                 $creator =~ s/\\underline \{(.+)\}/$1/;
                 $creator =~ s/\\address\{//;
                 $creator =~ s/\\//g;
                 $creator =~ s/\{//g;
                 $creator =~ s/\}//g;
                 $creator =~ s/\^//g;
                 $creator =~ s/\'//g;
                 $creator =~ s/\%//g;
                 $creator =~ s/^AND$|^and$//;
           if (length($creator) > 0) {
                         $creator =~ s/\.(\w)/. $1/g;
                         # Split the name apart on spaces.
                 @nameparts = split(/ /,$creator);
                         # Process each part of the name.
                         for($i = 0;$i <= $#nameparts; $i++) {
                             # Adjust case.
                             @nameparts[$i] = lc(@nameparts[$i]);
                             @nameparts[$i] = ucfirst(@nameparts[$i]);
                             $c = rindex(@nameparts[$i],"-");
                             # Uppercase hyphenated names.
                             if ($c != -1) {
                                     $r = uc(substr(@nameparts[$i],$c+1,1));
                                     substr(@nameparts[$i],$c+1,1,$r);
                             }
                         }
                         $lname = pop(@nameparts);
                         $nl = @nameparts[-1];
                         # Handle name prefixes.
                         if ($nl eq "Von"
                             || $nl eq "Vander"
                             || $nl eq "Le"
                             || $nl eq "De"
                             || $nl eq "de") {
                             $lname = pop(@nameparts)." ".$lname;
                         }
                         # Handle special case name parts
                         if ($nl eq "Der" ) {
                             $nl2 = @nameparts[-2];
                             $lname = pop(@nameparts)." ".$lname;
                             if ($nl2 eq "Van" ) {
                                     $lname = pop(@nameparts)." ".$lname;
                             }
                         }

                         # assemble the name and make the XML.
                         $name = $lname .", ".join(" ",@nameparts);
                 $creatorxml .= '<dcvalue element="creator" qualifier="">'
                     .$name.'</dcvalue>'."\n    ";
             }
         }
    } # Done processing creators of this item.

  
#### End author correction block ####
#### Sections omitted ####








Appendix E. MSS 2009 Batch Loading Scripts

-- mkxml2009.pl --

#!/usr/bin/perl

use Encode;                 # Routines for UTF encoding
use Text::xSV;              # Routines to process CSV files.
use File::Basename;

# Open and read the comma separated metadata file.
my $csv = new Text::xSV;
#$csv->set_sep('           '); # Use for tab separated files.
$csv->open_file("MSS2009.csv");
$csv->read_header();     # Process the CSV column headers.

# Constants for file and directory names.
$basedir = "/common/batch/input/mss/";
$indir = "$basedir/2009";
$xmldir= "./2009xml";
$imagesubdir= "processed_images";
$filename = "dublin_core.xml";

# Process each line of metadata, one line per item.
$linenum = 1;
while ($csv->get_row()) {
    # This divides the item's metadata into fields, each in its own variable.
    my (
            $identifier,
            $title,
            $creators,
            $description_abstract,
            $issuedate,
            $description,
            $description2,
            $abstract,
            $gif,
            $ppt,
    ) = $csv->extract(
            "Talk_id",
            "Title",
            "Creators",
            "Abstract",
            "IssueDate",
            "Description",
            "AuthorInstitution",
            "Image_file_name",
            "Talk_gifs_file",
            "Talk_ppt_file"
    );

    $creatorxml = "";
    # Multiple creators are separated by ';' in the metadata.
    if (length($creators) > 0) {
            # Create XML for each creator.
        @creatorlist = split(/;/,$creators);
        foreach $creator (@creatorlist) {
            if (length($creator) > 0) {
                $creatorxml .= '<dcvalue element="creator" qualifier="none">'
                .$creator.'</dcvalue>'."\n    ";
             }
         }
    } # Done processing creators for this item.

    # Create the XML string for the Abstract.
    $abstractxml = "";
    if (length($description_abstract) > 0) {
            # Convert special metadata characters for use in xml/html.
        $description_abstract =~ s/\&/&amp;/g;
        $description_abstract =~ s/\>/&gt;/g;
        $description_abstract =~ s/\</&lt;/g;
            # Build the Abstract in XML.
        $abstractxml = '<dcvalue element="description" qualifier="abstract">'
            .$description_abstract.'</dcvalue>';
    }

    # Create the XML string for the Description.
    $descriptionxml = "";
    if (length($description) > 0) {
            # Convert special metadata characters for use in xml/html.
        $description=~ s/\&/&amp;/g;
        $description=~ s/\>/&gt;/g;
        $description=~ s/\</&lt;/g;
            # Build the Description in XML.
        $descriptionxml = '<dcvalue element="description" qualifier="none">'
            .$description.'</dcvalue>';
    }

    # Create the XML string for the Author Institution.
    $description2xml = "";
    if (length($description2) > 0) {
            # Convert special metadata characters for use in xml/html.
        $description2=~ s/\&/&amp;/g;
        $description2=~ s/\>/&gt;/g;
        $description2=~ s/\</&lt;/g;
            # Build the Author Institution XML.
        $description2xml = '<dcvalue element="description" qualifier="none">'
            .'Author Institution: ' .$description2.'</dcvalue>';
    }

    # Convert special characters in title.
    $title=~ s/\&/&amp;/g;
    $title=~ s/\>/&gt;/g;
    $title=~ s/\</&lt;/g;

    # Create XML File
    $subdir = $xmldir."/".$linenum;
    system "mkdir $basedir/$subdir";
    open(fh,">:encoding(UTF-8)", "$basedir/$subdir/$filename");
    print fh <<"XML";
<dublin_core>
    <dcvalue element="identifier" qualifier="none">$identifier</dcvalue>
    <dcvalue element="title" qualifier="none">$title</dcvalue>
    <dcvalue element="date" qualifier="issued">$issuedate</dcvalue>
    $abstractxml
    $descriptionxml
    $description2xml
    <dcvalue element="type" qualifier="none">Article</dcvalue>
    <dcvalue element="language" qualifier="iso">en</dcvalue>
    $creatorxml
</dublin_core>
XML
    close($fh);

# Create contents file and move files to the load set.

    # Copy item files into the load set.
    if (defined($abstract) && length($abstract) > 0) {
        system "cp $indir/$abstract $basedir/$subdir";
    }

    $sourcedir = substr($abstract, 0, 5);
    if (defined($ppt) && length($ppt) > 0 ) {
         system "cp $indir/$sourcedir/$sourcedir/*.* $basedir/$subdir/";
    }
   
    if (defined($gif) && length($gif) > 0 ) {
         system "cp $indir/$sourcedir/$imagesubdir/*.* $basedir/$subdir/";
    }

    # Make the 'contents' file and fill it with the file names.
    system "touch $basedir/$subdir/contents";

    if (defined($gif) && length($gif) > 0
        && -d "$indir/$sourcedir/$imagesubdir" ) {
        # Sort items in reverse order so they show up right in DSpace.
        # This is a hack that depends on how the DB returns items
        # in unsorted (physical) order. There are better ways to do this.
        system "cd $indir/$sourcedir/$imagesubdir/;"
            . " ls *[0-9][0-9].* | sort -r >> $basedir/$subdir/contents";
        system "cd $indir/$sourcedir/$imagesubdir/;"
            . " ls *[a-zA-Z][0-9].* | sort -r  >> $basedir/$subdir/contents";
    }

    if (defined($ppt) && length($ppt) > 0
        && -d "$indir/$sourcedir/$sourcedir" ) {
        system "cd $indir/$sourcedir/$sourcedir/;"
            . " ls *.* >> $basedir/$subdir/contents";
    }
   
    # Put the Abstract in last, so it displays first.
    system "cd $basedir/$subdir; basename $abstract >>"
        . " $basedir/$subdir/contents";

    $linenum++;

} # Done processing an item.

-------------------------------------------------------------------------------------------------------------------------------

-- import.sh --

#!/bin/sh
#
# Import a collection from files generated on dspace
#
COLLECTION_ID=1811/6635
EPERSON=[name removed]@osu.edu
SOURCE_DIR=./2009xml
BASE_ID=`basename $COLLECTION_ID`
MAPFILE=./map-dspace03-mss2009.$BASE_ID

/dspace/bin/dsrun org.dspace.app.itemimport.ItemImport --add --eperson=$EPERSON --collection=$COLLECTION_ID --source=$SOURCE_DIR --mapfile=$MAPFILE