Building and Maintaining a Local Database for
Electronic Journal Access Points

Anne Prestamo, MLIS, Ed.D.
Associate Professor and Head, Digital Library Services
University Libraries
Oklahoma State University

Abstract

As the Oklahoma State University Libraries entered into more and more agreements with publishers and aggregators for online access to journals, magazines, and newspapers, it became increasingly challenging for librarians, let alone our patrons, to keep up with all of the available options.  In May 1999 the initial version of the “Full-Text Periodical Titles and Coverage List” went live on the Library’s Website.  Initially, the list contained approximately 5,500 entries.  Today the list has grown to nearly 18,000 access points. 

At the time this project began, the OSU Library was using NOTIS for its Online Catalog.  Since NOTIS was a text-based, telnet system, adding links to the MARC 856 field was not particularly helpful.  In July 2000, migration to Endeavor’s Voyager system was completed.  With the introduction of a Web-based catalog the 856 links have become much more utile, and efforts are underway to continue adding the URL information to records as quickly as possible.  Whether the “Full-Text Periodical Titles and Coverage List” will be maintained once the 856 project reaches a critical mass is currently a topic of discussion.

This paper will detail the processes used, including:

  
     1)   gathering and “normalizing” of data elements from vendors and publishers
        2)   creating the master file of all merged titles
  
     3)   converting to a Web-deliverable format
  
     4)   maintenance and update procedures
  
     5)   attempted migration to new format, including Access/SQL and ProCite/
               Reference Web Poster
   
  
     6)   856 links vs. “Full-Text Periodical Titles and Coverage List” ongoing discussion
  
     7)   lessons learned - what we’d do differently if we were starting over

 I.          Background

As the Oklahoma State University Libraries entered into more and more agreements with publishers and aggregators for online access to journals, magazines, and newspapers, it became increasingly challenging for librarians, let alone our patrons, to keep up with all of the available options.  When a patron was looking for a known periodical or article it was necessary to search numerous resources to determine if and where it was available.  At the time this project began, the OSU Library was using NOTIS for its Online Catalog.  Since NOTIS was a text-based, telnet system, adding links to the MARC 856 field was not particularly helpful.  Creating a local database of Full-Text Periodical access points was undertaken as a stopgap measure until the Libraries’ migration to Endeavor Voyager was completed, at which time 856 links would be added to periodical records and be truly functional.

In May 1999 the initial version of the Full-Text Periodical Titles and Coverage List went live on the Library’s Website.  Initially, the list contained approximately 5,500 entries.  In addition to the periodical titles, this list contained information on the formats available and the coverage dates for each title.  The majority of these entries represented the full-text access points to periodical titles from two aggregators:  Bell and Howell Information and Learning’s ProQuest Direct and Gale’s InfoTrac databases.  Initially, we were able to provide links to the “front door” of these resources.  Once there, it was necessary to do a search for the particular periodical title or article within that resource.

In the months that followed online full-text titles from Dow Jones Interactive, Project MUSE, JSTOR, Elsevier ScienceDirect, MCB Press Emerald Library, Lexis/Nexis Academic Universe, IEEE/IEE Electronic Library (IEL), Academic Press IDEAL, Oxford University Press, Cambridge University Press, Institute of Physics, American Institute of Physics, American Chemical Society, ACM, University of Chicago Press, SIAM, Wiley InterScience, Blackwell, and Springer LINK, as well as numerous society and single subscription titles have been added.  Some of the this content is distributed via Stanford University’s HighWire Press, while others are direct links to the respective publishers.  Currently, there are over 18,000 entries in the list.

As the list grew and procedures were refined, direct links to the periodical title within a given resource were incorporated through the use of durable or persistent URLs, streamlining the process of locating a particular periodical title.

II.         Data Sources

The procedures for obtaining the raw data for each resource have varied.  Whenever possible, we have requested that the vendor provide a delimited text file containing periodical title, ISSN, coverage dates, persistent or durable URL to the periodical title-level.  In a few cases the vendor has furnished this file to us, while in other cases we have been able to download the required information from the vendor’s Website as a delimited text file.  In yet other cases, it was necessary to retrieve the needed information on a title-by-title basis from the vendor’s Website. 

A separate Excel file with common field structure was created for each resource.  Particularly in the case of aggregators these files have proved very useful when we have considered new subscriptions, in that we have a readily usable file against which we can run checks for duplicate titles and/or coverage.
The following list contains links to sources of the necessary data from several of our vendors and notes on the ease of use for each.
 

Resource

Data Source

Ease of Use

ProQuest

http://www.umi.com/cgi-bin/TitleForm?cfg=LibTitles.cf

To create the title-level URL it was necessary to subscribe to ProQuest SiteBuilder at additional cost.  Building each durable link was a multiple-step process.

InfoTrac

http://www.galegroup.com/servlet/
HTMLFileServlet?region=9&imprint=
000&fileName=catalog/title.htm

In February 2001 durable links to the title-level (InfoMarks) and downloadable MARC records containing the 856 field were made available. 

ScienceDirect

Excel file from vendor, including durable URL for each title

Normalizing fields to our local format was all that was required.

AcademicUniverse

Title and coverage dates provided by vendor in delimited text file.

List included many titles that were not, in fact, full-text.  Best link is to the “Source List”, with no current method to provide durable links to the title-level.

Emerald

Subscribed title list provided by vendor in as text file.

Coverage dates and title-level URLs were retrieved on a title-by-title basis.

Oxford University Press

http://www3.oup.co.uk/jnls/online/

 

 

This is a new feature.  Save page as .html and open in Excel to edit.  It is still necessary to copy and paste title-level URLs into the Excel file.  At the time we subscribed it was necessary to build the list from scratch.

Cambridge University Press

http://www.journals.cambridge.org/bin/
bladerunner?REQUNIQ=984334071&
REQSESS=1843244&115020
REQEVENT=&REQAUTH=0

Note:  this URL requires authentication

Once subscribed, this URL displays a page showing your subscribed titles only. This feature was not available when our local list was originally created. Coverage dates and title-level URLs must be retrieved on a title-by-title basis.

IEL

http://ieeexplore.ieee.org/lpdocs/epic03/
Periodicals.HTM?findtitle=%;ilstrec=10;ifstrec=1

Note:  this URL requires authentication

Once subscribed titles were copied and pasted from the source list and coverage dates and title-level URLs were retrieved on a title-by-title basis.

Project MUSE

http://muse.jhu.edu/journals/

Save Webpage list as .txt file and open in Excel.  Coverage dates and title-level URLs were retrieved on a title-by-title basis.

JSTOR

http://www.jstor.org/cgi-
bin/jstor/listjournal?frame=noframe&config=jstor

Save Webpage list as .txt file and open in Excel.  Coverage dates and title-level URLs were retrieved on a title-by-title basis.

Figure 1    Sources of Raw Data


III.        Creating the Master File

Once the file for each individual resource was created, a copy of each file was merged into the master Excel file.  The file was sorted alphabetically by Periodical Title (see Figure 2).  This master file was then subdivided into separate worksheets, each containing approximately 100-150 titles.  Column headings containing the field names were inserted into the worksheets every 15-20 titles for clarity (see Figure 3).  The worksheets were named with the appropriate alphabetic subdivision for the titles it contained (i.e. A-Ag). 

Figure 2   Master File in Excel

In 1999 the first version of the list consisted of 50 worksheets.  Today, that number has grown to 105, with most worksheets containing 150-200 titles. 

IV.       Converting to Webpages

Once the worksheets were created each Excel worksheet was saved as an HTML file.   While the “Save as HTML” feature of Excel proved very useful in quickly converting the worksheets to Webpages, the resulting files were bloated with a great deal of extraneous HTML tagging and attributes.

To strip the extraneous HTML we used HotDog Pro, an HTML editor.  Using its MultiFile Find and Replace utility, a variety of schemes were created to find and replace all unnecessary code.  While creating these schemes was a challenge, this utility saved a tremendous amount of time, without having to open and edit each individual file.  Some schemes made more than 80,000 replacements across all files in a single operation.  Once the extraneous code was stripped the file sizes were reduced by as much as two-thirds.

The “TitleURL” and “VendorProductName” fields from the Excel file were blended into the “Access Via” field in the Webpages using schemes built with the MultiFile Find and Replace utility as well (see Figure 3).

 

                                Figure 3  Sample of Webpage version

An index page (see Figure 4) was created with a grid linking to each section of the alphabet.  Once again using HotDog’s MultiFile Find and Replace this grid was inserted at the top and bottom of each individual Webpage to facilitate browsing.

The work described above, as well as the maintenance and updating that will be discussed later was accomplished with one librarian, a half-time graduate assistant, and two student employees.  Overall, approximately 10% of the librarian’s time is devoted to this project, although the time-per-week varies tremendously.  This will be further addressed below.  The graduate assistant works approximately 15 hours per week on an ongoing basis on the project.  Student employees are involved primarily when new title and coverage lists are being constructed for a particular vendor.  This has required as much as 30 hours per week of student labor at various points in the project.

                    Figure 4   Index Page showing navigation grid that repeats on each page
                    http://www.library.okstate.edu/database/

V.        Maintenance and Update Procedures

Given the volatile nature of aggregators’ agreements with publishers, the digitizing of larger backfiles of articles, and changing URLs, there is a tremendous amount of maintenance required.  Whenever possible, we have subscribed to email lists for updates when that service is provided by vendors.  Some vendors provide Web-accessible lists of title and coverage changes.  In other cases it is necessary to periodically do manual checks of titles and coverage.

As discussed above, we have a graduate assistant who devotes approximately 15 hours per week to the retrieval and entering of updates.  On a week-to-week basis these updates are entered manually in both the Excel file and their respective HTML files.  Updated HTML files are uploaded to the server on an as-needed basis.

When large numbers of new titles are added a complete regeneration of the HTML files from the Excel file is done.  This has averaged approximately once per semester.  When this is done it requires 60-80 hours of the librarian’s time.   Because the complete regeneration is incredibly labor intensive and done rather infrequently, there are periods when the list does not include recently added titles for a period of up to 8 weeks.

VI.       Possible Migration Solutions

In Spring 2000 we began to explore the possibility of migrating the master file from Excel to Access and installing SQL Server to allow searching and browsing of the database via Web forms created with Cold Fusion.  Although a conversion of the file was done, and preliminary testing of SQL server was done, further work on the conversion was delayed due to the Libraries’ migration from NOTIS to Endeavor Voyager.  Not only were many of the same personnel involved, but there was the possibility that once the migration to Voyager was completed and 856 links were incorporated into OPAC records the Full-Text Periodical Titles and Coverage List would be abandoned. 

Obviously, the process of incorporating 856 links for more than 18,000 titles will be a lengthy process and a decision was made in August 2000 to continue updating the Full-Text Periodical Titles and Coverage List until such time as the 856 link project reaches “critical mass”.  However, further work on the Access/SQL solution was tabled in November 2000 due to the departure of the Systems librarian most knowledgeable in these areas.

An alternative solution was explored that involved migrating the Excel master file to ProCite, a bibliographic management utility.  A companion product, WebReference Poster enables searching of ProCite databases via Web forms.  While not a simple, straightforward conversion, after some trial and error the Excel file was successfully converted to ProCite.  WebReference Poster was installed in test mode on the Libraries’ Web server.  It functioned well, with the exception that the URL in each record appeared only as text – not as a clickable link.  Through further editing, the necessary HTML tags were inserted with each URL in an attempt to force the link.  This merely resulted in the HTML tags being displayed as text in the WebReference Poster record screens.  Technical support at RIS stated that they are aware clickable links are an issue, but that have no plans to incorporate that functionality in the near future.  As a result of this shortcoming we abandoned this solution.

Again, it appears that the Access/SQL solution is the most viable and we are in the process of trying to develop or acquire personnel with the expertise necessary to move in this direction.

VII.      856 links vs. Full-Text Periodical Titles and Coverage List

As discussed earlier, the creation of the Full-Text Periodical Titles and Coverage List was initially viewed as a stopgap measure until such time the 856 links could be provided in the Libraries’ OPAC records.  It seems, however, that we are the victims of our own success.  At the current time there is great support for maintaining the Full-Text Periodical Titles and Coverage List in tandem with having the 856 links in the OPAC records. 

The OSU Libraries’ Cataloging Department has made great strides in the highly labor-intensive work of incorporating 856 links.  They are following the CONSER Guidelines for a single-record approach. 
(See http://www.loc.gov/acq/conser/mod31pt1.html)   In a nutshell, this means that when the Libraries have no holdings in other formats a new bibliographic record is created and the 856 field is inserted in the bibliographic record.  When the Libraries have existing print or microform holdings a new holdings record is created with the 856 field and linked to the existing bibliographic record.  

In some cases, this massive undertaking has been facilitated by the availability of MARC records for bulk import.  Some vendors, like the Gale Group are now providing the MARC 856 records at no additional cost, while others like ProQuest have MARC records available for purchase.  Information on the Gale Group’s MARC records may be found at http://www.galegroup.com/catalog/marc.htm
Information on MARC records for ProQuest Direct is available at http://www.umi.com/hp/Features/Marc/MARCdata.html

Some libraries are still taking the a highly conservative approach with 856 links, holding firm to the idea that only items which are owned by a library, and over which the library has physical control, should be cataloged.  In our case, a decision has been made to catalog anything to which we have access, regardless of its format and physical location.  However, one issue that we are still wrestling with involves the question of whether of not 856 links for titles included in consortial subscriptions should be cataloged.  For example, we benefit from a statewide subscription to the Gale Group’s InfoTrac databases.   As an individual library we have no direct control over the decision to renew or cancel access to these databases.  The titles in the InfoTrac databases have been assigned the lowest priority in the Cataloging Department’s workflow, as the statewide contract is up for renewal at the present time.  Once a decision is made on renewal, we will make a decision on whether or not records for these titles will be added to the catalog.

The OSU Libraries will be migrating to Voyager2000 over the next few months.  Currently under study is the possibility that Voyager2000 will allow us to build “canned searches” utilizing the 856 field information.  These canned searches may allow on-the-fly-generation of browsable lists of full-text periodicals and could replace the locally maintained Full-Text Periodical Titles and Coverage List.  Until that option can be fully tested it is unlikely that migration of the Full-Text Periodical Titles and Coverage List to Access/SQL will be undertaken.

VIII.     Lessons Learned

When the initial work on the Full-Text Periodical Titles and Coverage List was undertaken it was envisioned as short-term solution.   While low-tech solutions may appear expedient in the beginning, their ability to “scale” as demands increase is a question that should have been asked from the beginning.  In its earliest stages, the methodology adopted was labor intensive, but not prohibitively so.  No one involved in the project imagined that the Libraries’ access to online full-text periodical would grow at the rapid rate we’ve experienced.  As the number of titles doubled and then more than tripled it became clear that the initial method of creating and delivering the list was becoming increasingly impractical. 

As the Cataloging Department has begun entering 856 links we have had to develop procedures for gathering and disseminating update information.  At the current time, we are updating records in three places:  the Excel master file, the Full-Text Periodical Titles and Coverage List Webpages, and the OPAC records.  An administrative decision will have to be made as to whether or not we can continue to commitment the human resources necessary to provide multiple means of access to full-text periodical links.

The reassuring thing in all of this is that, although the maintenance of the Full-Text Periodical Titles and Coverage List has become highly labor-intensive, it has become an important tool for librarians and Library patrons.  Analysis of the log files for 2000 indicates that the Full-Text Periodical Titles and Coverage List was the third most frequently accessed Webpage on the Libraries’ Web server.  It use was surpassed only by the Libraries’ homepage, which displays every time a browser is opened on any of the Libraries’ 400+ computers.  The second most frequently accessed Webpage is the Libraries’ list of Electronic Indexes and Databases.

Facilitating access to online materials will continue to pose challenges for libraries for the foreseeable future.  Seamless and simple access has yet to be achieved.  The availability of full-text periodicals online has certainly streamlined the ways in which we retrieve information, but has brought with it a whole new set of challenges in bibliographic control, organization, and access.

Contact the Author

Anne Prestamo, Coordinator
Digital Library Services
Edmon Low Library
Oklahoma State University
Phone (405) 744-9161
FAX (405) 744-7579
Email prestam@okstate.edu