The DOI is Coming
The DOI is Coming Information Outlook, Vol. 6, No. 9, September 2002

The DOI is Coming

by Davida Scharf, MLS

Davida Scharf has been online since the 1970s. At NKR Associates, Inc. (www.NKRassociates.com), she assists organizations of all types in using information technology effectively, through consulting and project management.

Get Ready!

Librarians know that what the Web really needs is a library system, including catalog numbers for documents. Unique catalog numbers are the bedrock of an information management system and a long-standing tradition in publishing and library science. At present, a URL is the door to electronic content on the Web. A URL defines the location of the content, but unlike call numbers on books, which are rarely reclassified, one of the biggest problems with URL's is that they change often.

Imagine the burden of keeping up with daily changes to call numbers in your print collection. That is essentially the Herculean task faced by those constructing "virtual libraries." Following this path leads straight to new electronic backlogs in Web site editing of changed URLs. While there are some software tools to cope with this, the solution probably lies elsewhere. The Digital Object Identifier (DOI), along with the Open URL standard and linking software, may hold the key to vast improvements in accessibility of copyrighted electronic content. The expectation of instant information is being fueled by modern technology's rapidly improving ability to actually deliver it. When you combine difficulties in consistently locating Web-based content caused by lack of a persistent URL and cumbersome methods for copyright compliance, you have a big practical problem needing a solution.

The DOI is a unique control number for digital content. It is a permanent, persistent identifier for electronic intellectual property. Its power lies in the standard, centralized database of metadata and the delivery system behind it, combined with an ability to interact with disparate systems. The DOI will always point to the content, wherever the publisher has put it. It has been compared to the standardized UPC (Universal Product Code) that appears on almost every product. The UPC barcode is useful at many points along the production and supply chain to connect the item to its metadataprice, brand, manufacturer and so on. It is used to track inventory, distribution, revenue and many other things in a variety of systems that communicate with each other across companies.

Use of a persistent identifier would likely have important benefits to the information industry and its customers. A DOI is somewhat analogous to an ISBN or ISSN. There are many similarities, but some important differences. Like an ISBN, the DOI represents its electronic bit of content forever, but it is not limited to representing an entire work. DOI's may be assigned to individual articles, chapters or even smaller chunks of content, like individual charts and other images that may make up a larger work. Publishers may assign DOI's for as many slices of content as they believe there are consumers willing to pay for those slices. In addition, the DOI is embedded in a database and system, which enables it to provide unimpeded pathways for linkages from content to consumers of content. The DOI, and the system to handle it automatically, can make it possible for publishers to provide seamless access through links that are "always right" because they are maintained by the content owner, not the person linking to the content. At the same time, it can provide an instant mechanism for copyright compliance for even the smallest bits of content.

URIs, URLs and URNs

The World Wide Web Consortium (W3C) was created in October 1994 to develop common protocols to promote the evolution of the Web and ensure its interoperability. W3C has approximately 500 member organizations from all over the world and has earned international recognition for its contributions to the growth of the Web. The W3C defines the URI (Uniform Resource Identifier) as the generic set of all names/addresses that are short strings that refer to resources. The URL (Uniform Resource Locator) is an informal term (no longer used in technical specifications) associated with popular URI schemes: http, ftp, mailto, etc. The URN (Uniform Resource Name) is intended to serve as a persistent, location-independent, resource identifier. The technology behind URIs is suitable for use as catalog numbers, and much more, such as rights management, payment assurance, privacy, digital signatures, etc. This does sound like a DOI, doesn't it? Indeed the DOI is essentially a URN.

OCLC PURLs

At about the same time the W3C began addressing this problem, PURL's (Persistent URL's) were developed by OCLC as a shorter-term solution to the persistency problem also recognized by catalogers. Instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The resolution service associates the PURL with the actual URL and returns that URL to the client, which can then complete the transaction in the normal fashion. Although a PURL service is being run and maintained at OCLC, the PURL model lends itself to distribution across the net, with servers run by organizations with a commitment to maintaining persistent naming schemes (libraries, government organizations, publishers and others). OCLC freely distributes its PURL source code to aid in rapid, wide distribution of this enabling technology (www.oclc.org/purl/docs/download.html). This system relies more on voluntary participation than international standards and hasn't really taken off. Since the introduction of the PURL model and services, a number of institutions have expressed an interest in formal participation and in running PURL servers of their own. The PURL does not facilitate copyright protection; as such commercial publishers have not employed it.

How Does the DOI Work?

There are currently five registration agencies authorized to assign DOI's by the International DOI Foundation (IDF). Two of these, CrossRef and Content Directions, currently serve as registrars and consultants to U.S. publishers on DOI implementations, though Content Directions is looking to expand into other media. CrossRef is a nonprofit consortium of publishers, whereas Content Directions is trying a for-profit business model. According to the IDF, there are currently more than 200 registrant organizations and more than 4 million have been allocated, with the majority of these designating text-rich online content from the traditional print publishers.

The DOI system is made up of two partsthe DOI number and the database of information about the item of content including its current location, called the Handle System. A publisher obtains the DOI prefix from one of these agencies and uses it to create unique DOI's for each article it publishes. The DOI number consists of two elements. The first is a prefix indicating the registration agency and the publisher. The second part of the number is publisher assigned, must be unique within that scheme, but may follow any numbering system developed by the publisher. The publisher submits article metadata to the central DOI directory maintained by the registration agency. The second component of the DOI system (the directory database) is maintained by the publisher. The information maintained on the publisher side may actually be distributed among many databases, but the publisher must be able to present metadata about the actual content to the DOI resolution system in order to communicate the location of the content at the publisher's site.

Players and Acronyms at a Glance

W3C (World Wide Web Consortium) - Created in 1994 to develop common protocols that promote the evolution of the Web and ensure its interoperability. Developed standards for the URL and the URN.

URN (Universal Resource Name) - A persistent identifier for digital content of any type.

DOI (Digital Object Identifier) - A form of URN being used by scientific and scholarly publishers to manage content on the Internet. The DOI system is run by the IDF.

IDF (International DOI Foundation) - Formally organized in 1998, consists of member electronic publishers who developed and now run the DOI organization. The DOI is an application of the URN (www.doi.org).

CrossRef - A collaborative organization of STM publishers (currently over 120 members) who developed the first practical application of the DOI system to enable reference linking (www.crossref.org).

The Handle System - The underlying technology for the DOI. It is a comprehensive system for assigning, managing and resolving persistent identifiers, known as "handles," for digital objects and other resources on the Internet. Handles can be used as Uniform Resource Names (URNs.) It was developed and is administered by CNRI (www.handle.net).

CNRI (Corporation for National Research Initiatives) - A not-for-profit organization formed in 1986 to foster research and development for the National Information Infrastructure.

Finally, knowing a DOI for a journal article will enable someone to locate it persistently. If a publisher changes the location of an article, it need only update the URL for the article in one place. The resolution of the DOI is actually done by the Handle System, operated by CNRI (Corporation for National Research Initiatives), not by the registration agency. When a user clicks on a link containing the DOI, the user is pointed to the URL submitted by the publisher. To update the location database, the publisher must submit updates into the Handle System. This is a weak point, because broken URL's are not caused by any deficiency of the Domain Name System, but rather a weakness of organizations. The DOI infrastructure has not figured out how to keep a link working when a publisher goes out of business or is incompetent.

Open URL

Researchers want access to the full text of an article with one click on a reference citation in a journal. In order for this to happen, additional software must be employed along with the Open URL, a NISO standard syntax, to create the conditions for open link technology, sometimes called context-sensitive linking. CrossRef was the first DOI registration agency to connect all the parts and create successful reference linking. This open linking technology is already being marketed to libraries directly, as well as to library system vendors for incorporation in their software. Endeavor is promoting stand-alone linking software called LinkfinderPlus to provide comprehensive linking of library resources, regardless of the information provider. Two other companies have similar software products, SFX from ExLibris and LinkOpenly from Openly Informatics, for linking services in the scholarly environment. They can dynamically create links that integrate the library's sources regardless of their location or format. For example, a database search results would show context-sensitive links that are based on the institution's entire e-journal collection. Openly Informatics has appropriately called one of its linking products 1CATE for "one click access to everything." Of course a user can only gain one click access to everything for which an institution has paid, but these systems do reduce the number of clicks and friction between a reference and a full-text article within the library's entire roster of licensed full-text sources. Libraries can become CrossRef affiliates for $500 per year, but Endeavor, ExLibris and Openly Informatics provide the linking software in the middle that can make CrossRef and other URN schemes work for libraries.

DOI and Digital Rights Management

What does this mean for transactional Digital Rights Management services offered by the Copyright Clearance Center, iCopyright and others? While these services seem to be proliferating at the moment, they are still cumbersome for several reasons. The primary reason is lack of acceptance and use by most content owners. The second reason is the burden of applying to pay the fees is placed on the user of the content. Users often do not have the means, time or money needed to comply and instead simply reproduce without permission or avoid reusing the content. These DRM services may not be needed if responsibility for "asking or tracking" copies for the purpose of "paying" for use is shifted from the users to the owners and their computer systems. Since powerful systems are being developed to track every click for advertising purposes, it is not unreasonable to think that similar systems can be deployed in service of copyright compliance. The DOI and Handle System hold more promise for seamless rights management, especially since they were devised by publishers with that goal in mind.

Guess Who Cooked This Up?

While mention of the DOI has also been amazingly absent from public discussions of digital rights management, STM publishers have been quietly working on standards and implementation of this scheme. The DOI was conceived by a group of publishers working through their trade association, the AAP (Association of American Publishers). In 1994, they began seeking a way to standardize a technology, which would both enable copyright protection of their electronic publications without impeding access to content. They hoped to facilitate copyright compliance without inconveniencing customers and creating an illegal monopoly technology. For a full accounting of the genesis of the DOI solution, read Bill Rosenblatt's article entitled "The Digital Object Identifier: Solving the Dilemma of Copyright Protection Online," in the Journal of Electronic Publishing (University of Michigan Press), volume 3, issue 2 (December 1997), www.press.umich.edu/jep/03-02/doi.html.

How Will It Affect Libraries and Researchers?

Anything that affects information resources and their means of distribution ultimately affects libraries. Widespread deployment of any persistent standardized identifier of electronic content will affect the cost of obtaining and redistributing information resources, as well as fundamentally change the way in which that happens. It may mean the expeditious linking to wanted material for some libraries and the death knell for others as content is "cataloged" for the Web at the time of its very creation by its producer. It is likely it will have as great an impact on libraries as the creation of the MARC format did, so many decades ago.

REFERENCES

CrossRef. Publishers International Linking Association. (www.crossref.org)

The Digital Object Identifier (DOI) Help. Wiley InterScience. (www3.interscience.wiley.com/doiinfo.html)

The Digital Object Identifier: Keystone for Digital Rights Management. Software Information Industry Association. May 30, 2001. (www.siia.net/divisions/content/doi.pdf)

The DOI Handbook. Version 2.1.0 April 2002. The International DOI Foundation. (www.doi.org)

Web Naming and Addressing: URI's, URL's . . . World Wide Web Consortium. (www.w3.org/Addressing/#1997)

Privacy Statement
©2009 Special Libraries Association. All rights reserved.
331 South Patrick Street Alexandria, VA 22314-3501 USA