Network Working Group                                       Juha Hakala
Internet-Draft                              Helsinki University Library
Category: Informational                                   February 2000
draft-hakala-nbn-00.txt
Expires: August 25, 2000





                Using National Bibliography Numbers as
                         Uniform Resource Names

Status of this Memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.


This Internet-Draft will expire on August 25, 2000.

Abstract

This document discusses how national bibliography numbers (persistent
and unique identifiers assigned by the national libraries) can be
supported within the URN framework and the syntax for URNs defined in
RFC 2141 [Moats].Much of the discussion below is based on the ideas
expressed in RFC 2288 [Lynch]. Chapter 5 contains a URN namespace
registration request modelled according to the template in RFC 2611
[Daigle et al.].


1. Introduction

As part of the validation process for the development of URNs the IETF
working group agreed that it is important to demonstrate that the
current URN syntax proposal can accommodate existing identifiers from
well established namespaces.  One such infrastructure for assigning and
managing names comes from the bibliographic community.  Bibliographic
identifiers function as names for objects that exist both in print and,
increasingly, in electronic formats.  RFC 2288 [Lynch et. al.]
investigated the feasibility of using three identifiers (ISBN, ISSN and
SICI) as URNs.

This document will analyse the usage of national bibliography numbers
(NBNs) as URNs. The need to extend analysis to new identifier systems
was shortly discussed in the RFC 2288 as well, with the following
summary: "The issues involved in supporting those additional identifiers
are anticipated to be broadly similar to those involved in supporting
ISBNs, ISSNs, and SICIs".

Note that this document does not purport to define the "official"
standard way of using national bibliography numbers as URNs; it merely
demonstrates feasibility. A registration request for acquiring Namespace
Identifier (NID) "NBN" for national bibliography numbers has been
written by the National Library of Finland on the request of Conference
of Directors of National Libraries (CDNL) and Conference of the European
National Librarians (CENL). The request is included into chapter 5 of
this text.

The document at hand is part of a global co-operation of the national
libraries to foster identification of electronic documents in general
and utilisation of URNs in particular. It should be noted that some
national libraries, including national libraries of Finland, Norway and
Sweden, are already assigning NBN-based URNs for electronic documents.

Following the registration request, we have used the URN Namespace
Identifier "NBN" for the national bibliographic numbers in examples
below.


2. Identification vs. Resolution

As a rule the national bibliography numbers identify finite, manageably-
sized objects, but these objects may still be large enough that
resolution to a hierarchical system is appropriate.

The materials identified by a national bibliography number may exist
only in printed or other physical form, not electronically. The best
that a resolver will be able to offer in this case is bibliographic data
from a national bibliography database, including information about where
the physical resource is stored in national library's holdings.

The URN Framework provides resolution services that may be used to
describe any differences between the resource identified by a URN and
the resource that would be returned as a result of resolving that URN.
However, NBNs will be used for instance to identify resources in digital
Web archives created by harvester robot applications. In this case, NBN
will identify exactly the resource the user expects to see.


3. National bibliography numbers

3.1 Overview

National Bibliography Number (NBN) is a generic name referring to a
group of identifier systems utilised by the national libraries and only
by them for identification of deposited publications which lack an
identifier, or to descriptive metadata (cataloguing) that describes the
resources. Each national library uses its own NBN strings independently
of other national libraries; there is no global authority which controls
them. For this reason NBNs are unique only on the national level. When
used as URNs NBN strings must be augmented with a controlled prefix such
as country code. These prefixes guarantee uniqueness of the NBN-based
URNs on the global scale.

NBNs have traditionally been given to documents that do not have a
publisher-assigned identifier, but are catalogued to the national
bibliography. NBNs can be seen as a fall-back mechanism: if no other,
better established identifier such as ISBN can be given, an NBN is
assigned. In principle, NBN usage enables identification of any Internet
document. Local policies may limit the NBN usage to much smaller subset
of documents.

Some national libraries (e.g. Finland, Norway, Sweden) have established
Web-based URN generators, which enable authors and publishers to fetch
NBN-based URNs for their network documents. At least national libraries
of Sweden and Finland are harvesting and archiving domestic Web
documents (and a number of other libraries plan to start this activity),
and long-time preservation of these materials requires persistent and
unique identification. NBNs can be and are in fact already used as
internal identifiers in these Web archives.

Both syntax and scope of NBNs can be decided by each national library
independently. Typically, a NBN consist of one or more letters and/or a
number. This simple syntax makes NBNs infinitely extensible and very
suitable for e.g. naming of the Web documents. For instance the
application used by the national library of Finland for Web harvesting
creates NBNs which are based on the MD5 checksum of the archived
resource.


3.2 LCCN

Two examples of NBN systems are LCCNs (Library of Congress Control
Number) used by the Library of Congress, and F-code assigned by the
National Library of Finland.

The Library of Congress Card Number was the number used to identify and
control catalog cards. With the development of the MARC format and the
first distribution of machine-readable records for book materials in the
late 1960s, the name of the LCCN was changed to Library of Congress
Control Number. LCCNs are currently structured as follows:

Element               Length        Positions
Alphabetic Prefix     3             00-02
Year                  2             03-04
Serial Number         6             05-10
Supplement Number     1             11


The uniqueness of the LCCN is determined by the first 11 positions
(positions 00-10). The Supplement Number has never been used by the
Library of Congress and this position is always blank. The Supplement
Number may be followed by two kinds of variable length data known as
Suffix/Alphabetic Identifier and Revision Date. Each Suffix/Alphabetic
Identifier is preceded by a slash as is Revision Date. If there is
noSuffix/Alphabetic Identifier, the Revision Date is preceded by two
slashes.

According to the RFC 2141, "RFC 1630 [2] reserves the characters "/",
"?", and "#" for particular purposes. The URN-WG has not yet debated the
applicability and precise semantics of those purposes as applied to
URNs. Therefore, these characters are RESERVED for future developments.
Namespace developers SHOULD NOT use these characters in unencoded form,
but rather use the appropriate %-encoding for each character".

Thus the slash character ("/") has to be encoded according the
requirements of RFC2141. There are no other characters in LCCN that need
encoding.

For more information about the LCCN, see
http://lcweb.loc.gov/cds/mdslccn.html.


3.3 F-code

F-codes have been used since early 20th century to identify and control
catalogue cards and later MARC records in the national bibliography. In
1998 the national library of Finland decided to enable the Finnish
authors to fetch F-codes to their Internet documents, if these documents
do not qualify for other identifiers such as ISBN. Authors and
publishers can retrieve F-codes, embedded into URNs, from the URN
generator (http://www.lib.helsinki.fi/cgi-bin/urn.pl) developed in co-
operation between the national library of Finland and the Lund
University library, NETLAB unit. There is a user guide, which tells the
users how to embed the NBN-based URNs into the identified documents.

F-codes are also used within the Web harvesting and archiving software,
which has been built to the Networked European Deposit Library (NEDLIB)
project (see http://www.konbib.nl/nedlib). This application calculates
MD5 checksum for each archived resource, and then builds an NBN-based
URN from the checksum. The URN serves then as a unique identifier to the
archived resource. Traditional identifiers can not be used for this
purpose, since there may for instance be several variants of a book
which (quite rightly so) all have the same ISBN. Moreover, identifiers
embedded into a document do not necessarily belong to the document
itself; the Web archiver can not trust the identifier information it
finds.

The F-code built by the URN generator consist of:

Prefix (for example fe)
Year (YYYY; for example 1999)
Number (for example 1055)

The generator also adds namespace identifier "NBN" and ISO 3166 country
code. Thus a URN based on F-code would in this case be for instance
urn:nbn:fi-fe19991055.

URNs created by the Web archiver have similar overall structure, except
that prefix (which may be defined by the operator) is fea and year is
not used. An example of a URN built by the Web archiver: urn:nbn:fi-fea-
5c5875e6e49ae649cad63e5ee4f6c346.


F-codes never need any special encoding when used as URNs, since they
consists of alphanumeric codes only (0-9, a-z). This is often the case
for other NBN systems as well.

3.4 Encoding Considerations and Lexical Equivalence

Embedding NBNs within the URN framework presents usually no particular
encoding problems, since all of the characters that can appear in
commonly used NBN systems can be expressed in special encoding, as
described in RFC 2141 [MOATS].

When an NBN is used as an URN, the namespace specific string will
consist of three parts: prefix, consisting of either a two-letter ISO
3166 country code or other string, delimiting character (hyphen, colon
or hash sign) and NBN string assigned by the national library.

Non-ISO 3166 -prefixes must be registered. The Library of Congress will
maintain the central register of reserved codes, and make it available
to the national libraries. All two-letter codes are reserved for
existing and possible future ISO country codes and may not be used as
non-ISO prefixes. If there are several national libraries in one country
who use the same prefix - for instance, a country code -, they need to
agree on how to split the sub-namespace between them.

Models:
URN:NBN:<ISO 3166 country code>-<assigned NBN string>
URN:NBN:<non-ISO 3166 prefix>-<assigned NBN string>

Examples:
URN:NBN:fi-fe19981001 (A "real" URN assigned by the National Library of
Finland).
URN:NBN:LCCN:2001000168 (A LCCN-based hypothetical URN assigned by the
Library of Congress).

3.5 Resolution of NBN-based URNs

As a dumb code NBN would be difficult to resolve globally as such. The
(usually) country code -based prefix part of the URN namespace specific
string will provide a guide to where to find a resolution service and
the NBN register will identify the assigning agency. Once the NBN-based
URN resolution is in global usage, the number of prefixes will slowly
become equal or even slightly bigger than the number of national
libraries.

If NBN assignment is limited to the national bibliography database, then
all NBN-based URNs for that country will be resolved there. In one model
these databases contain detailed resource descriptions including URLs,
which will point both to the copy of the document in the Internet and to
the copy in the national library's (legal) deposit collection. Due to
the limitations in the usage of legal deposit documents it is possible
that the deposited electronic materials can not be delivered outside the
premises of the national library.

If it is possible for the authors and publishers to retrieve NBNs to Web
documents and there is no obligation to deposit thus identified
documents to the national library, URN resolution service is not
possible without a national Web index and archive, maintained by the
national library or other organisation/organisations. Web index/archive
will also resolve URNs machine-generated to the archived Web documents.

3.6 Additional considerations

Guidelines adopted by each national library define when different
versions of a work should be assigned the same of differing NBNs. These
rules apply only if identifier assignment is done manually. If
identifiers are allocated programmatically, the only criteria that can
be used is that two documents which are identical on the bit level (have
the same MD5 checksum) are deemed identical and should receive the same
NBN. The likelihood of this happening to dissimilar documents is about
2^64, according to the RFC1321.

The rules governing the usage of NBNs are less strict than those
specifying the usage of ISBN or other, better established identifiers.
Since the NBNs have up to know been given only by the personnel
(cataloguers) working in the national libraries, the identifier
assignment has in practice been well co-ordinated.

It is obvious that a NBN URN will resolve to single instance of the work
if identifier assignment has been automatic. Given the nature of NBNs it
is also likely that different versions of the same work will receive
different NBNs even if identifier is given manually.


4. Security Considerations

This document proposes means of encoding several existing bibliographic
identifiers within the URN framework. This document does not discuss
resolution except in a very generic level; thus questions of secure or
authenticated resolution mechanisms are out of scope.  It does not
address means of validating the integrity or authenticating the source
or provenance of URNs that contain bibliographic identifiers.  Issues
regarding intellectual property rights associated with objects
identified by the various bibliographic identifiers are also beyond the
scope of this document, as are questions about rights to the databases
that might be used to construct resolvers.


5. Namespace registration


URN Namespace ID Registration for the National Bibliography Number (NBN)

Namespace ID:

NBN

This Namespace ID has been in production use in demonstrator systems
since summer 1998; at least hundreds of URNs from this namespace have
been delivered already in Finland and Sweden.

Registration Information:

Version: 2
Date: 2000-02-25
The first registration of the NID "NBN" was done via the URN WG in
November 1998.

Declared registrant of the namespace:

Name: Juha Hakala
E-mail: juha.hakala@helsinki.fi
Affiliation: Helsinki University Library - The National Library of
Finland, Conference of European National Librarians (CENL) and
Conference of Directors of National Libraries (CDNL)
Address: P.O.Box 26, 00014 Helsinki University, Finland

Both CENL and CDNL made decisions to foster the usage of URNs during
1998. Both organisations have set up a working group for this purpose.
One item in the common work plan is utilisation of national bibliography
numbers (NBNs; see below) as URNs for identification of grey literature
published in the Internet. NBN namespace will enable the national
libraries to do this. The namespace will be available for all national
libraries in the world.

Declaration of syntactic structure:

The namespace specific string will consist of three parts: prefix,
consisting of either a two-letter ISO 3166 country code or other string,
delimiting character (hyphen, colon or hash sign) and NBN string
assigned by the national library. A namespace specific string must be
unique when normalised to omit the delimiter between the prefix and the
string.

Non-ISO prefixes must be registered. A global registry, maintained by
the Library of Congress, will be created and made available via the Web.
Contact information: nbn.register@loc.gov.us. All two-letter codes are
reserved for existing and possible future ISO country codes and may not
be used as non-ISO prefixes.

If there are several national libraries in one country who want to use
the same prefix - for instance, a country code -, they need to agree on
how to split the namespace between them into smaller sub-domains. These
smaller domains must be registered if they are resolved on different
sites. Similarly, a single national library may utilise various sub-
domains; for instance, the National Library of Finland already has two
domains, fi-fe for author-assigned URNs and fi-fea for URNs built by the
Web harvesters.

Models:

URN:NBN:<ISO 3166 country code>-<assigned NBN string>
URN:NBN:<non-ISO 3166 prefix>-<assigned NBN string>

Examples:

A country code -based URN: URN:NBN:fi-fe19981001 (A URN assigned by the
National Library of Finland).
Non-country code based URN: URN:NBN:LCCN:2001000168 (A hypothetical URN
assigned by the Library of Congress).

Relevant ancillary documentation:

National Bibliography Number (NBN) is a generic name referring to a
group of identifier systems used by the national libraries for
identification of deposited publications which lack an identifier, or to
descriptive metadata (cataloguing) that describes the resources. Each
national library uses its own NBN strings independently of other
libraries; there is no global authority which controls them. For this
reason NBNs are unique only on the national level, and the controlled
prefix guarantees uniqueness on the global scale.

NBNs have traditionally been given to documents that do not have a
publisher-assigned identifier, but are catalogued to the national
bibliography. When assigned as URNs, these NBNs will fit into the global
URN resolution services. Some national libraries (Finland, Norway,
Sweden) have established Web-based URN generators, which enable authors
and publishers to fetch NBN-based URNs for their network documents.

Both syntax and scope of NBNs can be decided by each national library
independently. Typically, a NBN consist of one or more letters and a
number.

Identifier uniqueness considerations:

NBN strings assigned by two national libraries may be identical. For
this reason usage of prefix in the namespace specific string is
obligatory for guaranteeing global uniqueness of NBN-based URNs.

In the national level, libraries utilise different policies for
guaranteeing uniqueness. A national library may automate the delivery of
NBN-based URNs. In this case, the NBNs are assigned sequentially by a
program (URN generator).

Identifier persistence considerations:

Persistence of the NBNs as identifiers is guaranteed by the persistence
of national libraries and information systems, such as national
bibliographies, maintained by them. NBNs have been used for several
centuries for printed materials. NBN-based identification of electronic
documents is a recent practice, but it is likely to continue for a very
long time.

Process of identifier assignment:

Assignment of NBN-based URNs is always controlled in the national level
by the national library / national libraries. In Europe, Conference of
the European National Librarians will co-ordinate the URN practices in
member libraries via a working group established in 1998. In the global
level, Conference of Directors of National Librarians (CDNL) has
established in 1999 a task force with similar aims.

National libraries may choose different strategies in assigning NBN-
based URNs. One option is assignment by the library personnel only. This
is typically done when the document is catalogued into the national
bibliography. A national library may also set up a URN generator
(generators), and allow publishers and authors to retrieve NBN-based
URNs from there. In this case there is no guarantee that the document
will be catalogued into the national bibliography. Besides the harvester
the national libraries may develop other applications such as Web
harvesters/archivers which utilise URNs for identification purposes.

Process for identifier resolution:

URNs based on NBNs will be primarily resolved via the national
bibliography databases. In one model these databases contain detailed
resource descriptions including URLs, which will point both to the copy
of the document in the Internet and to the copy in the national
library's (legal) deposit collection. Due to the limitations in the
usage of legal deposit documents it is possible that the deposited
materials can not be delivered outside the premises of the national
library.

For those documents not catalogued into the national bibliography
database URN resolution may take place via national or international Web
indexes and/or archives. Nordic national libraries have established a
joint initiative called Nordic Web Index / Nordic Web Archive (NWI/NWA),
which aims at creating national Web archives and indexes into all Nordic
countries.

As a dumb code NBN would be difficult to resolve globally as such. The
prefix part of the URN namespace specific string will provide a guide to
where to find a resolution service and the NBN register will identify
the assigning agency. It will be necessary to establish a DNS NAPTR
resource record for each prefix; the total number of these records may
in the end be about 200. Initially, only a handful of records will be
needed.

Within each record, there will be one or more resolution services
specified, depending on the assignment policy of the national library.
If NBN assignment is limited to the national bibliography database, then
all NBN-based URNs for that country will be resolved there. If it is
possible to retrieve NBNs to Web documents, full-scale URN resolution
service is not possible without a national Web index and archive.

Rules for Lexical Equivalence:

None in the global level. Any national library may provide its own
rules, on the basis of its NBN syntax.

Conformance with URN Syntax:

All NBNs we know of are ASCII strings consisting of letters (a-z) and
numbers (0-9). If NBN contains characters that are reserved in the URN
syntax, this data must be presented in hex encoded form as defined in
RFC2141. A national library may limit the full scope of its NBN strings
in URN usage in such a way that there are no reserved characters in the
URN namespace specific strings.

Validation mechanism:

None specified on the global level. A national library may use NBNs,
which contain a checksum and can therefore be validated, but this is for
the time being not a common practice.

Scope:

Global.


6. References

[Daigle et al.]: Daigle, L., van Gulik, D., Iannella, R. & Faltstrom,
P.: URN Namespace Definition Mechanisms, RFC2611, June 1999.
[Lynch] Lynch, C., Using Existing Bibliographic Identifiers as Uniform
Resource Names, RFC 2288, February 1998
[Moats] Moats, R., "URN Syntax", RFC 2141, May 1997.


7. Authors' Address

   Juha Hakala
   Helsinki University Library - The National Library of Finland
   P.O. Box 26
   FIN-00014 Helsinki University
   FINLAND

   EMail: juha.hakala@helsinki.fi


8.  Full Copyright Statement

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.