Skip to main content

A Persistent Web IDentifier (PWID) URN Namespace
draft-pwid-urn-specification-03

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Expired".
Author Eld Maj-Britt Olmuetz Zierau
Last updated 2018-07-16
RFC stream (None)
Formats
Additional resources
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-pwid-urn-specification-03
Internet Engineering Task Force                           E. Zierau, Ed.
Internet-Draft                                      Royal Danish Library
Intended status: Informational                             July 16, 2018
Expires: January 17, 2019

            A Persistent Web IDentifier (PWID) URN Namespace
                    draft-pwid-urn-specification-03

Abstract

   This document specifies a Uniform Resource Name (URN) for Persistent
   Web IDentifiers to web material in web archives using the 'pwid'
   namespace identifier.  The purpose of the standard is to support
   general exact referencing method which includes support for
   references to archives with restricted access, for exact references
   to existing web material, and for exact specification of elements in
   a web corpus (possibly spanning over several web archives).  The PWID
   URN therefore offers a scheme to make references that are not
   currently supported.

   The PWID is designed for researchers and therefore it is designed as
   general, global, sustainable, humanly readable, technology agnostic,
   persistent and precise web references for web materials in web
   archives, and in a way that can make them potentially resolvable.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 17, 2019.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Zierau                  Expires January 17, 2019                [Page 1]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
   2.  Namespace Registration Template . . . . . . . . . . . . . . .   4
   3.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
   4.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
     4.1.  Normative References  . . . . . . . . . . . . . . . . . .  15
     4.2.  Informative References  . . . . . . . . . . . . . . . . .  16
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  18

1.  Introduction

   The purpose of the PWID URN is to represent general, global,
   sustainable, humanly readable, technology agnostic, persistent and
   precise web archive resource references in a way that;

   o  can be used for technical solutions e.g. to make them resolvable

   o  can cover references to all sorts of materials in web archives

   o  can cover references to materials from all sort of web archives

   The motivation for defining a PWID namespace is the growing challenge
   of references to archived web resources, which the PWID as a URN can
   assist in overcoming.  The standard is needed to address web
   materials meeting precision and persistency issues on par precision
   in with traditional references for analogue material.  Furthermore,
   it is needed in order to address web archive resources that are not
   freely available online.  The PWID URN covers both referencing of web
   resources from research papers and definition of web collection/
   corpus.  In detail the challenges are:

   o  Citation guidelines generally do not cover general and persistent
      referencing techniques for web resources that are not registered
      by Persistent Identifier systems (like DOI [DOI]).  However, an
      increasing number of references point to resources that only exist
      on the web, e.g. blogs that turned out to have a historical

Zierau                  Expires January 17, 2019                [Page 2]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

      impact.  In order to obtain persistency for a reference, the
      target need to be stable.  As the live web is 'alive' and in
      constant change, persistency can only be obtained by referring to
      archived snapshots of the web.  The PWID URN is therefore focused
      on referencing archived web material in a technology agnostic way
      (research documented in [IPRES] and [ResawRef]).

   o  There are many new initiatives for web archive referencing, - most
      of them are centralised solutions which offers harvest and
      referencing, but these cannot be used for existing materials in
      web archives.  Other initiatives only cover open web archives,
      which does not cover material in archives with restricted access
      and where there is a risk of imprecision if a resource in an
      alternative archive is the result of resolving such a resource.
      The PWID URN is needed in order to fill these gaps where other
      techniques are not sufficient.

   o  There are many different requirements for construction of
      collection definitions for web material besides precision and
      persistency.  Recent research have found that various legal and
      sustainability issues leads to a need for a collection to be
      defined by references to the web parts in the collection.  The
      PWID URN is needed in such definitions in order to fulfil these
      requirements and to enable a collection to cover web materials
      from more archives (Research documented in [ResawColl]).

   The PWID is especially useful for web material where precision is in
   focus and/or there are references to materials from web archives
   requiring special grants in order to gain access.  The precision
   regards both regards precise reference where there can be no doubt
   about that you have the correct web material as well as precision
   about what is actually referred by the reference (e.g. is it the page
   or the whole website)

   Furthermore the PWID is very useful in specification of contents of a
   web collection (also known as web corpus).  Definitions of web
   collections are often needed for extraction of data used in
   production of research results, e.g. for evaluations in the future.
   Current practices today are not persistent as they often use some CDX
   version, which vary for different implementations.

   For the sake of usability and sustainability, the definition of the
   PWID URN is focused on only having the minimum required information
   to make a precise identification of a resource in an arbitrary web
   archive.  Resent research have found that this is obtain by the
   following information [ResawRef]:

   o  Identification of web archive

Zierau                  Expires January 17, 2019                [Page 3]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

   o  Identification of source:

      *  Archived URI or identifier

      *  Archival timestamp

   o  Intended coverage (page, part, subsite etc.)

   The PWID URN represents this information in an unambiguous way, and
   thus enabling technical solutions to be defined in this URN.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

2.  Namespace Registration Template

   Namespace Identifier:

      PWID

   Version:

      3

   Date:

      2018-07-13

   Registrant:

      Eld Maj-Britt Olmuetz Zierau
      Royal Danish Library
      Soeren Kierkegaards Plads 1
      1219 Copenhagen
      Denmark
      ph: +45 9132 4690
      email: elzi@kb.dk

   Purpose:

      The purpose of the PWID URN is to represent general, global,
      sustainable, humanly readable, technology agnostic, persistent and
      precise web archive resource references in a way that:

Zierau                  Expires January 17, 2019                [Page 4]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

      *  can be used for technical solutions e.g. to make them
         resolvable

      *  can cover references to all sorts of materials in web archives

      *  can cover references to materials from all sort of web archives

      The motivation for defining a PWID namespace is the growing
      challenge of references to archived web resources, which the PWID
      as a URN can assist in overcoming.  The standard is needed to
      address web materials meeting precision and persistency issues on
      par precision in with traditional references for analogue
      material.  Furthermore, it is needed in order to address web
      archive resources that are not freely available online.  This
      regards both referencing of web resources from research papers and
      definition of web collection/corpus.  In detail the challenges
      are:

      *  Citation guidelines generally do not cover general and
         persistent referencing techniques for web resources that are
         not registered by Persistent Identifier systems (like DOI
         [DOI]).  However, an increasing number of references point to
         resources that only exist on the web, e.g. blogs that turned
         out to have a historical impact.  In order to obtain
         persistency for a reference, the target need to be stable.  As
         the live web is 'alive' and in constant change, persistency can
         only be obtained by referring to archived snapshots of the web.
         The PWID URN is therefore focused on referencing archived web
         material in a technology agnostic way (research documented in
         [IPRES] and [ResawRef]).

      *  There are many new initiatives for web archive referencing, -
         most of them are centralised solutions which offers harvest and
         referencing, but these cannot be used for existing materials in
         web archives.  Other initiatives only cover open web archives,
         which does not cover material in archives with restricted
         access and where there is a risk of imprecision if a resource
         in an alternative archive is the result of resolving such a
         resource.  The PWID URN is needed in order to fill these gaps
         where other techniques are not sufficient.

      *  There are many different requirements for construction of
         collection definitions for web material besides precision and
         persistency.  Recent research have found that various legal and
         sustainability issues leads to a need for a collection to be
         defined by references to the web parts in the collection.  The
         PWID URN is needed in such definitions in order to fulfil these

Zierau                  Expires January 17, 2019                [Page 5]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

         requirements and to enable a collection to cover web materials
         from more archives (research documented in [ResawColl]).

      The PWID is especially useful for web material where precision is
      in focus and/or there are references to materials from web
      archives requiring special grants in order to gain access.  The
      precision regards both regards precise reference where there can
      be no doubt about that you have the correct web material as well
      as precision about what is actually referred by the reference
      (e.g. is it the page or the whole website)

      Furthermore the PWID is very useful in specification of contents
      of a web collection (also known as web corpus).  Definitions of
      web collections are often needed for extraction of data used in
      production of research results, e.g. for evaluations in the
      future.  Current practices today are not persistent as they often
      use some CDX version, which vary for different implementations.

      Strict unambiguous syntax is needed for the PWID reference in
      order to ensure that it can be used for computational purposes.
      This is relevant for web collection definitions, which will need a
      strict syntax in order to be a basis for automatic extraction.
      Furthermore, readers of research papers are today expecting to be
      able to access a referenced resource by clicking an actionable
      URI, therefore a similar facility will be expected for references
      to available archived web material, which strict syntax can make
      possible.  Examples of technical solutions that is enabled by
      strict are:

      *  resolving of a references and automatic extraction of web
         collection defined by PWID URNs [ResawRef] [ResawColl]

      *  Resolving of a PWID reference by resolving services.  As a
         start, there is work on a prototype that can work for the
         Danish web archive data and open web archives with standard
         patterns for the current technologies.  There may come
         different implementations for resolving which may rely on
         different protocols and application

      The purpose of the PWID is also to express a web archive reference
      as simple as possible and at the same time meeting requirements
      for sustainability, usability and scope.  Therefore, the PWID URN
      is focused on only having the minimum required information to make
      a precise identification of a resource in an arbitrary web
      archive.  Resent research have found that this is obtain by the
      following information [ResawRef]:

      *  Identification of web archive

Zierau                  Expires January 17, 2019                [Page 6]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

      *  Identification of source:

         +  Archived URI or identifier

         +  Archival timestamp

      *  Intended coverage (page, part, subsite etc.)

      The PWID URN represents this information in an unambiguous way,
      and thus enabling technical solutions to be defined in this URN.

   Syntax:

      The syntax of the PWID URN is specified below in Augmented Backus-
      Naur Form (ABNF) [RFC5234] and it conforms to URN syntax defined
      in RFC 8141 [RFC8141].  The syntax definition of the PWID URN is:

           pwid-urn = "urn" ":" pwid-NID ":" pwid-NSS

           pwid-NID = "pwid"
           pwid-NSS = archive-id ":" archival-time ":" coverage-spec
                               ":" archived-item

           archive-id = +( unreserved )

           archival-time = full-date datetime-delim full-pwid-time
           datetime-delim = "T"
           full-pwid-time = time-hour [":"] time-minute
                                     [":"] time-second "Z"

           coverage-spec = "part" / "page" / "subsite" / "site"
                    / "collection" / "recording" / "snapshot"
                    / "other"

           archived-item = URI / archived-item-id
           archived-item-id = +( unreserved )

      where

      *  'unreserved' is defined as in RFC 3986 [RFC3986]

      *  'coverage-spec' values are not case sensitive (i.e.  "PAGE" /
         "PART" / "PaGe" / ... are valid values as well.)

      *  'archival-time' is a UTC timestamp conforming to the W3C
         profile ISO 8601 [ISO8601] (also defined in RFC 3339
         [RFC3339]), with a few exception.  It has to be a UTC timestamp
         in order to conform with web archiving practices, which always

Zierau                  Expires January 17, 2019                [Page 7]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

         uses UTC in order to avoid confusions.  The 'full-date' is
         defined as in RFC 3339 [RFC3339].  The 'archival-time' must
         represent the time specified in the archive, and can therefore
         be specified at any of the levels of granularity as described
         in [W3CDTF] and in accordance with teh WARC standard ISO 28500
         [ISO28500].

         In line with RFC 3339 [RFC3339] the "T" may alternatively be
         lower case "t".

         'time-hour', 'time-minute' and 'time-second' are defined as in
         RFC 3339 [RFC3339].

         In line with RFC 3339 [RFC3339] the "Z" may alternatively be
         lower case "z".

      *  'URI' is defined as in RFC 3986 [RFC3986]

      The 'coverage-spec' defines the type of archived item, serving as
      a precision to what is referred:

      *  part
         the single archived element, e.g. a pdf, a html text, an image

      *  page
         the full context as a page, e.g. a html page with referred
         images

      *  subsite
         the full context as a subsite within its domain, e.g. a
         document represented in a web structure

      *  site
         the full context as a site within its domain

      *  collection
         a collection/corpora definition, e.g. defined as descibed in
         [ResawColl]

      *  snapshot
         a snapshot (image) representation of web material, e.g. a web
         page

      *  recording
         a recording of a web browsing

      *  other
         if something else

Zierau                  Expires January 17, 2019                [Page 8]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

   Assignment:

      The PWID URNs does not have to be assigned by an authority, as
      they are based on the information created at the time of
      archiving:

      *  Identification of web archive

      *  Identification of source:

         +  Archived URI or identifier

         +  Archival timestamp

      *  Intended coverage (page, part, subsite etc.)

      The rest of the PWID URN

      *  Intended coverage (page, part, subsite etc.)

      is specifying what the user of the PWID URN wants to be focused on
      - and may later be used for how a resource is displayed.  However
      it is not part of the actual location of the resource.

      In other words: the PWID URNs are created independently, but
      following an algorithm that itself guarantees uniqueness.

      In this version of the standard, it is recommededto use the web
      domain as the identifier for the web archive.  This is
      recommended, since it currently implicitly provides information
      about the web archive.  Furthermore, it is more precise than e.g.
      the name of the archive, since there may be more than one
      installation of web archives in the same organisation, e.g.
      archive.org and archive-it.org are both covered by Internet
      Archive.

      Currently, there is also a prototype for a SOLR-Wayback tool
      (Source at https://github.com/netarchivesuite/solrwayback)
      [PWIDprovider], which can assist in finding the most precise
      reference to an archived web page by provideing all PWIDs belongig
      to it.  For example, in archive: netarkivet.dk, archived URI:
      http://www.susanlegetoej.dk/shop/handskedyr-siameser-killing-
      8681p.html archiving time: 2008-11-29 01:19:16 UTC, [web page],
      has the parts:

         urn:pwid:netarkivet.dk:2008-11-
         29T00:41:42Z:part:http://www.susanlegetoej.dk/images/ddcss/
         SK113_Master_NF.css

Zierau                  Expires January 17, 2019                [Page 9]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

         urn:pwid:netarkivet.dk:2008-11-
         29T00:39:47Z:part:http://www.susanlegetoej.dk/shop/css/
         print.css

         urn:pwid:netarkivet.dk:2008-11-
         29T00:40:06Z:part:http://www.susanlegetoej.dk/images/ddcss/
         SK113_Basket_NF.css

         urn:pwid:netarkivet.dk:2008-11-
         29T00:40:00Z:part:http://www.susanlegetoej.dk/images/ddcss/
         SK113_TopMenu_NF.css

         urn:pwid:netarkivet.dk:2008-11-
         29T00:40:00Z:part:http://www.susanlegetoej.dk/images/ddcss/
         SK113_SearchPage_NF.css

         urn:pwid:netarkivet.dk:2008-11-
         29T00:40:35Z:part:http://www.susanlegetoej.dk/images/ddcss/
         SK113_Productmenu_NF.css

         urn:pwid:netarkivet.dk:2008-11-
         29T00:40:22Z:part:http://www.susanlegetoej.dk/images/ddcss/
         SK113_SpaceTop_NF.css

         urn:pwid:netarkivet.dk:2008-11-
         29T00:40:24Z:part:http://www.susanlegetoej.dk/images/ddcss/
         SK113_SpaceLeft_NF.css

         urn:pwid:netarkivet.dk:2008-11-
         29T00:40:23Z:part:http://www.susanlegetoej.dk/images/ddcss/
         SK113_SpaceBottom_NF.css

         urn:pwid:netarkivet.dk:2008-11-
         29T00:40:25Z:part:http://www.susanlegetoej.dk/images/ddcss/
         SK113_SpaceRight_NF.css

         urn:pwid:netarkivet.dk:2008-11-
         29T00:37:23Z:part:http://www.susanlegetoej.dk/images/ddcss/
         SK113_ProductInfo_NF.css

         urn:pwid:netarkivet.dk:2008-11-
         29T00:37:24Z:part:http://www.susanlegetoej.dk/Shop/js/
         Variants.js

         urn:pwid:netarkivet.dk:2009-03-
         03T11:53:00Z:part:http://www.susanlegetoej.dk/Shop/js/Media.js

Zierau                  Expires January 17, 2019               [Page 10]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

         urn:pwid:netarkivet.dk:2009-03-
         03T11:53:02Z:part:http://www.susanlegetoej.dk/images/design/
         print.gif

         urn:pwid:netarkivet.dk:2009-03-
         03T11:54:19Z:part:http://www.susanlegetoej.dk/Shop/js/Scroll.js

         urn:pwid:netarkivet.dk:2009-03-
         03T11:54:09Z:part:http://www.susanlegetoej.dk/Shop/js/
         Shop5Common.js

         urn:pwid:netarkivet.dk:2006-11-
         20T20:16:03Z:part:http://www.susanlegetoej.dk/images/602551.jpg

      On long term, there should be created a registry that keeps track
      of identifiers of archives over time, since they are likely to
      change names, merge etc. when taking about a 100 year period.

   Security and Privacy:

      Security and privacy considerations are restricted to accessible
      web resources in web archives.  If resolvers to PWID URNs are
      created, there should be made an analysis of whether they can be
      restricted to the former mentioned registry of web archives.
      Security and privacy will then be a question of security and
      privacy considerations related to the web archive resources.

   Interoperability:

      This is covered by comments in the Syntax description:

      *  the PWID URN conforms to the URI standard defined as in RFC
         3986 [RFC3986] and the URN standard RFC 8141 [RFC8141]

      *  the 'archival-time' of the PWID URN conforms to the URI
         standard defined as in RFC 3986 [RFC3986]W3C profile ISO 8601
         [ISO8601] (also defined in RFC 3339 [RFC3339]) and to the WARC
         standard ISO 28500 [ISO28500] using UTC dates only

      *  the 'archived-item' is a URI which conforms to the URI standard
         defined as in RFC 3986 [RFC3986]

   Resolution:

      The information in a PWID URN can be used for locating a web
      archive resource, for any kind of web archive.  It includes the
      minimum information for web archive materials, which enables
      resolvability, manually or by a resolver. esolution of a PWID URN

Zierau                  Expires January 17, 2019               [Page 11]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

      is the primary motivation of making a formal URN definition,
      instead of just textual representation of the for needed parts of
      a PWID.

      A resolving service is currently available in form of code for a
      prototype which run at the Royal Danish Library [PWIDresolver] and
      is planned to be more broudly available or can be installed
      locally.  This service currently covers bothe the Danish web
      archives (with the proper rights) and open web archives with
      access sevices based on a patterns including archive, archival
      time and archived URI.  In other words, for open web archives it
      covers conversion of PWIDs for: archive.org, archive-it.org,
      arquivo.pt, bibalex.org, nationalarchives.gov.uk, stanford.edu and
      vefsafn.is.  The source code for this prototyppe is available from
      https://github.com/netarchivesuite/NAS-research/releases/
      tag/0.0.6.

      Resolution (manually or automatically) is done based on the PWID
      parts:

      *  Web archive identification
         to find the archive holding the material

      *  Archived URI or identifier of item
         as part of identifying the material

      *  Date and time associated with the archived URI/item
         as part of precise identification of the material

      *  Coverage of what is referred
         as part of clarification of what the referred material covers
         (page, part etc.)

      in the following the different resolution techniques are explained
      (manual as well as via a service) An example of a PWID URN is:

         urn:pwid:archive.org:2016-01-22T11:20:29Z:page:http://www.dr.dk

      has the information:

      *  archive.org
         currently known identifier in form of the Internet Archive
         domian name for their open access web archive

      *  2016-01-22T11:20:29Z
         UTC date and time associated with the archived URI

      *  page

Zierau                  Expires January 17, 2019               [Page 12]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

         clarification that the reference cover the full web page with
         all its inherited parts selected by the web archive

      *  http://www.dr.dk
         archived URI of item

      With knowledge of the current (2017) Internet Archive open access
      web interface having the form:

         https://web.archive.org/web/<time>/<uri>

      We can manually (or technically) deduce an actual (current 2017)
      access https address:

         https://web.archive.org/web/20160122112029/http://www.dr.dk

      and regard the referred web part as the reference in the way that
      the content coverage specifies, i.e. for a webpage the value
      'part' would mean the html of the web page, 'webpage' would mean
      the resoult of the web archive rendering the referred html as a
      web page etc.

      The same recipe can be used for other Wayback platforms - and
      possibly also other web archive access tools platforms, as the
      crucial information is date and URI, which are requested to be
      looked up in a specified archive.

      Note that this also includes access to archives that are only
      accessible via a local proxy to a restricted environment (which
      the current prototype does for references to the Danish
      Netarkivet).  Here the difference is that the archive information
      is used to identify the local environment used (possibly on-site)
      and then construct local http/https address based on knowledge
      from the local access installation.

      Automatic access of a referenced web resource may work on the open
      net for open web archive or in restricted environments for the web
      archives with restricted access.  There may be a need for varied
      operation depending on the available technology and applications,
      e.g.:

      *  Via locally installed browser plug-ins or applications forming
         http/https URIs:

         +  http/https URIs for standard web archive interfaces
            At this stage there are initiatives on streamlined and
            standardize APIs to web archives interfaces, - and in case
            such APIs will be implemented generally, it may be used for

Zierau                  Expires January 17, 2019               [Page 13]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

            resolving of the PWID URNs.  This could be on form (denoting
            pwid parts in <> using syntax names):

               https://<archive-id>/pwid?time=<archival-
               time>&coverage=<coverage-spec>&item=<archived-item>

            The example from previous section would then resolve by

               https://archive.org/pwid?time=2016-01-22T11:20:29Z&covera
               ge=page&item=http://www.dr.dk

         +  http/https URIs for archive material for individual web
            archives
            Using the current open access http/https address pattern for
            the individual web archives, which for the example is

               https://web.archive.org/web/20160122112029/
               http://www.dr.dk

            This would require a registry of the different patterns for
            the individual web archives

      *  Via web research infrastructures
         this is a future solution scenario as a web archive research
         infrastructure do not yet exists.  However, it is a likely
         future scenario, as it is currently being proposed in the RESAW
         community [RESAW].  The PWID URN resolving could in such cases
         be a question of starting a special application, as for the
         'mailto' scheme RFC 6068 [RFC6068].

   Documentation:

      None relevant

   Additional Information:

      The PWID was originally suggested as a URI based on research
      between a computer science researcher with know of web archiving
      and researchers from humanity subject (History and Literature).
      This resulted in the paper "Persistent Web References - Best
      Practices and New Suggestions" [IPRES] from the iPres 2016
      conference.  In this paper the PWID is referred to as WPID.
      However, one of the feedbacks has been a concern that WPID was
      interpreted as a PID related to a PID-system, e.g. as the DOI.
      All though PID does not have a precise definition that makes it
      wrong to call it a "WPID.  The danger is that it is confused with
      PID systems, which is not the intension.  Consequently, this
      suggestion names the PWID instead.

Zierau                  Expires January 17, 2019               [Page 14]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

      The comments on the drafted PWID URI ([DraftPwidUri]) has been
      that is seems to be a URN rather than a URI.  Which is the reason
      why it is now suggested as a URN, although there is a danger that
      users of the reference style can be confused by the the additional
      "urn:" prefix.

      At the RESAW 2017 conference there are two related papers: One on
      referencing practices [ResawRef] and one on research data
      management practices [ResawColl].  This practice is also planned
      to be used for Danish web collections.

      The interest for this new PWID has already been shown.  There was
      a lot of response at iPRES.  Especially at the RESAW 2017
      conference, web researchers from digital humanities have expressed
      strong interest in the PWID, since it can fill a gap and make it
      possible for them to make all the references they need to make.
      Therefore, the ambition is to make the PWID URN namespace
      definition a constituent part of a standard being developed in the
      IETF or some other recognized standards body.  The textual version
      of the PWID is also suggested in a textual form in a draft of the
      revision of the ISO 690 reference standard.

   Revision Information:

      This is the third version of PWID as a URN, where prototypes for
      resolving a PWID and getting PWIDs for a web page has been added
      and explained.  Furthermore, it has been made more clear where the
      PWID URN makes a difference and "closed archives" have to
      "archives with restricted access"

3.  Acknowledgements

   A special thanks to Caroline Nyvang and Thomas Kromann who have
   contributed to the research identifying the minimum information
   required in a persistent web reference, and to Bolette Jurik
   contributed with supplementary research concerning requirements for
   web collection/copora definitions.  Also thanks to all that have
   contributed to this work with the research and reviewing this RFC.

4.  References

4.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

Zierau                  Expires January 17, 2019               [Page 15]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
              Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002,
              <https://www.rfc-editor.org/info/rfc3339>.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66,
              RFC 3986, DOI 10.17487/RFC3986, January 2005,
              <https://www.rfc-editor.org/info/rfc3986>.

   [RFC5234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", STD 68, RFC 5234,
              DOI 10.17487/RFC5234, January 2008,
              <https://www.rfc-editor.org/info/rfc5234>.

   [RFC8141]  Saint-Andre, P. and J. Klensin, "Uniform Resource Names
              (URNs)", RFC 8141, DOI 10.17487/RFC8141, April 2017,
              <https://www.rfc-editor.org/info/rfc8141>.

4.2.  Informative References

   [DOI]      International DOI Foundation, "The DOI System", 2016,
              <https://web.archive.org/web/20161020222635/
              https:/www.doi.org/>.

              urn:pwid:archive.org:2016-10-20T22:26:35:site:https://www.
              doi.org/

   [DraftPwidUri]
              Zierau, E., "DRAFT: Scheme Specification for the pwid URI,
              version 4", June 2018, <https://datatracker.ietf.org/doc/
              draft-pwid-uri-specification/>.

   [IPRES]    Zierau, E., Nyvang, C., and T. Kromann, "Persistent Web
              References - Best Practices and New Suggestions", October
              2016, <http://www.ipres2016.ch/frontend/organizers/media/
              iPRES2016/_PDF/
              IPR16.Proceedings_4_Web_Broschuere_Link.pdf>.

              In: proceedings of the 13th International Conference on
              Preservation of Digital Objects (iPres) 2016, pp. 237-246

   [ISO28500]
              International Organization for Standardization,
              "Information and documentation -- WARC file format", 2017,
              <https://www.iso.org/standard/68004.html>.

Zierau                  Expires January 17, 2019               [Page 16]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

   [ISO8601]  International Organization for Standardization, "Data
              elements and interchange formats -- Information
              interchange -- Representation of dates and times", 2004,
              <https://www.iso.org/standard/40874.html>.

   [PWIDprovider]
              Royal Danish Library (Netarkivet), "SolrWayback 3.1",
              2018, <https://github.com/netarchivesuite/solrwayback>.

              urn:pwid:archive.org:2018-06-
              11T02:00:05Z:page:https://github.com/netarchivesuite/
              solrwayback

   [PWIDresolver]
              Royal Danish Library (Netarkivet), "Date and Time Formats:
              note submitted to the W3C. 15 September 1997", 2018,
              <https://github.com/netarchivesuite/NAS-research/releases/
              tag/0.0.6>.

              urn:pwid:archive.org:2018-07-
              16T06:53:51Z:page:https://github.com/netarchivesuite/NAS-
              research/releases/tag/0.0.6

   [RESAW]    The Resaw Community, "A Research infrastructure for the
              Study of Archived Web materials", 2017,
              <https://web.archive.org/web/20170529113150/
              http://resaw.eu/>.

              pwid:archive.org:2017-05-29T11:31:50Z:site:http://resaw.eu
              /

   [ResawColl]
              Jurik, B. and E. Zierau, "Data Management of Web archive
              Research Data", 2017,
              <https://archivedweb.blogs.sas.ac.uk/files/2017/06/
              RESAW2017-JurikZierau-
              Data_management_of_web_archive_research_data.pdf>.

              In: proceedings of the RESAW 2017 Conference, DOI:
              10.14296/resaw.0002

   [ResawRef]
              Nyvang, C., Kromann, T., and E. Zierau, "Capturing the Web
              at Large - a Critique of Current Web Referencing
              Practices", 2017,
              <https://archivedweb.blogs.sas.ac.uk/files/2017/06/
              RESAW2017-NyvangKromannZierau-
              Capturing_the_web_at_large.pdf>.

Zierau                  Expires January 17, 2019               [Page 17]
Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace   July 2018

              In: proceedings of the RESAW 2017 Conference, DOI:
              10.14296/resaw.0004

   [RFC6068]  Duerst, M., Masinter, L., and J. Zawinski, "The 'mailto'
              URI Scheme", RFC 6068, DOI 10.17487/RFC6068, October 2010,
              <https://www.rfc-editor.org/info/rfc6068>.

   [W3CDTF]   W3C, "Date and Time Formats: note submitted to the W3C. 15
              September 1997", 1997,
              <http://www.w3.org/TR/NOTE-datetime>.

              W3C profile of ISO 8601 urn:pwid:archive.org:2017-04-
              03T03:37:42Z:page:http://www.w3.org/TR/NOTE-datetime

Author's Address

   Eld Maj-Britt Olmuetz Zierau (editor)
   Royal Danish Library
   Soeren Kierkegaards Plads 1
   Copenhagen  1219
   Denmark

   Phone: +45 9132 4690
   Email: elzi@kb.dk

Zierau                  Expires January 17, 2019               [Page 18]