Digest Values for DOM (DOMHASH)
RFC 2803
Network Working Group H. Maruyama
Request for Comments: 2803 K. Tamura
Category: Informational N. Uramoto
IBM
April 2000
Digest Values for DOM (DOMHASH)
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2000). All Rights Reserved.
Abstract
This memo defines a clear and unambiguous definition of digest (hash)
values of the XML objects regardless of the surface string variation
of XML. This definition can be used for XML digital signature as well
efficient replication of XML objects.
Table of Contents
1. Introduction............................................2
2. Digest Calculation......................................3
2.1. Overview..............................................3
2.2. Namespace Considerations..............................4
2.3. Definition with Code Fragments........................5
2.3.1. Text Nodes..........................................5
2.3.2. Processing Instruction Nodes........................6
2.3.3. Attr Nodes..........................................6
2.3.4. Element Nodes.......................................7
2.3.5. Document Nodes......................................9
3. Discussion..............................................9
4. Security Considerations.................................9
References................................................10
Authors' Addresses........................................10
Full Copyright Statement..................................11
Maruyama, et al. Informational [Page 1]
RFC 2803 Digest Values for DOM (DOMHASH) April 2000
1. Introduction
The purpose of this document is to give a clear and unambiguous
definition of digest (hash) values of the XML objects [XML]. Two
subtrees are considered identical if their hash values are the same,
and different if their hash values are different.
There are at least two usage scenarios of DOMHASH. One is as a basis
for digital signatures for XML. Digital signature algorithms normally
require hashing a signed content before signing. DOMHASH provides a
concrete definition of the hash value calculation.
The other is to use DOMHASH when synchronizing two DOM structures
[DOM]. Suppose that a server program generates a DOM structure which
is to be rendered by clients. If the server makes frequent small
changes on a large DOM tree, it is desirable that only the modified
parts are sent over to the client. A client can initiate a request by
sending the root hash value of the structure in the cache memory. If
it matches with the root hash value of the current server structure,
nothing needs be sent. If not, then the server compares the client
hash with the older versions in the server's cache. If it finds one
that matches the client's version of the structure, then it locates
differences with the current version by recursively comparing the
hash values of each node. This way, the client can receive only an
updated portion of a large structure without requesting the whole
thing.
One way of defining digest values is to take a surface string as the
input for a digest algorithm. However, this approach has several
drawbacks. The same internal DOM structure may be represented in may
different ways as surface strings even if they strictly conform to
the XML specification. Treatment of white spaces, selection of
character encodings, entity references (i.e., use of ampersands), and
so on have impact on the generation of a surface string. If the
implementations of surface string generation are different, the hash
values would be different, resulting in unvalidatable digital
signatures and unsuccessful detection of identical DOM structures.
Therefore, it is desirable that digest of DOM is defined in the DOM
terms -- that is, as an unambiguous algorithm operating on a DOM
tree. This is the approach we take in this specification.
Introduction of namespace is another source of variation of surface
string because different namespace prefixes can be used for
representing the same namespace URI [URI]. In the following example,
the namespace prefix "edi" is bound to the URI
"http://ecommerce.org/schema" but this prefix can be arbitrary chosen
without changing the logical contents as shown in the second example.
Maruyama, et al. Informational [Page 2]
RFC 2803 Digest Values for DOM (DOMHASH) April 2000
Show full document text