Canonical XML Version 1.0
RFC 3076
|
Document |
Type |
|
RFC - Informational
(March 2001; No errata)
|
|
Author |
|
John Boyer
|
|
Last updated |
|
2013-03-02
|
|
Stream |
|
IETF
|
|
Formats |
|
plain text
html
pdf
htmlized
bibtex
|
Stream |
WG state
|
|
(None)
|
|
Document shepherd |
|
No shepherd assigned
|
IESG |
IESG state |
|
RFC 3076 (Informational)
|
|
Consensus Boilerplate |
|
Unknown
|
|
Telechat date |
|
|
|
Responsible AD |
|
(None)
|
|
Send notices to |
|
(None)
|
Network Working Group J. Boyer
Request for Comments: 3076 PureEdge Solutions Inc.
Category: Informational March 2001
Canonical XML Version 1.0
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2001). All Rights Reserved.
Abstract
Any XML (Extensible Markup Language) document is part of a set of XML
documents that are logically equivalent within an application
context, but which vary in physical representation based on syntactic
changes permitted by XML 1.0 and Namespaces in XML. This
specification describes a method for generating a physical
representation, the canonical form, of an XML document that accounts
for the permissible changes. Except for limitations regarding a few
unusual cases, if two documents have the same canonical form, then
the two documents are logically equivalent within the given
application context. Note that two documents may have differing
canonical forms yet still be equivalent in a given context based on
application-specific equivalence rules for which no generalized XML
specification could account.
Boyer Informational [Page 1]
RFC 3076 Canonical XML March 2001
Table of Contents
1. Introduction............................................... 2
1.1 Terminology............................................... 3
1.2 Applications.............................................. 4
1.3 Limitations............................................... 4
2. XML Canonicalization....................................... 6
2.1 Data Model................................................ 6
2.2 Document Order............................................ 10
2.3 Processing Model.......................................... 10
2.4 Document Subsets.......................................... 13
3. Examples of XML Canonicalization........................... 14
3.1 PIs, Comments, and Outside of Document Element............ 14
3.2 Whitespace in Document Content............................ 15
3.3 Start and End Tags........................................ 16
3.4 Character Modifications and Character References.......... 17
3.5 Entity References......................................... 19
3.6 UTF-8 Encoding............................................ 19
3.7 Document Subsets.......................................... 20
4. Resolutions................................................ 21
4.1 No XML Declaration........................................ 21
4.2 No Character Model Normalization.......................... 21
4.3 Handling of Whitespace Outside Document Element........... 22
4.4 No Namespace Prefix Rewriting............................. 22
4.5 Order of Namespace Declarations and Attributes............ 23
4.6 Superfluous Namespace Declarations........................ 23
4.7 Propagation of Default Namespace Declaration in Document
Subsets................................................... 24
4.8 Sorting Attributes by Namespace URI....................... 24
Security Considerations....................................... 24
References.................................................... 25
Author's Address.............................................. 26
Acknowledgements.............................................. 27
Full Copyright Statement...................................... 28
1. Introduction
The XML 1.0 Recommendation [XML] specifies the syntax of a class of
resources called XML documents. The Namespaces in XML Recommendation
[Names] specifies additional syntax and semantics for XML documents.
It is possible for XML documents which are equivalent for the
purposes of many applications to differ in physical representation.
For example, they may differ in their entity structure, attribute
ordering, and character encoding. It is the goal of this
specification to establish a method for determining whether two
documents are identical, or whether an application has not changed a
document, except for transformations permitted by XML 1.0 and
Namespaces.
Boyer Informational [Page 2]
RFC 3076 Canonical XML March 2001
1.1 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [Keywords].
See [Names] for the definition of QName.
A document subset is a portion of an XML document indicated by a
Show full document text