Representing Label Generation Rulesets using XML
draft-davies-idntables-07
The information below is for an old version of the document.
Document | Type |
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
|
|
---|---|---|---|
Authors | Kim Davies , Asmus Freytag | ||
Last updated | 2014-03-26 | ||
Replaced by | draft-ietf-lager-specification, RFC 7940 | ||
RFC stream | (None) | ||
Formats | |||
Additional resources | |||
Stream | Stream state | (No stream defined) | |
Consensus boilerplate | Unknown | ||
RFC Editor Note | (None) | ||
IESG | IESG state | I-D Exists | |
Telechat date | (None) | ||
Responsible AD | (None) | ||
Send notices to | (None) |
draft-davies-idntables-07
> <date>2010-01-01</date> <language>sv</language> <domain>example</domain> <validity-start>2010-01-01</validity-start> <validity-end>2013-12-31</validity-end> <description type="text/html"> <![CDATA[ This language table was developed with the <a href="http://swedish.example/">Swedish examples institute</a>. ]]> </description> <unicode-version>6.3.0</unicode-version> <references> Davies & Freytag Expires September 27, 2014 [Page 46] Internet-Draft Label Generation Rulesets in XML March 2014 <reference id="0" comment="the most recent" >The Unicode Standard 6.2</reference> <reference id="1" >RFC 5892</reference> <reference id="2" >Big-5: Computer Chinese Glyph and Character Code Mapping Table, Technical Report C-26, 1984</reference> </references> </meta> <!-- the data section describing the repertoire --> <data> <!-- single code point "char" element --> <char cp="002D" ref="1" comment="HYPHEN" /> <!-- range elements for contiguous code points, with tags --> <range first-cp="0030" last-cp="0039" ref="1" tag="digit" /> <range first-cp="0061" last-cp="007A" ref ="1" tag="letter" /> <!-- code point sequence --> <char cp="006C 00B7 006C" comment="catalan middle dot" /> <!-- alternatively use a when rule --> <char cp="00B7" when="catalan-middle-dot" /> <!-- code point with context rule --> <char cp="200D" when="joiner" ref="2" /> <!-- code points with variants --> <char cp="4E16" tag="preferred" ref="0"> <var cp="4E17" disp="block" ref="2" /> <var cp="534B" disp="allocate" ref="2" /> </char> <char cp="4E17" ref="0"> <var cp="4E16" disp="allocate" ref="2" /> <var cp="534B" disp="allocate" ref="2" /> </char> <char cp="534B" ref="0"> <var cp="4E16" disp="allocate" ref="2" /> <var cp="4E17" disp="block" ref="2" /> </char> </data> <!-- Context and whole label rules --> <rules> <!-- Require the given code point to be between two 006C --> <rule name="catalan-middle-dot" ref="0"> <look-behind> <char cp="006C" /> </look-behind> Davies & Freytag Expires September 27, 2014 [Page 47] Internet-Draft Label Generation Rulesets in XML March 2014 <anchor /> <look-ahead> <char cp="006C" /> </look-ahead> </rule> <!-- example of a context rule based on property --> <class name="virama" property="ccc:9" /> <rule name="joiner" ref="1" > <look-behind> <class by-ref="virama" /> </look-behind> <anchor /> </rule> <!-- example of using set operators --> <!-- Subtract vowels from letters to get consonant, demonstrating the different set notations and the difference operator --> <difference name="consonants"> <!-- use standard notation --> <class comment="all letters"> <char cp="0061" /> <range first-cp="0062" last-cp="0072" /> </class> <!-- use shorthand notation --> <class comment="all vowels"> 0061 0065 0069 006F 0075-0075 </class> </difference> <!-- by using the start and end, rule matches whole label --> <rule name="three-or-more-consonants"> <start /> <!-- reference the class defined by the difference and require three or more matches --> <class by-ref="consonants" count="3+" /> <end /> </rule> <!-- rule for negative matching --> <rule name="non-preferred" comment="matches any non-preferred code point"> <complement comment="non-preferred" > <class from-tag="preferred" /> </complement> </rule> Davies & Freytag Expires September 27, 2014 [Page 48] Internet-Draft Label Generation Rulesets in XML March 2014 <!-- actions triggered by matching rules and/or variant dispositions --> <action disp="consonants" match="three-or-more-consonants" /> <action disp="block" any-variant="block" /> <action disp="activate" all-variants="allocate" not-match="non-preferred" /> </rules> </lgr> Davies & Freytag Expires September 27, 2014 [Page 49] Internet-Draft Label Generation Rulesets in XML March 2014 Appendix B. How to Translate RFC 3743 based Tables into the XML Format As a background, the [RFC3743] rules work as follows: 1. The Original (requested) label is checked to make sure that all the code points are a subset of the repertoire. 2. If it passes the check, the Original label is allocatable. 3. Generate the all-simplified and all-traditional variant labels (union of all the labels generated using all the simplified variants of the code points) for allocation. To illustrate by example, here is one of the more complicated set of variants: U+4E7E U+4E81 U+5E72 U+5E79 U+69A6 U+6F27 The following shows the relevant section of the Chinese language table published by the .ASIA registry [ASIA-TABLE]. Its entries read: <codepoint>;<simpl-variant(s)>;<trad-variant(s)>;<other-variant(s)> These are the lines corresponding to the set of variants listed above U+4E7E;U+4E7E,U+5E72;U+4E7E;U+4E81,U+5E72,U+6F27,U+5E79,U+69A6 U+4E81;U+5E72;U+4E7E;U+5E72,U+6F27,U+5E79,U+69A6 U+5E72;U+5E72;U+5E72,U+4E7E,U+5E79;U+4E7E,U+4E81,U+69A6,U+6F27 U+5E79;U+5E72;U+5E79;U+69A6,U+4E7E,U+4E81,U+6F27 U+69A6;U+5E72;U+69A6;U+5E79,U+4E7E,U+4E81,U+6F27 U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6 The corresponding data section XML format would look like this: <data> <char cp="4E7E"> <var cp="4E7E" disp="both" comment="identity" /> <var cp="4E81" disp="block" /> <var cp="5E72" disp="simp" /> <var cp="5E79" disp="block" /> <var cp="69A6" disp="block" /> <var cp="6F27" disp="block" /> Davies & Freytag Expires September 27, 2014 [Page 50] Internet-Draft Label Generation Rulesets in XML March 2014 </char> <char cp="4E81"> <var cp="4E7E" disp="trad" /> <var cp="5E72" disp="simp" /> <var cp="5E79" disp="block" /> <var cp="69A6" disp="block" /> <var cp="6F27" disp="block" /> </char> <char cp="5E72"> <var cp="4E7E" disp="trad"/> <var cp="4E81" disp="block"/> <var cp="5E72" disp="both" comment="identity"/> <var cp="5E79" disp="trad"/> <var cp="69A6" disp="block"/> <var cp="6F27" disp="block"/> </char> <char cp="5E79"> <var cp="4E7E" disp="block"/> <var cp="4E81" disp="block"/> <var cp="5E72" disp="simp"/> <var cp="5E79" disp="trad" comment="identity"/> <var cp="69A6" disp="block"/> <var cp="6F27" disp="block"/> </char> <char cp="69A6"> <var cp="4E7E" disp="block"/> <var cp="4E81" disp="block"/> <var cp="5E72" disp="simp"/> <var cp="5E79" disp="block"/> <var cp="69A6" disp="trad" comment="identity"/> <var cp="6F27" disp="block"/> </char> <char cp="6F27"> <var cp="4E7E" disp="simp"/> <var cp="4E81" disp="block"/> <var cp="5E72" disp="block"/> <var cp="5E79" disp="block"/> <var cp="69A6" disp="block"/> <var cp="6F27" disp="trad" comment="identity"/> </char> </data> Here the simplified variants have been given a disposition of "simp", the traditional variants one of "trad" and all other ones are given "block". Note that some variant mappings map to themselves (identity), that is the mapping is reflexive (see Section 4.2.5). In creating the Davies & Freytag Expires September 27, 2014 [Page 51] Internet-Draft Label Generation Rulesets in XML March 2014 permutation of all variant labels, these mappings have no effect, other than adding a value to the variant disposition list for the variant label containing them. Because some variant mappings show in more than one column, while the XML format allows only a single disposition value, they have been given the disposition of "both". In the example so far, all of these are also mappings where source and target are identical that is, reflexive mappings as defined in Section 4.2.5. Given a label "U+4E7E U+4E81", the following labels would be ruled allocatable under [RFC3743] based on how that standard is commonly implemented in domain registries: Original label: U+4E7E U+4E81 Simplified label 1: U+4E7E U+5E72 Simplified label 2: U+5E72 U+5E72 Traditional label: U+4E7E U+4E7E However, If we generated allocatable labels without regard to the simplified-to-traditional variants, we would end up with an extra allocatable label: "U+5E72 U+4E7E". That label is comprised of an SC character and a TC character which shouldn't be allocatable, but it would be the result of a straight permutation of all variants with disposition other than disp="block". To more fully resolve the dispositions requires several actions to be defined as described in Section 6.2.2. After blocking all labels that contain a variant with disposition "block", these actions will first allocate all labels that consist entirely of variants (including variants with reflexive mappings) that are "simp" or "both", then do likewise for labels that are entirely "trad" or "both". All surviving labels containing any one of the dispositions "simp" or "trad" are now known to be part of an undesirable mixed simplified/traditional label and are blocked. Finally, the remaining labels must be code points without variants or reflexive variants of type "both", in other words, the original label. Davies & Freytag Expires September 27, 2014 [Page 52] Internet-Draft Label Generation Rulesets in XML March 2014 <rules> <!--Action elements - order defines precedence--> <action disp="block" any-variant="block" comment="filter out by blocked code point" /> <action disp="allocate" only-variants="simp both" comment="only allocate if simplified variant including reflexive (identity) mapping" /> <action disp="allocate" only-variants="trad both" comment="only allocate if traditional variant, including reflexive (identity) mapping" /> <action disp="block" any-variant="simp trad" comment="filter out any remaining variant code point" /> <action disp="activate" comment="surviving labels must be original labels" /> </rules> In the example above, variants with the disposition "both" occur only as part of identity mappings (as pointed out in the comments). The scheme described so far relies on the assumption that this is always the case. However, consider the following set of variants: U+62E0;U+636E;U+636E;U+64DA U+636E;U+636E;U+64DA;U+62E0 U+64DA;U+636E;U+64DA;U+62E0 for which the corresponding XML would be: <char cp="62E0"> <var cp="636E" disp="both" comment=" BOTH, but NOT identity" /> <var cp="64DA" disp="block" /> </char> <char cp="636E"> <var cp="636E" disp="simp" comment="identity, but not BOTH" /> <var cp="64DA" disp="trad" /> <var cp="62E0" disp="block" /> </char> <char cp="64DA"> <var cp="636E" disp="simp" /> <var cp="64DA" disp="trad" comment="identity" /> <var cp="62E0" disp="block" /> </char> What is needed to make such variant sets work is a way to capture when a disposition is associated with an identity or reflexive mapping, and when it is associated with an ordinary variant mapping. Davies & Freytag Expires September 27, 2014 [Page 53] Internet-Draft Label Generation Rulesets in XML March 2014 This can be done by adding a prefix "i-" in front of the disposition whenever the mapping is an identity mapping, for example the last "trad" in the preceding figure would become "i-trad". With all the dispositions prepared in this way, only a slight modification to the actions is needed to yield the correct set of allocatable labels: <action disp="block" any-variant="block" /> <action disp="allocate" only-variants="simp i-simp both i-both" /> <action disp="allocate" only-variants="trad i-trad both i-both" /> <action disp="block" all-variants="simp trad both" /> <action disp="allocate" /> The first three actions get triggered by the same labels as before. The fourth action blocks any label that combines an original code point with any of the variant mappings, yet lets through all labels that are a combination of only original code points (everything having either no variant mapping or one of the identity mappings). These are the original labels and they are allocated in the last action. With this modification all RFC 3743-style tables can be converted to XML and, by using the above set of actions, the result will be that all variants consisting completely of variants preferred for simplified or traditional, respectively, will be allocated, as will be the original label. All other variant labels will be blocked. Davies & Freytag Expires September 27, 2014 [Page 54] Internet-Draft Label Generation Rulesets in XML March 2014 Appendix C. Indic Syllable Structure Example In LGRs for Indic scripts it may be desirable to restrict valid labels to sequences of valid Indic syllables, or aksharas. This appendix gives a sample set of rules designed to enforce this restriction. We start with the following BNF form for an akshara which has been published in "Devanagari Script Behavior for Hindi" [TDIL-HINDI] but which, if not directly valid for other languages and scripts used in India is at least similar to equivalent definitions used for them. V[m]|{C[N]H}C[N](H|[v][m]) Where: V (upper case) is any independent vowel m is any vowel modifier (Devanagari Anusvara, Visarga, and Candrabindu) C is any consonant (with inherent vowel) N is Nukta H is a Halant (or Virama) v (lower case) is any dependent vowel sign (matra) {} encloses items which may be repeated one or more times [ ] encloses items which may or may not be present | separates items, out of which only one can be present By using the Unicode property "InSC" or "Indic_Syllable_Category" which corresponds rather directly to the classification of characters in the BNF above, we can directly translate the BNF into a set of WLE rules matching the definition of an akshara. <rules> <!--Character Class Definitions go here--> <class name="halant" property="InSC:Virama" /> <union name="vowel-modifier"> <class property="InSC:Visarga" /> <class property="InSC:Bindu" comment="includes anusvara" /> </union> <!--Whole label evaluation and Context rules go here--> Davies & Freytag Expires September 27, 2014 [Page 55] Internet-Draft Label Generation Rulesets in XML March 2014 <rule name="consonant-with-optional-nukta"> <class by-ref="InSC:Consonant" /> <class by-ref="InSC:Nukta" count="0:1"/> </rule> <rule name="independent-vowel-with-optional-modifier"> <class by-ref="InSC:Vowel_Independent" /> <class by-ref="vowel-modifier" count="0:1" /> </rule> <rule name="optional-dependent-vowel-with-opt-modifier" > <class by-ref="InSC:Vowel_Dependent" count="0:1" /> <class by-ref="vowel-modifier" count="0:1" /> </rule> <rule name="consonant-cluster"> <rule count="0+"> <rule by-ref="consonant-with-optional-nukta" /> <class by-ref="halant" /> </rule> <rule by-ref="consonant-with-optional-nukta" /> <choice> <class by-ref="halant" /> <rule by-ref="optional-dependent-vowel-with-opt-modifier" /> </choice> </rule> <rule name="akshara"> <choice> <rule by-ref="independent-vowel-with-optional-modifier" /> <rule by-ref="consonant-cluster" /> </choice> </rule> <rule name="WLE-akshara-or-other" comment="series of one or more aksharas, possibly alternating with other types of code points such as digits"> <start /> <choice count="1+"> <class property="InSC:other" /> <rule by-ref="akshara" /> </choice> <end /> </rule> <!--Action elements go here - order defines precedence--> <action disp="invalid" not-match="WLE-akshara-or-other" /> </rules> With the rules and classes as defined above, the final action assigns a disposition of "invalid" to all labels that are not composed of a sequence of well-formed aksharas, optionally interspersed with other characters, perhaps digits, for example. Davies & Freytag Expires September 27, 2014 [Page 56] Internet-Draft Label Generation Rulesets in XML March 2014 The relevant Unicode property is as of this writing still considered provisional; however, it could be replicated by tagging repertoire values directly in the LGR which would remove the dependency on the Unicode Standard altogether. Davies & Freytag Expires September 27, 2014 [Page 57] Internet-Draft Label Generation Rulesets in XML March 2014 Appendix D. RelaxNG Compact Schema default namespace = "http://www.iana.org/lgr/0.1" # # SIMPLE TYPES # # RFC 5646 language tag (e.g. "de", "Latn", etc.) language-tag = xsd:token ## domain to which this LGR applies domain-name = text ## a single code point code-point = xsd:token { pattern = "[0-9A-F]{4,6}" } ## a space-separated sequence of code points code-point-sequence = xsd:token { pattern = "[0-9A-F]{4,6}( [0-9A-F]{4,6})+" } ## single code point, or a sequence of code points code-point-literal = code-point | code-point-sequence code-point-set-shorthand = xsd:token { pattern = "([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6})" ~ "( ([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6}))*" } ## dates are used in information fields in the meta section. date = xsd:token { pattern = "\d{4}-\d\d-\d\d" } ## reference to a rule name (used in "when" and "not-when" attributes, ## as well as the "by-ref" attribute of the "rule" element.) rule-ref = xsd:IDREF ## a space-separated list of tags. Tags should generally follow ## identifier syntax, although the use of punctuation symbols such as ## a colon is allowed. tags = text ## Although "from-tag" attributes are closer to xsd:IDREF lexically and ## semantically, tags do not appear as a single unique instance in the Davies & Freytag Expires September 27, 2014 [Page 58] Internet-Draft Label Generation Rulesets in XML March 2014 ## document. As such, we are unable to take advantage of facilities ## provided by the validator. tag-ref = text ## an identifier type (used by "name" attributes). identifier = xsd:ID ## used in the class "by-ref" attribute to reference another class of ## the same "name" attribute value. class-ref = xsd:IDREF ## count attribute pattern ("n", "n+" or "n:m") count-pattern = xsd:token { pattern = "\d+(\+|:\d+)?" } # # STRUCTURES # ## Representation of a single code point, or a sequence of code points char = element char { attribute cp { code-point-literal }, attribute comment { text }?, attribute when { rule-ref }?, attribute not-when { rule-ref }?, attribute tag { tags }?, attribute ref { text }?, variant* } ## Representation of a range of code points range = element range { attribute first-cp { code-point }, attribute last-cp { code-point }, attribute comment { text }?, attribute tag { tags }?, attribute ref { text }? } ## Representation of a single code point (no sequences allowed, and no ## tag attribute allowed). This is used when defining the set of ## characters that constitute a class. char-simple = element char { attribute cp { code-point } } ## Representation of a range of code points, for use in defining the Davies & Freytag Expires September 27, 2014 [Page 59] Internet-Draft Label Generation Rulesets in XML March 2014 ## set of characters that constitute a class. range-simple = element range { attribute first-cp { code-point }, attribute last-cp { code-point } } ## Representation of a variant code point or sequence variant = element var { attribute cp { code-point-literal }, attribute type { text }?, attribute when { rule-ref }?, attribute not-when { rule-ref }?, attribute comment { text }?, attribute disp { text }?, attribute ref { text }? } # # Classes # ## a "class" element that references the name of another "class" ## (or set-operator like "union") defined elsewhere. ## If used as a matcher (appearing under a "rule" ## element), ## the "count" attribute may be present. class-invocation = element class { (attribute by-ref { class-ref } | attribute from-tag { tag-ref }), attribute count { count-pattern }?, attribute comment { text }? } ## defines a new class (set of code points) using Unicode property or ## code point literals class-declaration = element class { # "name" attribute MUST be present if this is a "top-level" class # declaration, i.e. appearing directly under the "rules" element. # Otherwise, it MUST be absent. attribute name { identifier }?, # If used as a matcher (appearing in a "rule" element), the "count" # attribute may be present. Otherwise, it MUST be absent. attribute count { count-pattern }?, attribute comment { text }?, attribute ref { text }?, ( # define the class by property (e.g. property="sc:Latn"), OR Davies & Freytag Expires September 27, 2014 [Page 60] Internet-Draft Label Generation Rulesets in XML March 2014 attribute property { text } # define the class by tagged code points, OR | attribute from-tag { tag-ref } # list of single code points and ranges, OR | (char-simple | range-simple)+ # text node to allow for shorthand notation e.g. "0061 0062-0063" | code-point-set-shorthand ) } class-or-set-operator-nested = class-invocation | class-declaration | set-operator class-or-set-operator-declaration = # a "class" element or set operator (effectively defining a class) # directly in the "rules" element. class-declaration | set-operator # # Set operators # complement-operator = element complement { attribute name { identifier }?, attribute comment { text }?, attribute ref { text }?, # "count" attribute MUST only be used when this set-operator is # used as a matcher (i.e. nested in a <rule> element) attribute count { count-pattern }?, class-or-set-operator-nested } union-operator = element union { attribute name { identifier }?, attribute comment { text }?, attribute ref { text }?, # "count" attribute MUST only be used when this set-operator is # used as a matcher (i.e. nested in a <rule> element) attribute count { count-pattern }?, class-or-set-operator-nested, # needs two or more child elements class-or-set-operator-nested+ } intersection-operator = element intersection { attribute name { identifier }?, attribute comment { text }?, Davies & Freytag Expires September 27, 2014 [Page 61] Internet-Draft Label Generation Rulesets in XML March 2014 attribute ref { text }?, # "count" attribute MUST only be used when this set-operator is # used as a matcher (i.e. nested in a <rule> element) attribute count { count-pattern }?, class-or-set-operator-nested, class-or-set-operator-nested } difference-operator = element difference { attribute name { identifier }?, attribute comment { text }?, attribute ref { text }?, # "count" attribute MUST only be used when this set-operator is # used as a matcher (i.e. nested in a <rule> element) attribute count { count-pattern }?, class-or-set-operator-nested, class-or-set-operator-nested } symmetric-difference-operator = element symmetric-difference { attribute name { identifier }?, attribute comment { text }?, attribute ref { text }?, # "count" attribute MUST only be used when this set-operator is # used as a matcher (i.e. nested in a <rule> element) attribute count { count-pattern }?, class-or-set-operator-nested, class-or-set-operator-nested } ## operators that transform class(es) into a new class. set-operator = complement-operator | union-operator | intersection-operator | difference-operator | symmetric-difference-operator # # Match operators (matchers) # any-matcher = element any { attribute count { count-pattern }?, attribute comment { text }? } choice-matcher = element choice { attribute count { count-pattern }?, Davies & Freytag Expires September 27, 2014 [Page 62] Internet-Draft Label Generation Rulesets in XML March 2014 attribute comment { text }?, # two or more match operators match-operator-choice, match-operator-choice+ } char-matcher = # for use as a matcher - like "char" but without a "tag" attribute element char { attribute cp { code-point-literal }, # If used as a matcher (appearing in a "rule" element), the "count" # attribute may be present. Otherwise, it MUST be absent. attribute count { count-pattern }?, attribute comment { text }?, attribute ref { text }? } start-matcher = element start { attribute comment { text }? } end-matcher = element end { attribute comment { text }? } anchor-matcher = element anchor { attribute comment { text }? } look-ahead-matcher = element look-ahead { attribute comment { text }?, match-operators-non-pos } look-behind-matcher = element look-behind { attribute comment { text }?, match-operators-non-pos } ## non-positional match operator that can be used as a direct child ## element of the choice matcher. match-operator-choice = ( any-matcher | choice-matcher | start-matcher | end-matcher | char-matcher | class-or-set-operator-nested | rule-matcher ) ## non-positional match operators do not contain any anchor, ## look-behind or look-ahead elements. match-operators-non-pos = ( Davies & Freytag Expires September 27, 2014 [Page 63] Internet-Draft Label Generation Rulesets in XML March 2014 start-matcher?, (any-matcher | choice-matcher | char-matcher | class-or-set-operator-nested | rule-matcher)*, end-matcher? ) ## positional match operators have an anchor element, which may be ## preceeded by a look-behind element, or followed by a look-ahead ## element, or both. match-operators-pos = look-behind-matcher?, anchor-matcher, look-ahead-matcher? match-operators = match-operators-non-pos | match-operators-pos # # Rules # # top-level rule must have "name" attribute rule-declaration-top = element rule { attribute name { identifier }, attribute comment { text }?, attribute ref { text }?, match-operators } ## rule element used as a matcher (either by-ref or contains other ## match operators itself) rule-matcher = element rule { attribute count { count-pattern }?, attribute comment { text }?, attribute ref { text }?, (attribute by-ref { rule-ref } | match-operators) } # # Actions # action-declaration = element action { attribute comment { text }?, attribute ref { text }?, attribute disp { text }, ( attribute match { text } | attribute not-match { text } )?, ( attribute any-variant { text } Davies & Freytag Expires September 27, 2014 [Page 64] Internet-Draft Label Generation Rulesets in XML March 2014 | attribute all-variants { text } | attribute only-variants { text } )? } # DOCUMENT STRUCTURE start = lgr lgr = element lgr { attribute id { text }?, meta-section?, data-section, rules-section? } ## Meta section - information recorded with an label ## generation ruleset that generally does not affect machine processing ## (except for unicode-version). ## However, if any "class-declaration" uses the "property" attribute, ## one or more unicode-version MUST be present. meta-section = element meta { element version { attribute comment { text }?, text }? & element date { xsd:token { pattern = "\d{4}-\d{2}-\d{2}" } }? & element language { language-tag }* & element domain { domain-name }* & element validity-start { text }? & element validity-end { text }? & element unicode-version { xsd:token { pattern = "\d+\.\d+\.\d+" } }? & element description { attribute type { text }?, text }? & element references { element reference { attribute id { text }, attribute comment { text }?, text Davies & Freytag Expires September 27, 2014 [Page 65] Internet-Draft Label Generation Rulesets in XML March 2014 }* }? } data-section = element data { (char | range)+ } ## Note that action declarations are strictly order dependent. ## class-or-set-operator-declaration and rule-declaration-top ## are weakly order dependent, they must precede first use of the ## identifier via by-ref. rules-section = element rules { ( class-or-set-operator-declaration | rule-declaration-top | action-declaration)* } Davies & Freytag Expires September 27, 2014 [Page 66] Internet-Draft Label Generation Rulesets in XML March 2014 Appendix E. Acknowledgements This format builds upon the work on documenting IDN tables by many different registry operators. Notably, a comprehensive language table for Chinese, Japanese and Korean was developed by the "Joint Engineering Team" [RFC3743] that is the basis of many registry policies; and a set of guidelines for Arabic script registrations [RFC5564] was published by the Arabic-language community. Contributions that have shaped this document have been provided by Francisco Arias, Mark Davis, Paul Hoffman, Nicholas Ostler, Thomas Roessler, Steve Sheng, Michel Suignard, Andrew Sullivan, Wil Tan and John Yunker. Davies & Freytag Expires September 27, 2014 [Page 67] Internet-Draft Label Generation Rulesets in XML March 2014 Appendix F. Editorial Notes This appendix to be removed prior to final publication. F.1. Known Issues and Future Work o A method of specifying the origin URI for a table, and an expiration or refresh policy, as meta-data may be a useful way to declare how the table will be updated. o The "domain" element should be specified as absolute, so that the Root can be identified as needed for the Root Zone LGR. o The recommended names for disposition ("block" and "allocate") deviate from the name in the Root Zone LGR Procedure ("blocked" and "allocatable"). The latter were chosen to highlight that the machine processing of the LGR table is just the first step, actual allocation requires additional actions, hence "allocatable". This should be resolved. F.2. Change History -00 Initial draft. -01 Add an XML Namespace, and fix other XML nits. Add support for sequences of code points. Improve on consistently using Unicode nomenclature. -02 Add support for validity periods. -03 Incorporate requirements from the Label Generation Ruleset Procedure for the DNS Root Zone. These requirements include a detailed grammar for specifying whole-label variants, and the ability to explicitly declare of the actions associated with a specific variant. The document also consistently applies the term "Label Generation Ruleset", rather than "IDN table", to reflect the policy term now being used to describe these. -04 Support reference information per [RFC3743]. Update description in response to feedback. Extend the context rules to "char" elements and allow for inverse matching ("not-when"). Extend the description of label processing and implied actions, and allow for actions that reference disposition attributes on any or all variant mappings used in the generation of a variant label. Davies & Freytag Expires September 27, 2014 [Page 68] Internet-Draft Label Generation Rulesets in XML March 2014 -05 Change the name of the "disposition" attribute to "disp". Add comment attribute on version and reference elements. Allow empty "cp" attributes in char elements to support expressing symmetric mapping of null variants. Describe use of variants that map identically. Clarify how actions are triggered, in particular based on variant dispositions, as well as description of default actions. Revise description of processing a label and its variants. Move example table at the head of appendices. Add "only-variants" attribute. Change "name" attribute to "by- ref" attribute for referencing named classes and rules. Change "not" to "complement". Remove "match" attribute on rules as redundant if "start" and "end" are supported. Rename "match" element to "anchor" as better fitting it's function and removing confusion with both the "match" attribute on actions as well as the generic term Match Operator. Augmented the examples relevant to [RFC3743]. -06 Extend the discussion of reflexive variants and their use; includes update of the appendix on converting tables in the style of [RFC3743]. Improve description of tagging and clarify that it doesn't apply to sequences. Specify that root zone uses ".". Add an appendix with an Indic Syllable Structure example. Extend count attribute to allow maximal counts. -07 Change "byref" to "by-ref". Add list of recommended properties. Change "location" to "positional" for collective name of start/ end match operators. Use from-tag instead of by-ref for tag- based classes. Made optional or mutually exclusive nature of some attributes more explicit. Allowing "comment" attributs on all child elements of "rules" except "char" and "range" elements used as child elements of "class". Recast the design goals and requirements at the start of the document. Reword aspects of the document to make it clear the format's application is not limited only to domain names. Davies & Freytag Expires September 27, 2014 [Page 69] Internet-Draft Label Generation Rulesets in XML March 2014 Authors' Addresses Kim Davies Internet Corporation for Assigned Names and Numbers 12025 Waterfront Drive Los Angeles, CA 90094 US Phone: +1 310 301 5800 Email: kim.davies@icann.org URI: http://www.icann.org/ Asmus Freytag ASMUS Inc. Email: asmus@unicode.org Davies & Freytag Expires September 27, 2014 [Page 70]