Skip to main content

Representing Label Generation Rulesets using XML
draft-davies-idntables-07

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
Authors Kim Davies , Asmus Freytag
Last updated 2014-03-26
Replaced by draft-ietf-lager-specification, RFC 7940
RFC stream (None)
Formats
Additional resources
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-davies-idntables-07
>
       <date>2010-01-01</date>
       <language>sv</language>
       <domain>example</domain>
       <validity-start>2010-01-01</validity-start>
       <validity-end>2013-12-31</validity-end>
       <description type="text/html">
           <![CDATA[
           This language table was developed with the
           <a href="http://swedish.example/">Swedish
           examples institute</a>.
           ]]>
       </description>
       <unicode-version>6.3.0</unicode-version>
       <references>

Davies & Freytag       Expires September 27, 2014              [Page 46]
Internet-Draft      Label Generation Rulesets in XML          March 2014

         <reference id="0" comment="the most recent" >The
               Unicode Standard 6.2</reference>
         <reference id="1" >RFC 5892</reference>
         <reference id="2" >Big-5: Computer Chinese Glyph
            and Character Code Mapping Table, Technical Report
            C-26, 1984</reference>
       </references>
    </meta>
    <!-- the data section describing the repertoire -->
     <data>
       <!-- single code point "char" element -->
       <char cp="002D" ref="1" comment="HYPHEN" />

       <!-- range elements for contiguous code points,  with tags -->
       <range first-cp="0030" last-cp="0039" ref="1" tag="digit" />
       <range first-cp="0061" last-cp="007A" ref ="1" tag="letter" />

       <!-- code point sequence -->
       <char cp="006C 00B7 006C" comment="catalan middle dot" />

       <!-- alternatively use a when rule -->
       <char cp="00B7" when="catalan-middle-dot" />

        <!-- code point with context rule -->
       <char cp="200D" when="joiner" ref="2" />

       <!-- code points with variants -->
       <char cp="4E16" tag="preferred" ref="0">
         <var cp="4E17" disp="block" ref="2" />
         <var cp="534B" disp="allocate" ref="2" />
       </char>
       <char cp="4E17" ref="0">
         <var cp="4E16" disp="allocate" ref="2" />
         <var cp="534B" disp="allocate" ref="2" />
       </char>
       <char cp="534B" ref="0">
         <var cp="4E16" disp="allocate" ref="2" />
         <var cp="4E17" disp="block" ref="2" />
       </char>
     </data>

     <!-- Context and whole label rules -->
     <rules>
       <!-- Require the given code point to be between two 006C -->
       <rule name="catalan-middle-dot" ref="0">
           <look-behind>
               <char cp="006C" />
           </look-behind>

Davies & Freytag       Expires September 27, 2014              [Page 47]
Internet-Draft      Label Generation Rulesets in XML          March 2014

           <anchor />
           <look-ahead>
               <char cp="006C" />
           </look-ahead>
       </rule>

       <!-- example of a context rule based on property -->
       <class name="virama" property="ccc:9" />
       <rule name="joiner"  ref="1" >
           <look-behind>
               <class by-ref="virama" />
           </look-behind>
           <anchor />
       </rule>

       <!-- example of using set operators -->

       <!-- Subtract vowels from letters to get
            consonant, demonstrating the different
            set notations and the difference operator -->
       <difference name="consonants">
            <!-- use standard notation -->
            <class comment="all letters">
              <char cp="0061" />
              <range first-cp="0062" last-cp="0072" />
            </class>
            <!-- use shorthand notation -->
            <class comment="all vowels">
                    0061 0065 0069 006F 0075-0075
            </class>
        </difference>

        <!-- by using the start and end, rule matches whole label -->
        <rule name="three-or-more-consonants">
            <start />
            <!-- reference the class defined by the difference
                 and require three or more matches -->
            <class by-ref="consonants" count="3+" />
            <end />
       </rule>

       <!-- rule for negative matching -->
       <rule name="non-preferred"
             comment="matches any non-preferred code point">
           <complement comment="non-preferred" >
               <class from-tag="preferred" />
           </complement>
       </rule>

Davies & Freytag       Expires September 27, 2014              [Page 48]
Internet-Draft      Label Generation Rulesets in XML          March 2014

      <!-- actions triggered by matching rules and/or
           variant dispositions -->
       <action disp="consonants"
               match="three-or-more-consonants" />
       <action disp="block" any-variant="block" />
       <action disp="activate" all-variants="allocate"
               not-match="non-preferred" />
     </rules>
   </lgr>

Davies & Freytag       Expires September 27, 2014              [Page 49]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix B.  How to Translate RFC 3743 based Tables into the XML Format

   As a background, the [RFC3743] rules work as follows:

   1.  The Original (requested) label is checked to make sure that all
       the code points are a subset of the repertoire.

   2.  If it passes the check, the Original label is allocatable.

   3.  Generate the all-simplified and all-traditional variant labels
       (union of all the labels generated using all the simplified
       variants of the code points) for allocation.

   To illustrate by example, here is one of the more complicated set of
   variants:

       U+4E7E
       U+4E81
       U+5E72
       U+5E79
       U+69A6
       U+6F27

   The following shows the relevant section of the Chinese language
   table published by the .ASIA registry [ASIA-TABLE].  Its entries
   read:

    <codepoint>;<simpl-variant(s)>;<trad-variant(s)>;<other-variant(s)>

   These are the lines corresponding to the set of variants listed above

   U+4E7E;U+4E7E,U+5E72;U+4E7E;U+4E81,U+5E72,U+6F27,U+5E79,U+69A6
   U+4E81;U+5E72;U+4E7E;U+5E72,U+6F27,U+5E79,U+69A6
   U+5E72;U+5E72;U+5E72,U+4E7E,U+5E79;U+4E7E,U+4E81,U+69A6,U+6F27
   U+5E79;U+5E72;U+5E79;U+69A6,U+4E7E,U+4E81,U+6F27
   U+69A6;U+5E72;U+69A6;U+5E79,U+4E7E,U+4E81,U+6F27
   U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6

   The corresponding data section XML format would look like this:

       <data>
       <char cp="4E7E">
       <var cp="4E7E" disp="both" comment="identity" />
       <var cp="4E81" disp="block" />
       <var cp="5E72" disp="simp" />
       <var cp="5E79" disp="block" />
       <var cp="69A6" disp="block" />
       <var cp="6F27" disp="block" />

Davies & Freytag       Expires September 27, 2014              [Page 50]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       </char>
       <char cp="4E81">
       <var cp="4E7E" disp="trad" />
       <var cp="5E72" disp="simp" />
       <var cp="5E79" disp="block" />
       <var cp="69A6" disp="block" />
       <var cp="6F27" disp="block" />
       </char>
       <char cp="5E72">
       <var cp="4E7E" disp="trad"/>
       <var cp="4E81" disp="block"/>
       <var cp="5E72" disp="both" comment="identity"/>
       <var cp="5E79" disp="trad"/>
       <var cp="69A6" disp="block"/>
       <var cp="6F27" disp="block"/>
       </char>
       <char cp="5E79">
       <var cp="4E7E" disp="block"/>
       <var cp="4E81" disp="block"/>
       <var cp="5E72" disp="simp"/>
       <var cp="5E79" disp="trad" comment="identity"/>
       <var cp="69A6" disp="block"/>
       <var cp="6F27" disp="block"/>
       </char>
       <char cp="69A6">
       <var cp="4E7E" disp="block"/>
       <var cp="4E81" disp="block"/>
       <var cp="5E72" disp="simp"/>
       <var cp="5E79" disp="block"/>
       <var cp="69A6" disp="trad" comment="identity"/>
       <var cp="6F27" disp="block"/>
       </char>
       <char cp="6F27">
       <var cp="4E7E" disp="simp"/>
       <var cp="4E81" disp="block"/>
       <var cp="5E72" disp="block"/>
       <var cp="5E79" disp="block"/>
       <var cp="69A6" disp="block"/>
       <var cp="6F27" disp="trad" comment="identity"/>
       </char>
     </data>

   Here the simplified variants have been given a disposition of "simp",
   the traditional variants one of "trad" and all other ones are given
   "block".

   Note that some variant mappings map to themselves (identity), that is
   the mapping is reflexive (see Section 4.2.5).  In creating the

Davies & Freytag       Expires September 27, 2014              [Page 51]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   permutation of all variant labels, these mappings have no effect,
   other than adding a value to the variant disposition list for the
   variant label containing them.

   Because some variant mappings show in more than one column, while the
   XML format allows only a single disposition value, they have been
   given the disposition of "both".

   In the example so far, all of these are also mappings where source
   and target are identical that is, reflexive mappings as defined in
   Section 4.2.5.

   Given a label "U+4E7E U+4E81", the following labels would be ruled
   allocatable under [RFC3743] based on how that standard is commonly
   implemented in domain registries:

       Original label:     U+4E7E U+4E81
       Simplified label 1: U+4E7E U+5E72
       Simplified label 2: U+5E72 U+5E72
       Traditional label:  U+4E7E U+4E7E

   However, If we generated allocatable labels without regard to the
   simplified-to-traditional variants, we would end up with an extra
   allocatable label: "U+5E72 U+4E7E".  That label is comprised of an SC
   character and a TC character which shouldn't be allocatable, but it
   would be the result of a straight permutation of all variants with
   disposition other than disp="block".

   To more fully resolve the dispositions requires several actions to be
   defined as described in Section 6.2.2.  After blocking all labels
   that contain a variant with disposition "block", these actions will
   first allocate all labels that consist entirely of variants
   (including variants with reflexive mappings) that are "simp" or
   "both", then do likewise for labels that are entirely "trad" or
   "both".  All surviving labels containing any one of the dispositions
   "simp" or "trad" are now known to be part of an undesirable mixed
   simplified/traditional label and are blocked.  Finally, the remaining
   labels must be code points without variants or reflexive variants of
   type "both", in other words, the original label.

Davies & Freytag       Expires September 27, 2014              [Page 52]
Internet-Draft      Label Generation Rulesets in XML          March 2014

     <rules>
       <!--Action elements - order defines precedence-->
       <action disp="block" any-variant="block"
           comment="filter out by blocked code point" />
       <action disp="allocate"
           only-variants="simp both"
           comment="only allocate if simplified variant
           including reflexive (identity) mapping" />
       <action disp="allocate"
           only-variants="trad both"
           comment="only allocate if traditional variant,
           including reflexive (identity) mapping" />
       <action disp="block"
           any-variant="simp trad"
           comment="filter out any remaining variant code point" />
       <action disp="activate" comment="surviving labels must be
           original labels" />
     </rules>

   In the example above, variants with the disposition "both" occur only
   as part of identity mappings (as pointed out in the comments).  The
   scheme described so far relies on the assumption that this is always
   the case.  However, consider the following set of variants:

       U+62E0;U+636E;U+636E;U+64DA
       U+636E;U+636E;U+64DA;U+62E0
       U+64DA;U+636E;U+64DA;U+62E0

   for which the corresponding XML would be:

       <char cp="62E0">
       <var cp="636E" disp="both" comment=" BOTH, but NOT identity" />
       <var cp="64DA" disp="block" />
       </char>
       <char cp="636E">
       <var cp="636E" disp="simp" comment="identity, but not BOTH" />
       <var cp="64DA" disp="trad" />
       <var cp="62E0" disp="block" />
       </char>
       <char cp="64DA">
       <var cp="636E" disp="simp" />
       <var cp="64DA" disp="trad" comment="identity" />
       <var cp="62E0" disp="block" />
       </char>

   What is needed to make such variant sets work is a way to capture
   when a disposition is associated with an identity or reflexive
   mapping, and when it is associated with an ordinary variant mapping.

Davies & Freytag       Expires September 27, 2014              [Page 53]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   This can be done by adding a prefix "i-" in front of the disposition
   whenever the mapping is an identity mapping, for example the last
   "trad" in the preceding figure would become "i-trad".

   With all the dispositions prepared in this way, only a slight
   modification to the actions is needed to yield the correct set of
   allocatable labels:

     <action disp="block" any-variant="block" />
     <action disp="allocate" only-variants="simp i-simp both i-both" />
     <action disp="allocate" only-variants="trad i-trad both i-both" />
     <action disp="block" all-variants="simp trad both" />
     <action disp="allocate" />

   The first three actions get triggered by the same labels as before.

   The fourth action blocks any label that combines an original code
   point with any of the variant mappings, yet lets through all labels
   that are a combination of only original code points (everything
   having either no variant mapping or one of the identity mappings).
   These are the original labels and they are allocated in the last
   action.

   With this modification all RFC 3743-style tables can be converted to
   XML and, by using the above set of actions, the result will be that
   all variants consisting completely of variants preferred for
   simplified or traditional, respectively, will be allocated, as will
   be the original label.  All other variant labels will be blocked.

Davies & Freytag       Expires September 27, 2014              [Page 54]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix C.  Indic Syllable Structure Example

   In LGRs for Indic scripts it may be desirable to restrict valid
   labels to sequences of valid Indic syllables, or aksharas.  This
   appendix gives a sample set of rules designed to enforce this
   restriction.

   We start with the following BNF form for an akshara which has been
   published in "Devanagari Script Behavior for Hindi" [TDIL-HINDI] but
   which, if not directly valid for other languages and scripts used in
   India is at least similar to equivalent definitions used for them.

       V[m]|{C[N]H}C[N](H|[v][m])

   Where:

   V    (upper case) is any independent vowel

   m    is any vowel modifier (Devanagari Anusvara, Visarga, and
        Candrabindu)

   C    is any consonant (with inherent vowel)

   N    is Nukta

   H    is a Halant (or Virama)

   v    (lower case) is any dependent vowel sign (matra)

   {}   encloses items which may be repeated one or more times

   [ ]  encloses items which may or may not be present

   |    separates items, out of which only one can be present

   By using the Unicode property "InSC" or "Indic_Syllable_Category"
   which corresponds rather directly to the classification of characters
   in the BNF above, we can directly translate the BNF into a set of WLE
   rules matching the definition of an akshara.

    <rules>
       <!--Character Class Definitions go here-->
       <class name="halant" property="InSC:Virama" />
       <union name="vowel-modifier">
         <class property="InSC:Visarga" />
         <class property="InSC:Bindu" comment="includes anusvara" />
       </union>
       <!--Whole label evaluation and Context rules go here-->

Davies & Freytag       Expires September 27, 2014              [Page 55]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       <rule name="consonant-with-optional-nukta">
           <class by-ref="InSC:Consonant" />
           <class by-ref="InSC:Nukta"  count="0:1"/>
       </rule>
       <rule name="independent-vowel-with-optional-modifier">
           <class by-ref="InSC:Vowel_Independent" />
           <class by-ref="vowel-modifier"  count="0:1" />
       </rule>
       <rule name="optional-dependent-vowel-with-opt-modifier" >
         <class by-ref="InSC:Vowel_Dependent" count="0:1" />
         <class by-ref="vowel-modifier" count="0:1"  />
       </rule>
       <rule name="consonant-cluster">
         <rule count="0+">
           <rule by-ref="consonant-with-optional-nukta" />
           <class by-ref="halant" />
         </rule>
         <rule by-ref="consonant-with-optional-nukta" />
         <choice>
           <class by-ref="halant" />
           <rule by-ref="optional-dependent-vowel-with-opt-modifier" />
         </choice>
       </rule>
       <rule name="akshara">
         <choice>
           <rule by-ref="independent-vowel-with-optional-modifier" />
           <rule by-ref="consonant-cluster" />
         </choice>
       </rule>
       <rule name="WLE-akshara-or-other" comment="series of one or
           more aksharas, possibly alternating with other types of
           code points such as digits">
         <start />
         <choice count="1+">
           <class property="InSC:other"  />
           <rule by-ref="akshara"  />
         </choice>
         <end />
       </rule>
       <!--Action elements go here - order defines precedence-->
       <action disp="invalid" not-match="WLE-akshara-or-other" />
     </rules>

   With the rules and classes as defined above, the final action assigns
   a disposition of "invalid" to all labels that are not composed of a
   sequence of well-formed aksharas, optionally interspersed with other
   characters, perhaps digits, for example.

Davies & Freytag       Expires September 27, 2014              [Page 56]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   The relevant Unicode property is as of this writing still considered
   provisional; however, it could be replicated by tagging repertoire
   values directly in the LGR which would remove the dependency on the
   Unicode Standard altogether.

Davies & Freytag       Expires September 27, 2014              [Page 57]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix D.  RelaxNG Compact Schema

 default namespace = "http://www.iana.org/lgr/0.1"

 #
 # SIMPLE TYPES
 #

 # RFC 5646 language tag (e.g. "de", "Latn", etc.)
 language-tag = xsd:token

 ## domain to which this LGR applies
 domain-name = text

 ## a single code point
 code-point = xsd:token {
     pattern = "[0-9A-F]{4,6}"
 }

 ## a space-separated sequence of code points
 code-point-sequence = xsd:token {
     pattern = "[0-9A-F]{4,6}( [0-9A-F]{4,6})+"
 }

 ## single code point, or a sequence of code points
 code-point-literal = code-point | code-point-sequence

 code-point-set-shorthand = xsd:token {
     pattern = "([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6})"
               ~ "( ([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6}))*"
 }

 ## dates are used in information fields in the meta section.
 date = xsd:token {
     pattern = "\d{4}-\d\d-\d\d"
 }

 ## reference to a rule name (used in "when" and "not-when" attributes,
 ## as well as the "by-ref" attribute of the "rule" element.)
 rule-ref = xsd:IDREF

 ## a space-separated list of tags. Tags should generally follow
 ## identifier syntax, although the use of punctuation symbols such as
 ## a colon is allowed.
 tags = text

 ## Although "from-tag" attributes are closer to xsd:IDREF lexically and
 ## semantically, tags do not appear as a single unique instance in the

Davies & Freytag       Expires September 27, 2014              [Page 58]
Internet-Draft      Label Generation Rulesets in XML          March 2014

 ## document. As such, we are unable to take advantage of facilities
 ## provided by the validator.
 tag-ref = text

 ## an identifier type (used by "name" attributes).
 identifier = xsd:ID

 ## used in the class "by-ref" attribute to reference another class of
 ## the same "name" attribute value.
 class-ref = xsd:IDREF

 ## count attribute pattern ("n", "n+" or "n:m")
 count-pattern = xsd:token {
     pattern = "\d+(\+|:\d+)?"
 }

 #
 # STRUCTURES
 #

 ## Representation of a single code point, or a sequence of code points
 char = element char {
     attribute cp { code-point-literal },
     attribute comment { text }?,
     attribute when { rule-ref }?,
     attribute not-when { rule-ref }?,
     attribute tag { tags }?,
     attribute ref { text }?,
     variant*
 }

 ## Representation of a range of code points
 range = element range {
     attribute first-cp { code-point },
     attribute last-cp { code-point },
     attribute comment { text }?,
     attribute tag { tags }?,
     attribute ref { text }?
 }

 ## Representation of a single code point (no sequences allowed, and no
 ## tag attribute allowed). This is used when defining the set of
 ## characters that constitute a class.
 char-simple = element char {
     attribute cp { code-point }
 }

 ## Representation of a range of code points, for use in defining the

Davies & Freytag       Expires September 27, 2014              [Page 59]
Internet-Draft      Label Generation Rulesets in XML          March 2014

 ## set of characters that constitute a class.
 range-simple = element range {
     attribute first-cp { code-point },
     attribute last-cp { code-point }
 }

 ## Representation of a variant code point or sequence
 variant = element var {
     attribute cp { code-point-literal },
     attribute type { text }?,
     attribute when { rule-ref }?,
     attribute not-when { rule-ref }?,
     attribute comment { text }?,
     attribute disp { text }?,
     attribute ref { text }?
 }

 #
 # Classes
 #

 ## a "class" element that references the name of another "class"
 ## (or set-operator like "union") defined elsewhere.
 ## If used as a matcher (appearing under a "rule" ## element),
 ## the "count" attribute may be present.
 class-invocation = element class {
     (attribute by-ref { class-ref } | attribute from-tag { tag-ref }),
     attribute count { count-pattern }?,
     attribute comment { text }?
 }

 ## defines a new class (set of code points) using Unicode property or
 ## code point literals
 class-declaration = element class {
     # "name" attribute MUST be present if this is a "top-level" class
     # declaration, i.e. appearing directly under the "rules" element.
     # Otherwise, it MUST be absent.
     attribute name { identifier }?,
     # If used as a matcher (appearing in a "rule" element), the "count"
     # attribute may be present. Otherwise, it MUST be absent.
     attribute count { count-pattern }?,
     attribute comment { text }?,
     attribute ref { text }?,
     (
       # define the class by property (e.g. property="sc:Latn"), OR

Davies & Freytag       Expires September 27, 2014              [Page 60]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       attribute property { text }
       # define the class by tagged code points, OR
       | attribute from-tag { tag-ref }
       # list of single code points and ranges, OR
       | (char-simple | range-simple)+
       # text node to allow for shorthand notation e.g. "0061 0062-0063"
       | code-point-set-shorthand
     )
   }

 class-or-set-operator-nested =
   class-invocation | class-declaration | set-operator

 class-or-set-operator-declaration =
   # a "class" element or set operator (effectively defining a class)
   # directly in the "rules" element.
   class-declaration | set-operator

 #
 # Set operators
 #

 complement-operator = element complement {
     attribute name { identifier }?,
     attribute comment { text }?,
     attribute ref { text }?,
     # "count" attribute MUST only be used when this set-operator is
     # used as a matcher (i.e. nested in a <rule> element)
     attribute count { count-pattern }?,
     class-or-set-operator-nested
 }

 union-operator = element union {
     attribute name { identifier }?,
     attribute comment { text }?,
     attribute ref { text }?,
     # "count" attribute MUST only be used when this set-operator is
     # used as a matcher (i.e. nested in a <rule> element)
     attribute count { count-pattern }?,
     class-or-set-operator-nested,
     # needs two or more child elements
     class-or-set-operator-nested+
 }

 intersection-operator = element intersection {
     attribute name { identifier }?,
     attribute comment { text }?,

Davies & Freytag       Expires September 27, 2014              [Page 61]
Internet-Draft      Label Generation Rulesets in XML          March 2014

     attribute ref { text }?,
     # "count" attribute MUST only be used when this set-operator is
     # used as a matcher (i.e. nested in a <rule> element)
     attribute count { count-pattern }?,
     class-or-set-operator-nested,
     class-or-set-operator-nested
 }

 difference-operator = element difference {
     attribute name { identifier }?,
     attribute comment { text }?,
     attribute ref { text }?,
     # "count" attribute MUST only be used when this set-operator is
     # used as a matcher (i.e. nested in a <rule> element)
     attribute count { count-pattern }?,
     class-or-set-operator-nested,
     class-or-set-operator-nested
 }

 symmetric-difference-operator = element symmetric-difference {
     attribute name { identifier }?,
     attribute comment { text }?,
     attribute ref { text }?,
     # "count" attribute MUST only be used when this set-operator is
     # used as a matcher (i.e. nested in a <rule> element)
     attribute count { count-pattern }?,
     class-or-set-operator-nested,
     class-or-set-operator-nested
 }

 ## operators that transform class(es) into a new class.
 set-operator = complement-operator
                | union-operator
                | intersection-operator
                | difference-operator
                | symmetric-difference-operator

 #
 # Match operators (matchers)
 #

 any-matcher = element any {
     attribute count { count-pattern }?,
     attribute comment { text }?
 }

 choice-matcher = element choice {
     attribute count { count-pattern }?,

Davies & Freytag       Expires September 27, 2014              [Page 62]
Internet-Draft      Label Generation Rulesets in XML          March 2014

     attribute comment { text }?,
     # two or more match operators
     match-operator-choice,
     match-operator-choice+
 }

 char-matcher =
   # for use as a matcher - like "char" but without a "tag" attribute
   element char {
     attribute cp { code-point-literal },
     # If used as a matcher (appearing in a "rule" element), the "count"
     # attribute may be present. Otherwise, it MUST be absent.
     attribute count { count-pattern }?,
     attribute comment { text }?,
     attribute ref { text }?
 }

 start-matcher = element start {
     attribute comment { text }?
 }

 end-matcher = element end {
     attribute comment { text }?
 }

 anchor-matcher = element anchor {
     attribute comment { text }?
 }

 look-ahead-matcher = element look-ahead {
     attribute comment { text }?,
     match-operators-non-pos
 }
 look-behind-matcher = element look-behind {
     attribute comment { text }?,
     match-operators-non-pos
 }

 ## non-positional match operator that can be used as a direct child
 ## element of the choice matcher.
 match-operator-choice = (
   any-matcher | choice-matcher | start-matcher | end-matcher
   | char-matcher | class-or-set-operator-nested | rule-matcher
 )

 ## non-positional match operators do not contain any anchor,
 ## look-behind or look-ahead elements.
 match-operators-non-pos = (

Davies & Freytag       Expires September 27, 2014              [Page 63]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   start-matcher?,
   (any-matcher | choice-matcher | char-matcher
    | class-or-set-operator-nested | rule-matcher)*,
   end-matcher?
 )

 ## positional match operators have an anchor element, which may be
 ## preceeded by a look-behind element, or followed by a look-ahead
 ## element, or both.
 match-operators-pos =
   look-behind-matcher?, anchor-matcher, look-ahead-matcher?

 match-operators = match-operators-non-pos | match-operators-pos

 #
 # Rules
 #

 # top-level rule must have "name" attribute
 rule-declaration-top = element rule {
     attribute name { identifier },
     attribute comment { text }?,
     attribute ref { text }?,
     match-operators
 }

 ## rule element used as a matcher (either by-ref or contains other
 ## match operators itself)
 rule-matcher =
   element rule {
     attribute count { count-pattern }?,
     attribute comment { text }?,
     attribute ref { text }?,
     (attribute by-ref { rule-ref } | match-operators)
   }

 #
 # Actions
 #

 action-declaration = element action {
     attribute comment { text }?,
     attribute ref { text }?,
     attribute disp { text },
     ( attribute match { text } | attribute not-match { text } )?,
     ( attribute any-variant { text }

Davies & Freytag       Expires September 27, 2014              [Page 64]
Internet-Draft      Label Generation Rulesets in XML          March 2014

       | attribute all-variants { text }
       | attribute only-variants { text } )?
 }

 # DOCUMENT STRUCTURE

 start = lgr
 lgr = element lgr {
     attribute id { text }?,
     meta-section?,
     data-section,
     rules-section?
 }

 ## Meta section - information recorded with an label
 ## generation ruleset that generally does not affect machine processing
 ## (except for unicode-version).
 ## However, if any "class-declaration" uses the "property" attribute,
 ## one or more unicode-version MUST be present.
 meta-section = element meta {
     element version {
         attribute comment { text }?,
         text
     }?
     & element date {
         xsd:token {
             pattern = "\d{4}-\d{2}-\d{2}"
         }
     }?
     & element language { language-tag }*
     & element domain { domain-name }*
     & element validity-start { text }?
     & element validity-end { text }?
     & element unicode-version {
         xsd:token {
             pattern = "\d+\.\d+\.\d+"
         }
     }?
     & element description {
         attribute type { text }?,
         text
     }?
     & element references {
         element reference {
             attribute id { text },
             attribute comment { text }?,
             text

Davies & Freytag       Expires September 27, 2014              [Page 65]
Internet-Draft      Label Generation Rulesets in XML          March 2014

         }*
     }?
 }

 data-section = element data { (char | range)+ }

 ## Note that action declarations are strictly order dependent.
 ## class-or-set-operator-declaration and rule-declaration-top
 ## are weakly order dependent, they must precede first use of the
 ## identifier via by-ref.
 rules-section = element rules {
   ( class-or-set-operator-declaration
     | rule-declaration-top
     | action-declaration)*
 }

Davies & Freytag       Expires September 27, 2014              [Page 66]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix E.  Acknowledgements

   This format builds upon the work on documenting IDN tables by many
   different registry operators.  Notably, a comprehensive language
   table for Chinese, Japanese and Korean was developed by the "Joint
   Engineering Team" [RFC3743] that is the basis of many registry
   policies; and a set of guidelines for Arabic script registrations
   [RFC5564] was published by the Arabic-language community.

   Contributions that have shaped this document have been provided by
   Francisco Arias, Mark Davis, Paul Hoffman, Nicholas Ostler, Thomas
   Roessler, Steve Sheng, Michel Suignard, Andrew Sullivan, Wil Tan and
   John Yunker.

Davies & Freytag       Expires September 27, 2014              [Page 67]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Appendix F.  Editorial Notes

   This appendix to be removed prior to final publication.

F.1.  Known Issues and Future Work

   o  A method of specifying the origin URI for a table, and an
      expiration or refresh policy, as meta-data may be a useful way to
      declare how the table will be updated.

   o  The "domain" element should be specified as absolute, so that the
      Root can be identified as needed for the Root Zone LGR.

   o  The recommended names for disposition ("block" and "allocate")
      deviate from the name in the Root Zone LGR Procedure ("blocked"
      and "allocatable").  The latter were chosen to highlight that the
      machine processing of the LGR table is just the first step, actual
      allocation requires additional actions, hence "allocatable".  This
      should be resolved.

F.2.  Change History

   -00  Initial draft.

   -01  Add an XML Namespace, and fix other XML nits.  Add support for
        sequences of code points.  Improve on consistently using Unicode
        nomenclature.

   -02  Add support for validity periods.

   -03  Incorporate requirements from the Label Generation Ruleset
        Procedure for the DNS Root Zone.  These requirements include a
        detailed grammar for specifying whole-label variants, and the
        ability to explicitly declare of the actions associated with a
        specific variant.  The document also consistently applies the
        term "Label Generation Ruleset", rather than "IDN table", to
        reflect the policy term now being used to describe these.

   -04  Support reference information per [RFC3743].  Update description
        in response to feedback.  Extend the context rules to "char"
        elements and allow for inverse matching ("not-when").  Extend
        the description of label processing and implied actions, and
        allow for actions that reference disposition attributes on any
        or all variant mappings used in the generation of a variant
        label.

Davies & Freytag       Expires September 27, 2014              [Page 68]
Internet-Draft      Label Generation Rulesets in XML          March 2014

   -05  Change the name of the "disposition" attribute to "disp".  Add
        comment attribute on version and reference elements.  Allow
        empty "cp" attributes in char elements to support expressing
        symmetric mapping of null variants.  Describe use of variants
        that map identically.  Clarify how actions are triggered, in
        particular based on variant dispositions, as well as description
        of default actions.  Revise description of processing a label
        and its variants.  Move example table at the head of appendices.
        Add "only-variants" attribute.  Change "name" attribute to "by-
        ref" attribute for referencing named classes and rules.  Change
        "not" to "complement".  Remove "match" attribute on rules as
        redundant if "start" and "end" are supported.  Rename "match"
        element to "anchor" as better fitting it's function and removing
        confusion with both the "match" attribute on actions as well as
        the generic term Match Operator.  Augmented the examples
        relevant to [RFC3743].

   -06  Extend the discussion of reflexive variants and their use;
        includes update of the appendix on converting tables in the
        style of [RFC3743].  Improve description of tagging and clarify
        that it doesn't apply to sequences.  Specify that root zone uses
        ".".  Add an appendix with an Indic Syllable Structure example.
        Extend count attribute to allow maximal counts.

   -07  Change "byref" to "by-ref".  Add list of recommended properties.
        Change "location" to "positional" for collective name of start/
        end match operators.  Use from-tag instead of by-ref for tag-
        based classes.  Made optional or mutually exclusive nature of
        some attributes more explicit.  Allowing "comment" attributs on
        all child elements of "rules" except "char" and "range" elements
        used as child elements of "class".  Recast the design goals and
        requirements at the start of the document.  Reword aspects of
        the document to make it clear the format's application is not
        limited only to domain names.

Davies & Freytag       Expires September 27, 2014              [Page 69]
Internet-Draft      Label Generation Rulesets in XML          March 2014

Authors' Addresses

   Kim Davies
   Internet Corporation for Assigned Names and Numbers
   12025 Waterfront Drive
   Los Angeles, CA  90094
   US

   Phone: +1 310 301 5800
   Email: kim.davies@icann.org
   URI:   http://www.icann.org/

   Asmus Freytag
   ASMUS Inc.

   Email: asmus@unicode.org

Davies & Freytag       Expires September 27, 2014              [Page 70]