INTERNET-DRAFT                                        James SENG
draft-jseng-idn-admin-00.txt             Kazunori KONISHI, JPNIC
6th May 2002                                  Kenny HUANG, TWNIC
Expires 6th Nov 2002                          QIAN Hualin, CNNIC
                                            KO YangWoo, PeaceNet

       Internationalized Domain Name Administration Guideline

Status of this Memo

    This document is an Internet-Draft and is in full conformance
    with all provisions of Section 10 of RFC2026.

    Internet-Drafts are working documents of the Internet
    Engineering Task Force (IETF), its areas, and its working
    groups. Note that other groups may also distribute working
    documents as Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of
    six months and may be updated, replaced, or obsoleted by other
    documents at any time. It is inappropriate to use Internet-
    Drafts as reference material or to cite them other than as
    "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.

Abstract

There are many complex issues revolving around the internationalized
access to domain names (IDN) such as the IDN protocol, IDN deployment,
IDN transition and IDN administration.

While the IDN working group focuses on the standard track specification
on access to IDN, the administration guideline is also necessary to
ensure a smooth deployment and transition.

This document provides a guideline for all zone administrators,
including but not limited to registry/registrars operators and all
domain names holders on the administration of these domain names.

Comments on this document can send to the authors at idn-admin@jdna.jp.

Definitions

Unless otherwise stated, the definition of the terms used in this
document is consistent with ‚Ç£Terminology Used in Internationalization
in the IETF‚Ç¥ [I18NTERMS].

Locale is defined as language from a region, if applicable. RFC3066
[RFC3066] defines how locale should be represented.

Characters mentioned in this document are identified by their position
or code point in the ISO/IEC 10646/Unicode. The notation U+12AB, for
example, indicates the character at the position 12AB (hexadecimal) in
the ISO/IEC 10646/Unicode.

1. Introduction

Internationalized Domain Names (IDN) is a one of the most controversial
task IETF have taken on in recent years. Domain name is the fundamental
naming architecture of the Internet; many internet protocols and
applications rely on the stability and continuity of DNS.

The introduction of IDN amplifies the difficulty of putting names into
identifiers and the confusion between scripts and languages. It impacts
many internet protocols/applications and creates more complexity to the
technical administration and services.

While the IDN working group [IDN-WG] focuses on the technical problems
of IDN, the administration guideline is also important in order to
avoid unnecessary domain name dispute between domain names holders.
This is the main purpose of this guideline.

The IDN working group has completed working group last call for the
following internet-drafts:

1. Internationalizing Host Names In Applications [IDNA]
2. Punycode version 0.3.3 [PUNYCODE]
3. Preparation of Internationalized Strings [STRINGPREP]
4. A Stringprep Profile for Internationalized Domain Names [NAMEPREP]

This set of drafts proposes that the domain names system infrastructure
remains unchanged. Instead, it introduces internationalization (I18N)
only on client side (IDNA) using an ASCII Compatible Encoding (ACE)
known as Punycode.

Domain names were also normally case-insensitive. But with the
introduction of characters beyond the [US-ASCII], and the possibility
to represent a single character in multiple ways in ISO10646/Unicode
[UNICODE], a normalization process for these identifiers known as
Nameprep has been proposed. Nameprep is also done on the client side as
described in IDNA.

While Nameprep normalizes domain name so that the users have the
highest chance getting the right domain name, in the interest of I18N,
Nameprep does not handle any localization (L10N).

This become significant when a domain name holder attempts to put a
string of I18N characters forming a ‚Ç£name‚Ç¥ or ‚Ç£word‚Ç¥ or ‚Ç£phrase‚Ç¥ that
may have certain meaning in a certain language as a domain name. Such
string of I18N characters may have different variants in the context of
the language or culture or locale.

Generally, these localized variants can be classified into four
categories [C2C]: (Please see ‚Ç£Disclaimer‚Ç¥ below)

a. Character (or Code) variants

Character (or Code) variants refer to variants that are generated by
character-by-character (or code-by-code) substitution.

An example in English would be A/a (U+0041/U+0061).
An example in Chinese would be ‰ú¢/‰ú¤ (U+98DB/U+98DE) or †¨–/†£¦
(U+6A5F/U+673A).

Note that this does not mean U+6A5F/U+673A is bicameral like A/a ‚Çô it
is only true for Chinese but not Japanese.

It is possible that character variant may be corresponding to null. For
example, points and vowels characters in Hebrew (U+05B0 to U+05C4) and
Arabic (U+064B to U+0652) are optional.

Code variants may also occur when there different code points are
assigned a ‚Ç£same‚Ç¥ character, possibility due to compatibility issues,
type face differences or script range. For example, LATIN CAPITAL
LETTER A (U+0041) looks similar to GREEK CAPTIAL LETTER A (U+0391). CJK
have font variants for compatibility (U+4E0D/U+F967) and ‚Ç£zVariant‚Ç¥
U+5154/U+514E (*).

The difficulty is defining what characters are ‚Ç£same‚Ç¥ and what are not.

b. Orthographic variants

Orthographic variants refer to variants that are generated by word-by-
word substitution.

An example in English would be color/colour.

Some of these orthographic variants may be possible to be generated by
character variants. For example airplane in Chinese ‰ú¢†¨–/‰ú¤†£¦ (U+98DB
U+6A5F/U+98DE U+673A).

Other orthographic variants may not be generate by character variants.
For example, in Chinese, ‚Ç£‡Ö­‚Ç¥ (U+767C) and ‚Ç£‰½«‚Ç¥ (U+9AEE) are both
related to ‚Ç£…Åæ‚Ç¥ (U+53D1) depending on the word. For hair, ‚Ç£…ñ³…Åæ‚Ç¥ (U+5934
U+53D1), the variant should be ‚Ç£‰á¡‰½«‚Ç¥ (U+982D U+9AEE) but not ‚Ç£‰á¡‡Ö­‚Ç¥
(U+982D U+767C).

c. Lexemic variants

Lexemic variants refer to variants that can be generated by word-by-
word substitution with locale consideration.

An example in English would be cab/taxi, or check/cheque.
An example in Chinese would be ˆþ爿è/„Àí†ü¯ (U+8CC7 U+8A0A/U+4FE1 U+606F).

Note that there is no relationship between U+8CC7/U+4FE1 or
U+8A0A/U+606F.

d. Contextual variants

Contextual variants refer to variants that are generated by word-by-
word substitution with contextual consideration.

In English, the word ‚Ç£plane‚Ç¥ have different meanings and could be
substituted with different equivalent word such as ‚Ç£airplane‚Ç¥ or
‚Ç£plane‚Ç¥ (as in a flat-surface) depending on context.

Similarly, the word ‚Ç£†û焩µ‚Ç¥ (U+6587 U+4EF6) could be either document ‚Ç£†ûç
„©µ‚Ç¥ (U+6587 U+4EF6) or data file ‚Ç£†¬ö†íê‚Ç¥ (U+6A94 U+6848) depending on
context.

Although domain name was designed to be an identifier without any
language context, it has not stop users putting ‚Ç£words‚Ç¥ or ‚Ç£names‚Ç¥ into
domain names. It is foreseeable that users will do likewise with IDN.
Therefore, precautions will be required to deploy these IDN.

The intention of the guideline is to provide a mechanism to deploy IDN
with language context only at the category of character variant to
increase the possibility of successful resolution and reduced confusion.

Note:
* The variants in CJK are very complex and require many different
layers of solution. This guideline is a one of the solution component,
but not sufficient to solve the whole problem alone.

2. Administration Framework

Zone administrators are responsible for the administration of the
domain names under their control. Zone administrator could be
responsible for a large zone such as a Top Level Domain (TLD), generic
or country code, or a smaller one like a second level or third level. A
large zone would have more complexity then a smaller one but the
administration tasks such as addition, deletion, delegation and
transfer of zones between domain name holders are similar for all zone
administrators.

Different zone also have different policies and processes. For example,
a pay-per-domain policy and registry/registrar model for .COM may not
be applicable to other zone such as .SG or .IBM.COM. The latter is
likely to have restricted policies of who can have a zone under IBM.COM
and the procedure is very different.

Understanding these differences, this document provides only guideline
of how I18N characters with locale consideration should be handled
within a zone and how these IDN should be administrated (registration,
deletion and transfer).

Policies of IDN such as new TLD or cost are out of scope for the
document. Such discussions should be conduct in other forum outside
IETF.

Technical implementations are also out of scope. Zone administrators
have to decide where (registry or registrar side) and how to implement
this guideline.

2.1. Guideline Principles

The principles provided are for a single zone on a per-label basis. The
word ‚Ç£IDN‚Ç¥ should be more read as ‚Ç£domain name label‚Ç¥ and not ‚Ç£Fully
Quantifiable Domain Name‚Ç¥.

The document also assumed that ‚Ç£First-Come-First-Serve‚Ç¥ (FCFS) is used
to determine the rights of domain name holders although it is not one
of the principles. If FCFS is not used, then replace all FCFS with an
appropriate policy for the zone.

(a) Each IDN should be associated with a set of locales.

Although some IDN may be pure identifiers made up of random selection
of characters, IDN are likely to be names or phrases that have certain
meaning in some locale.

Zone administrators should associate a locale to each IDN
administratively, either pre-determined by the zone administrator or
chosen by the domain names holders.

IDN could also have multiple locales association or no locale
association but these are not recommended.

With a locale association, the zone administrator could also verify the
validity of the IDN requested.

(b) The domain name holder of an IDN should also have all character
variants, depending on the associated locale(s), of the IDN requested.

Depending on the associated locale(s), there are different character
variants for the IDN. To minimize the domain names dispute between
holders over similar IDNs, these character variants should be reserved.

Reserved IDNs are not inside the DNS zone file. In other words, these
reserved names do not resolve. Domain name holder could request these
reserved IDN to be inside the zone file, i.e. make the reserve names
active.

In the case whereby there are overlapping reserved names, then the
reserved names should be resolve with the same registration policy,
usually based on FCFS.

(c) Some IDN may have a preferred character variant that should be
recommended to the domain name holder.

Some locale rules may prefer certain character variant over others. To
increase the end-user chance of resolution of the IDN, the preferred
variant should be active.

(d) The IDN and its reserved character variants with the locale(s)
association should be atomic.

The IDN and its reserved character variants with the locale(s)
association should be contain with a single package (‚Ç£IDN Package‚Ç¥).
The IDN Package is created upon registration.

The IDN Package is atomic ‚Çô Transfer and deletion of IDN are done with
IDN Package as a whole. IDNs, either active or reserved, within the IDN
Package must not be transfer or deleted individually.

2.2. Registration of IDN

Conformance to the principles described in 2.1, the registration of IDN
would require at least two components, character variant tables for the
locale and the registration algorithm.

2.2.1. Locale character variant table

Every locale group should provide a character variant table.

The table should be generated based on an establish language standards,
documenting its references. For example,

Reference 1: CP936 or commonly known as GBK
Reference 2: zVariant, zTradVariant, zSimpVariant in Unihan.txt
Reference 3: List of Simplified character Table (Simplified column)
Reference 4: zSimpVariant in Unihan.txt
Reference 5: variant that exists in GB2312, common simplified hanzi

The table has three fields, separated by a semicolon: ‚Ç£valid code
point‚Ç¥, ‚Ç£recommended code point‚Ç¥ and ‚Ç£character variant(s)‚Ç¥.

Only code points listed in the ‚Ç£valid code point‚Ç¥ are allowed to be
registered in the language.

There can be at most one ‚Ç£recommended code point‚Ç¥. If the ‚Ç£recommended
code point‚Ç¥ column is empty, then the code point would be recommended
to be ‚Ç£null‚Ç¥.

By default, ‚Ç£character variant(s)‚Ç¥ always include the ‚Ç£valid code
point‚Ç¥.

If the variant is composed of a series of code points, then each code
point is should be listed in the appropriate order separated by a space
in the ‚Ç£character variant(s)‚Ç¥.

If there are multiple variants, each variant must be separated by a
comma in the ‚Ç£character variant(s)‚Ç¥.

It is possible that a code point in the ‚Ç£character variant(s)‚Ç¥ may not
be allowed to be registered in the locale.

Every code point in the table should have a corresponding reference
number (associated with the references) specified for justification.
Reference number is place in round bracket after each code point. If
there is more than one reference, then each number place in the round
bracket separated by a space.

Any content after hash ‚Ç£#‚Ç¥ are treated as comment.

This document does not define any locale variants tables. Each locale
group will have to supplement their documents including the rules which
derived their tables.

2.2.2. Registration Algorithm

1.   IN = IDN and {L} = Set of IN associated locale(s)
2.   NP(IN) = Nameprep processed IN and
       check for availability of NP(IN)
3.   For each AL in {L}
3.1.   Check validity of NP(IN) in AL. If failed, stop processing.
3.2.   PV(IN,AL) = Preferred character variant of IN in AL
3.3.   RV(IN,AL) = Set of character variants of IN in AL
3.4. End of Loop
4.   {ZV} = Set of all PV(IN,AL) + NP(IN)
5.   {RV} = Set of all RV(IN,AL) (all character variants) minus {ZV}
6.   Create IDN Package for IN using IN, {L}, {ZV} and {RV}
7.   Put {ZV} into zone file

Step 1 takes the IDN to be registered and the associated locale(s) as
input to the process. Following that, the IDN goes through Nameprep in
Step 2. If the Nameprep‚ÇÖed IDN is already registered or reserved, then
IDN cannot be registered based on FCFS.

Step 3 goes through all associated language with IDN and check for the
validity in each language, generate the recommended variant and the
reserved variants.

In step 3.1, validation for IDN are done by checking every code point
in Nameprep‚ÇÖed IDN is a code point allowed by ‚Ç£valid code point‚Ç¥ column
for the ‚Ç£character variant table‚Ç¥ of the language.

Step 3.2 generates the preferred variant of the IDN by replacing every
codepoint in the IDN with the associated ‚Ç£recommended code point‚Ç¥
column, followed by Nameprep. If the preferred variant of the IDN is
registered or reserved, then there is no preferred variant for that
language based on FCFS. However, this does not prevent IDN from being
registered.

Step 3.3 generates the lists of reserved variants by doing a
permutation of all the possible variants listed in ‚Ç£character
variant(s)‚Ç¥ column for each code point in the Nameprep‚ÇÖed IDN.
Generated variants should be also Nameprep‚ÇÖed. If any of the variants
are registered or reserved, then that variant must be removed from the
list based on FCFS. Similarly, this does not prevent IDN from being
registered.

Then an ‚Ç£IDN Package‚Ç¥ for IDN is created in Step 6 with the original
IDN, the associated language(s), all the list of activated IDNs (Step 4)
and the list of variants (Step 5).

Lastly, the activated IDNs is then put into the zone file and delegated.
It may be delegated to different domain name server so long it is owned
by the same domain name holder.

2.3. Deletion and Transfer of IDN

In normal domain administration, every domain name is atomic.
Registration, deletion and transfer of domain names is done on a per
domain name basis.

However, with IDN, each domain name is tied with a list of variants
domain names, depending of the locale association, tied together in an
IDN Package.

Because all variants of the IDN should belong to a single domain name
holder, the IDN Package should be atomic. IDN, either active or
registered, within the IDN must not be deleted or transfer on its own.

If IDN is to be deleted or transfer, it must be done as IDN Package.

2.4. Activation and De-activation of IDN variants

With the introduction of IDN Package with active and inactive IDN, a
new process is required to activate or de-active IDN variants in the
IDN Package.

The activation algorithm is described below:

1.  IN = IDN & PA = IDN Package
2.  NP(IN) = Nameprep processed IN
3.  If NP(IN) not in {RV} then stop
4.  {RV} = {RV} ‚Çô NP(IN) and {ZV} = {ZV} + NP(IN)
5.  Put {ZV} into the zone file

Similarly, the deactivation algorithm:
1.  IN = IDN & PA = IDN Package
2.  NP(IN) = Nameprep processed IN
3.  If NP(IN) not in {ZV} then stop
4.  {RV} = {RV} + NP(IN) and {ZV} = {ZV} ‚Çô NP(IN)
5.  Put {ZV} into the zone file

2.5. Adding/Deleting locale(s) association

The list of variants is generated from the IDN and locale(s)
association. If there is a change in the locale(s) association, then
the list of variants has to be update. On the other hand, the IDN
Package is atomic and the list of variants should not be changed after
creation.

Therefore, to add or delete locale(s) association from the IDN Package,
the document recommends deleting the IDN Package followed by a
registration with the new set of locales.

3. Example of Guideline Adoption

To provide a meaningful example, some locale character variant tables
have to be defined. Assuming there the following four locale character
variants tables are defined:

a) locale character variants tables for zh-cn and zh-sg

Reference 1: CP936 or commonly known as GBK
Reference 2: zVariant, zTradVariant, zSimpVariant in Unihan.txt [UNIHAN]
Reference 3: List of Simplified character Table (Simplified column)
Reference 4: zSimpVariant in Unihan.txt
Reference 5: variant that exist in GB2312, common simplified hanzi

56E2(1);56E2(5);5718(2)          # sphere, ball, circle; mass, lump
5718(1);56E2(4);56E2(2),56E3(2)  # sphere, ball, circle; mass, lump
60F3(1);60F3(5);                 # think, speculate, plan, consider
654E(1);6559(5);6559(2)          # teach
6559(1);6559(5);654E(2)          # teach, class
6DF8(1);6E05(5);6E05(2)          # clear
6E05(1);6E05(5);6DF8(2)          # clear, pure, clean; peaceful
771E(1);771F(5);771F(2)          # real, actual, true, genuine
771F(1);771F(5);771E(2)          # real, actual, true, genuine
8054(1);8054(3);806F(2)          # connect, join; associate, ally
806F(1);8054(3);8054(2),8068(2)  # connect, join; associate, ally
96C6(1);96C6(5);                 # assemble, collect together

b) locale variants table for zh-tw

Reference 1: CP950 or commonly known as BIG5
Reference 2: zVariant, zTradVariant, zSimpVariant in Unihan.txt
Reference 3: List of Simplified Character Table (Traditional column)
Reference 4: zTradVariant in Unihan.txt
Reference 5: reference itself

5718(1);5718(4);56E2(2),56E3(2)  # sphere, ball, circle; mass, lump
60F3(1);60F3(5);                 # think, speculate, plan, consider
6559(1);6559(5);654E(2)          # teach, class
6E05(1);6E05(5);6DF8(2)          # clear, pure, clean; peaceful
771F(1);771F(5);771E(2)          # real, actual, true, genuine
806F(1);806F(3);8054(2),8068(2)  # connect, join; associate, ally
96C6(1);96C6(5);                 # assemble, collect together

c) locale variants table for ja

Reference 1: CP932 or commonly known as Shift-JIS
Reference 2: zVariant in Unihan.txt
Reference 3: variant that exist in JIS X0208, commonly used Kanji
Refernece 4: reference itself

5718(1);5718(3);56E3(2)          # sphere, ball, circle; mass, lump
60F3(1);60F3(3);                 # think, speculate, plan, consider
654E(1);6559(3);6559(2)          # teach
6559(1);6559(3);654E(2)          # teach, class
6DF8(1);6E05(3);6E05(2)          # clear
6E05(1);6E05(3);6DF8(2)          # clear, pure, clean; peaceful
771E(1);771E(4);771F(2)          # real, actual, true, genuine
771F(1);771F(4);771E(2)          # real, actual, true, genuine
806F(1);806F(4);8068(2)          # connect, join; associate, ally
96C6(1);96C6(3);                 # assemble, collect together

d) locale variants table for ko

Reference 1: CP949 or commonly known as EUC-KR
Reference 2: zVariant in Unihan.txt
Reference 3: reference itself

5718(1);56E2(3);56E3(2)          # sphere, ball, circle; mass, lump
60F3(1);60F3(3);                 # think, speculate, plan, consider
654E(1);6559(3);6559(2)          # teach
6DF8(1);6E05(3);6E05(2)          # clear
771E(1);771F(3);771F(2)          # real, actual, true, genuine
806F(1);8054(3);8068(2)          # connect, join; associate, ally
96C6(1);96C6(3);                 # assemble, collect together

(Note that these tables or the rules that define these tables are not
official, nor is it a sample sniplet of the real table. The tables are
only as an illustration.)

Example 1: IDN = †¹à‡£–†òÖ (U+6E05 U+771F U+6559)
           {L} = {zh-cn, zh-sg, zh-tw}

NP(IN) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559)
PV(IN,zh-cn) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559)
PV(IN,zh-sg) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559)
PV(IN,zh-tw) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559)
{ZV} = {†¹à‡£–†òÖ (U+6E05 U+771F U+6559)}
{RV} = {†¹à‡£¤†òÖ (U+6E05 U+771E U+6559),
        †¹à‡£¤†òÄ (U+6E05 U+771E U+654E),
        †¹à‡£–†òÄ (U+6E05 U+771F U+654E),
        †¸¹‡£¤†òÖ (U+6DF8 U+771E U+6559),
        †¸¹‡£¤†òÄ (U+6DF8 U+771E U+654E),
        †¸¹‡£–†òÖ (U+6DF8 U+771F U+6559),
        †¸¹‡£–†òÄ (U+6DF8 U+771F U+654E)}

Example 2: IDN = †¹à‡£–†òÖ (U+6E05 U+771F U+6559)
    {L} = {ja}

NP(IN) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559)
PV(IN,ja) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559)
{ZV} = {†¹à‡£–†òÖ (U+6E05 U+771F U+6559)}
{RV} = {†¹à‡£¤†òÖ (U+6E05 U+771E U+6559),
        †¹à‡£¤†òÄ (U+6E05 U+771E U+654E),
        †¹à‡£–†òÄ (U+6E05 U+771F U+654E),
        †¸¹‡£¤†òÖ (U+6DF8 U+771E U+6559),
        †¸¹‡£¤†òÄ (U+6DF8 U+771E U+654E),
        †¸¹‡£–†òÖ (U+6DF8 U+771F U+6559),
        †¸¹‡£–†òÄ (U+6DF8 U+771F U+654E)}

Example 3: IDN = †¹à‡£–†òÖ (U+6E05 U+771F U+6559)
         {L} = {zh-cn, zh-sg, zh-tw, ja, ko}

NP(IN) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559)
Invalid registration because U+6E05 is invalid in L = ko

Example 4: IDN = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)
    {L} = {zh-cn, zh-sg, zh-tw}

NP(IN) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)
PV(IN,zh-cn) = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2)
PV(IN,zh-sg) = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2)
PV(IN,zh-tw) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)
{ZV} = {ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2),
        ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)}
{PV} = {ˆüö†âþ‰¢å…¢ú (U+8054 U+60F3 U+96C6 U+56E3),
        ˆüö†âþ‰¢å…£ÿ (U+8054 U+60F3 U+96C6 U+5718),
        ˆü¯†âþ‰¢å…¢ó (U+806F U+60F3 U+96C6 U+56E2),
        ˆü¯†âþ‰¢å…¢ú (U+806f U+60F3 U+96C6 U+56E3),
        ˆü¿†âþ‰¢å…¢ó (U+8068 U+60F3 U+96C6 U+56E2),
        ˆü¿†âþ‰¢å…¢ú (U+8068 U+60F3 U+96C6 U+56E3),
        ˆü¿†âþ‰¢å…£ÿ (U+8068 U+60F3 U+96C6 U+5718)

Example 4: IDN = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2)
    {L} = {zh-cn, zh-sg}

NP(IN) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)
PV(IN,zh-cn) = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2)
PV(IN,zh-sg) = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2)
{ZV} = {ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2)}
{PV} = {ˆüö†âþ‰¢å…¢ú (U+8054 U+60F3 U+96C6 U+56E3),
        ˆüö†âþ‰¢å…£ÿ (U+8054 U+60F3 U+96C6 U+5718),
        ˆü¯†âþ‰¢å…¢ó (U+806F U+60F3 U+96C6 U+56E2),
        ˆü¯†âþ‰¢å…¢ú (U+806f U+60F3 U+96C6 U+56E3),
        ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718),
        ˆü¿†âþ‰¢å…¢ó (U+8068 U+60F3 U+96C6 U+56E2),
        ˆü¿†âþ‰¢å…¢ú (U+8068 U+60F3 U+96C6 U+56E3),
        ˆü¿†âþ‰¢å…£ÿ (U+8068 U+60F3 U+96C6 U+5718)}

Example 5: IDN = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2)
    {L} = {zh-cn, zh-sg, zh-tw}

NP(IN) = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2)
Invalid registration because U+8054 is invalid in L = zh-tw

Example 6: IDN = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)
    {L} = {ja,ko}

NP(IN) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)
PV(IN,ja) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)
PV(IN,ko) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)
{ZV} = {ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)}
{PV} = {ˆü¯†âþ‰¢å…¢ú (U+806F U+60F3 U+96C6 U+56E3),
        ˆü¿†âþ‰¢å…£ÿ (U+8068 U+60F3 U+96C6 U+5718),
        ˆü¿†âþ‰¢å…¢ú (U+8068 U+60F3 U+96C6 U+56E3)}

While the guideline uses examples from zh-cn, zh-tw, zh-sg, ja, and ko,
this can be applied to other locales.

4. Other Issues

It is possible that many variants generated may have no meaning in the
language or locale. The intention is not to generate meaningful ‚Ç£words‚Ç¥
but to generate similar variants to be reserved.

The Locale Character Variants tables are critical to the success of the
guideline. A badly designed table may either generate too many
meaningless variants or may not generate enough meaningful variants.
However, the tables or the rules used to generate the tables are not
within the scope of this document.

This document does not recommend allowing registration of IDN in a
locale have not defined its locale character variants tables.

Disclaimer

Every human language is unique and therefore, every linguistic and
localization issue is also unique. It is difficult to apply comparison
across the multiple languages or to classify them into categories.

For example, to classify Traditional Chinese/Simplified Chinese as
upper/lower case makes as much sense as to classify TC/SC as ‚Ç£spelling
variant‚Ç¥ like ‚Ç£color‚Ç¥ and ‚Ç£colour‚Ç¥. Both are close comparison but
neither are 100% correct.

This document disclaims any the classification or analogy across
different languages are linguistically accurate. It only attempts to
provide a generic framework to a linguistically challenging problem.

Unresolved Issues

1. How do we deal with updates of tables? Different version?

2. Should we have multiple recommended variant per locale?

Acknowledgement

The authors gratefully acknowledge the contributions of:

V.Chen, N.Hsu, H.Hotta, S.Tashiro, Y.Yoneya and other Joint Engineering
Team members in the JET Bangkok meeting.

Yves Arrouye, an observer during JET Bangkok, for his contribution on
the IDN Package.

Soobok Lee
L.M Tseng
Patrik Faltstrom
Paul Hoffman
Erin Chen

Author(s)

James SENG
Title
Address
Email: jseng@pobox.org.sg

Kazunori KONISHI
JPNIC
Kokusai-Kougyou-Kanda Bldg 6F
2-3-4 Uchi-Kanda, Chiyoda-ku
Tokyo 101-0047
JAPAN
Phone: +81 49-278-7313
Email: konishi@jp.apan.net

Kenny HUANG
Title
Address
Email: huangk@alum.sinica.edu

QIAN Hualin
Title
Address
Email:

KO YangWoo
PeaceNet
Yangchun P.O. Box 81 Seoul 158-600
Email: newcat@peacenet.or.kr

References

[I18NTERMS] Terminology Used in Internationalization in the IETF
            draft-hoffman-i18n-terms, Jan 2002, Paul Hoffman

[RFC3066]   Tags for the Identification of Languages, RFC3066,
            Jan 2001, H. Alvestrand

[IDN-WG]    IETF Internationalized Domain Names Working Group,
            idn@ops.ietf.org, James Seng, Marc Blanchet

[IDNA]      Internationalizing Domain Names in Applications,
            draft-ietf-idn-idna, Feb 2002, Patrik Faltstrom,
            Paul Hoffman, Adam M. Costella

[PUNYCODE]  Punycode: An encoding of Unicode for use with IDNA,
            draft-ietf-idn-punycode, Feb 2002, Adam M. Costella


[STRINGPREP]Preparation of Internationalized Strings,
            draft-hoffman-stringprep, Feb 2002, Paul Hoffman,
            Marc Blanchet

[NAMEPREP]  Nameprep: A Stringprep Profile for Internationalized
            Domain Names, draft-ietf-idn-nameprep, Feb 2002,
            Paul Hoffman, Marc Blanchet


[C2C]       Pitfalls and Complexities of Chinese to Chinese Conversion,
            http://www.cjk.org/cjk/c2c/c2c.pdf, Jack Halpern, Jouni
            Kerman

[UNIHAN]    Unicode Han Database, Unicode Consortium
            ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt