draft-iab-idn-encoding-00

Network Working Group                                          D. Thaler
Internet-Draft                                                 Microsoft
Intended status: Informational                              July 6, 2009
Expires: January 7, 2010


      IAB Thoughts on Encodings for Internationalized Domain Names
                     draft-iab-idn-encoding-00.txt

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 7, 2010.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   This document explores issues with Internationalized Domain Names
   (IDNs) that result from the use of various encoding schemes such as
   Punycode and UTF-8.



Thaler                   Expires January 7, 2010                [Page 1]


Internet-Draft                IDN Encodings                    July 2009


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  APIs . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
   2.  Use of Non-DNS Protocols . . . . . . . . . . . . . . . . . . .  7
   3.  Use of Non-ASCII in DNS  . . . . . . . . . . . . . . . . . . .  8
   4.  Recommendations  . . . . . . . . . . . . . . . . . . . . . . . 12
   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
   7.  IAB Members at the time of this writing  . . . . . . . . . . . 14
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
     8.1.  Normative References . . . . . . . . . . . . . . . . . . . 15
     8.2.  Informative References . . . . . . . . . . . . . . . . . . 15
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16





































Thaler                   Expires January 7, 2010                [Page 2]


Internet-Draft                IDN Encodings                    July 2009


1.  Introduction

   The goal of this document is to explore what can be learned from the
   current difficulties in implementing Internationalized Domain Names
   (IDNs).  Although some elements of this exploration may immediately
   feed back into current IETF work it is explicitly not the intention
   for this document to influence any current working group charter.

   An Internationalized Domain Name (IDN) is a name that contains one or
   more non-ASCII characters.  An IDN can be encoded in various ways.

   Punycode [RFC3492] is a mechanism for encoding a Unicode string in
   ASCII characters using only letters, digits, and hypens.  When an IDN
   is encoded with Punycode, it is prefixed with "xn--", which assumes
   that ASCII names do not start with this prefix.  While this
   assumption is not necessarily true, taking this limitation is seen to
   be acceptable.

   The term "ToASCII" refers to the combination of a non-reversible
   character mapping operation (e.g., converting upper case characters
   to lower case characters), plus a reversible Unicode-to-Punycode
   conversion.  Similarly, the term "ToUnicode" refers to the
   combination of a non-reversible character mapping operation, plus a
   reversible Punycode-to-Unicode conversion.

   ISO-2022-JP [RFC1468] is a mechanism for encoding a string of ASCII
   and Japanese characters, where an ASCII character is preserved as-is.

   UTF-8 [RFC3629] is a mechanism for encoding a Unicode character in a
   variable number of 8-bit octets, where an ASCII character is
   preserved as-is.  A UTF-8 string is thus a string of UTF-8
   characters.

   UTF-16 [RFC2781] is a mechanism for encoding a Unicode character in
   one or two 16-bit integers.  A UTF-16 string is thus a string of
   UTF-16 characters.

   UTF-32 (formerly UCS-4) ([UNICODE] section 3.10) is a mechanism for
   encoding a Unicode character in a single 32-bit integer.  A UTF-32
   string is thus a string of UTF-32 characters.

   Different applications, APIs, and protocols use different encoding
   schemes today.  Historically, many of them were originally defined to
   use only ASCII.  Internationalizing  Domain Names in Applications
   (IDNA) [RFC3490] defined a mechanism that required changes to
   applications, but not APIs or servers, and specifies that Punycode is
   to be used.




Thaler                   Expires January 7, 2010                [Page 3]


Internet-Draft                IDN Encodings                    July 2009


   [RFC3490] section 1.3 states:

      The IDNA protocol is contained completely within applications.  It
      is not a client-server or peer-to-peer protocol: everything is
      done inside the application itself.  When used with a DNS resolver
      library, IDNA is inserted as a "shim" between the application and
      the resolver library.  When used for writing names into a DNS
      zone, IDNA is used just before the name is committed to the zone.

   Figure 1 depicts a simplistic architecture that a naive reader might
   assume from the paragraph quoted above.  (A variant of this same
   picture appears in [RFC3490] section 6, further strengthening this
   assumption.)

    +-----------------------------------------+
    |Host                                     |
    |             +-------------+             |
    |             | Application |             |
    |             +------+------+             |
    |                    |                    |
    |               +----+----+               |
    |               |   DNS   |               |
    |               | Resolver|               |
    |               | Library |               |
    |               +----+----+               |
    |                    |                    |
    +-----------------------------------------+
                         |
                _________|_________
               /                   \
              /                     \
             /                       \
            |         Internet        |
             \                       /
              \                     /
               \___________________/

                          Simplistic Architecture

                                 Figure 1

   There are, however, two problems with this simplistic architecture
   that cause it to differ from reality.

   First, resolver APIs on OS's today (MacOS, Windows, Linux, etc.) are
   not DNS-specific.  They typically provide a layer of indirection so
   that the application can work independent of the name resolution
   mechanism, which could be DNS, mDNS



Thaler                   Expires January 7, 2010                [Page 4]


Internet-Draft                IDN Encodings                    July 2009


   [I-D.cheshire-dnsext-multicastdns], LLMNR [RFC4795], NetBIOS-over-TCP
   [RFC1001][RFC1002], etc/hosts file [RFC0952], NIS [NIS], or anything
   else.  For example, RFC 3493 [RFC3493] specifies the getaddrinfo()
   API and contains many phrases like "For example, when using the DNS"
   and "any type of name resolution service (for example, the DNS)".
   Importantly, DNS is mentioned only as an example, and the application
   has no knowledge as to whether DNS or some other protocol will be
   used.

   Second, even with the DNS protocol, private name spaces (sometimes
   referred to as "split DNS"), do not necessarily use the same
   character set encoding scheme as the public Internet name space.

   We will discuss each of the above issues in subsequent sections.  For
   reference, Figure 2 depicts a more realistic architecture on typical
   hosts today.  More generally, the host may be multi-homed to one or
   more local networks, each of which may or may not be connected to the
   public Internet and may or may not have a private name space.

































Thaler                   Expires January 7, 2010                [Page 5]


Internet-Draft                IDN Encodings                    July 2009


    +-----------------------------------------+
    |Host                                     |
    |             +-------------+             |
    |             | Application |             |
    |             +------+------+             |
    |                    |                    |
    |             +------+------+             |
    |             |   Sockets   |             |
    |             |   Library   |             |
    |             +------+------+             |
    |                    |                    |
    |   +-----+------+---+--+-------+-----+   |
    |   |     |      |      |       |     |   |
    | +-+-++--+--++--+-++---+---++--+--++-+-+ |
    | |DNS||LLMNR||mDNS||NetBIOS||hosts||...| |
    | +---++-----++----++-------++-----++---+ |
    |                                         |
    +-----------------------------------------+
                         |
                   ______|______
                  /             \
                 /               \
                /      local      \
                \     network     /
                 \               /
                  \_____________/
                         |
                _________|_________
               /                   \
              /                     \
             /                       \
            |         Internet        |
             \                       /
              \                     /
               \___________________/

                          Realistic Architecture

                                 Figure 2

1.1.  APIs

   [RFC3490] section 6.2 states:

      It is expected that new versions of the resolver libraries in the
      future will be able to accept domain names in other charsets than
      ASCII, and application developers might one day pass not only
      domain names in Unicode, but also in local script to a new API for



Thaler                   Expires January 7, 2010                [Page 6]


Internet-Draft                IDN Encodings                    July 2009


      the resolver libraries in the operating system.  Thus the ToASCII
      and ToUnicode operations might be performed inside these new
      versions of the resolver libraries.

   Resolver APIs such as getaddrinfo() and its predecessor
   gethostbyname() were defined to accept "char *" arguments, meaning
   they accept a string of bytes, terminated with a NULL (0) byte.  This
   is sufficient for ASCII strings, Punycode strings, and even
   ISO-2022-JP and UTF-8 strings (unless an implementation artificially
   precludes them), but not UTF-16 or UTF-32 strings.  Several operating
   systems historically used in Japan will accept (and expect)
   ISO-2022-JP strings in such APIs.  Some platforms used worldwide also
   have new versions of the APIs (e.g., GetAddrInfoW() on Windows) that
   accept other encoding schemes such as UTF-16.

   It is worth noting that an API using "char *" arguments can
   distinguish between ASCII, Punycode, ISO-2022-JP, and UTF-8 strings
   as follows:
   o  if the string contains an ESC (0x1B) byte the string is
      ISO-2022-JP; otherwise,
   o  if any byte in the string has the high bit set, the string is
      UTF-8; otherwise,
   o  if the string starts with "xn--" then it is Punycode; otherwise,
   o  the string is ASCII.
   Again this assumes that ASCII names never start with "xn--", and also
   that UTF-8 strings never contain an ESC character.


2.  Use of Non-DNS Protocols

   As noted earlier, typical name resolution libraries are not DNS-
   specific.  Furthermore, some protocols are defined to use encoding
   schemes other than Punycode.  For example, mDNS specifies that UTF-8
   be used.  Indeed, the IETF policy on character sets and languages
   [RFC2277] states:

      Protocols MUST be able to use the UTF-8 charset, which consists of
      the ISO 10646 coded character set combined with the UTF-8
      character encoding scheme, as defined in [10646] Annex R
      (published in Amendment 2), for all text.  Protocols MAY specify,
      in addition, how to use other charsets or other character encoding
      schemes for ISO 10646, such as UTF-16, but lack of an ability to
      use UTF-8 is a violation of this policy; such a violation would
      need a variance procedure ([BCP9] section 9) with clear and solid
      justification in the protocol specification document before being
      entered into or advanced upon the standards track.  For existing
      protocols or protocols that move data from existing datastores,
      support of other charsets, or even using a default other than



Thaler                   Expires January 7, 2010                [Page 7]


Internet-Draft                IDN Encodings                    July 2009


      UTF-8, may be a requirement.  This is acceptable, but UTF-8
      support MUST be possible.

   Applications that convert an IDN to Punycode before calling
   getaddrinfo() will result in name resolution failures if the Punycode
   name is directly used in such protocols.  Having libraries or
   protocols to convert from Punycode to the encoding scheme defined by
   the protocol (e.g., UTF-8) would require changes to APIs and/or
   servers, which IDNA was intended to avoid.

   As a result, applications that assume that non-ASCII names are
   resolved using the public DNS and blindly convert them to Punycode
   without knowledge of what protocol will be selected by the name
   resolution library have problems.  Furthermore, name resolution
   libraries often try multiple protocols, until one succeeds, because
   they are defined to use a common name space.  For example, the hosts
   file ([RFC0952] and [RFC1123] section 2.1), DNS ([RFC1034] section
   2.1), and NetBIOS-over-TCP ([RFC1001] section 11.1.1) are all defined
   by RFC to be able to share a common name space.  This means that when
   an application passes a name to be resolved, resolution may in fact
   be attempted using multiple protocols, each with a potentially
   different encoding scheme.  For this to work successfully, the name
   must be converted to the appropriate encoding scheme only after the
   choice is made to use that protocol.  In general, this cannot be done
   by the application since the choice of protocol is not made by the
   application.


3.  Use of Non-ASCII in DNS

   A common misconception is that DNS only supports names that can be
   expressed using letters, digits, and hyphens.

   This misconception originally stemed from the definition of an
   "Internet host name" in [RFC0952], published in 1985, which defines
   the use of the hosts file.  An Internet host name was defined therein
   as including only letters, digits, and hyphens, where upper and lower
   case letters were to be treated as identical.  For DNS, [RFC1034]
   section 3.5 entitled "Preferred name syntax" then repeated this
   definition in 1987, saying that this "syntax will result in fewer
   problems with many applications that use domain names (e.g., mail,
   TELNET)".

   The confusion was thus left as to whether the "preferred" name syntax
   was a mandatory restriction, or merely "preferred".

   In 1989, [RFC1123] section 2.1 updated the definition of an Internet
   host name as defined in [RFC0952], to allow starting with a digit (to



Thaler                   Expires January 7, 2010                [Page 8]


Internet-Draft                IDN Encodings                    July 2009


   support IPv4 addresses in dotted-decimal form).  Section 6.1 of that
   RFC discusses the use of DNS (and the hosts file) for resolving host
   names to IP addresses and vice versa.  This led to confusion as to
   whether all names in DNS are "host names", or whether a "host name"
   is merely a special case of a DNS name.

   By 1997, things had progressed to a state where it was necessary to
   clarify these areas of confusion.  "Clarifications to the DNS
   Specification" [RFC2181] section 11 clarifies:

      The DNS itself places only one restriction on the particular
      labels that can be used to identify resource records.  That one
      restriction relates to the length of the label and the full name.
      The length of any one label is limited to between 1 and 63 octets.
      A full domain name is limited to 255 octets (including the
      separators).  The zero length full name is defined as representing
      the root of the DNS tree, and is typically written and displayed
      as ".".  Those restrictions aside, any binary string whatever can
      be used as the label of any resource record.  Similarly, any
      binary string can serve as the value of any record that includes a
      domain name as some or all of its value (SOA, NS, MX, PTR, CNAME,
      and any others that may be added).  Implementations of the DNS
      protocols must not place any restrictions on the labels that can
      be used.

   Hence, it clarified that the restriction to letters, digits, and
   hyphens does not apply to DNS names in general, nor to records that
   include "domain names".  Hence the "preferred" name syntax specified
   in [RFC1034] is indeed merely "preferred", not mandatory.

   Since there is no restriction even to ASCII, let alone letter-digit-
   hyphen use, DNS is in conformance with the requirement in [RFC2277]
   to allow UTF-8.

   However, this requirement is complicated by the fact that in an 8-bit
   clean protocol, one has to have some way of knowing whether a binary
   string is encoded in UTF-8, UTF-16, UTF-32, or some other encoding.

   While implementations of the DNS protocol must not place any
   restrictons on the labels that can be used, applications that use the
   DNS are free to impose whatever restrictions they like, and many
   have.  The above rules permit a domain name label that contains
   unusual characters, such as embedded spaces which many applications
   would consider a bad idea.  For example, the SMTP protocol [RFC5321]
   originally constrained the character set usable in email addresses
   and now has an effort underway to extend SMTP to support email
   address internationalization.




Thaler                   Expires January 7, 2010                [Page 9]


Internet-Draft                IDN Encodings                    July 2009


   Shortly after [RFC2181] and [RFC2277] were written, the need for
   internationalized names within private name spaces (i.e., within
   enterprises) arose.  The current (and past, predating Punycode)
   practice within enterprises that support other languages is to put
   UTF-8 names in their internal DNS servers in a private name space.
   For example, [I-D.skwan-utf8-dns-00] was first written in 1997, and
   was then widely deployed in Windows.  The use of UTF-8 names in DNS
   was similarly implemented and deployed in MacOS.  Within a private
   name space, and especially in light of [RFC2277], it was reasonable
   to assume within a private name space that binary strings were
   encoded in UTF-8.

   [EDITOR'S NOTE: There are also normalization/mapping issues which the
   next version of this document may explore.  Currently we only explore
   encoding issues.]

   Five years after UTF-8 was already in use in private name spaces in
   DNS, Punycode began to be developed ([I-D.ietf-idn-punycode-00] began
   in 2002, culminating in the publication of [RFC3492] in 2003) for use
   in the public DNS name space.  This publication thus resulted in
   having to use different encodings for different name spaces (where
   UTF-8 for private name spaces was already deployed).  Hence,
   referring back to Figure 2, a different encoding scheme may be in use
   on the Internet vs. a local network.

   In general a host may be connected to zero or more networks using
   private name spaces, plus potentially the public name space.
   Applications that convert an IDN to Punycode before calling
   getaddrinfo() will result in name resolution failures if the name is
   actually registered in a private name space in some other encoding
   (e.g., UTF-8).  Having libraries or protocols convert from Punycode
   to the encoding used by a private name space (e.g., UTF-8) would
   require changes to APIs and/or servers, which IDNA was intended to
   avoid.

   Some examples of cases that can happen in existing implementations
   today (where {non-ASCII} below represents some user-entered non-ASCII
   string) are:
   1.  User types in {non-ASCII}.{non-ASCII}.com, and the application
       passes it, in the form of a UTF-8 string, to getaddrinfo or
       gethostbyname or equivalent.
       *  The DNS resolver passes the (UTF-8) string unmodified to a DNS
          server.
   2.  User types in {non-ASCII}.{non-ASCII}.com, and the application
       passes it to a name resolution API that accepts strings in some
       other encoding such as UTF-16, e.g., GetAddrInfoW on Windows.





Thaler                   Expires January 7, 2010               [Page 10]


Internet-Draft                IDN Encodings                    July 2009


       *  The name resolution API decides to pass the string to DNS (and
          possibly other protocols).
       *  The DNS resolver converts the name from UTF-16 to UTF-8 and
          passes the query to a DNS server.
   3.  User types in {non-ASCII}.{non-ASCII}.com, but the application
       first converts it to Punycode such that the name that is passed
       to name resolution APIs is (say) xn--e1afmkfd.xn--
       80akhbyknj4f.com.
       *  The name resolution API decides to pass the string to DNS (and
          possibly other protocols).
       *  The DNS resolver passes the string unmodified to a DNS server.
       *  If the name is not found in DNS, the name resolution API
          decides to try another protocol, say mDNS.
       *  The query goes out in mDNS, but since mDNS specified that
          names are to be registered in UTF-8, the name isn't found
          since it was Punycode encoded in the query.
   4.  User types in {non-ASCII}, and the application passes it, in the
       form of a UTF-8 string, to getaddrinfo or equivalent.
       *  The name resolution API decides to pass the string to DNS (and
          possibly other protocols).
       *  The DNS resolver will append suffixes in the suffix search
          list, which may contain UTF-8 characters if the local network
          uses a private name space.
       *  Each FQDN in turn will then be sent in a query to a DNS
          server, until one succeeds.
   5.  User types in {non-ASCII}, but the application first converts it
       to Punycode, such that the name that is passed to getaddrinfo or
       equivalent is (say) xn--e1afmkfd.
       *  The name resolution API decides to pass the string to DNS (and
          possibly other protocols).
       *  The DNS stub resolver will append suffixes in the suffix
          search list, which may contain UTF-8 characters if the local
          network uses a private name space, resulting in (say) xn--
          e1afmkfd.{non-ASCII}.com
       *  Each FQDN in turn will then be sent in a query to a DNS
          server, until one succeeds.
       *  Since the private name space in this case uses UTF-8, the
          above queries fail, since the Punycode version of the name was
          not registered in that name space.
   6.  User types in {non-ASCII1}.{non-ASCII2}.{non-ASCII3}.com, where
       {non-ASCII3}.com is a public name space using Punycode, but {non-
       ASCII2}.{non-ASCII3}.com is a private name space using UTF-8,
       which is accessible to the user.  The application passes the
       name, in the form of a UTF-8 string, to getaddrinfo or
       equivalent.
       *  The name resolution API decides to pass the string to DNS (and
          possibly other protocols).




Thaler                   Expires January 7, 2010               [Page 11]


Internet-Draft                IDN Encodings                    July 2009


       *  The DNS resolver tries to locate the authoritative server, but
          fails the lookup because it cannot find a server for the UTF-8
          encoding of {non-ASCII3}.com, even though it would have access
          to the private name space.  (To make this work, the private
          name space would need to include the UTF-8 encoding of {non-
          ASCII3}.com.)

   When users use multiple applications, some of which do Punycode
   conversion prior to passing a name to name resolution APIs, and some
   of which do not, odd behavior can result which at best violates the
   principle of least surprise, and at worst can result in security
   vulnerabilities.

   First consider two competing applications, such as web browsers, that
   are designed to achieve the same task.  If the user types the same
   name into each browser, one may successfully resolve the name (and
   hence access the desired content) because the encoding scheme was
   correct, while the other may fail name resolution because the
   encoding scheme was incorrect.  Hence the issue can incent users to
   switch to another application (which in some cases means switching to
   an IDNA application, and in other cases means switching away from an
   IDNA application).

   Next consider two separate applications where one is designed to be
   launched from the other, for example a web browser launching a media
   player application when the link to a media file is clicked.  If both
   types of content (web pages and media files in this example) are
   hosted at the same IDN in a private name space, but one application
   converts to Punycode before calling name resolution APIs and the
   other does not, the user may be able to access a web page, click on
   the media file causing the media player to launch and attempt to
   retrieve the media file, which will then fail because the IDN
   encoding scheme was incorrect.  Or even worse, if an attacker was
   able to register the same name in the other encoding scheme, may get
   the content from the attacker's machine.  This is similar to a normal
   phishing attack, except that the two names represent exactly the same
   Unicode characters.


4.  Recommendations

   Taking into account the issues above, it would seem inappropriate for
   an application to convert a name to Punycode when it does not know
   whether DNS will be used by the name resolution library, or whether
   the name exists in a private name space that uses UTF-8, or in the
   global DNS that uses Punycode.

   Instead, conversion to Punycode, UTF-8, or whatever other encoding,



Thaler                   Expires January 7, 2010               [Page 12]


Internet-Draft                IDN Encodings                    July 2009


   should be done only by an entity that knows which protocol will be
   used (e.g., the DNS resolver, or getaddrinfo upon deciding to pass
   the name to DNS), rather than by general applications that call
   protocol-independent name resolution APIs.  Similarly, even when DNS
   is used, the conversion to Punycode should be done only by an entity
   that knows which name space will be used.

   That is, a more intelligent DNS resolver would be more liberal in
   what it would accept from an application and be able to query for
   both a Punycode name (e.g., over the Internet) and a UTF-8 name
   (e.g., over a corporate network with a private name space) in case
   the server only recognized one.  However, we might also take into
   account that the various resolution behaviors discussed earlier could
   also occur with record updates (e.g., with Dynamic Update [RFC2136]),
   resulting in some names being registered in a local network's private
   name space by applications doing Punycode conversion, and other names
   being registered using UTF-8.  Hence a name might have to be queried
   with both encodings to be sure to succeed without changes to DNS
   servers.

   Similarly, a more intelligent stub resolver would also be more
   liberal in what it would accept from a response as the value of a
   record (e.g., PTR) in that it would accept either UTF-8 or Punycode
   and convert them to whatever encoding is used by the application APIs
   to return strings to applications.

   Indeed the choice of conversion within the resolver libraries is
   consistent with the quote from [RFC3490] section 6.2 stating that
   Punycode conversion "might be performed inside these new versions of
   the resolver libraries".

   That said, some application-layer protocols may be defined to use
   Punycode rather than UTF-8 as recommended by [RFC2277].  In this
   case, an application may receive a Punycode name and want to pass it
   to name resolution APIs.  Again the recommendation is that a resolver
   library be more liberal in what it would accept from an application
   would mean that such a name would be accepted and re-encoded as
   needed, rather than requiring the application to do so.

   Finally, the question remains about what a DNS server should do to
   handle cases where some existing applications or hosts do Punycode
   queries within the local network using a private name space, and
   other existing applications or hosts send UTF-8 queries.  It is
   undesirable to store different records for different encodings of the
   same name, since this introduces the possibility for inconsistency
   between them.  Instead, a new DNS server could treat encoding-
   conversion in the same way as case-insensitive comparison which a DNS
   server is already required to do.  Two encodings are, in this sense,



Thaler                   Expires January 7, 2010               [Page 13]


Internet-Draft                IDN Encodings                    July 2009


   two representations of the same name, just as two case-different
   strings are.  However, whereas case comparison of non-ASCII
   characters is complicated by ambiguities (see [RFC4690]), encoding
   conversion between Punycode and UTF-8 is unambiguous.

   [EDITOR'S NOTE: There are also normalization/mapping issues which the
   next version of this document may explore.  Currently we only explore
   encoding issues.]


5.  Security Considerations

   Having applications convert names to Punycode before calling name
   resolution can result in security vulnerabilities.  If the name is
   resolved by protocols or in zones for which records are registered
   using other encoding schemes, an attacker can claim the Punycode
   version of the same name and hence trick the victim into accessing a
   different destination.  This can be done for any non-ASCII name, even
   when there is no possible confusion due to case, language, or other
   issues.  Other types of confusion beyond those resulting simply from
   the choice of encoding scheme are discussed in [RFC4690].


6.  IANA Considerations

   [RFC Editor: please remove this section prior to publication.]

   This document has no IANA Actions.


7.  IAB Members at the time of this writing

   Marcelo Bagnulo
   Gonzalo Camarillo
   Stuart Cheshire
   Vijay Gill
   Russ Housley
   John Klensin
   Olaf Kolkman
   Gregory Lebovitz
   Andrew Malis
   Danny McPherson
   David Oran
   Jon Peterson
   Dave Thaler


8.  References



Thaler                   Expires January 7, 2010               [Page 14]


Internet-Draft                IDN Encodings                    July 2009


8.1.  Normative References

8.2.  Informative References

   [I-D.cheshire-dnsext-multicastdns]
              Cheshire, S. and M. Krochmal, "Multicast DNS",
              draft-cheshire-dnsext-multicastdns-07 (work in progress),
              September 2008.

   [I-D.ietf-idn-punycode-00]
              Costello, A., "Punycode version 0.3.3",
              draft-ietf-idn-punycode-00 (work in progress), July 2002.

   [I-D.skwan-utf8-dns-00]
              Kwan, S. and J. Gilroy, "Using the UTF-8 Character Set in
              the Domain Name System", draft-skwan-utf8-dns-00 (work in
              progress), November 1997.

   [NIS]      Sun Microsystems, "System and Network Administration",
              March 1990.

   [RFC0952]  Harrenstien, K., Stahl, M., and E. Feinler, "DoD Internet
              host table specification", RFC 952, October 1985.

   [RFC1001]  NetBIOS Working Group, "Protocol standard for a NetBIOS
              service on a TCP/UDP transport: Concepts and methods",
              STD 19, RFC 1001, March 1987.

   [RFC1002]  NetBIOS Working Group, "Protocol standard for a NetBIOS
              service on a TCP/UDP transport: Detailed specifications",
              STD 19, RFC 1002, March 1987.

   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
              STD 13, RFC 1034, November 1987.

   [RFC1123]  Braden, R., "Requirements for Internet Hosts - Application
              and Support", STD 3, RFC 1123, October 1989.

   [RFC1468]  Murai, J., Crispin, M., and E. van der Poel, "Japanese
              Character Encoding for Internet Messages", RFC 1468,
              June 1993.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2136]  Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
              "Dynamic Updates in the Domain Name System (DNS UPDATE)",
              RFC 2136, April 1997.



Thaler                   Expires January 7, 2010               [Page 15]


Internet-Draft                IDN Encodings                    July 2009


   [RFC2181]  Elz, R. and R. Bush, "Clarifications to the DNS
              Specification", RFC 2181, July 1997.

   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
              Languages", BCP 18, RFC 2277, January 1998.

   [RFC2492]  Armitage, G., Schulter, P., and M. Jork, "IPv6 over ATM
              Networks", RFC 2492, January 1999.

   [RFC2781]  Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO
              10646", RFC 2781, February 2000.

   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
              "Internationalizing Domain Names in Applications (IDNA)",
              RFC 3490, March 2003.

   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
              for Internationalized Domain Names in Applications
              (IDNA)", RFC 3492, March 2003.

   [RFC3493]  Gilligan, R., Thomson, S., Bound, J., McCann, J., and W.
              Stevens, "Basic Socket Interface Extensions for IPv6",
              RFC 3493, February 2003.

   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
              10646", STD 63, RFC 3629, November 2003.

   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
              Recommendations for Internationalized Domain Names
              (IDNs)", RFC 4690, September 2006.

   [RFC4795]  Aboba, B., Thaler, D., and L. Esibov, "Link-local
              Multicast Name Resolution (LLMNR)", RFC 4795,
              January 2007.

   [RFC5321]  Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
              October 2008.

   [UNICODE]  The Unicode Consortium, "The Unicode Standard, Version
              4.0.0, defined by:  The Unicode Standard, Version 4.0",
              (Boston, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1) .










Thaler                   Expires January 7, 2010               [Page 16]


Internet-Draft                IDN Encodings                    July 2009


Author's Address

   Dave Thaler
   Microsoft Corporation
   One Microsoft Way
   Redmond, WA  98052
   USA

   Phone: +1 425 703 8835
   Email: dthaler@microsoft.com









































Thaler                   Expires January 7, 2010               [Page 17]

Document	Document type	This is an older version of an Internet-Draft that was ultimately published as RFC 6055. Expired & archived
	Select version	00 01 02 03 04 RFC 6055
	Compare versions
	Author
	RFC stream
	Other formats	txt pdf bibtex bibxml
	Additional resources