Network Working Group                                         J. Klensin
Internet-Draft                                           October 3, 2003
Expires: April 2, 2004


                Internationalization of Email Addresses
                  draft-klensin-emailaddr-i18n-00.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at http://
   www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 2, 2004.

Copyright Notice

   Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

   Internationalization of electronic mail addresses is, if anything,
   more important than the already-completed effort for domain names.
   In most of the contexts in which they are used, domain names can be
   hidden within or as part of various types of references. Email
   addresses, by contrast, are crucial: use of names of people or
   organizations as, or as part of, the email local part is, for obvious
   reasons, a well-established tradition on the network. Preventing
   people from spelling their names correctly is, in the long term,
   inexcusable.  At the same time, email addresses pose a number of
   special problems -- they are more difficult than simple domain names
   in some respects, but actually easier in others. This document
   discusses the issues with internationalization of email addresses,
   explains why some obvious approaches are incompatible with the



Klensin                  Expires April 2, 2004                  [Page 1]


Internet-Draft    Internationalization of Email Addresses   October 2003


   definitions and use of Internet mail, and proposes a solution that is
   likely to serve users and the network well for the long term.

Table of Contents

   1.    Introduction . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.    History, Context, and Design Constraints . . . . . . . . . .  4
   2.1   MUAs, MTAs, addresses, and learning from MIME and ESMTP  . .  4
   2.2   An MUA-based Solution is Not Necessary . . . . . . . . . . .  6
   2.2.1 Obtaining an Internationalized Email Address . . . . . . . .  7
   2.2.2 Relay environment  . . . . . . . . . . . . . . . . . . . . .  7
   2.2.3 Internationalizing the Sender  . . . . . . . . . . . . . . .  7
   2.3   An MUA-based Solution is Unworkable  . . . . . . . . . . . .  8
   2.3.1 MX diversion . . . . . . . . . . . . . . . . . . . . . . . .  8
   2.3.2 Embedded commands  . . . . . . . . . . . . . . . . . . . . .  8
   2.4   Encoding the Whole Address String  . . . . . . . . . . . . .  9
   2.5   Looking back and looking forward . . . . . . . . . . . . . . 10
   2.6   Summary of Design Issues . . . . . . . . . . . . . . . . . . 10
   3.    A Mail Transport-level Protocol  . . . . . . . . . . . . . . 10
   3.1   General Principles and Objectives  . . . . . . . . . . . . . 10
   3.2   Framework for the Internationalization Extension . . . . . . 11
   3.3   The Address Internationalization Service Extension . . . . . 11
   3.4   Extended Mailbox Address Syntax  . . . . . . . . . . . . . . 12
   3.5   Additional ESMTP Changes and Clarifications  . . . . . . . . 13
   3.5.1 The Initial SMTP Exchange  . . . . . . . . . . . . . . . . . 13
   3.5.2 Trace Fields . . . . . . . . . . . . . . . . . . . . . . . . 13
   3.6   Protocol Loose Ends  . . . . . . . . . . . . . . . . . . . . 13
   3.6.1 Punycode in Domain Names?  . . . . . . . . . . . . . . . . . 14
   3.6.2 Local Character Codes in Local Parts?  . . . . . . . . . . . 14
   3.6.3 Restrictions on Characters in Local Part?  . . . . . . . . . 14
   3.6.4 Requirement for 8BITMIME?  . . . . . . . . . . . . . . . . . 14
   3.6.5 Message Header and Body Issues with MTA Approach?  . . . . . 15
   3.6.6 Variant Addresses (Aliases) in a Command Verb  . . . . . . . 15
   3.6.7 The Received field 'for' clause  . . . . . . . . . . . . . . 15
   4.    Advice to Designers and Operators of Mail-receiving
         Systems  . . . . . . . . . . . . . . . . . . . . . . . . . . 15
   5.    Security considerations  . . . . . . . . . . . . . . . . . . 16
   6.    Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16
         Normative References . . . . . . . . . . . . . . . . . . . . 16
         Informative References . . . . . . . . . . . . . . . . . . . 17
         Author's Address . . . . . . . . . . . . . . . . . . . . . . 18
         Intellectual Property and Copyright Statements . . . . . . . 19









Klensin                  Expires April 2, 2004                  [Page 2]


Internet-Draft    Internationalization of Email Addresses   October 2003


1. Introduction

   Internationalization of electronic mail addresses is, if anything,
   more important than the already-completed effort for domain names.
   In most of the contexts in which they are used, domain names can be
   hidden within, or as part of, various types of references or the
   references themselves may be hidden.  It also remains controversial
   whether internationalization of domain names is actually necessary,
   no matter how attractive and important it may appear at first glance.
   Email addresses, by contrast, are crucial: use of names of people or
   organizations as, or as part of, the email local part is, for obvious
   reasons, a well-established tradition on the network.  Preventing
   people from spelling their names correctly is, in the long term,
   inexcusable.  However, while it is tempting to ignore them, email
   addresses pose a number of special problems.  Unlike domain names
   --and, consequently, the domain part of an email address (after the
   last "@")-- the local part (or mailbox name) is essentially
   unconstrained with regard to syntax or the characters used.  There
   are no special delimiters comparable to the period used to separate
   domain name labels, there is no standardized structure comparable to
   the domain name system's hierarchy, and it has always been a firm
   protocol requirement that no host other than the one to which final
   delivery is made is permitted to parse or interpret the address (see
   section 2.3.10 of [RFC2821]). In some respects, this makes things
   much more difficult: it is far more difficult to know what behavior
   will cause existing systems to cease working properly.  In others, it
   actually makes them easier, since the originating system is not
   required, indeed, must not, understand how the receiving one will
   interpret an address.

   The balance of this document explores these issues in more detail.

   While much of the description here depends on the abstractions of
   "Mail Transfer Agent" ("MTA") and "Mail User Agent" ("MUA"), it is
   important to understand that those terms and the underlying concepts
   postdate the design of the Internet's email architecture and the
   "protocols on the wire" principle. These two concepts have prevented
   any strong and standardized distinctions about how MTAs and MUAs
   interact on a given origin or destination host (or even whether they
   are separate).

   This document assumes a reasonable understanding of the protocols and
   terminology of the most recent core email standards documented in RFC
   2821 [RFC2821] and RFC 2822 [RFC2822].


   In its present internet-draft form, the document contains a great
   deal of explanatory material and rationale for the approach chosen.



Klensin                  Expires April 2, 2004                  [Page 3]


Internet-Draft    Internationalization of Email Addresses   October 2003


   The actual protocol material appears almost entirely in Section 3,
   especially Section 3.2 through Section 3.4.  If it appears to be a
   candidate for standards-track publication, the explanatory material,
   rationale, and most of the other background materials should be
   removed to a separate document.   Those who wish to skip the
   reasoning and comparison to other alternatives in this document and
   examine the protocol proposal should skip to those sections.

2. History, Context, and Design Constraints

   Several key issues in how email works and is handled impose
   significant constraints on the solution space.  Email is often used
   as a transport mechanism for information that will be acted on by
   computers, not merely read by people.  While the approach is not
   common, some of the systems that use it that way encode routing,
   processing, or validation information into the envelope address
   fields.  More commonly, recipient systems use special address formats
   to encode local routing or priority information.  In recent years,
   some of these addressing techniques have become important anti-spam
   tools for some users and communities.  These techniques have a long
   history.  Most or all of them conform to email standards and
   practices that, in turn, go back to the first uses of email on the
   ARPANet. Backward-compatibility --not damaging the interoperability
   of standards-conforming programs that are now deployed and working
   correctly-- makes it inappropriate to make decisions by conducting
   user surveys and concluding that "not too many" people will be hurt.
   Any new system must preserve existing practices and flexibilities
   unless there are overwhelming reasons -- e.g., an absence of
   plausible alternatives -- to not do so.

2.1 MUAs, MTAs, addresses, and learning from MIME and ESMTP

   The development and deployment of MIME [RFC2045] provided a number of
   important lessons for the community about how to design extensions
   and enhanced features without harm to the installed and conforming
   email system.  Perhaps the most important of these was that it is
   easier, and often more expedient, to make changes that have impact
   only on mail user agents. If it is possible to make changes that way
   --generally changes that involve only message headers and the message
   body or body parts-- users who need particular features can switch to
   user agents that support them or press for those features in the user
   agents they have already selected.  Even in the worst case in which
   support for features the user considers critical is not readily
   available, it is possible, with proper user agent design, to save the
   entire message to a file and then use stand-alone software to
   interpret the information and perform the desired functions.

   Providing these functions in the message headers and body permits



Klensin                  Expires April 2, 2004                  [Page 4]


Internet-Draft    Internationalization of Email Addresses   October 2003


   them to be moved opaquely through the mail transport system, thus
   avoiding any requirement to modify originating or delivery MTAs or
   intermediate relays.  In practice, the user may have little control
   over those systems.  Since changes to them typically impacts large
   numbers of users, those who are responsible for them are often
   reluctant to make changes in response to the needs of a few users.

   It is hence reasonable to conclude that, if it is feasible to support
   address internationalization strictly at the MUA level, keeping the
   internationalized addresses opaque to the transport system, that is a
   more desirable approach than requiring MTA changes. The MUA approach
   has been carefully examined by others [I-D.hoffman-imaa].  This
   document argues that

   1.  addressing is a fundamental MTA-level function,

   2.  some of the complexities encountered when trying to encode
       addresses so as to avoid MTA interactions are symptoms that
       attempting to "hide" the MTA function so that it can be handled
       by MUAs is not an architecturally desirable approach,

   3.  the restrictions on email uses and syntax required to provide
       internationalization at MUA level are unnecessarily risky, and
       almost certainly damaging, to deployed email infrastructure, and

   4.  MTA-level solutions are feasible, architecturally more elegant,
       and perhaps not as difficult to deploy in relevant communities as
       the strongest advocates of the MUA approach appear to imagine.

   The decision as to what to do in message bodies and formats (e.g.,
   [RFC2822]  and MIME [RFC2045]) and what to handle in message
   transport (i.e., [E]SMTP) is critical because, as discussed below,
   the level at which something is handled is both determined by, and
   determines, how information is appropriately encoded.   This decision
   ultimately depends on the application of two principles:

   1.  If body content is opaque, anything still visible to transport
       requires transport negotiation.

   2.  Anything an MTA -- origin, relay, MX, gateway, delivery -- needs
       to understand or process must be handled as part of mail
       transport.  The discussion below might be titled "why the MTA
       must get involved".

   Whether mail addresses meet these criteria, and hence must be
   comprehensible in transport, depends on how much the sending MUA
   needs to know to construct, and the delivery MTA needs to know to
   deliver, a message.  Traditionally, we have kept the former knowledge



Klensin                  Expires April 2, 2004                  [Page 5]


Internet-Draft    Internationalization of Email Addresses   October 2003


   level at zero: if a sender produces "!a!b!c@example.com" in response
   to information that it is a valid address, it still does not know
   whether this is a "bang path" or a slightly-perverse name for a
   single mailbox.  Is "xyz%def@example.com" a specification for routing
   to mailbox "xyz" on host "def" or a mailbox on the example.com host
   named "xyz%def".  Are "foo+bar@..." or "foo-baz@..." subaddresses
   "bar" and "baz" for the mailbox "foo", or are they simple addresses?
   Is "jjoneschem@labs.example.com" a local mailbox on that host or an
   instruction to route mail to "jjones" in the chemistry department?

   Under the rules established in [RFC0821] and [RFC1123], as summarized
   and updated in [RFC2821], all of those decisions are up to
   "example.com", its MX alternatives, or hosts in that domain, and they
   may make very local decisions about them.  For example, "xyz%def"
   might be a mailbox while "xyz%ghi" might be a route; "foo-baz" might
   be a subaddress while "foo-blog" might be a mailbox.

   The sender cannot, in the general case, know.

   Worse, while non-alphanumeric characters like "+", "-", and "%" have
   been used in these examples, delimiters for subaddresses, implicit
   routing, embedded commands, and so on are, again, up to the
   destination MTA and its interpretations.  "X" might be as good a
   delimiter as "+".  It might even be a better one in some
   applications. And, since local-parts are defined as case-sensitive,
   "x" might be a normal address character in the same address in which
   "X" was an important delimiter. Of course, in a completely non-ASCII
   environment, it would make sense to substitute characters from the
   local script for  "+", "-", "%", and so on.

   It is not even necessary to use a delimiter to support some forms or
   subaddressing or local routing.  Suppose an organization adopted the
   convention that externally-visible email address local parts were
   structured as, e.g., a three-letter department code, followed by a
   five-letter code representing the individual, optionally followed by
   a code representing a project.  Many organizations use just such
   systems and there is no way (and no need) for an email sender to
   understand the system or whether it is actually used for mail routing
   internally.

   Consequently, the idea of a sender breaking an address up into its
   component parts and encoding those parts separately is an
   impossibility without major, incompatible, and retroactive changes in
   how mail addressing is defined.

2.2 An MUA-based Solution is Not Necessary





Klensin                  Expires April 2, 2004                  [Page 6]


Internet-Draft    Internationalization of Email Addresses   October 2003


2.2.1 Obtaining an Internationalized Email Address

   One of the classic arguments for an MUA-based approach (to
   international addresses or anything else) is that users will be able
   to install and use solutions on their own, even if the administrators
   of their systems are unenthused about the particular function or
   extension and delay, or decline, to install it.  That argument was
   certainly true for MIME, especially in the presence of the capability
   to store messages as files and apply post-MUA tools.  But it does not
   seem to apply for email addresses.  In general, users cannot create
   email accounts, or aliases controlling delivery of messages from
   external systems.  Those accounts and aliases must be created by
   system administrators responsible for the mail servers.  If they are
   not sympathetic to internationalized mailbox names, such names will
   not exist on the receiving system. Having apparatus to send those
   names through the protocols will be essentially useless: a message
   that bounces because the relevant account or mailbox does not exist
   will bounce equally well whether the target address is in ASCII or in
   some other script and whether or not the receiving MTA is required to
   explicitly agree to access internationalized addresses. Conversely,
   if the administrators of the mail system host are sympathetic to
   internationalization, it is reasonable to expect that appropriate
   software can and will be installed at the MTA level.

2.2.2 Relay environment

   As in many other areas with email, the difficulties with an MTA-based
   model for internationalization of addresses arise, not when the
   originating MTA communicates directly with the delivery MTA, but when
   relay MTAs are involved.  If the both the sending and receiving
   systems support internationalized addresses, it is still possible
   that an intermediate relay will not do so, forcing mail to bounce
   that could be delivered if there were a direct connection between
   sender and receiver.  But, as with the installation of email
   addresses on a system, relays do not get inserted in the mail path by
   accident.  If internationalized addresses are important to the
   destination host, its administrators will chose lower-preference MX
   hosts or other relays that can support internationalized addresses.

2.2.3 Internationalizing the Sender

   If we assume a destination host that can accept, and properly handle,
   an internationalized address, and we assume that any MX-designated
   intermediaries for that host will be chosen to be similarly capable,
   one situation is left in which it would be advantageous to have an
   MUA-based solution.   If a originating/ sending system is not capable
   of generating or sending an internationalized address, but the
   prospective receiving system is, it would be good to enable the



Klensin                  Expires April 2, 2004                  [Page 7]


Internet-Draft    Internationalization of Email Addresses   October 2003


   originating user to generate and somehow send to the relevant
   address.

   This is a real issue, and deserves some serious consideration. But it
   seems better to find a good temporary, transitional, mechanism for it
   than to permanently burden the email system with an uncomfortable
   mechanism just to accommodate this case.  One example of a
   transitional mechanism might be to use ESMTP tunneling over MIME
   [RFC2442] to route the address and message to a friendly gateway host
   that would unpack the message and transmit it using this
   specification.   Other examples, less attractive at first glance but
   still plausible, would include defining and using small variations on
   the message encapsulation mechanisms that are integral to MIME
   [RFC2046], or the more complex encapsulation designed for HTML
   [RFC2557], to accomplish the same purpose.

   So, a user with an MUA that has the capability to handle an
   internationalized address, but who does not have access to an
   originating MTA with the capabilities defined here, may be given
   access to a reasonable transition strategy until the needed
   capabilities are available.  Note that this does not require an open
   relay, since all of the user authentication capabilities of ESMTP
   [RFC2554] and SUBMIT [RFC2476] would be available.  One can even
   imagine a service with a per-message charging system, which would
   presumably encourage rapid upgrading.

2.3 An MUA-based Solution is Unworkable

   The examples given above are, perhaps obviously, not the only ones.
   Other issues arise with intermediate MX relay and gateway hosts,
   commands embedded in local parts, and special formats used in
   gateways to other environments, among other cases.

2.3.1 MX diversion

   If the domain part of an email address is associated with several MX
   records and the mail is delivered to one of them that is not the best
   preference host, the receiving host is not required to use SMTP.  If,
   instead, it performs some gateway function, it may need to inspect or
   alter the local part to determine how to route and deliver the
   message.   If the local part were encoded in some fashion that
   prevented that inspection process, and the MTA was not aware that it
   needed to apply special techniques, mail delivery might well fail.

2.3.2 Embedded commands

   In addition to the address forms with special syntax or semantics
   described elsewhere, systems have been developed that embed commands



Klensin                  Expires April 2, 2004                  [Page 8]


Internet-Draft    Internationalization of Email Addresses   October 2003


   in address local parts.  These might, of course, use entirely
   different syntax parts and formats than are typical in conventional
   addresses and, in an internationalized environment, might reasonably
   use character coding conventions that are neither ASCII nor
   Unicode-based.

   A number of specialized applications of email do require, or
   recommend, specific syntax in the local part.  These are identified,
   not to indicate that they are the only cases (they are not) but to
   reinforce the point that one must be quite cautious in doing anything
   that makes global assumptions about local part syntax and significant
   characters.  These applications include local part explicit routing
   with the "percent hack" [RFC1123], gateways to and from X.400
   environments [RFC2156], and gateways to fax systems [RFC3192].

2.4 Encoding the Whole Address String

   Much of the above demonstrates why selective encoding of parts of the
   local-part string is not practical.  Why, then, not encode the entire
   string and insist that the delivery MTA recognize the presence of an
   encoded form and do whatever decoding is needed before it does other
   processing?  There are three major reasons to approach the problem
   this way:

   1.  Any change in address syntax interpretation is likely to be a
       major, incompatible, change, since we do not now impose any
       restrictions on how an MTA is organized or even on how, or
       whether, the MTA and MUA functions are actually divided up on a
       given host.  Converting user agents to handle international forms
       of addresses in a way that does not produce user astonishment is
       likely to be a major undertaking, regardless of what is done to
       the protocols and at what level.

   2.  Imposing a requirement that MTAs "understand" local-parts so that
       they can be partially decoded as part of mail routing would seem
       to defeat the main goal of encoding internationalized strings
       into a compact ASCII-compatible form, i.e., to keep MTAs from
       needing to understand the extended naming system

   3.  We potentially have three different encodings of an
       internationalized string: the one used by the MTA, the one used
       by the MUA, and the one seen by the user through applications
       software or the operating system's display interface.  Having all
       three of these identical or closely compatible is desirable from
       the standpoint of user understanding and debugging.  Having them
       different can cause many "interesting" problems, e.g., having to
       return an error message that uses different coding, and hence
       might represent an entirely different string, than the string the



Klensin                  Expires April 2, 2004                  [Page 9]


Internet-Draft    Internationalization of Email Addresses   October 2003


       user put into the process.

   Instead, it would seem sensible to move from a straightforward
   encoding of mail addresses in ASCII to a straightforward encoding in
   Unicode via UTF-8 [RFC2277], imposing only those restrictions on the
   characters in the local part that are implied by Unicode itself.

2.5 Looking back and looking forward

   Another principle is implied by some of the discussion above.
   Internationalization measures for the Internet will be with us for as
   long as there are multiple languages and scripts in the world, i.e.,
   probably forever.  If a satisfactory long-term solution can be found,
   and a reasonable transition strategy can be defined for it, it is
   much better to optimize for the long term.  The alternative of making
   things more difficult or less functional forever in order to save
   some small effort in transition, of even to make the transition a few
   months faster, represents a very poor tradeoff.

2.6 Summary of Design Issues

   Each of the above subsections describes a strong case for continuing
   to treat addressing as an MTA function, opaque except at the end
   systems. The main alternative is to rely on the sending system being
   able to understand the addressing system of the target host, and any
   relays accessed through MX relays, potentially needing to be able to
   remove IDN encoding ("punycode" or otherwise) in order to determine
   how to process or route the message.  That alternative violates a
   long-standing and important design principle of Internet email,
   complicates a number of other cases, and does not offer sufficient
   transition advantages to be worth any of those difficulties.

3. A Mail Transport-level Protocol

3.1 General Principles and Objectives

   1.  Whatever encoding used should apply to the whole address and be
       directly compatible with software used at the user interface.

   2.  An SMTP relay must either recognize the format explicitly,
       agreeing to do so via an ESMTP option, or bounce the message so
       that the sender can make another plan.

   3.  If any charset other than UTF-8 or punycode is permitted and used
       for the local part, its interpretation at the "what does this
       mean" level is the responsibility of the receiving MTA.





Klensin                  Expires April 2, 2004                 [Page 10]


Internet-Draft    Internationalization of Email Addresses   October 2003


3.2 Framework for the Internationalization Extension

   The following service extension is defined:

   1.  the name of the SMTP service extension is "Internationalized
       Addresses";

   2.  the EHLO keyword value associated with this extension is "I18N";

   3.  No parameter values are defined for this EHLO keyword value. In
       order to permit future (although unanticipated) extensions, the
       EHLO response MUST NOT contain any parameters.  If a parameter
       appears, the SMTP client that is conformant to this version of
       this specification MUST treat the ESMTP response as if the I18N
       keyword did not appear.

   4.  no parameters are added to any SMTP command.

       [[Note in draft: A variation on this is probably excess
       complexity, rather than a good tradeoff, but should be considered
       in terms of whether it would be a good transitional aid. It would
       be possible to permit an optional parameter on the MAIL and RCPT
       commands that would specify an all-ASCII address to be used if an
       MTA (SMTP Sender) encounters an SMTP Receiver that does not
       support this extension.  Such a parameter might be called
       "AddressVariant" or even just "alias".  It would be especially
       useful in error handling if used on the MAIL command. ]]

   5.  no additional SMTP verbs are defined by this extension.

   The remainder of this memo specifies how support for the extension
   affects the behavior of an SMTP client and server.

3.3 The Address Internationalization Service Extension

   In the absence of this extension, SMTP clients and servers are
   constrained to using only those addresses permitted by RFC 2821.  The
   local parts of those addresses may be made up of any ASCII
   characters, although certain of them must be quoted as specified
   there.  It is notable in an internationalization context that there
   is a long history on some systems of using over struck ASCII
   characters (a character, a backspace, and another character) within a
   quoted string to approximate non-ASCII characters.  This form of
   internationalization should probably be phased out as this extension
   becomes widely deployed but backward-compatibility considerations
   require that it continue to be supported.

   An SMTP Server that announces this extension MUST be prepared to



Klensin                  Expires April 2, 2004                 [Page 11]


Internet-Draft    Internationalization of Email Addresses   October 2003


   accept a UTF-8 string [RFC2279] in any position in which RFC 2821
   specifies that a "mailbox" may appear.  That string must be parsed
   only as specified in RFC 2821, i.e., by separating the mailbox into
   source route, local part and domain part, using only the characters
   colon (U+003A), comma (U+002C), and at-sign (U+0040) as specified
   there.  Once isolated by this parsing process, the local part MUST be
   treated as opaque unless the SMTP Server is the final delivery MTA.
   Any domain names that are to be looked up in the DNS MUST be
   processed into punycode form as specified in IDNA [RFC3490] unless
   they are already in that form. Any domain names that are to be
   compared to local strings SHOULD be checked for validity and then
   MUST be compared as specified in IDNA.

   An SMTP Client that receives the I18N extension keyword MAY transmit
   a mailbox name as an internationalized string in UTF-8 form. It MAY
   transmit the domain part of that string in either punycode (derived
   from the IDNA process) or UTF-8 form but, if it sends the domain in
   UTF-8, it SHOULD first verify that the string is valid for a domain
   name according to IDNA rules.  As required by RFC 2821, it MUST not
   attempt to parse, evaluate, or transform the local part in any way.
   If the I18N SMTP extension is not offered by the Server, the SMTP
   Client MUST not transmit an internationalized address.  Instead, it
   MUST either return the message to the user as undeliverable or
   replace it, using some process outside the scope of this
   specification such as a directory lookup, with a local-part that
   conforms to the syntax rules of RFC 2821.

3.4 Extended Mailbox Address Syntax

   RFC 2821, section 4.1.2, defines the syntax of a mailbox as


         Mailbox = Local-part "@" Domain

         Local-part = Dot-string / Quoted-string
               ; MAY be case-sensitive

         Dot-string = Atom *("." Atom)

         Atom = 1*atext

         Quoted-string = DQUOTE *qcontent DQUOTE

         Domain = (sub-domain 1*("." sub-domain)) / address-literal
         sub-domain = Let-dig [Ldh-str]


   (see that document for productions and definitions not provided here



Klensin                  Expires April 2, 2004                 [Page 12]


Internet-Draft    Internationalization of Email Addresses   October 2003


   -- their details are not important to understanding this
   specification). The key changes made by this specification are,
   informally, to

   o  Change the definition of "sub-domain" to permit either the
      definition above or a UTF-8 (or other, see Section 3.6.1) string
      representing a label that is conformant with IDNA [RFC3490].  That
      sub-domain string MUST NOT contain the characters "@" or ".".

   o  Change the definition of "Atom" to permit either the definition
      above or a UTF-8 (or other, see Section 3.6.3) string.  That
      string MUST NOT contain any of the ASCII characters (either
      graphics or controls) that are not permitted in "atext"; it is
      otherwise unrestricted.


3.5 Additional ESMTP Changes and Clarifications

   The mail transport process involves addresses ("mailboxes") and
   domain names in contexts in addition to the MAIL and RCPT commands
   and extended alternatives to them.  In general, the rule is that,
   when RFC 2821 specifies a mailbox, UTF-8 is used for the entire
   string; when it specifies a domain name, the name should be in
   punycode form if its raw form is non-ASCII.

   The following subsections list and discuss all of the relevant cases.
   [[Note in draft: I hope]]

3.5.1 The Initial SMTP Exchange

   When an SMTP or ESMTP connection is opened, the server sends a
   "banner" response consisting of the 220 reply code and some
   information.  The client then sends the EHLO command.  Since the
   client cannot know whether the server supports internationalized
   addresses until after it receives the response from EHLO, any domain
   names that appear in this dialogue, or in responses to EHLO, must be
   in hostname form, i.e., internationalized ones must be in punycode
   form.

3.5.2 Trace Fields

   Internationalized domain names in Received fields should be
   transmitted in Unicode form.    Addresses in "for" clauses need
   further examination and might be treated differently depending on
   whether 8BITMIME is a requirement for internationalized addresses.

3.6 Protocol Loose Ends




Klensin                  Expires April 2, 2004                 [Page 13]


Internet-Draft    Internationalization of Email Addresses   October 2003


   These issues should be resolved, and this section eliminated, before
   the document is considered complete.

3.6.1 Punycode in Domain Names?

   It is not clear whether the flexibility of being able to pass domain
   names in punycode, as well as UTF-8, form is needed.  If it is not,
   it should be eliminated as excess complexity.

3.6.2 Local Character Codes in Local Parts?

   There are some reasons for permitting local-parts to be written in
   locally-used character codes, i.e., in other than the UTF-8 encoding
   of UNICODE.  It clearly increases flexibility, and the mailbox part
   can be defined as a simple octet string (as it essentially is in the
   sections above).   We can reasonably expect that some systems,
   operating in local environments, will use local character codes no
   matter what we specify.   On the other hand, having an application
   presented with an octet (or bit) string and not knowing what charset
   is involved would wreak havoc on any attempt to intelligently display
   local parts: if one cannot know the character coding being used, then
   it is not possible to accurately decode the characters and display
   appropriate character glyphs.

   Use of local coding also implies an encoding for the local part
   different from that for the domain part -- any MTA in the path must
   be able to resolve the domain part into something that can be looked
   up in the DNS and resolved and that, in turn, requires a
   globally-known encoding.

3.6.3 Restrictions on Characters in Local Part?

   The specification is extremely liberal about what can be included in
   a UTF-8 string that represents a local-part.  In return, it
   effectively prohibits the use of quoted strings, or quoted
   characters, in non-ASCII local parts.  Those have, in general, been
   nothing but trouble and there appears to be no reason to carry that
   trouble forward into an internationalized world (and the much greater
   complexity that quoting in that environment might imply).   There may
   be a strong case for applying restrictions, e.g., by use of a
   stringprep [RFC3454] profile that would eliminate particularly
   problematic characters while not forcing, e.g., even an approximation
   to case-mapping (remember that ASCII local-parts are inherently case
   sensitive, even though local systems are encouraged to not take
   advantage of that feature).

3.6.4 Requirement for 8BITMIME?




Klensin                  Expires April 2, 2004                 [Page 14]


Internet-Draft    Internationalization of Email Addresses   October 2003


   This extension is carefully defined to be independent of "8BITMIME".
   However, given the length of time 8BITMIME has been around, the
   amount of deployment of it that exists, and the rather low likelihood
   that any MTA implementer in his or her right mind will go to the
   trouble of implementing this without also implementing 8BITMIME, it
   may be sensible to permit this extension only if 8BITMIME also
   appears.

3.6.5 Message Header and Body Issues with MTA Approach?

   By viewing i18n addresses as an MTA problem, this document does not
   address a number of interesting 2822/MIME issues.  In particular, if
   both this extension and 8BITMIME are in use, is it sensible to drop
   the requirement for RFC 2047/ 2231 encoding of personal name fields?

3.6.6 Variant Addresses (Aliases) in a Command Verb

   A determination should be made as to whether a parameter to the MAIL
   and RCPT commands that would specify an alternate, ASCII-only,
   address is desirable and the text in Section 3.2, item 4, corrected
   accordingly.

3.6.7 The Received field 'for' clause

   Decide what to do about the value of the "for" clause in Received
   fields.  See Section 3.5.2.

4. Advice to Designers and Operators of Mail-receiving Systems

   As discussed above, in the historical Internet email context, the
   interpretation and permitted syntax for an email local-part is
   entirely the responsibility of the receiving system.  Systems can get
   themselves into trouble and, more particularly, can seriously
   restrict the number and type of users who can send mail to their
   users, by poor choices of format and syntax.  For example, general
   advice to system designers has long included "treat addresses in a
   case-independent fashion" and "do not use addresses that require
   quoting" in order to increase the odds that remote users will be able
   to properly compose and transmit intended addresses.   In a way, that
   advice is an extreme generalization of the "receiver" side of the
   robustness principle: being generous in what one accepts implies
   accepting as many plausible variations of an address local-part
   string as possible and designing the strict forms of those strings to
   facilitate differentiation when it is appropriate.

   As one moves toward internationalization of local parts, an expanded
   version of these principles is useful and may be even more
   appropriate, even though it is neither necessary nor desirable to



Klensin                  Expires April 2, 2004                 [Page 15]


Internet-Draft    Internationalization of Email Addresses   October 2003


   turn those principles into protocol requirements.  For example, a
   receiving host should normally consider any string that would match
   under nameprep rules --or perhaps any string that would match under
   an expanded stringprep protocol-- as matching for local-part
   purposes. An even more "liberal" receiving host might use some sort
   of variant tables for its script(s) of interest to further expand the
   matching rules.

   But, whatever extended matching rules the local host adopts, those
   rules are a property of that host.  Senders should continue to be
   conservative about what they send, and relays should continue to
   avoid presumptions about their understanding of the content of
   local-parts. Receiving systems that have reason to adopt more
   restricted syntax rules, or interpretations of matching, should
   continue to be able to do so.

5. Security considerations

   Any expansion of permitted characters and encoding forms in email
   addresses raises the risk, however slight, of misdirected or
   undeliverable mail.  The problem is worsened if address information
   is carried in local character sets and must be converted to some
   standard form.  Any conversion of character sets may also be
   problematic for digitally-signed information.  Modulo those concerns,
   the ideas proposed here do not introduce new security issues.

6. Acknowledgements

   The author acknowledges the contributions and comments of Dave
   Crocker in a personal conversation, and the efforts of a private
   discussion group, led by Paul Hoffman and Adam Costello, to develop
   an MUA-only solution to this problem.  The author had hoped that
   effort would succeed, since the idea of requiring transport changes
   to support internationalization (or any other new function) is
   unattractive and should be avoided when possible.  Difficulties that
   group has encountered in properly defining a number of boundary
   conditions, including appropriate delimiters for permitting internal
   parsing of the local part and problems with right-to-left characters
   and substrings, have led to the conclusion that it is time to get a
   specific, transport-based, approach on the table.  While their ideas
   have inspired several of the properties of this proposal they are, of
   course, not responsible for the result and will probably disagree
   with it.

Normative References

   [RFC0821]  Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC
              821, August 1982.



Klensin                  Expires April 2, 2004                 [Page 16]


Internet-Draft    Internationalization of Email Addresses   October 2003


   [RFC1123]  Braden, R., "Requirements for Internet Hosts - Application
              and Support", STD 3, RFC 1123, October 1989.

   [RFC2279]  Yergeau, F., "UTF-8, a transformation format of ISO
              10646", RFC 2279, January 1998.

   [RFC2821]  Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
              April 2001.

   [RFC3490]  Faltstrom, P., Hoffman, P. and A. Costello,
              "Internationalizing Domain Names in Applications (IDNA)",
              RFC 3490, March 2003.

   [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
              Profile for Internationalized Domain Names (IDN)", RFC
              3491, March 2003.

   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
              for Internationalized Domain Names in Applications
              (IDNA)", RFC 3492, March 2003.

Informative References

   [I-D.hoffman-imaa]
              Hoffman, P. and A. Costello, "Internationalizing Mail
              Addresses in Applications (IMAA)", draft-hoffman-imaa-02
              (work in progress), August 2003.

   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
              Extensions (MIME) Part One: Format of Internet Message
              Bodies", RFC 2045, November 1996.

   [RFC2046]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
              Extensions (MIME) Part Two: Media Types", RFC 2046,
              November 1996.

   [RFC2056]  Denenberg, R., Kunze, J. and D. Lynch, "Uniform Resource
              Locators for Z39.50", RFC 2056, November 1996.

   [RFC2156]  Kille, S., "MIXER (Mime Internet X.400 Enhanced Relay):
              Mapping between X.400 and RFC 822/MIME", RFC 2156, January
              1998.

   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
              Languages", BCP 18, RFC 2277, January 1998.

   [RFC2442]  Freed, N., Newman, D. and Hoy, M., "The Batch SMTP Media
              Type", RFC 2442, November 1998.



Klensin                  Expires April 2, 2004                 [Page 17]


Internet-Draft    Internationalization of Email Addresses   October 2003


   [RFC2476]  Gellens, R. and J. Klensin, "Message Submission", RFC
              2476, December 1998.

   [RFC2554]  Myers, J., "SMTP Service Extension for Authentication",
              RFC 2554, March 1999.

   [RFC2556]  Bradner, S., "OSI connectionless transport services on top
              of UDP Applicability Statement for Historic Status", RFC
              2556, March 1999.

   [RFC2557]  Palme, F., Hopmann, A., Shelness, N. and E. Stefferud,
              "MIME Encapsulation of Aggregate Documents, such as HTML
              (MHTML)", RFC 2557, March 1999.

   [RFC2822]  Resnick, P., "Internet Message Format", RFC 2822, April
              2001.

   [RFC3192]  Allocchio, C., "Minimal FAX address format in Internet
              Mail", RFC 3192, October 2001.

   [RFC3454]  Hoffman, P. and M. Blanchet, "Preparation of
              Internationalized Strings ("stringprep")", RFC 3454,
              December 2002.


Author's Address

   John C Klensin
   1770 Massachusetts Ave, #322
   Cambridge, MA  02140
   USA

   Phone: +1 617 491 5735
   EMail: john-ietf@jck.com

















Klensin                  Expires April 2, 2004                 [Page 18]


Internet-Draft    Internationalization of Email Addresses   October 2003


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights. Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11. Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard. Please address the information to the IETF Executive
   Director.


Full Copyright Statement

   Copyright (C) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assignees.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION



Klensin                  Expires April 2, 2004                 [Page 19]


Internet-Draft    Internationalization of Email Addresses   October 2003


   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.











































Klensin                  Expires April 2, 2004                 [Page 20]