Internet Engineering Task Force                                 M. Davis
Internet-Draft                                                    Google
Intended status: BCP                                         A. Phillips
Expires: July 17, 2010                                            Lab126
                                                               Y. Umaoka
                                                                     IBM
                                                        January 13, 2010


                           BCP 47 Extension U
                      draft-davis-u-langtag-ext-00

Abstract

   This document specifies an Extension to BCP 47 which provides subtags
   that specify language and/or locale-based behavior or refinements to
   language tags, according to work done by the Unicode Consortium.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on July 17, 2010.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents



Davis, et al.             Expires July 17, 2010                 [Page 1]


Internet-Draft       BCP 47 Unicode Locale Extension        January 2010


   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the BSD License.


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . . . 3
   2.  BCP47 Required Information  . . . . . . . . . . . . . . . . . . 3
     2.1.  Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 4
       2.1.1.  Canonicalization  . . . . . . . . . . . . . . . . . . . 5
     2.2.  Registration Form . . . . . . . . . . . . . . . . . . . . . 5
   3.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . 5
   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
   5.  Security Considerations . . . . . . . . . . . . . . . . . . . . 6
   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
     6.1.  Normative References  . . . . . . . . . . . . . . . . . . . 6
     6.2.  Informative References  . . . . . . . . . . . . . . . . . . 6
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 6



























Davis, et al.             Expires July 17, 2010                 [Page 2]


Internet-Draft       BCP 47 Unicode Locale Extension        January 2010


1.  Introduction

   [BCP47] permits the definition and registration of language tag
   extensions "that contain a language component and are compatible with
   applications that understand language tags".  This document defines
   an extension for identifying Unicode locale-based variations using
   language tags.  The "singleton" identifier for this extension is 'u'.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119.


2.  BCP47 Required Information

   Language tags, as defined by [BCP47], are useful for identifying the
   language of content.  They are also used as locale identifiers (or
   can be mapped to locales) in many operating environments and APIs.
   However, most such locale identifiers also provide additional
   "tailorings" or options for specific values within a language,
   culture, region, or other variation.  This extension provides a
   mechanism for using these additional tailorings within language tags
   for general interchange.

   The maintaining authority for this extension's registry is the
   Unicode Consortium.  Unicode defines common locale data and
   identifiers for this data:

   +---------------+---------------------------------------------------+
   | Item          | Value                                             |
   +---------------+---------------------------------------------------+
   | Name          | Unicode Consortium                                |
   | Contact Email | cldr@unicode.org                                  |
   | Discussion    | cldr-users@unicode.org                            |
   | List Email    |                                                   |
   | URL Location  | cldr.unicode.org                                  |
   | Specification | Unicode Technical Standard #35 Unicode Locale     |
   |               | Data Markup Language (LDML),                      |
   |               | http://unicode.org/reports/tr35/                  |
   | Section       | Section 3.2 BCP 47 Tag Conversion                 |
   +---------------+---------------------------------------------------+

   The specification of extension subtags is provided by Section 3 of
   Unicode Technical Standard #35 Unicode Locale Data Markup Language
   [LDML].  As required by BCP 47, subtags follow the language tag ABNF
   and other rules for the formation of language tags and subtags, are



Davis, et al.             Expires July 17, 2010                 [Page 3]


Internet-Draft       BCP 47 Unicode Locale Extension        January 2010


   restricted to the ASCII letters and digits, are not case sensitive,
   and do not exceed eight characters in length.

   [LDML] specifies a canonical representation.  LDML is available over
   the Internet and at no cost, and is available via a royalty-free
   license at http://unicode.org/copyright.html.  LDML is versioned, and
   each version of LDML is numbered, dated, and stable.  Extension
   subtags, once defined by LDML, are never retracted or change in
   meaning in a substantial way.

2.1.  Summary

   The subtags available for use in the 'u' extension consist of a set
   of attributes, keys, and types.  Attributes, keys, types, and their
   respective meanings are defined in Section 3 (Unicode Language and
   Locale Identifiers) of [LDML].  The following is a summary of that
   definition (for details see Section 3):

   o  An 'attribute' is a subtag with a length of three or more
      characters following the singleton and preceding any 'keyword'
      sequences.  No attributes were defined at the time of this
      document's publication.

   o  A 'keyword' is a sequence of subtags consisting of a 'key' subtag,
      followed by zero or more 'type' subtags.  Each 'key' MUST be
      unique within the extension.  The order of the 'type' subtags
      within a 'keyword' is sometimes significant to their
      interpretation.  Note that 'keys' can appear without a subsequent
      'type' subtag.

      A.  A 'key' is a subtag with a length of exactly two characters.
          Each 'key' is followed by zero or more 'type' subtags.

      B.  A 'type' is a subtag with a length of three or more characters
          following a key.  'Type' subtags are specific to a particular
          'key' and the order of the 'type' subtags MAY be significant
          to the interpretation of the 'keyword'.

   For example, the language tag "de-DE-u-attr-co-phonebk" consists of:

   o  The base language tag "de-DE" (German as used in Germany), exactly
      as defined by [BCP47] using subtags from the IANA Language Subtag
      Registry.

   o  The singleton 'u', identifying this extension.

   o  The attribute 'attr', which is an example for illustration (no
      attributes were defined at the time this document was published).



Davis, et al.             Expires July 17, 2010                 [Page 4]


Internet-Draft       BCP 47 Unicode Locale Extension        January 2010


   o  The keyword 'co-phonebk', consisting to the key 'co' (Collation)
      and the type 'phonebk' (Phonebook collation order).

   With successive versions of [LDML], additional attributes, keys, and
   types MAY be defined.  Once defined, attributes, keys, and types will
   never be removed.  Machine-readable files listing the valid
   attributes, keys, and types are available in the CLDR repository for
   each version.  For example, for version 1.7.2, the files are located
   at http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/.
   These also can contain aliases which were used in previous versions
   of [LDML].

2.1.1.  Canonicalization

   As required by [BCP47], case is not significant.  The canonical form
   for all subtags in the extension is lowercase.  The canonical order
   of attributes is in [US-ASCII] order (that is, numbers before
   letters, with letters sorted as lowercase US-ASCII code points).  The
   canonical order of keywords is in [US-ASCII] order by key.  The order
   of subtags within a keyword is significant; the meaning of this
   extension is altered if those subtags are rearranged.  Thus, the
   canonical form of the extension never reorders the subtags within a
   keyword.

2.2.  Registration Form

   Per [RFC5646], Section 3.7:
   %%
   Identifier: u
   Description: Unicode Locale
   Comments: Subtags for the identification of language and cultural
       variations. Used to set behavior in locale APIs.
   Added: 2009-mm-dd
   RFC: [TBD]
   Authority: Unicode Consortium
   Contact_Email: cldr@unicode.org
   Mailing_List: cldr-users@unicode.org
   URL: http://cldr.unicode.org
   %%


3.  Acknowledgements

   Thanks to John Emmons and the rest of the Unicode CLDR Technical
   Committee for their work in developing the BCP 47 subtags for LDML.






Davis, et al.             Expires July 17, 2010                 [Page 5]


Internet-Draft       BCP 47 Unicode Locale Extension        January 2010


4.  IANA Considerations

   This document will require IANA to insert the record in Section 2.2
   into the Language Extensions Registry, according to Section 3.7.
   Extensions and the Extensions Registry of "Tags for Identifying
   Languages" in [BCP47].  There might be occasional maintenance of this
   record.  This document does not require IANA to create or maintain a
   new registry or otherwise impact IANA.


5.  Security Considerations

   The security considerations for this extension are the same as those
   for [RFC5646] (or its successors).  See Section 6.  Security
   Considerations of [RFC5646].


6.  References

6.1.  Normative References

   [BCP47]    Davis, M., Ed., "Tags for the Identification of Language
              (BCP47)", September 2009.

   [LDML]     Davis, M., "Unicode Technical Standard #35: Locale Data
              Markup Language (LDML)", December 2007,
              <http://www.unicode.org/reports/tr35/>.

   [RFC5646]  Phillips, A. and M. Davis, "Tags for Identifying
              Languages", BCP 47, RFC 5646, September 2009.

   [US-ASCII]
              International Organization for Standardization, "ISO/IEC
              646:1991, Information technology -- ISO 7-bit coded
              character set for information interchange.", 1991.

6.2.  Informative References

   [ldml-registry]
              "Registry for Common Locale Data Repository tag elements",
              September 2009.










Davis, et al.             Expires July 17, 2010                 [Page 6]


Internet-Draft       BCP 47 Unicode Locale Extension        January 2010


Authors' Addresses

   Mark Davis
   Google

   Email: mark@macchiato.com


   Addison Phillips
   Lab126

   Email: addison@inter-locale.com


   Yoshito Umaoka
   IBM

   Email: yoshito_umaoka@us.ibm.com

































Davis, et al.             Expires July 17, 2010                 [Page 7]