Network Working Group                                    O. Mazahir
Internet Draft                                            D. Thaler
Intended status: Standards Track                             M. Cox
Expires: August 2014                                  G. Montenegro
                                              Microsoft Corporation
                                                   14 February 2014



                       Deterministic URI Encoding
                draft-montenegro-httpbis-uri-encoding-00


Abstract

   The "http" and "https" URI schemes do not have a fixed character
   encoding. This document defines HTTP headers to enable an
   explicit indication of the character encoding.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance
   with the provisions of BCP 78 and BCP 79. This document may
   contain material from IETF Documents or IETF Contributions
   published or made publicly available before November 10, 2008.
   The person(s) controlling the copyright in some of this material
   may not have granted the IETF Trust the right to allow
   modifications of such material outside the IETF Standards
   Process.  Without obtaining an adequate license from the
   person(s) controlling the copyright in such materials, this
   document may not be modified outside the IETF Standards Process,
   and derivative works of it may not be created outside the IETF
   Standards Process, except to format it for publication as an RFC
   or to translate it into languages other than English.

   Internet-Drafts are working documents of the Internet
   Engineering Task Force (IETF), its areas, and its working
   groups.  Note that other groups may also distribute working
   documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet-
   Drafts as reference material or to cite them other than as "work
   in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.


Mazahir, et. al.                                           [Page 1]


Internet-Draft        Deterministic URI Encoding      February 2014


   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on August, 2014.

Copyright

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document. Code Components extracted from this
   document must include Simplified BSD License text as described
   in Section 4.e of the Trust Legal Provisions and are provided
   without warranty as described in the Simplified BSD License.

Table of Contents


   1. Introduction...................................................2
      1.1. Requirements Language.....................................3
   2. URI Path and Query Encoding Headers............................3
   3. IANA Considerations............................................4
      3.1. URI-Path-Encoding.........................................4
      3.2. URI-Query-Encoding........................................4
   4. Security Considerations........................................5
   5. Acknowledgments................................................5
   6. References.....................................................5
      6.1. Normative References......................................5
      6.2. Informative References....................................5
   7. Author's Addresses.............................................6

1. Introduction

   The "http" and "https" URI schemes don't have a fixed character
   encoding. The URI RFC [RFC3986] talks about the generic syntax
   for URI components:

   . Legacy URI components (before 2005) tend to use UTF-8 "or
     some other superset of the US-ASCII character encoding"
   . New schemes (after 2005) use UTF-8 with percent encoding for
     reserved characters.

   The first bullet explains why the character encoding for "http"
   and "https" URIs is not deterministic. This is particularly
Mazahir, et. al.                                           [Page 2]


Internet-Draft        Deterministic URI Encoding      February 2014


   problematic when parsing URIs at the server side or at
   intermediate proxies (e.g., when looking for a cache hit).

   URI's have different components with different character
   encoding issues.

   Per the IDNA rules in [RFC5890], the host component is encoded
   using A-labels.

   There is more non-determinism with respect to the path and query
   components. Furthermore, these two components are not
   necessarily encoded the same way [Handbook].

   This document defines HTTP headers that explicitly state the
   character encoding for the path and query components.

1.1. Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
   NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described
   in RFC 2119 [RFC2119].



2. URI Path and Query Encoding Headers

   The URI Path encoding is conveyed in the following header:

      URI-Path-Encoding   = "URI-Path-Encoding" ":" 1charset

   The URI Query encoding is conveyed in the following header:

      URI-Query-Encoding   = "URI-Query-Encoding" ":" 1charset

   charset is defined in section 3.4 of [RFC2616]. The expected value
   indicates the character encoding for the path or query component in
   the URI prior to percent encoding. (A value of UTF-8 does not mean
   that the URI carries raw UTF-8.)

   If the user agent is certain that the path component was formed from
   percent-encoded UTF-8, it sets the header as follows:

      URI-Path-Encoding: UTF-8

   Similarly, for the query component:

      URI-Query-Encoding: UTF-8

Mazahir, et. al.                                           [Page 3]


Internet-Draft        Deterministic URI Encoding      February 2014


   This signals that the query component in the URI is in UTF-8 with
   percent encoding.

   Absence of the URI-Path-Encoding or URI-Query-Encoding header is
   equivalent to the legacy situation of non-determinism with respect
   to the path or query component, respectively, as mentioned above in
   section 1.

   Likewise, if the URI-Path-Encoding or URI-Query-Encoding header is
   set to an invalid value or unrecognized charset, this is equivalent
   to the legacy situation of non-determinism with respect to the path
   or query component, respectively, mentioned above in section 1.



3. IANA Considerations

   IANA is requested to add these headers to the "Permanent Message
   Header Field Names" registry. Per [RFC3864], the template for
   these headers is specified below.



3.1. URI-Path-Encoding

      Applicable protocol: http

      Status: standard

      Author/change controller:

         IETF (iesg@ietf.org)

      Specification document(s):

         This document.



3.2. URI-Query-Encoding

      Applicable protocol: http

      Status: standard

      Author/change controller:

         IETF (iesg@ietf.org)

Mazahir, et. al.                                           [Page 4]


Internet-Draft        Deterministic URI Encoding      February 2014


      Specification document(s):

         This document.



4. Security Considerations

   Due to the non-deterministic character encoding of URI's, URI
   parsing at servers or proxies currently may involve trying
   different possible character encodings searching for a match.
   This represents a potential attack vector [RFC6943]. The headers
   proposed in this document could be used to reduce the attack
   surface by enabling a more explicit interpretation of the data
   within a URI, thus preventing unintended consequences.



5. Acknowledgments

   Thanks to Ivan Pashov and Wade Hilmo for useful discussions in
   this space.

   This document was prepared using 2-Word-v2.0.template.doc.



6. References

6.1. Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter,
              "Uniform Resource Identifier (URI): Generic Syntax",
              STD 66, RFC 3986, January 2005.


6.2. Informative References

   [Handbook] Zalewski, M., "Browser Security Handbook, part 1",
              http://code.google.com/p/browsersec/wiki/Part1
Mazahir, et. al.                                           [Page 5]


Internet-Draft        Deterministic URI Encoding      February 2014


              March 2011.


   [RFC3864]  Klyne, G., Nottingham, M., and J. Mogul, "Registration
              Procedures for Message Header Fields", BCP 90, RFC 3864,
              September 2004.

   [RFC5890]  Klensin, J., "Internationalized Domain Names for
              Applications (IDNA): Definitions and Document Framework",
              RFC 5890, August 2010.

   [RFC6943]  Thaler, D., Ed., "Issues in Identifier Comparison for
              Security Purposes", RFC 6943, May 2013.


7. Author's Addresses

   Osama Mazahir
   Microsoft Corporation

   Email: OsamaM@microsoft.com


   Dave Thaler
   Microsoft Corporation

   Email: DThaler@microsoft.com


   Matthew Cox
   Microsoft Corporation

   Email: MaCox@microsoft.com


   Gabriel Montenegro
   Microsoft Corporation

   Phone:
   Email: gabriel.montenegro@microsoft.com







Mazahir, et. al.                                           [Page 6]