Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing

Note: This ballot was opened for revision 24 and is now closed.

(Spencer Dawkins) Yes

(Stephen Farrell) (was Discuss) Yes

Comment (2014-02-07)
No email
send info
Thank you for the additional extensive security considerations.

Barry Leiba Yes

(Pete Resnick) (was Discuss) Yes

Comment (2013-12-18 for -25)
No email
send info
Throughout the document (and the other documents in the series): I now understand that you intend a two stage parse for header fields and have that represented in the ABNF as a separate overall message syntax and a header field value syntax. That's fine, but I would ask that you make this clearer somewhere in section 3 of the p1 document. You talk about the parsing, but I think it is well worth describing that there are two levels of ABNF, and that the ABNF rule name corresponds to the header field name. It is fine to do it this way, but it's not the way that ABNF has been used in the past, so best to make it crystal clear.

Specific comments:


   This HTTP/1.1 specification obsoletes and moves to historic status
   RFC 2616, its predecessor RFC 2068, and RFC 2145 (on HTTP
Please, no, it doesn't (and shouldn't) move any of these documents to Historic (even if it were capitalized correctly ;-) ). It obsoletes them. Please strike "and moves to historic status". (I'm happy to give you the long explanation of why moving to Historic is not the right thing if you like.)

Also, an editorial nit: I find the "we" affectation distracting. Sounds like an academic paper. 11 occurrences in this document. "This document" or "this specification" (or simply switching to the passive voice) are much more IETF-like.

2.3 (Editorial nit) "...upstream to downstream. Likewise, we use...". It's not "Likewise" here. "Upstream" and "downstream" are only about the direction of the message and don't have anything to do with who sent/received it. "Inbound" and "outbound" refer to direction based on who it's coming from (UA or OS). Strike "Likewise".

2.5, para 8 (and ff): I'm not a fan of the "MUST..., unless..." construct. People get into stupid conformance arguments over such things. I prefer "MUST either..., or ..." or "SHOULD..., the primary exception being...".

2.7.1, para 6: Why "MAY"? What else could it do? Is this a protocol option of some sort?
para 7: The concept of "establishing authority" is not well explained here. What's the import of it?
para 8: Why "ought to"? That seems like a fine candidate for a "SHOULD": You're giving implementation advice to avoid damage.


   A proxy MUST remove any such whitespace from a
   response message before forwarding the message downstream.

Really? Wouldn't that cause the aforementioned "security vulnerability"?

   A field value is preceded by optional whitespace (OWS)...

"...and/or followed...", right?

3.2.6: This is your only use of the term "escape". A bit imprecise. I suggest reusing the quoted-pair text for quoted-cpair.

3.3.1: "encoding parameters MAY be provided by other header fields". I think MAY is wrong there. "Can"?


   A sender MUST NOT send a Content-Length header field in any message
   that contains a Transfer-Encoding header field.

Why not? Can there not ever be a Transfer-Encoding that has no implicit length? I read 3.3.3 sub 3 and I still don't get it.

4.1: I presume chunk-size can't be "0" even though the ABNF allows it?

4.1.1: quoted-string doesn't allow folding, does it? Why do you need a new quoted-str-nf?

5.5, first paragraph: Why do you have "MUST reconstruct" instead of "reconstructs", or simply reversing the sense of the whole paragraph and say, "An 'effective request URI' is a reconstruction of the user agent's original target URI"? I haven't found anything in the documents that says that effective request URIs are going to be passed as protocol parameters, but rather they are for local processing and comparison. Given that, the "MUST reconstruct" seems inappropriate.

5.7.1: s/The received-by field/The received-by token OR The received-by portion of the Via header field


   A non-transforming proxy MUST NOT modify the message payload (Section
   3.3 of [Part2]).  A transforming proxy MUST NOT modify the payload of
   a message that contains the no-transform cache-control directive.

I get the second sentence. But isn't the first a definition of a non-transforming proxy? Is so, I think you should change "MUST NOT" to "will not" or "does not".


   A server MAY assume that an HTTP/1.1 client intends to maintain a
   persistent connection until a close connection option is received in
   a request.
   Clients and servers SHOULD NOT assume that a persistent connection is
   maintained for HTTP versions less than 1.1 unless it is explicitly

I'm not sure how to implement the option/requirement of "assume". :-) What is it that you want/expect/permit the implementation to do/not do?

6.4: A "SHOULD" should not be used to "encourage" something. This seems like an utterly empty piece of normative text. "Be nice" without other guidance doesn't seem to lead to any useful interoperability.


   Thus, a sender MUST expand the list construct as follows:
   a recipient MUST expand the list construct as follows:
The two MUSTs here strike me as goofy. Implementations of senders and recipients do not "expand" ABNF rules; they produce and parse text. Saying things like the following would make sense to me:

   In any production that uses the list construct, a sender MUST NOT
   produce empty list elements. In other words, senders MUST produce
   lists that satisfy the following syntax: [...]
   In other words, a recipient MUST accept lists that satisfy the
   following syntax: [...]

(Jari Arkko) No Objection

Comment (2013-12-19 for -25)
No email
send info
Discussion with Meral's Gen-ART review comment seems to continue on some sub-items. Maybe worthwhile completing before sending of to RFC-Editor.

(Richard Barnes) No Objection

Comment (2013-12-18 for -25)
No email
send info
In general, this is a very nice introduction to the HTTP architecture.  Thanks!

The above categories ... are indistinguishable from a man-in-the-middle attack.
It seems worth noting that "captive portal" is not equivalent to the other two terms; it's a special case.  I would also expand the last sentence to explain a little more why the two are equivalent, and to clarify that the distinction is technical (since one could make moral distinctions between a MitM and a porn-filtering proxy):

OLD: "They are indistinguishable from a man-in-the-middle attack."
NEW: "Because these entities intercept and modify packets without the consent of either endpoint, these entities are indistinguishable at a protocol level from a man-in-the-middle attack."

A sender MUST NOT generate protocol elements that convey a
   meaning that is known by that sender to be false.
This seems optimistic.  

In Section 3.2.2, are the scare quotes around "good practice" necessary?

In Section 3.2.3, "to white-out invalid or unwanted protocol elements" -- what does it mean to "white out" protocol elements?  To replace them with whitespace?  Why not just remove them?

COMMENT 5 (almost a DISCUSS):
In Section 4.1.2: Suppose you have an intermediary that decodes the chunked encoding of an inbound message and generates a new message with known length (Content-Length present).  It seems like you need to specify what happens to trailer fields in this case.  The answer seems to be that they're just appended to the header, but AFAICT, that's not specified in the text.

(Stewart Bryant) No Objection

(Benoît Claise) No Objection

Comment (2013-12-17 for -25)
No email
send info
Thanks Tom Nadewu for your OPS-DIR review. I know how much you spent on this one!

I see the HEAD request, which I didn't know about:

   Transfer-Encoding MAY be sent in a response to a HEAD request or in a
   304 (Not Modified) response (Section 4.1 of [Part4]) to a GET

I was wondering: where is the list of valid HTTP operations defined?

The GET method is used to retrieve information from the given server using a given URI. Requests using GET should only retrieve data and should have no other effect on the data.
Same as GET, but only transfer the status line and header section.
A POST request is used to send data to the server, for example customer information, file upload etc using HTML forms.
Replace all current representations of the target resource with the uploaded content.
Remove all current representations of the target resource given by URI.
Establish a tunnel to the server identified by a given URI.
Describe the communication options for the target resource.
Perform a message loop-back test along the path to the target resource.

I finally found it (my mistake was that I was searching for "operation" in the document while the correct term is "method"):

   The request methods defined by this specification can be found in
   Section 4 of [Part2], along with information regarding the HTTP
   method registry and considerations for defining new methods.

This points to:


       4.3.  Method Definitions . . . . . . . . . . . . . . . . . . . 24
       4.3.1.  GET  . . . . . . . . . . . . . . . . . . . . . . . . . 24
       4.3.2.  HEAD . . . . . . . . . . . . . . . . . . . . . . . . . 25
       4.3.3.  POST . . . . . . . . . . . . . . . . . . . . . . . . . 25
       4.3.4.  PUT  . . . . . . . . . . . . . . . . . . . . . . . . . 26
       4.3.5.  DELETE . . . . . . . . . . . . . . . . . . . . . . . . 29
       4.3.6.  CONNECT  . . . . . . . . . . . . . . . . . . . . . . . 30
       4.3.7.  OPTIONS  . . . . . . . . . . . . . . . . . . . . . . . 31
       4.3.8.  TRACE  . . . . . . . . . . . . . . . . . . . . . . . . 32

Anyway, an extra sentence, such as the following one, would have helped me:
    "Existing methods are GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE"

Also change "HEAD request" to "HEAD request method. Similar remark for "GET request"

(Adrian Farrel) No Objection

(Brian Haberman) No Objection

(Joel Jaeggli) No Objection

Comment (2013-12-16 for -25)
No email
send info
to ops-dir review noted (consistent with respect to usage in the document) but otherwise inconsistent employment of the term coding vs encoding in http 1.0 vs 1.1 vs here. I guess it's to much to ask that the name of the header field and the term of art employed to describe it be consistent.

(Ted Lemon) (was Discuss) No Objection

Comment (2014-02-01 for -25)
No email
send info
In 2.7.3:

   Characters other
   than those in the "reserved" set are equivalent to their percent-
   encoded octets (see [RFC3986], Section 2.1): the normal form is to
   not encode them.

There's no explicit reference for the definition of a reserved set; this could be easily fixed thusly:

   Characters other
   than those in the "reserved" set  (see [RFC3986], Section 2.2) 
   are equivalent to their percent-
   encoded octets (see [RFC3986], Section 2.1): the normal form is to
   not encode them.

Given that they follow each other, the reader will probably find the information either way, but it might be better to include both references.

Section 3, Page 20, second paragraph:

   A recipient MUST parse an HTTP message as a sequence of octets in an
   encoding that is a superset of US-ASCII [USASCII].  Parsing an HTTP
   message as a stream of Unicode characters, without regard for the
   specific encoding, creates security vulnerabilities due to the
   varying ways that string processing libraries handle invalid
   multibyte character sequences that contain the octet LF (%x0A).

I don't understand what this means.   I think I can guess what it means, but that's probably dangerous. What I think it means is that my reader should process the stream as UTF-8, storing it in a normalized Unicode format, either failing to process the request or doing something "sensible" when bad UTF-8 sequences are encountered, and the normalized Unicode should then be passed to the parser that parses the header lines. Is that roughly what's meant? If so, I think it could be more clearly stated.

The list rule exception mentioned in section 1.2 confused the hell out of me until I got to section 7.   Why is section 7 not a subsection of section 2?   I assume the answer is "because it's long, and would suck the wind out of the document if it were at the beginning," which is fine, but if so, it would be nice if the text in 1.2 did a *bit* more foreshadowing.   E.g.:

   This specification uses the Augmented Backus-Naur Form (ABNF)
   notation of [RFC5234] with an extension defined in
   Section 7 that adds compact support for comma-separated lists
   with the addition of a # token to the usual ABNF token set, similar
   to the * token.  Appendix B shows the collected ABNF with the list
   rule expanded.

BTW, none of the hassling I have done here in these DISCUSSes and comments should be construed as a lack of enthusiasm for this document.   It's really obvious that a lot of care went into this document—I'm seeing all kinds of really good advice based on practice in terms of how not to do an implementation that will be vulnerable to a variety of issues.   I am very enthusiastic about this document.   Thank you very much for doing it.

Former DISCUSSes, which have been addressed:

Point 1:

In 2.7.1, end of last paragraph:
   Before making use of an "http" URI
   reference received from an untrusted source, a recipient ought to
   parse for userinfo and treat its presence as an error; it is likely
   being used to obscure the authority for the sake of phishing attacks.

Why no normative language here?   I'm assuming this was deliberate, but it seems like the wrong call.   Why not propose that the recipient reject this out of hand, unless there's some strong reason not to?    I expect that you will explain why you made this decision and it will make sense to me, in which case that will resolve this DISCUSS point; otherwise, changing "ought to" to "SHOULD" would also satisfy.   The referenced section of RFC 3986 has some good text on why this is important, but this document doesn't repeat much of it, so I'm concerned that a new reader wouldn't really get the significance of this advice.

Point 2:

In 5.5, suppose I connect to foo.example.org on port 80, and send the following:
  GET / HTTP/1.1
  Host: foo.example.org:8080

This produces an effective URI of http://foo.example.org:8080/.   What is the server supposed to do at this point?
The obvious way to resolve this DISCUSS point is to update the text to address this problem.   I think this example has the same property that leads you to require a 301 or 400 status in section 3.1.1.

Point 3:

In 3.2.4, paragraph 1:
   server MUST reject any received request message that contains
   whitespace between a header field-name and colon with a response code
   of 400 (Bad Request).  A proxy MUST remove any such whitespace from a
   response message before forwarding the message downstream.

Why the different handling in the two cases?   Is it really less bad (and hence salvageable in the response?   What if a user agent receives such whitespace?   I expect you'll address this point by explaining why this is an issue in requests and not in responses, or else by at least adding text about how user agents should deal with this situation.   I am asking this question based on the inconsistency I see here, not any special insight I have into the problem, so I'm assuming there's a straightforward explanation.

(Martin Stiemerling) No Objection

(Sean Turner) (was Discuss) No Objection

Comment (2013-12-19 for -25)
No email
send info
Caveat: I know this is a bis draft but since you hacked it up for clarity, I figured I'd give you both barrels when reading it (i.e., it goes to "11" on the nits scale).  With that said, I would not like to see any of my comments hold progression of this draft up for a microsecond.  Feel free to consider these *if* you're making other changes before progressing to Approved or during AUTH48.

*) Support Stephen's discuss.

0) abstract: The WWW global initiative is a reference to this: http://www.w3.org/Summary.html , which hasn't been updated since ~1991/2?  Maybe we can drop the reference to that and just say:

  HTTP has been very widely used since 1990.
  HTTP is the foundation of [this thing you might of heard of called]
  the World Wide Web architecture.

2) Abstract & s1: to match s2.1:
    r/an application-level request/response protocol
     /an application-level stateless request/response protocol

3) (no action required) Thanks for the collected ABNF in Appendix B.

4) s2.1: What's the difference between a native application and a mobile app?  Isn't a mobile app on a mobile phone a native application for that mobile phone?

5) s2.3: Maybe worth explaining what a public network access points might by adding: (e.g., accessing the Internet from a hotel).

6) s2.3: Mentions proxies are done through a local configuration rules: Should we note these might be set by an administrator and that users should be aware of these settings?

7) s2.3: Would it be better to say proprietary:
    r/Some non-standard HTTP extensions (e.g., [RFC4559])
     /Some proprietary HTTP extensions (e.g., [RFC4559])

8) s2.6: Shouldn't we be future proofing this protocol to address the two digit version bug :)  Never mind I got to A.2 and find out folks can't handle two digit versions consistently.

9) s2.7: Does there need to be a statement that all entities MUST support URIs as defined in RFC 3986?  There's some language earlier about relying upon URIs etc. but there isn't a specific MUST support.

A) s2.7: Maybe add the following before the list:

   The following provide references for the URI syntax used in this document:

B) s2.7.1, 2nd para: r/optional/OPTIONAL in reference to the query.  It would make the text match the ABNF syntax ;)

C) s2.7.1, 2nd para last sentence: Mentions the path and query component but omits the fragment component - but the fragment is in the http-URI exhibit above.  Maybe worth including in the sentence for completeness.

D) s2.7.1: Please expand WWW on first use ;)
  [Actually, I'd ask the RFC editor to include it in there list of abbreviations:
  http://www.rfc-editor.org/rfc-style-guide/abbrev.expansion.txt so that no one
  will ever see this comment ever again.]

E) s2.7.1: Is Internet Name = registered or domain names? and is Internet address = IP number?  Those terms are used later so maybe:

    its registered name or IP address

   "http" URI scheme makes use of the delegated nature of Internet domain names
   and IP addresses to establish a naming authority (whatever entity has
   the ability to place an HTTP server at that Internet domain name or IP address)
   and allows that authority to determine which domain names are valid and how
   they might be used.

Then again this might all of been carefully crafter to avoid some long running debate that I am unaware of in which case this should be ignored.

F) 2.7.2: Personally, I'd drop the [RFC0793] reference for the TCP port, it's already in the http schema and you reference that scheme from this scheme.

10) s3, 1st para: r/optional/OPTIONAL - would make the text match the ABNF.

11) s3.1.2: If clients are going to be ignoring the reason-phrase, should p1 or p2 say something about not emitting it?  I mean what with all the need for speed from web search engines/browsers shouldn't we be trying to not send stuff that's going to promptly be ignored?

12) s3.2: r/optional/OPTIONAL x2 - would make the text match the ABNF

13) s3.2.3: should optional and required be replaced by their 2119 keywords?

14) s3.3.1: (see #2) What does a client do if the server chunks more than once, if a server sends a Transfer-Encoding header when it shouldn't have?

15) s4.1.2: "The above requirement" is a little vague is that the MUST immediately preceding the last paragraph?  Maybe:

  The requirement to generate an empty trailer prevents .....

16) s4.1.3: Pseudo-code needs error conditions for handing the MUST NOTs ;)

17) s4.2.2: r/incorrect/non-standard or non-conformant

18) s5.3: Is the origin-form before the 2nd paragraph supposed to be there?  Oh wait those are supposed to be subsections?  Can't you just call it 5.3.1 origin-form, 5.3.2 absolute-form, etc.?

19) s5.7.2: might be worth putting a reference in to Part6 after the first use of non-transform cache-control ... granted I did figure out where it was defined after remembering the 

1A) s5.7.2 and s2.3: s2.3 mentions privacy proxies and s5.7.2 says the following about proxies without qualifying the type of proxy:

  A proxy MUST NOT modify header fields that provide information about
  the end points of the communication chain, the resource state, or the
  selected representation.

So does that essentially mean privacy filters proxies are non-conformant?

1B) s6.7: OPTIONAL?:

   its acceptance
   and use by the server is optional

1C) Seems like you should just provide the form.  I'm wondering whether the POC includes an actually method of contact or not?  Having seen this done in the past, it's probably worth being pedantic and saying that they can change the registration but they need to tell IANA they're doing so.