Last Call Review of draft-ietf-json-text-sequence-09
review-ietf-json-text-sequence-09-genart-lc-black-2014-12-05-00

Request Review of draft-ietf-json-text-sequence
Requested rev. no specific revision (document currently at 13)
Type Last Call Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2014-12-05
Requested 2014-11-28
Authors Nicolás Williams
Draft last updated 2014-12-05
Completed reviews Genart Last Call review of -09 by David Black (diff)
Genart Last Call review of -11 by David Black (diff)
Secdir Last Call review of -09 by Carl Wallace (diff)
Opsdir Last Call review of -09 by David Black (diff)
Opsdir Telechat review of -11 by David Black (diff)
Assignment Reviewer David Black
State Completed
Review review-ietf-json-text-sequence-09-genart-lc-black-2014-12-05
Reviewed rev. 09 (document currently at 13)
Review result On the Right Track
Review completed: 2014-12-05

Review
review-ietf-json-text-sequence-09-genart-lc-black-2014-12-05

This is a combined Gen-ART and OPS-Dir review.  Boilerplate for both follows ...

I am the assigned Gen-ART reviewer for this draft. For background on
Gen-ART, please see the FAQ at:

<

http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other Last Call comments
you may receive.

I have reviewed this document as part of the Operational directorate's ongoing
effort to review all IETF documents being processed by the IESG.  These comments
were written primarily for the benefit of the operational area directors.
Document editors and WG chairs should treat these comments just like any other
last call comments.

Document: draft-ietf-json-text-sequence-09
Reviewer: David Black
Review Date: Dec 5, 2014
IETF LC End Date: Dec 5, 2014
IESG Telechat date: Dec 18, 2014

Summary: This draft is on the right track, but has open issues
 		described in the review.

This draft specifies a format that packs multiple JSON texts into a
single string.  The ASCII RS (0x1E) character is used to separate texts,
and a linefeed is appended to each text to ensure that a complete text
always ends with a whitespace character.

All of the open issues are minor - the most important ones center on
treatment of incomplete JSON texts - that appears to be an afterthought
in this draft and needs more attention.  I also found a couple of
minor issues in the Security and IANA Considerations sections, both of
which are almost nits.

Major issues: None.

Minor issues:

[A] Section 2.1:

   If parsing of such an octet string as a JSON text fails, the parser
   SHOULD nonetheless continue parsing the remainder of the sequence.

That's not quite right - there are two levels of parsing, JSON
sequence parsing and JSON text parsing of each text in the sequence,
both of which might be implemented in a single-pass parser.  For such an
implementation, the above sentence could be (mis-)read to imply that the
JSON text parse should resume from the point at which it failed, which
would be silly (although I've seen heroic PL/1 parsers do exactly that).
Instead, the parse needs to skip ahead to the next RS, ignoring the rest
of the JSON text that failed to parse.  I suggest:

   If parsing of such an octet string as a JSON text fails, and the
   octet string is followed by an RS octet, the parser
   SHOULD nonetheless skip ahead to that RS octet and continue parsing
   the remainder of the sequence from there.

That also covers the case where there is nothing more to parse after the
JSON text that caused the parse failure.

[B] Section 2.3:

Is incremental parsing of a JSON text within a sequence allowed, or
is the parser required to not produce any results until the parse of
the entire text is successful?  I'd expect that incremental parsing
is ok (so results may be produced from a text that ultimately fails
to parse), and I think that's worth stating.

[C] Section 2.4:

   Parsers MUST check that any JSON texts that are a top-level number
   include JSON whitespace ("ws" ABNF rule from [RFC7159]) after the
   number, otherwise the JSON-text may have been truncated.

That reference to the "ws" rule doesn't get the job done because that
rule allows a match to no characters - it's of the form ws = *( ... )
where ... is the list of whitespace characters.  What's needed here is
a rule of the form vws = 1*( ...) to force there to be at least one
whitespace character, but see the next issue for a better way to deal
with this topic by pulling the appended LF into the sequence parse
instead of the text parse.

[D] I wonder whether the possibility of incomplete texts ought to be
encoded into the parsing rules to directly catch JSON texts that must
be incomplete because the last character is not LF, e.g.:

     JSON-sequence = *(1*RS (possible-JSON / truncated-JSON / empty-JSON))
     RS = %x1E; "record separator" (RS), see RFC20
     possible-JSON = 1*(not-RS) LF ; attempt to parse as UTF-8-encoded
                               ; JSON text (see RFC7159)
     truncated-JSON = *(not-RS) not-LFRS); truncated, don't attempt
					; to parse as JSON text
     empty-JSON = LF ; only the LF appended by the encoder, nothing to parse

     not-RS = %x00-1D / %x1F-FF; any octet other than RS
     not-LFRS = %x00-09/ %x1B-1D / %x1F-FF; any octet other than RS or LF

Note that this won't detect all incomplete JSON texts, because LF is allowed
within a JSON text (and this should be stated).

[E] Section 3 - Security Considerations

Incomplete and malformed JSON texts can be used to attack JSON parsers -
that should be pointed out, as I don't see that in RFC 7159's security
considerations and incomplete texts are a relevant consideration for
this draft.

[F] Section 4 - IANA Considerations

   Security considerations: See <this document, once published>,
   Section 3.

   Interoperability considerations: Described herein.

   Published specification: <this document, once published>.

   Applications that use this media type: <by publication time
   <

https://stedolan.github.io/jq

> is likely to support this format>.

Replace all three instances of the angle bracketed text.  The first two
instances should be RFC references (e.g., RFC XXXX) w/a note to the RFC
Editor to insert the number of the RFC when published.  The third instance
should be resolved now, or could have an RFC Editor note added indicating
that the author will resolve that during Authors 48 hours.

Nits/editorial comments:

idnits didn't like the reference to RFC 20 for ASCII:

  ** Downref: Normative reference to an Unknown state RFC: RFC   20

RFC 5234 (ABNF) uses this, which looks like a better reference:

   [US-ASCII]  American National Standards Institute, "Coded Character
               Set -- 7-bit American Standard Code for Information
               Interchange", ANSI X3.4, 1986.

--- Selected RFC 5706 Appendix A Q&A for OPS-Dir review ---

Most of these questions are n/a because this draft describes a format
that will be used in other protocols to which RFC 5706's concerns would apply.

A.1.4   Have the Requirements on other protocols and functional
       components been discussed?

The specification of the interaction of the JSON sequence parser with the
JSON text parser is not as clear as it should be for incomplete or malformed
JSON texts.  See Minor Issues [A]-[E] above.

A.1.8   Are there fault or threshold conditions that should be reported?

Yes, incomplete JSON texts - this is covered in sections 2.3 and 2.4. 

Thanks,
--David
----------------------------------------------------
David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
david.black at emc.com        Mobile: +1 (978) 394-7754
----------------------------------------------------