Skip to main content

JavaScript Object Notation (JSON) Text Sequences
draft-ietf-json-text-sequence-01

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft that was ultimately published as RFC 7464.
Author Nicolás Williams
Last updated 2014-05-08
RFC stream Internet Engineering Task Force (IETF)
Formats
Reviews
Additional resources Mailing list discussion
Stream WG state WG Document
Document shepherd Paul E. Hoffman
IESG IESG state Became RFC 7464 (Proposed Standard)
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-ietf-json-text-sequence-01
json                                                         N. Williams
Internet-Draft                                              Cryptonector
Intended status: Standards Track                             May 8, 2014
Expires: November 9, 2014

            JavaScript Object Notation (JSON) Text Sequences
                    draft-ietf-json-text-sequence-01

Abstract

   This document describes the JSON text sequence format and associated
   media type.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on November 9, 2014.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Williams                Expires November 9, 2014                [Page 1]
Internet-Draft             JSON Text Sequences                  May 2014

Table of Contents

   1.    Introduction and Motivation  . . . . . . . . . . . . . . . .  3
   1.1.  Conventions used in this document  . . . . . . . . . . . . .  3
   2.    JSON Text Sequence Format  . . . . . . . . . . . . . . . . .  4
   3.    Use for Logfiles, or How to Resynchronize Following
         Truncated entries  . . . . . . . . . . . . . . . . . . . . .  5
   4.    Security Considerations  . . . . . . . . . . . . . . . . . .  6
   5.    IANA Considerations  . . . . . . . . . . . . . . . . . . . .  7
   6.    Acknowledgements . . . . . . . . . . . . . . . . . . . . . .  8
   7.    Normative References . . . . . . . . . . . . . . . . . . . .  9
         Author's Address . . . . . . . . . . . . . . . . . . . . . . 10

Williams                Expires November 9, 2014                [Page 2]
Internet-Draft             JSON Text Sequences                  May 2014

1.  Introduction and Motivation

   The JavaScript Object Notation (JSON) [RFC7159] is a very handy
   serialization format.  However, when serializing a large sequence of
   values as an array, or a possibly indeterminate-length or never-
   ending sequence of values, JSON becomes difficult to work with.

   Consider a sequence of one million values, each possibly 1 kilobyte
   when encoded, which would be roughly one gigabyte.  If processing
   such a dataset requires first parsing it entirely, then the result is
   very inefficient and the processing will be limited by virtual
   memory.  "Online" (a.k.a., "streaming") parsers help, but they are
   neither widely available or widely used, nor are they easy to use.

   Ideally such datasets could be parsed and processed one element at a
   time.  Even if each element must be parsed in a not-online manner due
   to local choice of parser, the result will usually be sufficiently
   online: limited by the size of the biggest element in the sequence
   rather than by the size of the sequence.

   This document describes the concept and format of "JSON text
   sequences", which are specifically not JSON texts themselves but are
   composed of JSON texts.

1.1.  Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

Williams                Expires November 9, 2014                [Page 3]
Internet-Draft             JSON Text Sequences                  May 2014

2.  JSON Text Sequence Format

   The ABNF [RFC5234] for the JSON text sequence format is as follows:

     JSON-sequence = *whitespace *(JSON-text 1*whitespace)
     whitespace = %x20 / %x09 / %x0A / %x0D
     JSON-text = <given by RFC7159>

                     Figure 1: JSON text sequence ABNF

   A JSON text sequence is a sequence of zero or more JSON texts, each
   followed by JSON whitespace separator.

   Requirements:

   o  JSON text sequence encoders MUST emit one or more JSON whitespace
      separator characters immediately after any JSON text.

   o  JSON text sequence parsers MUST NOT interpret any sequence of two
      or more contiguous whitespace as a sequence of empty JSON texts.
      Two contiguous separators do not denote an empty JSON text between
      them as there is no such thing as an empty JSON text.

   An input of 'truefalse' is not a valid sequence of two JSON values,
   true and false!  Neither is 'true0' a valid sequence of true and
   zero.  Some existing JSON parsers that might be used to construct
   sequence parsers might in fact accept such sequences, resulting in
   erroneous parsing of sequences of two or more numbers.  E.g., a
   sequence of two numbers, 4 and 2, encoded without the required
   whitespace between them would parse incorrectly as the number 42.
   This ambiguity is resolved by requiring that encoders never omit the
   separator.

Williams                Expires November 9, 2014                [Page 4]
Internet-Draft             JSON Text Sequences                  May 2014

3.  Use for Logfiles, or How to Resynchronize Following Truncated
    entries

   The JSON Text Sequence format is perfect for use with logfiles, as
   those are generally (and atomically) appended to on an ongoing basis.
   I.e., logfiles are of indeterminate length, at least right up until
   they closed.

   A problem comes up with this use case: it is difficult to guarantee
   that append writes will complete.  Therefore it's possible (if
   unlikely) to end up with truncated log entries -which may fail to
   parse as JSON texts- followed by other entries.  The mechanics of
   such failures are not explained here (consider power failures
   though).

   Fortunately, as long as all texts in the logfile sequence are
   followed by a newline, it is possible to detect a subsequent entry
   written after an entry that fails to parse.  Figure 2 shows an ABNF
   rule for detecting the boundary between a non-truncated [and some
   truncated] JSON text and the next JSON text in a sequence.

    boundary = endchar *whitespace NL *whitespace startchar
    endchar = ( "}" / "]" / %x22 / "e" / "l" / DIGIT )
    startchar =  ( "{" / "[" / %x22 / "t" / "f" / "n" / "-" / DIGIT )

                   Figure 2: ABNF for resynchronization

   To resynchronize after failing to parse a JSON text, simply search
   for a boundary as described in figure 2.  A boundary found this way
   might be the boundary between the truncated entry and the subsequent
   entry, or it might be a subsequent boundary.

   Applications SHOULD scan backwards (up to the start of the incomplete
   text) from such a boundary looking for a newline followed by a valid
   JSON text, otherwise valid entries following truncated entries can be
   missed by this rule.

   Note that in order to enable resynchronization all JSON texts
   appended to a logfile MUST be followed by a newline.

Williams                Expires November 9, 2014                [Page 5]
Internet-Draft             JSON Text Sequences                  May 2014

4.  Security Considerations

   All the security considerations of JSON [RFC7159] apply.

   There is no end of sequence indicator.  This means that "end of
   file", "end of transmission", and so on, can be indistinguishable
   form a logical end of sequence.  Applications where this matters
   should denote end of sequence by convention (e.g., Content-Length in
   HTTP).

   JSON text sequence parsers based on non-incremental, non-online JSON
   text parsers will not be able to efficiently parser JSON texts in
   which newlines appear; attempting to parse such sequences with non-
   incremental, non-online JSON text parsers creates a compute resource
   exhaustion vulnerability.

   The first requirement given in Section 2 (otherwise-ambiguous JSON
   texts must be separated by whitespace) is critical and must be
   adhered to.  It is best to always emit a whitespace separator after
   every JSON text emitted.

   Purposefully appending a truncated (or invalid) JSON text to a JSON
   text sequence logfile can cause the subsequent entry to be ignored by
   tooling that does not scan backwards from resynchronization
   boundaries looking for otherwise missed complete JSON texts.

Williams                Expires November 9, 2014                [Page 6]
Internet-Draft             JSON Text Sequences                  May 2014

5.  IANA Considerations

   The MIME media type for JSON text sequences is application/json-seq.

   Type name: application

   Subtype name: json-seq

   Required parameters: n/a

   Optional parameters: n/a

   Encoding considerations: binary

   Security considerations: See <this document, once published>,
   Section 4.

   Interoperability considerations: Described herein.

   Published specification: <this document, once published>.

   Applications that use this media type: JSON text sequences have been
   used in applications written with the jq programming language.

Williams                Expires November 9, 2014                [Page 7]
Internet-Draft             JSON Text Sequences                  May 2014

6.  Acknowledgements

   Phillip Hallam-Baker proposed the use of JSON text sequences for
   logfiles and pointed out the need for resynchronization.  James
   Manger contributed the ABNF for resynchronization.

Williams                Expires November 9, 2014                [Page 8]
Internet-Draft             JSON Text Sequences                  May 2014

7.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", STD 68, RFC 5234, January 2008.

   [RFC7159]  Bray, T., "The JavaScript Object Notation (JSON) Data
              Interchange Format", RFC 7159, March 2014.

Williams                Expires November 9, 2014                [Page 9]
Internet-Draft             JSON Text Sequences                  May 2014

Author's Address

   Nicolas Williams
   Cryptonector, LLC

   Email: nico@cryptonector.com

Williams                Expires November 9, 2014               [Page 10]