Concise Binary Object Representation (CBOR)
draft-ietf-cbor-7049bis-16

Note: This ballot was opened for revision 14 and is now closed.

Benjamin Kaduk (was Discuss) Yes

Comment (2020-09-27 for -15)
Just one nit left in the -15:

s/also MUST have matching streaming security mechanism/also MUST have a matching streaming security mechanism/

Erik Kline Yes

Comment (2020-09-08 for -14)
[[ questions ]]

[ section 3.3 ]

* Is it worth comparing and contrasting this encoding format with RFC 4506
  section 4.6?  Are they identical?


[[ comments ]]

[ section 1 ]

* I suppose XDR (4506) isn't well-known anymore.  :-(
  (no edits necessary, just a comment)


[[ nits ]]

[ section 1.2 ]

* "does not include following extraneous data"
  Is "following" important, or is it just "does not include other
  extraneous data"?

[ section 3.4.1 ]

* Perhaps "another type or that" -> "another type or a text string that"

[ section 5.6 ]

* Perhaps "Not accept maps duplicate keys"
  -> "Not accept maps with duplicate keys"?

Martin Duke No Objection

Murray Kucherawy No Objection

Comment (2020-09-10 for -14)
No email
send info
I regret that due to time constraints this week, I only reviewed the diff against RFC7049.  But what I saw there looked good to me.

Robert Wilton No Objection

Comment (2020-09-09 for -14)
Hi,

Thank you for your work on this document, and bringing this to full standard.  Since I'm a big fan of CBOR and try to evangelize it whenever possible I'm please to see this happening.

However, I have one minor annoyance with CBOR, which is the range of negative numbers that are encoded in major type 1.  My gripe is that the encoding allows for negative integers that are not easily representable in a simple form in most programming languages without using something equivalent to BigInteger.

E.g., all values below -2^63 won't fit into a int64 type, and the value 2^64 won't even fit into an uint64 that was used to represent a negative number (obviously unless it followed the CBOR encoding semantics of being offset by 1)

For a generic decoder I presume that this isn't an issue since it can fallback to something like BigInteger.  But for other decoders handling normal sized integer datatypes I would presume that they would effectively presumably regard anything smaller than -2^63 as not well-formed for their specific problem domain.

I'm not suggesting that this should be changed (hence comment not a discuss), but there are a couple of places in the document that it might be helpful to warn implementors about this, that I have mentioned below.

Other minor comments:

    3.  Specification of the CBOR Encoding

       Major type 0:  an integer in the range 0..2**64-1 inclusive.  The
          value of the encoded item is the argument itself.  For example,
          the integer 10 is denoted as the one byte 0b000_01010 (major type
          0, additional information 10).  The integer 500 would be
          0b000_11001 (major type 0, additional information 25) followed by
          the two bytes 0x01f4, which is 500 in decimal.

       Major type 1:  a negative integer in the range -2**64..-1 inclusive.
          The value of the item is -1 minus the argument.  For example, the
          integer -500 would be 0b001_11001 (major type 1, additional
          information 25) followed by the two bytes 0x01f3, which is 499 in
          decimal.
      
Would writing "0 to 2**64-1" be more clear than 0..2**64-1?  Or otherwise perhaps mention that in the terminology section that "x..y" is used to represent an inclusive range set of all values from x to y, including x and y.  Also, noting that here where ".." has been used it explicit states that it is inclusive, but that doesn't appear to be the case everywhere.

I suggest changing "Major type 0:  an integer ..." back to "Major type 0:  an unsigned integer", as in RFC7049, because the type is referred to as "Unsigned integer".  It also makes it more consistent with the definition of Major type 1.


    3.2.1.  The "break" Stop Code

       The "break" stop code is encoded with major type 7 and additional
       information value 31 (0b111_11111).  It is not itself a data item: it
       is just a syntactic feature to close an indefinite-length item.

       If the "break" stop code appears anywhere where a data item is
       expected, other than directly inside an indefinite-length string,
       array, or map -- for example directly inside a definite-length array
       or map -- the enclosing item is not well-formed.
       
I was wondering whether it would be helpful to clarify that by indefinite-length string it means text or byte string?  Although this becomes clear in section 3.2.3 anyway ...  My thinking is that section 3.2 lists 4 types that can have indefinite length, and then in this section both types are string are treated together. 

    3.2.3.  Indefinite-Length Byte Strings and Text Strings

Would it be helpful to clarify that the chunks must be the same type.  E.g. you cannot have a byte string that contains text string chunks and vice-versa?

    3.4.5.2.  Expected Later Encoding for CBOR-to-JSON Converters

"Tags number 21 to 23 ..." => "Tag numbers 21 to 23 ..."
       
       
    4.2.1.  Core Deterministic Encoding Requirements

          Floating-point values also MUST use the shortest form that
          preserves the value, e.g. 1.5 is encoded as 0xf93e00 and 1000000.5
          as 0xfa49742408.  (One implementation of this is to have all
          floats start as a 64-bit float, then do a test conversion to a
          32-bit float; if the result is the same numeric value, use the

I find this paragraph slightly opaque, and I would suggest spelling out that 1.5 has been encoded as a 16 bit IEEE float, whereas 1.00000005 has been encoded as a 32 bit IEEE float.  The same comment applies to 4.2.2 as well.
 
I also noticed that in most places the document refers to "floating-point" but in a few places "floating point" is used instead.


    5.5.  Numbers

As per my top comment, I think that it would be useful to raise in this section that CBOR can encode negative values that cannot normally be represented in a compact form.
  

    6.1.  Converting from CBOR to JSON

       Most of the types in CBOR have direct analogs in JSON.  However, some
       do not, and someone implementing a CBOR-to-JSON converter has to
       consider what to do in those cases.  The following non-normative
       advice deals with these by converting them to a single substitute
       value, such as a JSON null.

       *  An integer (major type 0 or 1) becomes a JSON number.

It is worth referencing back to section 5.5 on Javascript numbers and explicitly warn that not all CBOR integers can be precisely represented as JSON numbers, and there may be a loss of precision?
   

    Appendix C.  Pseudocode

       Major types 0 and 1 are designed in such a way that they can be
       encoded in C from a signed integer without actually doing an if-then-
       else for positive/negative (Figure 2).  This uses the fact that
       (-1-n), the transformation for major type 1, is the same as ~n
       (bitwise complement) in C unsigned arithmetic; ~n can then be
       expressed as (-1)^n for the negative case, while 0^n leaves n
       unchanged for non-negative.  The sign of a number can be converted to
       -1 for negative and 0 for non-negative (0 or positive) by arithmetic-
       shifting the number by one bit less than the bit length of the number
       (for example, by 63 for 64-bit numbers).
       
This was another place where I thought that it might be useful to warn the reader about decoding negative integers and the risk of overflow taking a major 1 value into an int64 native type.

Regards,
Rob

Roman Danyliw No Objection

Comment (2020-09-08 for -14)
I support Ben Kaduk’s DISCUSS position.

** Section 1.0. Is it possible to enumerate the fixed errata?

** Section 3.4.5.3.  For Tag 35, how does one know if the syntax is a PCRE or ECMA regular expression?

** Section 3.4.5.3.  PCRE is the only informative reference of all of the tags defined in this section (even ECMA is normative).  Please make it normative.

** Section 4.1.  As an implementer of an application, what is the take away from this section?  I’m not following on the definition of “preferred”.

** Section 10.  Per “The input check itself may consume resources.  This is usually linear in the size of the input, which means that an attacker has to spend resources that are commensurate to the resources spent by the defender on input validation.”  I’m not sure this is true for all types of resources.  For example, with compute resources, as an attacker I can craft an input that will take longer for the target to process then for me to produce.

Warren Kumari No Objection

Comment (2020-09-09 for -14)
No email
send info
Thanks!

Éric Vyncke No Objection

Comment (2020-09-09 for -14)
Thank you for the work put into this document. While it is rather long, it is exhaustive and usually quite clear (with exceptions see below).

Thanks to Eve Schooler for her very detailed IoT directorate review at https://datatracker.ietf.org/doc/review-ietf-cbor-7049bis-14-iotdir-telechat-schooler-2020-09-08/ I strongly suggest to the authors to follow Eve's recommendation to clarify and make the text easier to read.

Please find below a couple of non-blocking COMMENT points.

I hope that this helps to improve the document,

Regards,

-éric

== COMMENTS ==

-- Section 3.4  --
Is there a reason why "specifically, tag number 25 and tag number 29" have no reference to a RFC ? The reader would benefit from some short description. This oddity was also mentioned by Eve in her review, so, I strongly suggest to address the issue.

-- Section 3.4.5.2 --
As noted by other AD, I am puzzled by the added value of checking whether a string is PCRE or ECMA262.

-- Section 3.4.6 --
I like this idea of 'magic number' but, as I am not a Unicode expert, I wonder whether "In particular, 0xd9d9f7 is not a valid start of a Unicode text in any Unicode encoding if it is followed by a valid CBOR data item." will always stand true.

-- Section 4.2.1 --
Humm this section says "MUST be as short as possible" while the introduction says "optimize for CPU not for bytes". Same applies for sorted keys... How can we reconciliate ? Suggestion: add some text about this apparent goals conflict.

(Barry Leiba; former steering group member) Yes

Yes ( for -14)
No email
send info

(Alissa Cooper; former steering group member) No Objection

No Objection ( for -14)
No email
send info

(Deborah Brungard; former steering group member) No Objection

No Objection ( for -14)
No email
send info

(Magnus Westerlund; former steering group member) No Objection

No Objection ( for -14)
No email
send info