Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures
RFC 8610

Note: This ballot was opened for revision 05 and is now closed.

Alexey Melnikov Yes

Deborah Brungard No Objection

(Ben Campbell) No Objection

Comment (2018-11-20 for -06)
Most of my comments have already been captured by others, save one:

Is there a specific reason the normative appendices are not part of the main body? I think a lot of RFC readers assume that appendices are optional to read. We should not surprise them without reason.

Alissa Cooper No Objection

Comment (2018-11-19 for -06)
Thank you for a clear document and for addressing the Gen-ART review comments.

(Spencer Dawkins) No Objection

Benjamin Kaduk No Objection

Comment (2018-11-18 for -05)
Thanks for updating the editor's copy pursuant to the secdir review!

As I was reading, I wondered about potential confusion between a numerical
value and the corresponding text string when used as a keytype, especially
for barewords.  The bareword ABNF requires a leading EALPHA, which should
force the right parsing, while the memberkey ABNF still allows literal
values to be used as keys.  I do wonder, though, if the 'id' ABNF's
limitations on textual names (i.e., strings that could be interpreted as
numbers are disallowed) should be mentioned in the main text as how
disambiguation is enforced in general.

It's a little weird to use PersonalData as an example, given the privacy
considerations inherent in storing personal data, but I guess this is not
really a flaw in the spec.

Section 1

Nit: bullet (G3) lacks grammatical parallelism with its sibling bullets;
something like "Be able to" would restore parity.

Section 2

   1.  Instead of defining all four types of composition in CDDL
       separately, or even defining one kind for arrays (vectors and
       records) and one kind for maps (tables and structs), there is
       only one kind of composition in CDDL: the _group_ (Section 2.1).

This perhaps reads a bit strongly, as we do go on to define syntactic sugar
for arrays and maps, even though they build on the shared group
abstraction.

Section 2.1

   Note that the (curly) braces signify the creation of a map; the
   groups themselves are neutral as to whether they will be used in a
   map or an array.
[...]
   Note that the lists inside the braces in the above definitions
   constitute (anonymous) groups, while "identity" is a named group.

I might add another sentence in one of these places foreshadowing the
behavior that groups are "macro-like" the sense that when used in the
description of another group, their contents are siblings of the elements
that are new in the other group, as opposed to being part of a nested
structure.

Section 3.1

   o  CDDL uses UTF-8 [RFC3629] for its encoding.

It's pretty rare for it to be sufficient to just say "UTF-8" in a technical
spec; what kind of internationalization review has been done?  Do we need
to specify anything about normalization or canonicalization?

Section 3.5.1

   The "struct" usage of maps is similar to the way JSON objects are
   used in many JSON applications.

   A map is defined in the same way as defining an array (see
   Section 3.4), except for using curly braces "{}" instead of square
   brackets "[]".

Taken together, these paragraphs read as if (1) a struct is a type of map,
and (2) a map uses curly brackets.  But the following example shows a struct
as enclosed within square brackets.  Where am I going wrong?

         GpsCoordinates = {
           longitude      : uint,            ; multiplied by 10^7
           latitude       : uint,            ; multiplied by 10^7
         }

It is perhaps irresponsible to include an example that does not specify the
units of the measurement (e.g., degrees or radians).

Section 3.8.6

   value from being sent over the wire.  This control is only meaningful
   when the control type is used in an optional context; otherwise there
   would be no way to express the default value.

Maybe s/express/utilize/?  That is, the ".default" control still expresses
what the default value would be, but that information would never be used.

Section 5

   o  Where the CDDL includes extension points, the impact of extensions
      on the security of the system needs to be carefully considered.

Would it make sense to also add guidance for judicious use of .within to
constrain extension points?

   Writers of CDDL specifications are strongly encouraged to value
   simplicity and transparency of the specification over its elegance.
   Keep it as simple as possible while still expressing the needed data
   model.

Perhaps "simplicity of [type] constructions", since some readers may equate
simplicity [of design] and elegance.

Section 6.1

I don't really understand why there's a need for distinctions based on the
presence of an internal dot, especially given that this document does not
define any such operators.  What would such a control operator look like?

Section 7.2

It seems that RFC 4648 might need to be a normative reference given that it
specifies how some byte string literals are interpreted in EDN.

Appendix B

On first glance I wonder if some of the S should be 1*WS to avoid parsing
ambiguities, but I did not think about it very hard.

   Note that this ABNF does not attempt to reflect the detailed rules of
   what can be in a prefixed byte string.

Before I made it this far, I was going to note that the "bytes" definition
seems to allow me to use a "h" or b64" prefix with "arbitrary" contents; it
seems that an alternate construction could embody the semantic restrictions
for such strings into the ABNF.  How bad would it be if a future update to
this document attempted to actually reflect the "detailed rules of what can
be in a prefixed byte string"?

Appendix D

I can't decide if most of the "#" entries need double-quotes around them to
parse properly as ABNF.  Is it best to think about this CBOR major/minor
notation as an extension to standard ABNF?

Suresh Krishnan No Objection

Comment (2018-11-21 for -06)
* Section 3.8.1

Looks like there is an off-by-one error here. Shouldn't

BYTES_N == 256**N

be

BYTES_N == 256**N-1 

instead?

(Terry Manderson) No Objection

(Eric Rescorla) (was Discuss) No Objection

Comment (2019-03-23 for -07)
Thank you for addressing my DISCUSS

Alvaro Retana No Objection

Adam Roach (was Discuss) No Objection

Comment (2018-11-19 for -06)
No email
send info
I also have a handful of non-critical comments of varying importance.

Please expand "CBOR":
 (1) In the title
 (2) Upon first use in the document body

See https://www.rfc-editor.org/materials/abbrev.expansion.txt for details.

---------------------------------------------------------------------------

§1.2:

>  New terms are introduced in _cursive_.  CDDL text in the running text
>  is in "typewriter".

This is perplexing, as I know of no tool that will render the canonical form
of current RFCs in the way being described. Is the intention to hold this
document until the new RFC format is available?

---------------------------------------------------------------------------

§2:

>  The rest of this section introduces a number of basic concepts of
>  CDDL, and section Section 3 defines additional syntax.  Appendix C

Nit: "...and Section 3..."

---------------------------------------------------------------------------

§2.2.2:

>  delimited by a "//" (double slash).  Note that the "//" operators
>  binds much more weakly than the other CDDL operators, so each line

Nit: "...operator binds..." or "...operators bind..."

---------------------------------------------------------------------------

§3.1:

>  o  A name can consist of any of the characters from the set {'A',
>     ..., 'Z', 'a', ..., 'z', '0', ..., '9', '_', '-', '@', '.', '$'},

This looks like a formal syntax of some kind, but I don't know where it's
defined. Notably, since this document has just defined ".." to be an inclusive
range operator and "..." to be an exclusive range operator, defining the set of
allowed characters in this way seems to run the risk of interpreting, e.g., "Z"
to be disallowed.

I suggest either defining the set of allowed characters using a formally defined
and cited grammar (e.g., ABNF), or using prose.

---------------------------------------------------------------------------

§3.1:

>  o  outside strings, whitespace (spaces, newlines, and comments) is
>     used to separate syntactic elements for readability (and to
>     separate identifiers or numbers that follow each other); it is
>     otherwise completely optional.

This seems nominally at odds with the following text in §2.2.2.1, which points
to at least one other case where whitespace is mandatory:

>  When using a name as
>  the left hand side of a range operator, use spacing as in
>
>     min .. max
>
>  to separate off the range operator.

---------------------------------------------------------------------------

§3.1:

>     If prefixed as "h" or "b64", the string is
>     interpreted as a sequence of pairs of hex digits (base16) or a
>     base64(url) string, respectively

Please normatively cite RFC 4648, sections 8 and 5 respectively.

---------------------------------------------------------------------------

§3.8.1:

>  When applied to an unsigned integer, the ".size" control restricts
>  the range of that integer by giving a maximum number of bytes that
>  should be needed in a computer representation of that unsigned
>  integer.  In other words, "uint .size N" is equivalent to
>  "0...BYTES_N", where BYTES_N == 256**N.
>
>     audio_sample = uint .size 3 ; 24-bit, equivalent to 0..16777215
>
>               Figure 9: Control for integer size in bytes

While they're semantically the same, the example is oddly mismatched with the
preceding text. Consider instead:

      audio_sample = uint .size 3 ; 24-bit, equivalent to 0...16777216

---------------------------------------------------------------------------

Appendix B:

>           / "#" "6" ["." uint] "(" S type S ")" ; note no space!

No space where? I see two space productions in that rule (so it clearly
applies to some specific location), and there are several places where spaces
cannot appear.

>     type1 = type2 [S (rangeop / ctlop) S type2]

This rule doesn't seem to properly capture the ambiguity of "a...b". There is a
terribly complex way to address this by defining parallel "type2" and "type3"
rules that differ only in whether a dot is allowed to appear in their value, and
defining type1 as requiring a space after the type that can contain dots -- but
that is probably overkill. It's probably sufficient to reiterate the warning
about requiring a space under such circumstances as a comment on this rule.

>     HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"

It is a common implementor mistake to forget that ABNF is, by default,
case-insensitive. It is probably worth adding a comment here as a reminder.
(The same applies to "0x", "0b", "e", and "p" above, but these seem less likely
to appear in arbitrary case.)

---------------------------------------------------------------------------

Appendix B:

>     SCHAR = %x20-21 / %x23-5B / %x5D-10FFFD / SESC
>     SESC = "\" %x20-10FFFD
...
>     PCHAR = %x20-10FFFD

These almost certainly should be:

      SCHAR = %x20-21 / %x23-5B / %x5D-7E / %x80-10FFFD / SESC
      SESC = "\" %x20-7E / %x80-10FFFD
...
      PCHAR = %x20-7E / %x80-10FFFD

(i.e., exclude the control character %x7F)

---------------------------------------------------------------------------

Appendix C:

>  (It is not an error to extend a rule name
>  that has not yet been defined; this makes the right hand side the
>  first entry in the choice being created.)

Is it an error to redefine a rule name that has already been defined?

Martin Vigoureux No Objection