Extensible Binary Meta Language
draft-ietf-cellar-ebml-17

Note: This ballot was opened for revision 13 and is now closed.

(Alexey Melnikov) Yes

(Deborah Brungard) No Objection

(Alissa Cooper) No Objection

Comment (2019-12-17 for -15)
I support Adam's DISCUSS point about the abstract.

I'm not clear on why the IANA registry names are prefixed with "CELLAR." Are there more general EBML Element ID and DocType registries envisioned? If not, I would suggest dropping the "CELLAR."

For future documents I would expect the authors to respond to the Gen-ART reviewer's comments via email after fixes have been applied to address the issues raised in the review.

Roman Danyliw No Objection

Comment (2019-12-03 for -14)
Section 1.  Please add a reference to Matroska instead of an inline URL

Section 1.  Please add a reference for WebM

Section 2 and Table 4 of Section 5.  These sections define EBML Class.  However, it isn’t used else where in the document.  What is it supposed to be used for?

Section 4.4.  Recommend replacing “This table” text to be the name of specific table in question (i.e., Table 1, Table 2)

Section 6.2.  Per “Unknown-Sized Element MUST NOT be used or defined unnecessarily; however if the Element Data Size is not known before the Element Data is written, such as in some cases of data streaming, then Unknown- Sized Elements MAY be used.”, should this text be read as “the Unknown- Sized Elements MUST only be used if the Element Data Size is not known before the Element Data is written”.  I’m having trouble understanding how to handle normative language for a qualitative statement of “unnecessarily”

Section 11.1.5.1.  Double checking on the grammar of the name attribute – it is permitted to start with a “-“ or a “.”?

Section 11.1.5.3.  Are there any uniqueness properties for an id attribute?  Drawing a parallel from XML,  I would have thought that each EBML element would have unique ID per doctype (say like an xml:id)

Section 11.1.7.2.  documentation@purpose has a number of possible enumerated values, however, none are defined in the text (they are only listed)

Benjamin Kaduk (was Discuss) No Objection

Comment (2019-12-22 for -16)
No email
send info
The -16 addresses my DISCUSS point; thanks!
I'm given to understand that email discussion of the comments (preserved
below) is forthcoming, but do have one note on the new text in the -16:
In Section 7.3 we imply that a float is 32-bit and 64-bit at the same time;
I think s/and/or/ makes more sense (and, of course, the EBML element
length indicates which one is present).

I support Adam's Discuss regarding the Abstract.

Section 2

   "Parent Element": A relative term to describe the "Master Element"
   which contains a specified element.  For any specified "EBML Element"
   that is not at "Root Level", the "Parent Element" refers to the
   "Master Element" in which that "EBML Element" is contained.

It sounds like this is intended to be "directly" or "immediately"
contained (in order to be unique), right?  If not, then it sould be
''refers to a "Master Element" in which [...]''

Section 4.1

   Each Variable Size Integer begins with a VINT_WIDTH which consists of
   zero or many zero-value bits.  The count of consecutive zero-values
   of the VINT_WIDTH plus one equals the length in octets of the
   Variable Size Integer.  [...]

Does the following attempted rewording change the meaning?

%  Each Variable Size Integer begins with a VINT_WIDTH which consists of
%  zero or more bits set to zero.  The length in octets of the entire
%  Variable Size Integer is determined as one plus the number of
%  consecutive bits set to zero.

(I find the current formulation rather hard to parse.)

Section 6.2

    | "\root\level1\level2\<global>"     | Global Element cannot be   |
    |                                    | assumed to have this path, |
    |                                    | while parsing "elt" it can |
    |                                    | only be a child of "elt"   |

Cannot be assumed by who/what?  My brain is trying to parse this as just
"cannot assume this path".

Section 7.5

Should we say anything about termination of a UTF-8 string needing to
still result in valid UTF-8 (i.e., not insert NULs in the middle of a
codepoint)?

Section 7.7

   stored within Master Elements SHOULD only consist of EBML Elements
   and SHOULD NOT contain any data that is not part of an EBML Element.

When might this SHOULD (NOT) be violated?

Section 8.2

   part of an EBML Element.  This document defines precisely which EBML
   Elements are to be used within the EBML Header, but does not name or

(for EBMLVersion 1 only, right?)

Section 11.1

   Element; for example matroska or webm (see Section 11.2.6).  The
   DocType value for an EBML Document Type MUST be unique and
   persistent.

It might be appropriate to refer to Section 17.2 and/or the IANA
registry for DocType values, here.

   EBMLVersion to only support a value of "1".  If an EBML Schema adopts
   the EBML Header Element as-is, then it is not required to document
   that Element within the EBML Schema.  If an EBML Schema constrains

Does "as-is" imply some level of future-compatibility/extensibility for
when EBMLVersions other than "1" are defined?

Section 11.1.1

It's a little amusing that we bother to provide "default" attributes
when the "range" attribute uniquely determines the allowed value.

Section 11.1.4

   Each "<element>" defines one EBML Element through the use of several
   attributes that are defined in Section 11.1.3.  EBML Schemas MAY

I think this makes more sense as "Section 11.1.5".

Section 11.1.5.2

This ABNF seems to only allow "direct" recursion where element <x>
appears directly inside element <x>, without any intermediate elements.
I assume that's the intent, though it would be surprising in a
general-purpose markup language.

   In some cases the EBMLLastParent part of the path is an
   EBMLGlobalParent.  A path with a EBMLGlobalParent defines a
   Section 11.3.  Any path that starts with the EBMLFixedParent of the

That second sentence doesn't parse.

   As an example, a "path" of "1*(\Segment\Info)" means the element Info
   is found inside the Segment elements at least once and with no
   maximum iteration.  An element SeekHead with path
   "0*2(\Segment\SeekHead)" may not be found at all in its Segment
   parent, once or twice but no more than that.

The way this text is written makes me want to interpret the path
occurence counts more like the (regular) minOccurs/maxOccurs element
attributes, as opposed to applying to the path components to get to the
specific element in question.

Section 11.1.9.2

   <element name="Item" path="1*1(\Items)" id="0x4025" type="master"
     minOccurs="1" maxOccurs="1">
     <documentation lang="en" purpose="definition">
       A set of items.

Is this "name" supposed to be "Item" or "Items"?

Section 11.1.10-11.1.12

I'm not sure I have a full understanding of how <restriction>/<enum> are
used; perhaps a reference to the corresponding XML behavior is in order?

Section 11.1.13-11.1.14

The <extention type="..."> usage seems underspecified.

Section 11.1.15

       <xs:attribute name="path" use="required">
         <!-- <xs:simpleType>
           <xs:restriction base="xs:integer">
             <xs:pattern value="[0-9]*\*[0-9]*()"/>
           </xs:restriction>
         </xs:simpleType> -->
       </xs:attribute>

Why do we include this commented-out snippet?

       <xs:attribute name="unknownsizeallowed" type="xs:boolean"/>
       <xs:attribute name="recurring" type="xs:boolean"/>

Don't we effectively set default values for these two in the prose
description?

Section 11.1.16

   Identically Recurring Elements SHOULD include a CRC-32 Element as a
   Child Element; this is especially recommended when EBML is used for
   long-term storage or transmission.  If a Parent Element contains more

I'm not sure if the "long-term" is intended to also bind as "long-term
transmission" (though I'm not sure what it would mean in that case).
It's also not entirely clear what kinds of transmission would benefit
from this, as reliable media presumably don't need redundancy for
reliability, but unreliable media can't really be used to carry EBML
without some framing requirements to know when elements start.

Section 11.1.18

   If a Mandatory EBML Element has no default value declared by an EBML
   Schema and its Parent Element is present then the EBML Element MUST
   be present as well.  If a Mandatory EBML Element has a default value
   declared by an EBML Schema and its Parent Element is present and the
   value of the EBML Element is NOT equal to the declared default value
   then the EBML Element MUST be present.

This seems almost tautological, in that how would an EBML Element have a
value if it was not present?  (The following paragraph that talks about
when to write such elements, does make more sense.)

Section 11.3.1

   path: "*1((1*\)\CRC-32)"

Using backslash as both an escape character and a path separator makes
my head hurt, and I did not have enough caffeine yet this morning to
figure it out.

   8.1.1.6.2 of [ITU.V42.1994], with initial value of 0xFFFFFFFF.  The
   CRC value MUST be computed on a little endian bitstream and MUST use
   little endian storage.

bitstream or bytestream?

Section 12

   If a Master Element contains a CRC-32 Element that doesn't validate,
   then the EBML Reader MAY ignore all contained data except for
   Descendant Elements that contain their own valid CRC-32 Element.

Ignoring only part of the known questionable content could have
significant security considerations, if (e.g.) security-relevant
restrictions are in the garbled part of the document but the sensitive
content has a (valid) redundant CRC.

[review terminated early]

(Suresh Krishnan) No Objection

Warren Kumari No Objection

(Mirja Kühlewind) No Objection

Comment (2019-12-16 for -15)
No email
send info
I only did a very brief review but I don't think there are any transport issues :-)

(Barry Leiba) No Objection

Comment (2019-12-18 for -15)
This was a very difficult read: I found a lot of the document to be convoluted and hard to follow, and I considered balloting Abstain, as I’m not sure that DISCUSS is appropriate for my complaints, but I can’t really say that I have “no objection”.  In the end I decided to go with a “No Objection” ballot, to call out the worst of the issues, and to hope that you will consider the changes I suggest and that others will follow the text more easily than I could.

— Section 4.1 —

   Each Variable Size Integer begins with a VINT_WIDTH which consists of
   zero or many zero-value bits.  The count of consecutive zero-values
   of the VINT_WIDTH plus one equals the length in octets of the
   Variable Size Integer.  For example, a Variable Size Integer that
   starts with a VINT_WIDTH which contains zero consecutive zero-value
   bits is one octet in length and a Variable Size Integer that starts
   with one consecutive zero-value bit is two octets in length.  The
   VINT_WIDTH MUST only contain zero-value bits or be empty.

I found this very hard to follow, and had to read it several times before I understood what you’re getting at.  I found things such as “zero or many zero-value bits” to be confusing.  May I suggest alternative text, which describes the concept and then gets to the details?:

NEW
Each Variable Size Integer begins with a VINT_WIDTH followed by a VINT_MARKER.  VINT_WIDTH is a sequence of zero or more bits of value 0, and is terminated by the VINT_MARKER, which is a single bit of value 1.  The total number of bits (VINT_WIDTH and VINT_MARKER combined) is the number of octets of the Variable Size Integer.

Thus, the single bit “1” describes a Variable Size Integer with a length of one octet.  The sequence of bits “01” describes a Variable Size Integer with a length of two octets.  “001” describes a Variable Size Integer with a length of three octets, and so on, with each additional 0-bit adding one octet to the length of the Variable Size Integer.
END

I, at least, find that easier to follow.  Does it work for you?

For the next paragraph, which limits the length under various circumstances, I suggest putting it in terms of the number of octets in the integer, rather than the number of bits in the VINT_WIDTH, which might be better put into Section 4.3, rather than 4.1.  Text such as, “A Variable Size Integer in an EBML Header can be at most 4 octets long, except [...] , where it can be up to 8 octets long,” is easier to understand than the text explaining limits on the number of bits in VINT_WIDTH.

— Section 4.4 —
Table 2 and the text that introduces it would be better if they talked about the integer that’s represented (2), rather than the binary value (0b10 in the text and 10 in the table), considering that it is a Variable Size INTEGER, yes?

— Section 6.2 —

   An EBML Element with an unknown Element Data Size
   is referred to as an Unknown-Sized Element.  A Master Element MAY be
   an Unknown-Sized Element; however an EBML Element that is not a
   Master Element MUST NOT be an Unknown-Sized Element.  Master Elements
   MUST NOT use an unknown size unless the unknownsizeallowed attribute
   of their EBML Schema is set to true (see Section 11.1.5.10).

This also seems confusing and perhaps contradictory because of how it uses the BCP 14 key words.  May I suggest this, which neither uses nor needs the key words?:

NEW
   An EBML Element with an unknown Element Data Size
   is referred to as an Unknown-Sized Element.  Only a Master Element
   is allowed to be of unknown size, and it can only be so if the
   unknownsizeallowed attribute of its EBML Schema is set to true
   (see Section 11.1.5.10).
END

— Section 7.7 —

   The Master Element MAY also use an unknown length.

The “MAY” isn’t really correct, is it?  There are restrictions that make it not entirely optional, as noted in the next sentence.  I suggest not using BCP 14 here, and just saying, “The Master Element may be of unknown length.”

   The Master Element contains zero, one, or many other elements.

Does this mean anything more than the simpler, “The Master Element contains zero or more other elements.”?  As written, one tends to ask what “many” means here.

— Section 8.2 —

   The EBML Body MUST NOT contain any data that is not
   part of an EBML Element.

Why is this repetition needed?  Doesn’t the similar sentence in Section 8 cover this?

— Section 10 —

   An EBML Document handles 2 different versions: the version of the
   EBML Header and the version of the EBML Body.  Both versions are
   meant to be backward compatible.

I don’t see how that’s practical, as, taken strictly, it means you’ll never be able to make a significant change that is not backward compatible, so you’ll be stuck with errors or limitations forever.  Are you sure you won’t need to allow for incompatible versions at some point?

— Section 17.1 —

   Values from 1 to 126 are to
   be allocated according to the "RFC Required" policy [RFC8126].

Why did you choose that policy?  Are you aware that this allows registrations from non-IETF-stream RFCs?  In particular, anyone can get an RFC published in the Independent stream with a very light level of review.  Did you consider IETF Review, which requires an RFC in the IETF stream (including Informational and Experimental RFCs)?  Or even Standards Action, which requires standards-track RFCs?

The same comment applies to "matroska" and "webm" in Section 17.2.

Alvaro Retana No Objection

Comment (2019-12-18 for -15)
For completeness, the datatracker should point at this document replacing draft-lhomme-cellar-ebml.

(Adam Roach) (was Discuss) No Objection

Comment (2020-01-18 for -16)
Thanks for addressing my discuss and comments!

Martin Vigoureux No Objection

Éric Vyncke No Objection

Comment (2019-12-19 for -15)
No email
send info
I am trusting my ART Area Directors colleagues for this document. Did a quick browse through and it looks good to me.

Still wondering why the three authors have no affiliation at all...

(Magnus Westerlund) (was Discuss) No Objection