RTP Payload Format for VC-2 High Quality (HQ) Profile
RFC 8450

Note: This ballot was opened for revision 06 and is now closed.

(Ben Campbell) Yes

(Ignas Bagdonas) No Objection

Deborah Brungard No Objection

Alissa Cooper No Objection

(Spencer Dawkins) No Objection

Benjamin Kaduk No Objection

Comment (2018-06-19 for -06)
No email
send info
Thanks for the clear Security Considerations section.
I am a little uncertain about the privacy properties of the padding
data, though, largely due to my uncertainty about the details of how
the padding works.  (This is, perhaps, in a similar vein to Eric's
general concerns on interoperability.)

In particular, Section 4.2 says that the Data Length for a Padding
Data Unit "may have any value" and "indicates the size of the
recommended padding".  There is also an "Optional Payload Data" in
Figure 6, and I failed to find a description of what its contents
are for padding data units.  Section 4.5.1's coverage of the Padding
Data Parse Info Header suggests that the "native VC-2" and RTP
padding elements are essentially distinct, with the RTP one being
essentially a recommendation to add a VC-2 one, but giving no
mandatory guidance on how much padding to apply.  In this scenario I
don't know what the purpose of the "optional payload data" in Figure
6 be, though.  Padding is of course ignored by the actual VC-2
decoder, so the concern would mostly be if the (RTP) bits on the
wire include a side channel or "stegangraphic channel" (not exactly
the normal meaning of that one) where identifying information could
be inserted, unbeknownst to the recipient.  This could come into
play if media encryption is not used, or when a middlebox/mixer is
used, or probably in other scenarios as well.  The specification of
all-zeros padding along with the Padding Data Parse Info Header of
course removes any such channel at that point, but I didn't see
a real confirmation that there was no channel in the RTP bits on the
wire.

Some additional section-by-section comments follow.

Section 4.1

   Timestamp: 32 bits  If the packet contains transform parameters or
         coded slice data for a coded picture then the timestamp
         corresponds to the sampling instant of the coded picture.  A
         90kHz clock SHOULD be used.  A single RTP packet MUST NOT
         contain coded data for more than one coded picture, so there is
         no ambiguity here.

Is this a new requirement or quoting a preexisting one?  If a new
requirement, I suggest replacing "so there is no ambiguity here"
with "in order to eliminate any chance for ambiguity".

Section 4.2

         If the receiver does not receive a transform parameters packet
         for a picture then it MAY assume that the parameters are
         unchanged since the last picture, or MAY discard the picture.

Those seem like two very different options!  How would I choose
between them?


We only get the starting x- and y-coordinates of a slice for the
first slice in a packet; it sounds like the main VC-2 spec specifies
the order in which the following slices are laid out?
(Do we need to say anything about what "x coordinate" and "y
coordinate" mean?  I seem to recall that over history there have
existed pixel identifying schemes that put the origin at both the
top left and bottom left of the display.)

Section 4.5.1

There is some text here and elsewhere that seems to imply reusing a
Parse Info Header for various data received in different RTP
packets, potentially even from different coded pictures.  The Parse
Info Header contains "next" and "prev parse offset"s, though --
when would those offsets need to be updated?

   o  A receiver MAY combine all fragment data units (with parse code
      0xEC) and the same picture number into a single picture data unit
      with parse code 0xE8.  If the stream is required to comply with
      major versions 1 or 2 of the VC-2 Spec then this MUST be done.

The "and" in "and the same picture number" seems to be an editing
error; maybe "with" is better?

   o  Once a data unit has been assembled, whether a Sequence Header,
      Coded Picture Fragment, Coded Picture, or Auxiliary Data Unit, the
      next parse offset and previous parse offset values in its Parse
      Info Header should be filled with the offset between the start of
      the header and the start of the next or previous.

This text could probably be tightened up with respect to which
next/previous fields are updated when, and what values go in them.
E.g., do we have enough information to fill in the "previous" field
when we start assembling a data unit, and the "next" when we finish
assembling that data unit?

Section 6.1

Perhaps "RFC XXXX" makes more sense as the "published specification"
than ST 2042-1?  That is, this is where the (mandatory) RTP framing
is required, so it may be a better starting point for an
implementor.

(Suresh Krishnan) No Objection

Warren Kumari No Objection

(Mirja K├╝hlewind) No Objection

(Terry Manderson) No Objection

(Alexey Melnikov) No Objection

(Eric Rescorla) No Objection

Comment (2018-06-19 for -06)
No email
send info
Rich version of this review at:
https://mozphab-ietf.devsvcdev.mozaws.net/D4183


I am not sure that this specification can be interoperably
implemented. I have noted a number of points below. I believe these
are largely minor and so have not made this a DISCUSS, but it is
important they be resolved.

This document would also benefit from significant editorial work.
Based on S  4.5.1, I take the structure to be that you start with the
VC-2 stream and then use RTP headers to contain the information in
Parse Info blocks. If that is correct, it would be much clearer if
stated upfront.



IMPORTANT
S 4.2.
>   
>      The fields of the extended headers are defined as follows:
>   
>      Extended Sequence Number: 16 bits  MUST Contain the high-order
>            16-bits of the 32-bit packet sequence number, a number which
>            increments with each packet.  This is needed since the high

Increments by one?


S 4.2.
>            data rates of VC2 Sequences mean that it is highly likely that
>            the 16-bit sequence number will roll-over too frequently to be
>            of use for stream synchronisation.
>   
>      B: 1 bit  MUST be set to 1 if the packet contains the first byte of
>            an Auxiliary Data or Padded Data Unit.

And otherwise must be 0?


S 4.2.
>   
>      B: 1 bit  MUST be set to 1 if the packet contains the first byte of
>            an Auxiliary Data or Padded Data Unit.
>   
>      E: 1 bit  MUST be set to 1 if the packet contains the final byte of
>            an Auxiliary Data or Padded Data Unit.

And otherwise must be 0?


S 4.2.
>            from a new picture until all the coded data from the current
>            picture has been sent.
>   
>            If the receiver does not receive a transform parameters packet
>            for a picture then it MAY assume that the parameters are
>            unchanged since the last picture, or MAY discard the picture.

How does this interact with packet loss?

COMMENTS
S 3.
>      the decoder.
>   
>      Each Sequence consists of a series of 13-octet Parse Info headers and
>      variable length Data Units.  The Sequence begins and ends with a
>      Parse Info header and each Data Unit is preceded by a Parse Info
>      Header.  Data Units come in a variety of types, the most important

This text isn't very clear to me. Is the following valid: PI | Data |
PI? How about PI | PI | Data | PI.  PI | PI | PI | Data | PI?




S 3.
>      should not be assumed.
>   
>      The High Quality (HQ) profile for VC-2 restricts the types of Parse
>      Info Headers which may appear in the Sequence to only:
>   
>      o  Sequence Headers,

The text above says that Sequence Headers are a type of Data Unit. So
I'm confused by this text.


S 4.
>   
>   4.  Payload format
>   
>      This specification only covers the transport of Sequence Headers,
>      High Quality Fragments, Auxiliary Data, and (optionally) End of
>      Sequence Headers and Padding Data.

So it doesn't include Parse Info?


S 4.
>         Picture (Figure 2),
>   
>      o  A Picture Fragment containing VC-2 Coded Slices (Figure 3) for a
>         picture,
>   
>      o  The end of a VC-2 Sequence (Figure 4)

It would be helpful if you cited specific sections of VC-2 for these.


S 4.
>      .                                                               .
>      +---------------------------------------------------------------+
>   
>                  Figure 6: RTP Payload Format For Padding Data
>   
>      All fields in the headers longer than a single bit are interprted as

Nit: interpreted.


S 4.2.
>   
>      Data Length: 32 bits  For an auxiliary data unit this contains the
>            number of bytes of data contained in the uncoded payload
>            section of this packet.  For a Padding Data Unit this field may
>            have any value and simply indicates the size of the recommended
>            padding.

This seems like a very large field given that RTP datagrams are almost
never this large, so I am suspecting the "uncoded payload" means pre-
compressed? Can you be clearer..

Alvaro Retana No Objection

(Adam Roach) No Objection

Comment (2018-06-19 for -06)
No email
send info
Thanks for the work on this document.

Given that the underlying format doesn't appear to be resilient to loss, I'm a
little surprised to see no discussion of FEC; and, in particular, no treatment
of the allocation of unequal error protection to the various packet types. For
example, it sounds like the transform parameters packet is significantly more
important than, e.g., a picture fragment that contains slices. I suspect there
is a general prioritization among the various types that would useful to call
out for implementors.

Martin Vigoureux No Objection