Skip to main content

RTP Payload Format for ISO/IEC 21122 (JPEG XS)
draft-ietf-payload-rtp-jpegxs-18

Yes

Murray Kucherawy

No Objection

Erik Kline
Éric Vyncke
(Alvaro Retana)
(Robert Wilton)

Note: This ballot was opened for revision 15 and is now closed.

Murray Kucherawy
Yes
Erik Kline
No Objection
Francesca Palombini
No Objection
Comment (2021-06-16 for -16) Sent
Thank you for the work on this document. I have some non-blocking comments and observations.

Francesca

1. -----
 
   A JPEG XS codestream header, starting with an SOC marker, followed by
   one or more slices, and terminated by an EOC marker form a JPEG XS
   codestream.

FP: I understand from the terminology what this is meant to specify, however how this is expressed makes it slightly confusing: it is not clear that the subject of "followed" is "A JPEG XS codestream header" and not "an SOC marker".

2. -----

FP: I agree with John that without access to ISO21122-{1,2,3}, it's not possible to do a complete review; in particular the media type registration contains parameters that are inherited by the ISO standards, with normative text that I cannot review. Like John, I trust the responsible AD on that the doc has had sufficient reviews in the WG, from people with access to the ISO specifications.

3. -----

FP: I couldn't find that the Media type registration has been posted to the media-type mailing list, was that done? This was also highlighted in the shepherd write up, which I found helpful, so thank you Bernard.
John Scudder
No Objection
Comment (2021-06-16 for -16) Sent
Thanks, I found this spec very readable -- modulo the fact that I have no expertise in the subject area! Below are some questions and comments I hope may be useful. 

I'm concerned that since the underlying ISO21122-{1,2,3} normative references are not readily available, it's not possible to do a complete review. I take it on faith that the document has received review within the WG by subject matter experts who are conversant with, and have access to, the relevant ISO specifications.

1. Section 4.1

       In the case of an interlaced frame, the
       JPEG XS header segment of the second field SHALL be in its own
       packetization unit.

I’m confused why the second field even needs its own header segment, considering you earlier told us (§3.4) that

   Both picture segments SHALL contain identical
   boxes (i.e. concatenation of the video support box and the colour
   specification box is byte exact the same for both picture segments of
   the frame).

Surely this means the VS and CS boxes could have been elided from the second field? (Probably they’re left in for uniformity, but I thought it worth asking.)


2. Section 4.1

   Due to the constant bit-rate of JPEG XS, the codestream packetization
   mode guarantees that a JPEG XS RTP stream will produce a constant
   number of bytes per frame, and a constant number of RTP packets per
   frame.  To reach the same guarantee with the slice packetization
   mode, an additional mechanism is required.  This can involve a
   constraint at the rate allocation stage in the JPEG XS encoder to
   impose a constant bit-rate at the slice level, the usage of padding
   data, or the insertion of empty RTP packets (i.e. a RTP packet whose
   payload data is empty).

The “… additional mechanism is required” text is ambiguous. Does this mean to say that an implementation MUST use an (implementation-specific!) method, that makes its output CBR? That’s insinuated by the use of the word “required”. Or, does it mean that if an implementation wishes to render a CBR stream instead of a VBR one, it will need to adopt one of these strategies? Assuming your intent is the latter, I think the text should be clarified, for example

OLD
   To reach the same guarantee with the slice packetization
   mode, an additional mechanism is required. 

NEW
   If an implementation wishes to provide the same guarantee
   with the slice packetization mode, it will need to use an 
   additional mechanism.


3. Section 4.3

      In the case that the Transmission mode
      (T) is set to 0, the slice packetization mode SHALL be used and K
      SHALL be set to 1.

Presumably the reason for this is evident to someone conversant with JPEG XS?


4. Section 7.1

         level:  The JPEG XS level [ISO21122-2] in use.  Any white space in
         the level name SHALL be omitted.  Examples of valid levels
         names are '2k-1' or '4k-2'.

Nit: s/levels/level/ (alternately, delete “names”).

      width:  Determines the number of pixels per line.  This is an
         integer between 1 and 32767.

      height:  Determines the number of lines per frame.  This is an
         integer between 1 and 32767.

It would be less ambiguous to say “between 1 and 32767 inclusive”.
Roman Danyliw
No Objection
Comment (2021-06-10 for -16) Sent for earlier
Thank you for addressing my COMMENTs.
Zaheduzzaman Sarker
(was Discuss) No Objection
Comment (2021-07-20 for -17) Sent
Thanks to the authors and Stephan Wenger for prompt action to make the ISO specification available to us.

I have removed the discuss as the main reason for the discuss was resolved.

I however have one major issue which I think need to be addressed.

* Section 4.1 : the assertion here is that the jpeg xs produces constant bitrate. However, now I know that this codec can operate on both constant and variable bitrate mode. This section should clarify that when VBR mode is used the RTP payload format still holds or not. Also it might be helpful to discuss the two mode of operations somewhere in the introduction and state if the focus is only on constant bitrate mode with reasoning. The will level out the scope of the payload definition and also the impact on section 6.


And more comments:

* I can agree with Martin Duke's comment that the polymorphic use of "end-to-end latency" need to be explained a bit.

* Section 3:  having the statement that we are describing some terminologies or naming for this specification like it section 4 does, would help the reader to understand the context a bit more.

* Section 3.3: I would suggest to add reference to Ppih and Plev at the first use of them.

* Section 4.3: says --

    "If codestream packetization mode is
      used, L bit and M bit are equivalent."
   
   does this mean it is enough to set the M bit only in the codestream packetization mode? 

* Section 4.3: says --
    "In the case of codestream packetization mode (K=0), this
         counter resets whenever the Packet counter resets (see
         hereunder)"

   hereunder? can we give more specific reference instead?

* Section 6: Usually when RTP is used congestion control and corresponding required rate control is done by the RTP applications. The use of RTP AVPF profile is the recommended profile to be used for real-time communication when efficient rate control (nope not the video encoder rate control :-)) is needed. Hence, I think we should recommend that use of AVPF profile here and also refer to RFC8888. The inclusion of circuit breaker makes lot of sense here.
I also got to know that jpeg xs is designed to be used in a controller network environment. Hence, there should be a warning about use of this in a best effort Internet prior to the requirement on packetloss observation. If there is any acceptable parameter defined somewhere for packet loss then that also should be referenced here.
Éric Vyncke
No Objection
Alvaro Retana Former IESG member
No Objection
No Objection (for -16) Not sent

                            
Benjamin Kaduk Former IESG member
No Objection
No Objection (2021-06-17 for -16) Sent
I'll echo the sentiment of other reviewers that the scope of review
possible is limited witout access to the underlying ISO specification.
I further note that in the recent case of
https://datatracker.ietf.org/doc/draft-ietf-payload-vp9/ (for which the
underlying specification is freely available), there was an error in
replicating the chroma subsampling details from the underlying reference
to the internet-draft.  Any such errors are undetectable for this draft.

Section 4.3

Does the value of the T and K bits need to be identical for all packets
of a given RTP stream?

Section 4.4

It's perhaps needlessly confusing to have the human-readable slice
labels in Figures 8 and 9 start at 1 but the SEP counter start at 0.

nit: if SLH is an acronym it should be expanded somewhere (it only
appears in the figures, at present).

In the slice packetization modes, do we have reasonable guarantees that
the JPEG XS header (including all markers and marker segments) will fit
into a single RTP packet?

Section 7.1

   Applications that use this media type:
      For example: SMPTE ST 2110, Video over IP, Video conferencing,
      Broadcast applications.

I think bland declarative statements like "applications that transmit
video over RTP" tend to be more common than longer "for example"
listings, in this type of registration.

Section 8

nit: s/SPD/SDP in the section heading.
Lars Eggert Former IESG member
No Objection
No Objection (2021-06-11 for -16) Sent
All comments below are about very minor potential issues that you may choose to
address in some way - or ignore - as you see fit. Some were flagged by
automated tools (via https://github.com/larseggert/ietf-reviewtool), so there
will likely be some false positives. There is no need to let me know what you
did with these suggestions.

Section 2. , paragraph 8, nit:
> nit is the first (resp. last) byte of a RTP packet payload (excluding its pay
>                                       ^
Use "an" instead of "a" if the following word starts with a vowel sound, e.g.
"an article", "an hour". (Also elsewhere in the document.)

Section 2. , paragraph 17, nit:
> ferent slices can be decoded independently from each other. Note, however, t
>                              ^^^^^^^^^^^^^^^^^^
The usual collocation for "independently" is "of", not "from". Did you mean
"independently of"?
Martin Duke Former IESG member
No Objection
No Objection (2021-06-16 for -16) Sent
In the abstract and intro, it promises "end-to-end latency confined to a fraction of a frame".

I am not sure what to make of this guarantee. Latency is a measure of time and a frame is measured in ... bytes?

Moreover, end-to-end latency is mostly a property of the path, and not something an encoding format can promise.
Robert Wilton Former IESG member
No Objection
No Objection (for -16) Not sent