Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.

RTP Payload Format for ISO/IEC 21122 (JPEG XS)
draft-ietf-payload-rtp-jpegxs-18

Yes

Murray Kucherawy

No Objection

Erik Kline

Francesca Palombini

John Scudder

Roman Danyliw

Zaheduzzaman Sarker

Éric Vyncke

(Alvaro Retana)

(Benjamin Kaduk)

(Lars Eggert)

(Martin Duke)

(Robert Wilton)

Note: This ballot was opened for revision 15 and is now closed.

Murray Kucherawy

Yes

Erik Kline

No Objection

Francesca Palombini

No Objection

Comment (2021-06-16 for -16) Sent

Thank you for the work on this document. I have some non-blocking comments and observations.

Francesca

1. -----
 
   A JPEG XS codestream header, starting with an SOC marker, followed by
   one or more slices, and terminated by an EOC marker form a JPEG XS
   codestream.

FP: I understand from the terminology what this is meant to specify, however how this is expressed makes it slightly confusing: it is not clear that the subject of "followed" is "A JPEG XS codestream header" and not "an SOC marker".

2. -----

FP: I agree with John that without access to ISO21122-{1,2,3}, it's not possible to do a complete review; in particular the media type registration contains parameters that are inherited by the ISO standards, with normative text that I cannot review. Like John, I trust the responsible AD on that the doc has had sufficient reviews in the WG, from people with access to the ISO specifications.

3. -----

FP: I couldn't find that the Media type registration has been posted to the media-type mailing list, was that done? This was also highlighted in the shepherd write up, which I found helpful, so thank you Bernard.

John Scudder

No Objection

Comment (2021-06-16 for -16) Sent

Thanks, I found this spec very readable -- modulo the fact that I have no expertise in the subject area! Below are some questions and comments I hope may be useful.

I'm concerned that since the underlying ISO21122-{1,2,3} normative references are not readily available, it's not possible to do a complete review. I take it on faith that the document has received review within the WG by subject matter experts who are conversant with, and have access to, the relevant ISO specifications.

1. Section 4.1

In the case of an interlaced frame, the
JPEG XS header segment of the second field SHALL be in its own
packetization unit.

I’m confused why the second field even needs its own header segment, considering you earlier told us (§3.4) that

Both picture segments SHALL contain identical
boxes (i.e. concatenation of the video support box and the colour
specification box is byte exact the same for both picture segments of
the frame).

Surely this means the VS and CS boxes could have been elided from the second field? (Probably they’re left in for uniformity, but I thought it worth asking.)

2. Section 4.1

Due to the constant bit-rate of JPEG XS, the codestream packetization
mode guarantees that a JPEG XS RTP stream will produce a constant
number of bytes per frame, and a constant number of RTP packets per
frame. To reach the same guarantee with the slice packetization
mode, an additional mechanism is required. This can involve a
constraint at the rate allocation stage in the JPEG XS encoder to
impose a constant bit-rate at the slice level, the usage of padding
data, or the insertion of empty RTP packets (i.e. a RTP packet whose
payload data is empty).

The “… additional mechanism is required” text is ambiguous. Does this mean to say that an implementation MUST use an (implementation-specific!) method, that makes its output CBR? That’s insinuated by the use of the word “required”. Or, does it mean that if an implementation wishes to render a CBR stream instead of a VBR one, it will need to adopt one of these strategies? Assuming your intent is the latter, I think the text should be clarified, for example

OLD
To reach the same guarantee with the slice packetization
mode, an additional mechanism is required.

NEW
If an implementation wishes to provide the same guarantee
with the slice packetization mode, it will need to use an
additional mechanism.

3. Section 4.3

In the case that the Transmission mode
(T) is set to 0, the slice packetization mode SHALL be used and K
SHALL be set to 1.

Presumably the reason for this is evident to someone conversant with JPEG XS?

4. Section 7.1

level: The JPEG XS level [ISO21122-2] in use. Any white space in
the level name SHALL be omitted. Examples of valid levels
names are '2k-1' or '4k-2'.

Nit: s/levels/level/ (alternately, delete “names”).

width: Determines the number of pixels per line. This is an
integer between 1 and 32767.

height: Determines the number of lines per frame. This is an
integer between 1 and 32767.

It would be less ambiguous to say “between 1 and 32767 inclusive”.

Roman Danyliw

No Objection

Comment (2021-06-10 for -16) Sent for earlier

Thank you for addressing my COMMENTs.

Zaheduzzaman Sarker

(was Discuss) No Objection

Comment (2021-07-20 for -17) Sent

Thanks to the authors and Stephan Wenger for prompt action to make the ISO specification available to us.

I have removed the discuss as the main reason for the discuss was resolved.

I however have one major issue which I think need to be addressed.

* Section 4.1 : the assertion here is that the jpeg xs produces constant bitrate. However, now I know that this codec can operate on both constant and variable bitrate mode. This section should clarify that when VBR mode is used the RTP payload format still holds or not. Also it might be helpful to discuss the two mode of operations somewhere in the introduction and state if the focus is only on constant bitrate mode with reasoning. The will level out the scope of the payload definition and also the impact on section 6.

And more comments:

* I can agree with Martin Duke's comment that the polymorphic use of "end-to-end latency" need to be explained a bit.

* Section 3: having the statement that we are describing some terminologies or naming for this specification like it section 4 does, would help the reader to understand the context a bit more.

* Section 3.3: I would suggest to add reference to Ppih and Plev at the first use of them.

* Section 4.3: says --

"If codestream packetization mode is
used, L bit and M bit are equivalent."

does this mean it is enough to set the M bit only in the codestream packetization mode?

* Section 4.3: says --
"In the case of codestream packetization mode (K=0), this
counter resets whenever the Packet counter resets (see
hereunder)"

hereunder? can we give more specific reference instead?

* Section 6: Usually when RTP is used congestion control and corresponding required rate control is done by the RTP applications. The use of RTP AVPF profile is the recommended profile to be used for real-time communication when efficient rate control (nope not the video encoder rate control :-)) is needed. Hence, I think we should recommend that use of AVPF profile here and also refer to RFC8888. The inclusion of circuit breaker makes lot of sense here.
I also got to know that jpeg xs is designed to be used in a controller network environment. Hence, there should be a warning about use of this in a best effort Internet prior to the requirement on packetloss observation. If there is any acceptable parameter defined somewhere for packet loss then that also should be referenced here.

Éric Vyncke

No Objection

Alvaro Retana Former IESG member

No Objection

No Objection (for -16) Not sent

Benjamin Kaduk Former IESG member

No Objection

No Objection (2021-06-17 for -16) Sent

I'll echo the sentiment of other reviewers that the scope of review
possible is limited witout access to the underlying ISO specification.
I further note that in the recent case of
https://datatracker.ietf.org/doc/draft-ietf-payload-vp9/ (for which the
underlying specification is freely available), there was an error in
replicating the chroma subsampling details from the underlying reference
to the internet-draft.  Any such errors are undetectable for this draft.

Section 4.3

Does the value of the T and K bits need to be identical for all packets
of a given RTP stream?

Section 4.4

It's perhaps needlessly confusing to have the human-readable slice
labels in Figures 8 and 9 start at 1 but the SEP counter start at 0.

nit: if SLH is an acronym it should be expanded somewhere (it only
appears in the figures, at present).

In the slice packetization modes, do we have reasonable guarantees that
the JPEG XS header (including all markers and marker segments) will fit
into a single RTP packet?

Section 7.1

   Applications that use this media type:
      For example: SMPTE ST 2110, Video over IP, Video conferencing,
      Broadcast applications.

I think bland declarative statements like "applications that transmit
video over RTP" tend to be more common than longer "for example"
listings, in this type of registration.

Section 8

nit: s/SPD/SDP in the section heading.

Lars Eggert Former IESG member

No Objection

No Objection (2021-06-11 for -16) Sent

All comments below are about very minor potential issues that you may choose to
address in some way - or ignore - as you see fit. Some were flagged by
automated tools (via https://github.com/larseggert/ietf-reviewtool), so there
will likely be some false positives. There is no need to let me know what you
did with these suggestions.

Section 2. , paragraph 8, nit:
> nit is the first (resp. last) byte of a RTP packet payload (excluding its pay
>                                       ^
Use "an" instead of "a" if the following word starts with a vowel sound, e.g.
"an article", "an hour". (Also elsewhere in the document.)

Section 2. , paragraph 17, nit:
> ferent slices can be decoded independently from each other. Note, however, t
>                              ^^^^^^^^^^^^^^^^^^
The usual collocation for "independently" is "of", not "from". Did you mean
"independently of"?

Martin Duke Former IESG member

No Objection

No Objection (2021-06-16 for -16) Sent

In the abstract and intro, it promises "end-to-end latency confined to a fraction of a frame".

I am not sure what to make of this guarantee. Latency is a measure of time and a frame is measured in ... bytes?

Moreover, end-to-end latency is mostly a property of the path, and not something an encoding format can promise.

Robert Wilton Former IESG member

No Objection

No Objection (for -16) Not sent

RTP Payload Format for ISO/IEC 21122 (JPEG XS) draft-ietf-payload-rtp-jpegxs-18

RTP Payload Format for ISO/IEC 21122 (JPEG XS)
draft-ietf-payload-rtp-jpegxs-18