Skip to main content

Video Codec Testing and Quality Measurement
draft-ietf-netvc-testing-09

Discuss


Yes

(Adam Roach)

No Objection

(Deborah Brungard)
(Ignas Bagdonas)

Abstain


No Record

Deb Cooley
Erik Kline
Francesca Palombini
Gunter Van de Velde
Jim Guichard
John Scudder
Mahesh Jethanandani
Murray Kucherawy
Orie Steele
Paul Wouters
Warren Kumari
Zaheduzzaman Sarker
Éric Vyncke

Summary: Needs a YES. Has a DISCUSS.

Roman Danyliw
Discuss
Discuss (2019-06-13 for -08) Sent
(1) There appear to be deep and implicit dependencies in the document to the references [DAALA-GIT] and [TESTSEQUENCES].  I applaud the intent to provide tangible advice on testing and evaluation to the community with them.  I have a few questions around their use.

(1.a) Why aren’t [DAALA-GIT] and [TESTSEQUENCES] normative references as they are needed to fully understand the testing approach and provide the test data?

(1.b) What should readers of the RFC do should these external references no longer be available?  How is the change control of these references handled?  

(1.c) In the case of [DAALA-GIT] which version of the code in the repo should be used?  Formally, what version of C is in that repo?   

(1.d) Per the observation that there are implicit assumptions made by the document about familiarity with [DAALA-GIT] and [TESTSEQUENCES], here are a few places where additional clarity is required:

-- Section 4.3, Per “For individual feature changes in libaom or libvpx , the overlap BD-Rate method with quantizers 20, 32, 43, and 55 must be used”, what are libaom and libvpx and what is their role?

-- Section 5.3.  Multiple subsection in 5.3.* list what look like settings for tools (e.g., “av1: -codec=av1 -ivf -frame-parallel=0 …”). What exactly are those?  How to read them/use them?

(2) The full details of some of the testing regimes need to be more fully specified (or cited as normative):
-- Section 3.1. The variable MAX is not explained in either equation.

-- Section 3.1.  This section doesn’t explain or provide a reference to calculate PSNR.  I’m not sure how to calculate or implement it.

-- Section 4.2.  Reference needed for Bjontegaard rate difference to explain its computation

-- The references [SSIM], [MSSIM], [CIEDE2000] and [VMAF] are needed to fully explain a given testing metric so they need to be normative

(3) An IANA Considerations section isn’t present in the document.

(4) A Security Considerations sections isn’t present in the document.
Comment (2019-06-13 for -08) Sent
A few comments:

(5) Consider qualifying the title to more accurately capture the substance of this draft “Video Codec Testing and Quality Measurement {using the Daala Tool Suite or Xiph Tools and Data}.”

(6) Section 3.1 and 3.3.  Cite a reference for the source code files names in question – dump_psnr.c and dump_pnsrhvs.c (which are somewhere in the [DAALA-GIT] repo?)

(7) Editorial Nits
-- Section 2.1.  Expand PMF (Probability Mass Function) on first use.

-- Section 2.1. Explain floor.

-- Section 2.2.  Typo.  s/vidoes/videos/

-- Section 2.2. Typo. s/rewatched/re-watched/

-- Section 2.3.  Typo.  s/comparisions/comparisons/

-- Section 3.1.  Expand PSNR (Peak signal to noise ratio) on first use.

-- Section 3.1.  Typo.  s/drived/derived/
Deb Cooley
No Record
Erik Kline
No Record
Francesca Palombini
No Record
Gunter Van de Velde
No Record
Jim Guichard
No Record
John Scudder
No Record
Mahesh Jethanandani
No Record
Murray Kucherawy
No Record
Orie Steele
No Record
Paul Wouters
No Record
Warren Kumari
No Record
Zaheduzzaman Sarker
No Record
Éric Vyncke
No Record
Benjamin Kaduk Former IESG member
Discuss
Discuss [Treat as non-blocking comment] (2019-06-12 for -08) Sent
I suspect I will end up balloting Abstain on this document, given how
far it is from something I could support publishing (e.g., a
freestanding clear description of test procedures), but I do think
there are some key issues that need to be resolved before publication.
Perhaps some of them stem from a misunderstanding of the intended goal
of the document -- I am reading this document as attempting to lay out
procedures that are of general utility in evaluating a codec or codecs,
but it is possible that (e.g.) it is intended as an informal summary of
some choices made in a specific operating environment to make a
specific decision.  Additional text to set the scope of the discussion
could go a long way.

Section 2

There's a lot of assertions here without any supporting evidence or
reasoning.  Why is subjective better than objective?  What if objective
gets a lot better in the future?  What if a test should be important but
the interested people don't have the qualifications and the qualified
people are too busy doing other things?

Section 2.1

Why is p<0.5 an appropriate criterion?  Even where p-values are still
used in the scientific literature (which is decreasing in popularity),
the threshold is more often 0.05, or even 0.00001 (e.g., for high-energy
physics).

Section 3

Normative C code contained outside of the RFC being published is hardly
an archival way to describe an algorithm.  There isn't even a git commit
hash listed to ensure that the referenced material doesn't change!

Section 3.5, 3.6, 3.7

I don't see how MSSSIM, CIEDE2000, VMAF, etc. are not normative
references.  If you want to use the indicated metric, you have to follow
the reference.

Section 4.2

There is a dearth of references here.  This document alone is far from
sufficient to perform these calculations.

Section 4.3

There is a dearth of references here as well.  What are libaom and
libvpx?  What is the overlap "BD-Rate method" and where is it specified?

Section 5.2

This mention of "[a]ll current test sets" seems to imply that this
document is part of a broader set of work.  The Introduction should make
clear what broader context this document is to be interpreted within.
(I only note this once in the Discuss portion, but noted some other
examples in the Comment section.)
Adam Roach Former IESG member
Yes
Yes (for -08) Unknown

                            
Alissa Cooper Former IESG member
(was Discuss) No Objection
No Objection (2020-02-07) Sent
Thank you for addressing my DISCUSS.

Please respond to the Gen-ART review.
Deborah Brungard Former IESG member
No Objection
No Objection (for -08) Not sent

                            
Ignas Bagdonas Former IESG member
No Objection
No Objection (for -08) Not sent

                            
Mirja Kühlewind Former IESG member
Abstain
Abstain (2019-06-05 for -08) Sent for earlier
Update: This document has no security considerations section, while having this section is required.

This document reads more like a user manual of the Daala tools repository (together with the test sequences). I wonder why this is not simply archived within the repo? What’s the benefit of having this in an RFC? Especially I’m worried that this document is basically useless in case the repo and test sequences disappear, and are therefore not available anymore in future, or change significantly. I understand that this is referenced by OAM and therefore publication is desired, however, I don't think that makes my concern about the standalone usefulness of this document invalid. If you really want to publish in the RFC series, I would recommend to reduce the dependencies to these repos and try to make this document more useful as a standalone test description (which would probably mean removing most of section 4 and adding some additional information to other parts).

Also, the shepherd write-up seems to indicate that this document has an IPR disclosure that was filed after WG last call. Is the wg aware of this? Has this been discussed in the wg?

Other more concrete comments:
1) Quick question on 2.1: Is the tester supposed to view one image after the other or both at the same time? And if one ofter the other, could the order impact the results (and should maybe be randomly chosen therfore)?

2) Sec 2.3: Would it make sense to provide a (normative) reference to MOS? Or is that supposed to be so well know that that is not even necessary? 

3) Sec 3.1: maybe spell out PSNR on first occurrence. And would it make sense to provide a reference for PSNR?

4) Sec 3.2: “ The weights used by the dump_pnsrhvs.c tool in
   the Daala repository have been found to be the best match to real MOS
   scores.”
Maybe document these weights in this document as well…?

5) Sec 5.3: Maybe spell out CQP at first occurrence