Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.

Video Codec Testing and Quality Measurement
draft-ietf-netvc-testing-09

Discuss

Roman Danyliw

(Benjamin Kaduk)

Yes

(Adam Roach)

No Objection

(Alissa Cooper)

(Deborah Brungard)

(Ignas Bagdonas)

Abstain

(Mirja Kühlewind)

No Record

Deb Cooley

Erik Kline

Francesca Palombini

Gunter Van de Velde

Jim Guichard

John Scudder

Mahesh Jethanandani

Murray Kucherawy

Orie Steele

Paul Wouters

Warren Kumari

Zaheduzzaman Sarker

Éric Vyncke

Summary: Needs a YES. Has a DISCUSS.

Roman Danyliw

Discuss

Discuss (2019-06-13 for -08) Sent

(1) There appear to be deep and implicit dependencies in the document to the references [DAALA-GIT] and [TESTSEQUENCES].  I applaud the intent to provide tangible advice on testing and evaluation to the community with them.  I have a few questions around their use.

(1.a) Why aren’t [DAALA-GIT] and [TESTSEQUENCES] normative references as they are needed to fully understand the testing approach and provide the test data?

(1.b) What should readers of the RFC do should these external references no longer be available?  How is the change control of these references handled?  

(1.c) In the case of [DAALA-GIT] which version of the code in the repo should be used?  Formally, what version of C is in that repo?   

(1.d) Per the observation that there are implicit assumptions made by the document about familiarity with [DAALA-GIT] and [TESTSEQUENCES], here are a few places where additional clarity is required:

-- Section 4.3, Per “For individual feature changes in libaom or libvpx , the overlap BD-Rate method with quantizers 20, 32, 43, and 55 must be used”, what are libaom and libvpx and what is their role?

-- Section 5.3.  Multiple subsection in 5.3.* list what look like settings for tools (e.g., “av1: -codec=av1 -ivf -frame-parallel=0 …”). What exactly are those?  How to read them/use them?

(2) The full details of some of the testing regimes need to be more fully specified (or cited as normative):
-- Section 3.1. The variable MAX is not explained in either equation.

-- Section 3.1.  This section doesn’t explain or provide a reference to calculate PSNR.  I’m not sure how to calculate or implement it.

-- Section 4.2.  Reference needed for Bjontegaard rate difference to explain its computation

-- The references [SSIM], [MSSIM], [CIEDE2000] and [VMAF] are needed to fully explain a given testing metric so they need to be normative

(3) An IANA Considerations section isn’t present in the document.

(4) A Security Considerations sections isn’t present in the document.

Comment (2019-06-13 for -08) Sent

A few comments:

(5) Consider qualifying the title to more accurately capture the substance of this draft “Video Codec Testing and Quality Measurement {using the Daala Tool Suite or Xiph Tools and Data}.”

(6) Section 3.1 and 3.3.  Cite a reference for the source code files names in question – dump_psnr.c and dump_pnsrhvs.c (which are somewhere in the [DAALA-GIT] repo?)

(7) Editorial Nits
-- Section 2.1.  Expand PMF (Probability Mass Function) on first use.

-- Section 2.1. Explain floor.

-- Section 2.2.  Typo.  s/vidoes/videos/

-- Section 2.2. Typo. s/rewatched/re-watched/

-- Section 2.3.  Typo.  s/comparisions/comparisons/

-- Section 3.1.  Expand PSNR (Peak signal to noise ratio) on first use.

-- Section 3.1.  Typo.  s/drived/derived/

Deb Cooley

No Record

Erik Kline

No Record

Francesca Palombini

No Record

Gunter Van de Velde

No Record

Jim Guichard

No Record

John Scudder

No Record

Mahesh Jethanandani

No Record

Murray Kucherawy

No Record

Orie Steele

No Record

Paul Wouters

No Record

Warren Kumari

No Record

Zaheduzzaman Sarker

No Record

Éric Vyncke

No Record

Benjamin Kaduk Former IESG member

Discuss

Discuss [Treat as non-blocking comment] (2019-06-12 for -08) Sent

I suspect I will end up balloting Abstain on this document, given how
far it is from something I could support publishing (e.g., a
freestanding clear description of test procedures), but I do think
there are some key issues that need to be resolved before publication.
Perhaps some of them stem from a misunderstanding of the intended goal
of the document -- I am reading this document as attempting to lay out
procedures that are of general utility in evaluating a codec or codecs,
but it is possible that (e.g.) it is intended as an informal summary of
some choices made in a specific operating environment to make a
specific decision.  Additional text to set the scope of the discussion
could go a long way.

Section 2

There's a lot of assertions here without any supporting evidence or
reasoning.  Why is subjective better than objective?  What if objective
gets a lot better in the future?  What if a test should be important but
the interested people don't have the qualifications and the qualified
people are too busy doing other things?

Section 2.1

Why is p<0.5 an appropriate criterion?  Even where p-values are still
used in the scientific literature (which is decreasing in popularity),
the threshold is more often 0.05, or even 0.00001 (e.g., for high-energy
physics).

Section 3

Normative C code contained outside of the RFC being published is hardly
an archival way to describe an algorithm.  There isn't even a git commit
hash listed to ensure that the referenced material doesn't change!

Section 3.5, 3.6, 3.7

I don't see how MSSSIM, CIEDE2000, VMAF, etc. are not normative
references.  If you want to use the indicated metric, you have to follow
the reference.

Section 4.2

There is a dearth of references here.  This document alone is far from
sufficient to perform these calculations.

Section 4.3

There is a dearth of references here as well.  What are libaom and
libvpx?  What is the overlap "BD-Rate method" and where is it specified?

Section 5.2

This mention of "[a]ll current test sets" seems to imply that this
document is part of a broader set of work.  The Introduction should make
clear what broader context this document is to be interpreted within.
(I only note this once in the Discuss portion, but noted some other
examples in the Comment section.)

Adam Roach Former IESG member

Yes

Yes (for -08) Unknown

Alissa Cooper Former IESG member

(was Discuss) No Objection

No Objection (2020-02-07) Sent

Thank you for addressing my DISCUSS.

Please respond to the Gen-ART review.

Deborah Brungard Former IESG member

No Objection

No Objection (for -08) Not sent

Ignas Bagdonas Former IESG member

No Objection

No Objection (for -08) Not sent

Mirja Kühlewind Former IESG member

Abstain

Abstain (2019-06-05 for -08) Sent for earlier

Update: This document has no security considerations section, while having this section is required.

This document reads more like a user manual of the Daala tools repository (together with the test sequences). I wonder why this is not simply archived within the repo? What’s the benefit of having this in an RFC? Especially I’m worried that this document is basically useless in case the repo and test sequences disappear, and are therefore not available anymore in future, or change significantly. I understand that this is referenced by OAM and therefore publication is desired, however, I don't think that makes my concern about the standalone usefulness of this document invalid. If you really want to publish in the RFC series, I would recommend to reduce the dependencies to these repos and try to make this document more useful as a standalone test description (which would probably mean removing most of section 4 and adding some additional information to other parts).

Also, the shepherd write-up seems to indicate that this document has an IPR disclosure that was filed after WG last call. Is the wg aware of this? Has this been discussed in the wg?

Other more concrete comments:
1) Quick question on 2.1: Is the tester supposed to view one image after the other or both at the same time? And if one ofter the other, could the order impact the results (and should maybe be randomly chosen therfore)?

2) Sec 2.3: Would it make sense to provide a (normative) reference to MOS? Or is that supposed to be so well know that that is not even necessary?

3) Sec 3.1: maybe spell out PSNR on first occurrence. And would it make sense to provide a reference for PSNR?

4) Sec 3.2: “ The weights used by the dump_pnsrhvs.c tool in
the Daala repository have been found to be the best match to real MOS
scores.”
Maybe document these weights in this document as well…?

5) Sec 5.3: Maybe spell out CQP at first occurrence

Video Codec Testing and Quality Measurement draft-ietf-netvc-testing-09

Video Codec Testing and Quality Measurement
draft-ietf-netvc-testing-09