FFV1 Video Coding Format Version 0, 1, and 3
draft-ietf-cellar-ffv1-18

Summary: Has 2 DISCUSSes. Has enough positions to pass once DISCUSS positions are resolved.

Roman Danyliw (was No Objection) Discuss

Discuss (2020-10-07)
A simple clarification.

Section 6.
   Implementations of the FFV1 codec need to take appropriate security
   considerations into account, as outlined in [RFC4732]

RFC4732 only covers DoS.  A buffer overflow (as described in the subsequent text of this paragraph) in a codec implementation could have dramatically more significant consequences for the endpoint (or the services it provides) than a DoS.  It could potentially lead to arbitrary remote code execution on the system (barring defensive mitigations provided by sandboxing in the app, OS execution protections; and/or end-point protection software) which pretty much enables an attacker to do anything of their choosing on the system.  Please note that.
Comment (2020-10-21)
No email
send info
(Sorry for any confusion with the updated ballot.  I didn't cut-and-paste all of the text.  COMMENT is the same.  DISCUSS is added.)

Thanks for responding to the SECDIR feedback (and thank you to Liang Xia for providing the review).

I support Barry Lieba’s Discuss position.

Thanks for addressing some of the comments in -18. 

On the security considerations:

** Section 6.  Per the reference to [RFC4732], which selection is relevant here? Is it Section 2.1.1?

** Section 6.  The assertions about the security properties of [REFIMPL] don’t make sense to me in this document.  While it is extremely helpful that there is a high-quality reference implementation, it’s relevance to this spec isn’t clear.  This code isn’t normative.  Recommend removal all text after the paragraph “None of the content carried in FFV1 is intended to be executable”.

Barry Leiba Discuss

Discuss (2020-09-18 for -17)
There are a number of things I’d like to discuss, because I find them not understandable.  Perhaps it’s simply because I’m not a codec expert, but, while I understand that this is written for readers who will actually be implementing he FFV1 codec, some of them will also not be “experts”.  That said, I’m sure some of this is just a case of “give Barry some clue and it’s fine.”

— Section 2.1 —
You use the term “symbol” here and later, without defining it, and I don’t know what it is.  A byte?  A character?  A string of bits?  What length?

— Section 3.8.1.1 —
I’m not sure how to interpret the stuff in this section.  First, I don’t know why there are seven “figures”, with no captions nor other explanation.  Second, I’m having trouble making sense out of things like this:

   S_(i + 1, C_(i)) =  zero_state_(S_(i, C_(i)))  AND
              l_(i) =  L_(i)                      AND
              t_(i) =  R_(i) - r_(i)              <==
              b_(i) =  0                          <==>
              L_(i) <  R_(i) - r_(i)

Can you explain how this is meant to be read?  Maybe it’s just me, and maybe after you explain it I’ll whack myself on the head and say, “Doh!”

Third, you say, “S_(0, i) is the i-th initial state,” but you haven’t previously introduced the term “state”, and I don’t know what it means.

— Section 3.8.1.2 —

   "get_rac" returns a boolean, computed from the bytestream as
   described in Section 3.8.1.1.

I see nothing in Section 3.8.1.1 that describes get_rac.

— Section 3.8.1.3 —

   At keyframes all Range coder state variables are set to their initial
   state.

What does “at keyframes” mean?

— Section 3.8.1.5 —
This is just a list of numbers with no explanation.  It needs text explaining what it means.

— Section 3.8.2.2 —

   The level is identical to the predicted one.
   The run and the first different level are coded.

What does “level” mean?  It’s not defined anywhere.

— Section 4.3.1 —

   "reserved_for_future_use" has semantics that are reserved for future
   use.

Yes, that seems rather obvious, though it’s oddly worded.  But then you say what to do with “this value”, and nowhere do you say what the value is.  How does one know?
Comment (2020-09-18 for -17)
Some editorial comments that I ask you to please consider:

General: It’s jarring to see the defined terms, such as “Sample” and “Pixel” in quotes every time they’re used.  More common practice is to quote them when they’re first defined, and then to just use the capitalization to show that they’re defined terms.  I find that the quotes affect the readability and make it somewhat more of a challenge to read.

— Section 2.1 —

   "Plane": A discrete component of a static image comprised of

Total, total nit, and pet peeve: “composed of” or “comprising”, but not “comprised of”.

— Section 2.2.2 —
You’re missing “a % b”, which is referenced in Section 2.2.6.

— Section 2.2.4 —

   "a > b" means a is greater than b.
   "a >= b" means a is greater than or equal to b.
   "a < b" means a is less than b.
   "a <= b" means a is less than or equal b.
   "a == b" means a is equal to b.
   "a != b" means a is not equal to b.
   
I don’t think anyone will misunderstand this, so it’s a very small thing, but… might this be better said as, ‘ "a > b" is true when a is greater than b. ‘ (and similarly for the rest)?

— Section 2.2.5 —

   "min(a,b)" means the smallest of two values a and b.
   "max(a,b)" means the largest of two values a and b.

Nit: “smaller” and “larger”, for only two.

— Section 2.2.7 —

   "a...b" means any value starting from a to b, inclusive.

“starting”?  I would leave that word out.

— Section 2.2.9.1 —

   "remaining_bits_in_bitstream( )" means the count of remaining bits

Section 2.2.9.3 uses remaining_bits_in_bitstream( NumBytes ), as do Sections 4.4 and 4.5.  This should too, yes?

— Section 3.2 —

       Figure 3: A depiction of how relatively positions Samples are
                      references within this document.

I think you mean “positioned” and “referenced”, yes?

— Section 3.3 —

   Background: a two's complement signed 16-bit signed integer was used

I’d remove the first “signed”.

— Section 3.8.2 —

   The end of the bitstream of the "Frame" is filled with 0-bits until
   that the bitstream contains a multiple of 8 bits.

Is “that” just an extra word that needs to be struck, or is something else wrong here?  And is this meant to be what we usually call “padding”?

— Section 4 —

   Within the following sub-sections, pseudo-code is used to explain the
   structure of each FFV1 bitstream component, as described in
   Section 2.2.1.

I read this as saying that Section 2.2.1 describes the structure of each FFV1 bitstream component, until I went back there and looked.  I suggest reordering the sentence to help the reader a bit:

NEW
   Within the following sub-sections, pseudo-code is used, as described
   in Section 2.2.1, to explain the structure of each FFV1 bitstream
   component.
END

— Section 4.5 —

   Encoders SHOULD NOT fill these bits.
   Decoders SHOULD ignore these bits.

Before this you describe two kinds of bits, and it’s not clear whether these are talking about both kinds, or just the latter.  I think changing “these” to “reserved” is better.

— Section 6 —

   A frequent security problem in image
   and video codecs is also to not check for integer overflows, for
   example to allocate "frame_pixel_width * frame_pixel_height" in
   "Pixel" count computations without considering that the
   multiplication result may have overflowed the arithmetic types range.

This is somewhat complex and awkward (arguably a comma splice), and I suggest splitting the sentence thus:

NEW
   A frequent security problem in image
   and video codecs is failure to check for integer overflows.  An
   example is allocating "frame_pixel_width * frame_pixel_height" in
   "Pixel" count computations without considering that the
   multiplication result may have overflowed the arithmetic types range.
END

I might also make the “must” in the last sentence of that paragraph be “MUST”.

Murray Kucherawy Yes

Deborah Brungard No Objection

Alissa Cooper No Objection

Comment (2020-09-23 for -17)
Glad to see this work finishing up.

I think it would be better if each figure were explained in the text (e.g., "..., as shown in Figure 9"), or if the figure numbers were dropped.

Martin Duke No Objection

Comment (2020-10-02 for -17)
No email
send info
I found this quite difficult to follow. An introductory section that explains the overall framework and how the various components fit in it would be very helpful.

Erik Kline No Objection

Comment (2020-09-21 for -17)
[[ comments ]]

[ section 3.3 ]

* Suggest either defining colorspace_type value and coder_type values
  before using them, or provide a reference in this section (i.e. just
  include a reference to sections 4.2.3 and 4.2.5).


[[ nits ]]

[ section 3.2 ]

* In the description of Figure 3:
  "relatively positions" -> "relatively positioned", I think.

[ section 3.3 ]

* "signed 16-bit signed integer": choose any one instance of 'signed', I'd say

[ section 3.7.2 ]

* "and then if used transparency" -> "and then, if used, transparency"

Warren Kumari No Objection

Alvaro Retana No Objection

Comment (2020-10-05 for -17)
The datatracker should indicate that this draft replaces draft-niedermayer-cellar-ffv1.  The indication appears on the tools page, but not the datatracker.

Éric Vyncke No Objection

Comment (2020-10-05 for -17)
Thank you for the work put into this document. I found it fascinating while it describes the codec algorithms.

Please find below a couple of non-blocking COMMENT points.

I also support Alissa's point about using figure numbers/legends.

I hope that this helps to improve the document,

Regards,

-éric

== COMMENTS ==

-- Abstract --
May I suggest to expand/explain FFV1 on first use ? Still wondering what FFV1 means BTW...
Also, the lossless encoding is probably the key property of this codec and should probably be mentioned in the abstract.

-- Section 2 --
In an informational document, the use of BCP 14 is not required. Suggest to remove this part.

-- Section 7 --
Should it be included in section 8 rather than referenced to?

Robert Wilton No Objection

Comment (2020-10-05 for -17)
Hi,

Thank you for taking the time to document FFV1 version 0, 1 and 3.

I support Barry's discuss in that I found this document hard to read and interpret.  I think that I would struggle to implement a FFV1 encoder/decoded from scratch based on this document.  However, this is a long way outside my area of expertise and there is perhaps a corpus of basic video codec knowledge that is assumed in this specification.

Is the intention of this document that it gets obsoleted when FFV1 version 4 is documented?

I haven't reviewed the entirety of this document, but I do have some comments of particular areas of the document that I found hard to follow that additional text or explanation may be helpful.

Overall, having some more introduction text explaining the overall structure of the encoding , i.e. how the different parts fit together would likely help readability.

3.1.  Border

      Figure 2: A depiction of FFV1's assumed border for a set example
                                  Samples.

I wasn't sure whether an extra row at the bottom of this table would have been helpful, but perhaps it is not required because it is not referenced.


3.2.  Samples

   The labels for these relative "Samples" are made of the first letters
   of the words Top, Left and Right.
   
Don't feel obliged to change this, but I wonder whether keep lowercase for all of the relative positions might have been clearer.  E.g., perhaps using "tt" instead of "T" and "ll" instead of "L".

3.3.  Median Predictor

   Exception for the median predictor ...

Possibly putting the exception text into a 3.3.1 sub-section would aid readability.

3.4.  Quantization Table Sets

It wasn't clear to me what a "Quantized Sample Differences" is.


3.7.2.  RGB

   Cb = b - g
   Cr = r - g
   Y = g + (Cb + Cr) >> 2
   g = Y - (Cb + Cr) >> 2
   r = Cr + g
   b = Cb + g
   
Perhaps split into two sets of 3 equations to define the relationship in either direction.

   Exception for the JPEG2000-RCT conversion ...
   
Again, putting this into a sub-section (3.7.2.1) might aid readability, i.e. the split between what is desired vs what is being described due to bugs in real implementations.


3.8.  Coding of the Sample Difference

   coder_input = [(sample_difference + 2 ^ (bits - 1)) &
                 (2 ^ bits - 1)] - 2 ^ (bits - 1)
                 
It wasn't clear to me what [] brackets meant here.

3.8.1.1.  Range Binary Values

I found this hard to follow, as in I couldn't figure out what it means.

3.8.1.4.  State Transition Table

It wasn't really clear to me what these were used for.

3.8.2.1.  Signed Golomb Rice Codes

Unclear what is meant by "ESC case"

Regards,
Rob

Benjamin Kaduk Abstain

Comment (2020-10-07)
I support Roman's Discuss.

I am balloting Abstain in that I am not confident that this
document has received sufficient review, but have not identified
specific flaws or areas of uncertainty that sould justify blocking
publication, and accordingly do not wish to block its publication.  In
particular, I am not confident that I myself understand how all the
pieces fit together -- even though some of my comments below might
indicate internal inconsistencies or other issues that would justify a
Discuss position, I am not fully confident in that, nor am I confident
that I could judge that they have been adequately addressed by any
proposed changes to the document's text.  (This is in large part related
to Barry's Discuss.)

That said, my review comments on the document appear below.

It might make sense to have a bit of discussion on what types of things
would (or would not) change in a new micro version, vs new "macro"
version, and what could be said to be "invariants" of the format overall.

Should we give more treatment (or even an IANA registry) for how
extensibility will be achieved for (e.g.) new "coder_type" and
"colorspace_type" values?

Section 2.2.9.4

   "get_bits( i )" is the action to read the next "i" bits in the
   bitstream, from most significant bit to least significant bit, and to
   return the corresponding value.  [...]

[I guess this "value" is always in the form of an unsigned integer.]

Section 3.3

   It is expected (to be confirmed) to remove this exception for the
   median predictor in the next version of the FFV1 bitstream.

When can/will the confirmation process happen?  When version 4 is
finalized?

Section 3.6

The discussion seems to imply that the chroma planes are not used in
FFV4 (and so the check for "version <= 3" in the logic here is just for
future compatibility); is that correct?

Section 3.7

   color space.  In YCbCr for each "Plane", each "Line" is coded from
   top to bottom and for each "Line", each "Sample" is coded from left
   to right.  In JPEG2000-RCT for each "Line" from top to bottom, each

Is there a reason that we use "JPEG2000-RCT" in the prose here, but
"RGB" for the section 3.7.2 heading?

Section 3.8.1.1

The defined terms do not cover all terms used (so I conclude that there
are unnamed intermediate values) and the formulae themself do not do a
great job of conveying where the overall inputs/outputs are, thus the
flow of the encoding logic.  Captions on the figures might help, which
could include indications of which intermediate values from from formula
to formula.  It also seems like some of the intermediate values could be
usefully named, e.g., R_(i) seems to serve the role of the remaining
range at a given step.

Section 3.8.1.2

Does "non binary" mean "non-integer", i.e., floating point?
We don't seem to use the "non binary" term elsewhere in the document.

I'm also not sure I'm reading the formula properly -- state is a uint8_t
but the indexing into it seems to make more sense to me if treated as
bit indexing, not byte indexing.  (E.g., 'a' is only doubled when
iterating through 22..31, indicating bitwise representation.)

       int e = 0;
       while (get_rac(c, state + 1 + min(e, 9)) { //1..10
           e++;
       }

It looks like this enters an infinite loop if get_rac(c, state + 10)
returns true; is there something to enforce that this does not happen?

   "get_rac" returns a boolean, computed from the bytestream as
   described in Section 3.8.1.1.

I suggest saying a little bit more about how this works, e.g., that the
boolean is a single-bit integer value, and how this functional notation
maps to the interfaces of the expressions in Section 3.8.1.1.

Section 3.8.2.1

What is 'bits' (the argument to the last get_bits() call) in Figure 20?
(Is it the same "bits_per_raw_sample" from §3.8?  If so, that
description should be qualified to apply more broadly than just "the
equation below".)

Section 3.8.2.2.1

What are 'x' and 'w' (and how are they set)?

What are the semantics of "run_mode" values 1 vs 2?

Section 3.8.2.3

I guess it's implicit that "output_number" is of the appropriate (width)
type for the desired output bit width.

Section 3.8.2.4

I guess "i += i" vs "i *= 2" is just a matter of style.

Where does the 'bits' input to sign_extend() come from?
(Is it the same "bits_per_raw_sample" from §3.8?  If so, that
description should be qualified to apply more broadly than just "the
equation below".)

Section 3.8.2.4.1

I suggest rewording or using a section reference for "normal difference
coding" as the phrase "difference coding" does not suffice (via text
search) to find the referenced procedure, since this is the only
instance of the phrase in the document.

Section 4.1

I'm not sure how to parse "number of equal entries -1".  In combination
with the example I can guess what is meant, but I still can't parse the
text.

The "Table" listed in the example does not seem to conform to "second
half [...] is identical to the first with flipped sign".

Also, the QuantizationTable() pseudocode suggests that the Table should
have 256 entries, but this example only has 16.  (That may be fine for
example purposes, but the difference should be called out.)

Section 4.2

C uses 0-indexed arrays, but the pseudocode seems to use 1-indexed
arrays (e.g., state_transition_delta[]).  Perhaps this is noteworthy.

Section 4.5

If 'reserved' is only present when version <=1, it seems unlikely that a
revision of this specification would assign semantics to it.

Section 4.6.x

nit: "if not present" could be taken to suggest that this field is
optional at field-level granularity (vs. at header-/version-level
granularity); "if header not present" might be slightly more clear.

Section 6

If we are attempting to differentiate between "format" and "codec", are
the uses of "codec" in this section all correct?

It may (or may not) be worth calling out the importance of internal
consistency within an implementation as to whether the Parameters()
appear in the ConfigurationRecord() or in each Frame().

   In all of the conditions above, the decoder and encoder was run
   inside the [VALGRIND] memory debugger as well as clangs address
   sanitizer [Address-Sanitizer], which track reads and writes to

nit: "clang's" with apostrophe.

Section 10

I don't see a particular reason that RFC 6716 needs to be a normative
reference.

(I also agree with Roman that there doesn't seem to be need to reference
both C90 and C18.)

Appendix A, B

It is surprising to see BCP 14 keywords in these nominally informative
sections.

Martin Vigoureux No Record

Magnus Westerlund No Record