FFV1 Video Coding Format Version 0, 1, and 3

Note: This ballot was opened for revision 17 and is now closed.

Murray Kucherawy Yes

(Deborah Brungard) No Objection

(Alissa Cooper) No Objection

Comment (2020-09-23 for -17)
Glad to see this work finishing up.

I think it would be better if each figure were explained in the text (e.g., "..., as shown in Figure 9"), or if the figure numbers were dropped.

Roman Danyliw (was Discuss, No Objection) No Objection

Comment (2020-12-01 for -19)
No email
send info
(Sorry for any confusion with the updated ballot.  I didn't cut-and-paste all of the text.  COMMENT is the same.  DISCUSS is added.)

Thanks for responding to the SECDIR feedback (and thank you to Liang Xia for providing the review).

I support Barry Lieba’s Discuss position.

Thanks for addressing my DISCUSS. 

On the security considerations:

** Section 6.  Per the reference to [RFC4732], which selection is relevant here? Is it Section 2.1.1?

** Section 6.  The assertions about the security properties of [REFIMPL] don’t make sense to me in this document.  While it is extremely helpful that there is a high-quality reference implementation, it’s relevance to this spec isn’t clear.  This code isn’t normative.  Recommend removal all text after the paragraph “None of the content carried in FFV1 is intended to be executable”.

Martin Duke No Objection

Comment (2020-10-02 for -17)
No email
send info
I found this quite difficult to follow. An introductory section that explains the overall framework and how the various components fit in it would be very helpful.

Erik Kline No Objection

Comment (2020-09-21 for -17)
[[ comments ]]

[ section 3.3 ]

* Suggest either defining colorspace_type value and coder_type values
  before using them, or provide a reference in this section (i.e. just
  include a reference to sections 4.2.3 and 4.2.5).

[[ nits ]]

[ section 3.2 ]

* In the description of Figure 3:
  "relatively positions" -> "relatively positioned", I think.

[ section 3.3 ]

* "signed 16-bit signed integer": choose any one instance of 'signed', I'd say

[ section 3.7.2 ]

* "and then if used transparency" -> "and then, if used, transparency"

Warren Kumari No Objection

(Barry Leiba) (was Discuss) No Objection

Comment (2021-02-23)
Thanks very much for having addressed all my comments, and special thanks for the work in clearing up the meanings of things in Section  I think I do understand it now, and I think the work you've done in making that happen has made this a better document.

Alvaro Retana No Objection

Comment (2020-10-05 for -17)
The datatracker should indicate that this draft replaces draft-niedermayer-cellar-ffv1.  The indication appears on the tools page, but not the datatracker.

Éric Vyncke No Objection

Comment (2020-10-05 for -17)
Thank you for the work put into this document. I found it fascinating while it describes the codec algorithms.

Please find below a couple of non-blocking COMMENT points.

I also support Alissa's point about using figure numbers/legends.

I hope that this helps to improve the document,




-- Abstract --
May I suggest to expand/explain FFV1 on first use ? Still wondering what FFV1 means BTW...
Also, the lossless encoding is probably the key property of this codec and should probably be mentioned in the abstract.

-- Section 2 --
In an informational document, the use of BCP 14 is not required. Suggest to remove this part.

-- Section 7 --
Should it be included in section 8 rather than referenced to?

Robert Wilton No Objection

Comment (2020-10-05 for -17)

Thank you for taking the time to document FFV1 version 0, 1 and 3.

I support Barry's discuss in that I found this document hard to read and interpret.  I think that I would struggle to implement a FFV1 encoder/decoded from scratch based on this document.  However, this is a long way outside my area of expertise and there is perhaps a corpus of basic video codec knowledge that is assumed in this specification.

Is the intention of this document that it gets obsoleted when FFV1 version 4 is documented?

I haven't reviewed the entirety of this document, but I do have some comments of particular areas of the document that I found hard to follow that additional text or explanation may be helpful.

Overall, having some more introduction text explaining the overall structure of the encoding , i.e. how the different parts fit together would likely help readability.

3.1.  Border

      Figure 2: A depiction of FFV1's assumed border for a set example

I wasn't sure whether an extra row at the bottom of this table would have been helpful, but perhaps it is not required because it is not referenced.

3.2.  Samples

   The labels for these relative "Samples" are made of the first letters
   of the words Top, Left and Right.
Don't feel obliged to change this, but I wonder whether keep lowercase for all of the relative positions might have been clearer.  E.g., perhaps using "tt" instead of "T" and "ll" instead of "L".

3.3.  Median Predictor

   Exception for the median predictor ...

Possibly putting the exception text into a 3.3.1 sub-section would aid readability.

3.4.  Quantization Table Sets

It wasn't clear to me what a "Quantized Sample Differences" is.

3.7.2.  RGB

   Cb = b - g
   Cr = r - g
   Y = g + (Cb + Cr) >> 2
   g = Y - (Cb + Cr) >> 2
   r = Cr + g
   b = Cb + g
Perhaps split into two sets of 3 equations to define the relationship in either direction.

   Exception for the JPEG2000-RCT conversion ...
Again, putting this into a sub-section ( might aid readability, i.e. the split between what is desired vs what is being described due to bugs in real implementations.

3.8.  Coding of the Sample Difference

   coder_input = [(sample_difference + 2 ^ (bits - 1)) &
                 (2 ^ bits - 1)] - 2 ^ (bits - 1)
It wasn't clear to me what [] brackets meant here.  Range Binary Values

I found this hard to follow, as in I couldn't figure out what it means.  State Transition Table

It wasn't really clear to me what these were used for.  Signed Golomb Rice Codes

Unclear what is meant by "ESC case"


Benjamin Kaduk Abstain

Comment (2020-10-07 for -18)
I support Roman's Discuss.

I am balloting Abstain in that I am not confident that this
document has received sufficient review, but have not identified
specific flaws or areas of uncertainty that sould justify blocking
publication, and accordingly do not wish to block its publication.  In
particular, I am not confident that I myself understand how all the
pieces fit together -- even though some of my comments below might
indicate internal inconsistencies or other issues that would justify a
Discuss position, I am not fully confident in that, nor am I confident
that I could judge that they have been adequately addressed by any
proposed changes to the document's text.  (This is in large part related
to Barry's Discuss.)

That said, my review comments on the document appear below.

It might make sense to have a bit of discussion on what types of things
would (or would not) change in a new micro version, vs new "macro"
version, and what could be said to be "invariants" of the format overall.

Should we give more treatment (or even an IANA registry) for how
extensibility will be achieved for (e.g.) new "coder_type" and
"colorspace_type" values?


   "get_bits( i )" is the action to read the next "i" bits in the
   bitstream, from most significant bit to least significant bit, and to
   return the corresponding value.  [...]

[I guess this "value" is always in the form of an unsigned integer.]

Section 3.3

   It is expected (to be confirmed) to remove this exception for the
   median predictor in the next version of the FFV1 bitstream.

When can/will the confirmation process happen?  When version 4 is

Section 3.6

The discussion seems to imply that the chroma planes are not used in
FFV4 (and so the check for "version <= 3" in the logic here is just for
future compatibility); is that correct?

Section 3.7

   color space.  In YCbCr for each "Plane", each "Line" is coded from
   top to bottom and for each "Line", each "Sample" is coded from left
   to right.  In JPEG2000-RCT for each "Line" from top to bottom, each

Is there a reason that we use "JPEG2000-RCT" in the prose here, but
"RGB" for the section 3.7.2 heading?


The defined terms do not cover all terms used (so I conclude that there
are unnamed intermediate values) and the formulae themself do not do a
great job of conveying where the overall inputs/outputs are, thus the
flow of the encoding logic.  Captions on the figures might help, which
could include indications of which intermediate values from from formula
to formula.  It also seems like some of the intermediate values could be
usefully named, e.g., R_(i) seems to serve the role of the remaining
range at a given step.


Does "non binary" mean "non-integer", i.e., floating point?
We don't seem to use the "non binary" term elsewhere in the document.

I'm also not sure I'm reading the formula properly -- state is a uint8_t
but the indexing into it seems to make more sense to me if treated as
bit indexing, not byte indexing.  (E.g., 'a' is only doubled when
iterating through 22..31, indicating bitwise representation.)

       int e = 0;
       while (get_rac(c, state + 1 + min(e, 9)) { //1..10

It looks like this enters an infinite loop if get_rac(c, state + 10)
returns true; is there something to enforce that this does not happen?

   "get_rac" returns a boolean, computed from the bytestream as
   described in Section

I suggest saying a little bit more about how this works, e.g., that the
boolean is a single-bit integer value, and how this functional notation
maps to the interfaces of the expressions in Section


What is 'bits' (the argument to the last get_bits() call) in Figure 20?
(Is it the same "bits_per_raw_sample" from §3.8?  If so, that
description should be qualified to apply more broadly than just "the
equation below".)


What are 'x' and 'w' (and how are they set)?

What are the semantics of "run_mode" values 1 vs 2?


I guess it's implicit that "output_number" is of the appropriate (width)
type for the desired output bit width.


I guess "i += i" vs "i *= 2" is just a matter of style.

Where does the 'bits' input to sign_extend() come from?
(Is it the same "bits_per_raw_sample" from §3.8?  If so, that
description should be qualified to apply more broadly than just "the
equation below".)


I suggest rewording or using a section reference for "normal difference
coding" as the phrase "difference coding" does not suffice (via text
search) to find the referenced procedure, since this is the only
instance of the phrase in the document.

Section 4.1

I'm not sure how to parse "number of equal entries -1".  In combination
with the example I can guess what is meant, but I still can't parse the

The "Table" listed in the example does not seem to conform to "second
half [...] is identical to the first with flipped sign".

Also, the QuantizationTable() pseudocode suggests that the Table should
have 256 entries, but this example only has 16.  (That may be fine for
example purposes, but the difference should be called out.)

Section 4.2

C uses 0-indexed arrays, but the pseudocode seems to use 1-indexed
arrays (e.g., state_transition_delta[]).  Perhaps this is noteworthy.

Section 4.5

If 'reserved' is only present when version <=1, it seems unlikely that a
revision of this specification would assign semantics to it.

Section 4.6.x

nit: "if not present" could be taken to suggest that this field is
optional at field-level granularity (vs. at header-/version-level
granularity); "if header not present" might be slightly more clear.

Section 6

If we are attempting to differentiate between "format" and "codec", are
the uses of "codec" in this section all correct?

It may (or may not) be worth calling out the importance of internal
consistency within an implementation as to whether the Parameters()
appear in the ConfigurationRecord() or in each Frame().

   In all of the conditions above, the decoder and encoder was run
   inside the [VALGRIND] memory debugger as well as clangs address
   sanitizer [Address-Sanitizer], which track reads and writes to

nit: "clang's" with apostrophe.

Section 10

I don't see a particular reason that RFC 6716 needs to be a normative

(I also agree with Roman that there doesn't seem to be need to reference
both C90 and C18.)

Appendix A, B

It is surprising to see BCP 14 keywords in these nominally informative