RTP Payload for Timed Text Markup Language (TTML)
RFC 8759

Note: This ballot was opened for revision 03 and is now closed.

Barry Leiba Yes

(Adam Roach) (was Discuss) Yes

Comment (2019-11-07 for -05)
No email
send info
Thanks for addressing my DISCUSS and COMMENT points. I have preserved
them below for posterity.


Thanks for the work everyone put into this document. I think it's not quite
ready to publish, due to one ambiguity, one critical missing feature,
and the lack of guidance around fragmentation. I also have two comments that I
consider very important, although they don't quite rise to the level of blocking

As always, it's possible that my DISCUSS points are off-base, and I'd be
happy to be corrected if I've misunderstood anything here.



>     When the document spans more
>     than one RTP packet, the entire document is obtained by
>     concatenating User Data Words from each contributing packet in
>     ascending order of Sequence Number.

This is underspecified, in that it doesn't make it clear whether it would be
valid to split a single UTF-8 or UTF-16 character between RTP packets, and it
is nearly certain that different implementations will make different
assumptions on this point, leading to interop failures. For example, the UTF-8
encoding of '¢' is 0xC2 0xA2. Would it be valid to place the "0xC2" in one
packet and the "0xA2" in a subsequent packet?

Without specifying this, it is quite likely that some implementations will
use, e.g., UTF-8 strings to accumulate the contents of RTP packets; and most
such libraries will emit errors or exhibit unexpected behavior if units of
less than a character are added at any time.  (The same point holds for
splitting a UTF-16 byte across packets).

I don't think it much matters which choice you make (explicitly allowing
or explicitly forbidding splitting characters between packets), but it
does need to be explicit. I have a slight personal preference for requiring
that characters cannot be split (both for ease of implementation on the
receiving end and to more smoothly handle missing data due to extended packet
loss), but leave it to the authors and working group to decide.


Unlike other definitions to convey non-loss-resilient data on RTP streams, this
document had no defined mechanism to deal with packet loss. This makes it
unusable on the public Internet, where packet loss is an inevitable feature
of the network. The existing text-in-RTP specifications define procedures to
deal with such loss (see, e.g., RFC 4103 section 4 and RFC 4396 section 5).


This format is rather unique in that it, alone among all other RTP text
formats, is designed to send monolithic documents that may stretch into the
multiple kilobyte range.  While fragmentation is mentioned as a possibility,
the document provides no implementation guidance about when to fragment
documents, and what sizes each fragment should assume. RFC 4396 section 4.4 is
an example of the kind of information I would expect to see in a document like
this, with emphasis on the fact that TTML documents are going to frequently
exceed the PTMU for a typical network connection.



>  TTML (Timed Text Markup Language)[TTML2] is a media type for
>  describing timed text such as closed captions (also known as
>  subtitles) in television workflows or broadcasts as XML.

Although superficially similar, there are important distinctions between
subtitles (intended to help a hearing audience exclusively with spoken dialog,
typically because the audio is in a different language or otherwise difficult to
understand) and closed captions (intended to aid deaf or hard-of-hearing
viewers by providing a direct, word-for-word transcription of dialog as well
as descriptions of all other audio present). Calling one "also known as" the
other is incorrect.

I suggest rephrasing as:

   TTML (Timed Text Markup Language)[TTML2] is a media type for
   describing timed text such as closed captions and subtitles
   in television workflows or broadcasts as XML.



>  The TTML document instance MUST use the "media" value of the
>  "ttp:timeBase" parameter attribute on the root element.

This statement makes an assumption that the
"http://www.w3.org/ns/ttml#parameter" namespace MUST be mapped to the "ttp"
prefix, which is both bad form and probably not what is intended. I suggest
rephrasing as:

   The TTML document instance MUST include a "timeBase" element from
   the "http://www.w3.org/ns/ttml#parameter" namespace containing
   the value "media".

(Ignas Bagdonas) No Objection

Deborah Brungard No Objection

Alissa Cooper No Objection

Comment (2019-10-14 for -03)
No email
send info
I would recommend starting some new top-level sections within what is currently Section 4.2, rather than going down to six levels of subsections (, which can get confusing when other people are citing parts of this document.

Please respond to the Gen-ART review.

Roman Danyliw No Objection

Benjamin Kaduk No Objection

Comment (2019-10-16 for -03)
Thanks for this clear and well-written document!

Section 2

   The term "word" refers to byte aligned or 32-bit aligned words of
   data in a computing sense and not to refer to linguistic words that
   might appear in the transported text.

Either of byte-aligned and 4-byte-aligned, as opposed to aligned to one
of those and in multiples of the other in length?

Section 4

I find myself feeling like I would benefit from a brief discussion of
the relationship between documents and the RTP stream before getting
into the details of the payload format (e.g., "one document per
subtitle", "many documents per stream but each document contains some
minutes of data", or "totally up to the profile in use").  Even having
finished the I-D I'm still wondering: it's clear that we only have
a single TTLM stream in a given RTP stream, and a given RTP packet has
(part of) a TTML document in the epoch of the timestamp of the RTP
packet, and I can only have one document active at a time.  On the
flip side, different documents must belong to different epochs.  So it
seems that I could either make large documents stuck on a single
timestamp, or small documents with (relatively) rapidly advancing
timestamps, regardless of how I need to actually split the TTML content
into packets in order to meet MTU requirements (and possibly packet
pacing ones).  Given that this is RTP and we're used to ignoring things
with old timestamps, I mostly expect the latter to be more common, but
would appreciate some guidance in the document [sic].  This seems to
roughly be Adam's third Discuss point.


   If the TTML document payload is assessed to be invalid then it MUST
   be discarded.  When processing a valid document, the following
   requirements apply.

Does this imply that I have to wait for the entire document to arrive
before I start processing it?

   Each TTML document becomes active at the epoch E.  E MUST be set to

nit: I suggest s/the/its/, since there is not a global distinguished

Most of the security considerations I can think of apply more to the
TTML format itself rather than the RTP payload.  I might include a short
note that the text contents are meant to be interpreted by a human, and
content from untrusted sources should be viewed with appropriate levels
of skepticism.

(Suresh Krishnan) No Objection

Warren Kumari No Objection

Comment (2019-10-15 for -03)
No email
send info
Thank you for writing this -- I found it interesting and useful.

(Mirja Kühlewind) No Objection

Comment (2019-10-11 for -03)
Small comment on Sec 4.1. - Maybe:
OLD "These bits are reserved for future use and MUST be set to 0x0."
NEW "These bits are reserved for future use and MUST be set to 0x0 and ignored at receive."

(Alexey Melnikov) No Objection

Comment (2019-10-16 for -03)
No email
send info
I agree with Adam’s DISCUSS.

Alvaro Retana No Objection

Martin Vigoureux No Objection

Éric Vyncke No Objection

Comment (2019-10-11 for -03)
No email
send info
Thank you for the work done in this document.

The unusual wording of 'RTP carriage' in section 4.2.1 is interesting.


Magnus Westerlund (was Discuss) No Objection