Guidelines for using the Multiplexing Features of RTP to Support Multiple Media Streams
draft-ietf-avtcore-multiplex-guidelines-11

Summary: Has a DISCUSS. Has enough positions to pass once DISCUSS positions are resolved.

Benjamin Kaduk Discuss

Discuss (2020-03-04)
As Mirja maybe already noted, Section 3.4.3 says:

   Section 8.3 of the RTP Specification [RFC3550] recommends using a
   single SSRC space across all RTP sessions for layered coding.  Based
   on the experience so far however, we recommend to use a solution with
   explicit binding between the RTP streams that is agnostic to the used
   SSRC values.  That way, solutions using multiple RTP streams in a

This sounds an awful lot like we're trying to update the recommendations
from RFC 3550, and looks like different text than was discussed in
Mirja's ballot thread.  Let's discuss whether the formal Updates:
mechanism is appropriate here or we should consider rewording.
Comment (2020-03-04)
Abstract, Introduction

Do we consider SRTP to be included in discussion of "RTP"?

Section 1

   from a particular usage of the RTP multiplexing points.  The document
   will provide some guidelines and recommend against some usages as
   being unsuitable, in general or for particular purposes.

If something is unsuitable in general, should the protocol feature be
deprecated/removed?

Section 2.1

   RTP Session Group:  One or more RTP sessions that are used together
      to perform some function.  Examples are multiple RTP sessions used
      to carry different layers of a layered encoding.  In an RTP
      Session Group, CNAMEs are assumed to be valid across all RTP
      sessions, and designate synchronisation contexts that can cross
      RTP sessions; i.e. SSRCs that map to a common CNAME can be assumed
      to have RTCP SR timing information derived from a common clock
      such that they can be synchronised for playout.

I suggest expanding "RTCP SR timing" or providing a definition.

Section 3.1

It seems a little surprising to use simulcast as an example in the
"needed to represent one media source" bullet and then have separate
bullets for simulcast permutations.

   sessions to group the RTP streams.  The choice suitable for one
   reason, might not be the choice suitable for another reason.  The

nit: is it the "reason" or the "situation"/"scenario" that is relevant
here?

Section 3.2

   RTP streams.  Figure 1 outlines the process of demultiplexing
   incoming RTP streams starting already at the socket representing
   reception of one or transport flows, e.g. an UDP destination port.

nit: "one or more"?

I'd consider putting more arrowheads in the downward direction, though
it's unclear if the resultant vertical expansion of the figure is worth
it.

Section 3.2.1

   For RTP session separation within a single endpoint, RTP relies on
   the underlying transport layer, and on the signalling to identify RTP
   sessions in a manner that is meaningful to the application.  A single
   endpoint can have one or more transport flows for the same RTP
   session, and a single RTP session can therefore span multiple
   transport layer flows even if all endpoints use a single transport
   layer flow per endpoint for that RTP session.  The signalling layer

nit: "therefore" seems misplaced; the relevant linkage in the logic
seems to be that there could be one transport flow per endpoint pair
(as we don't require multicast usage).

   Independently if an endpoint has one or more IP addresses, a single

nit: I'm not sure if "independently" is the right conjunctive adverb,
but whatever is used it should have a comma after it.

Section 3.2.2

   Endpoints that are both RTP sender and RTP receiver use the same SSRC
   in both roles.

If I have multiple SSRCs as a sender, do I have freedom to vary amongst
them when acting as an RTP receiver (or RTCP sender)?

   SSRC values are unique across RTP sessions.  For the RTP
   retransmission [RFC4588] case it is recommended to use explicit
   binding of the source RTP stream and the redundancy stream, e.g.
   using the RepairedRtpStreamId RTCP SDES item [I-D.ietf-avtext-rid].

Some indication of whether this recommendation is new in this document
or "long-standing" might be worthwhile.

   Note that RTP sequence number and RTP timestamp are scoped by the
   SSRC and thus specific per RTP stream.

And now I wonder about the behavior of these two in the retransmission
case from the previous paragraph.  But that's likely off-topic for this
document :)

   An endpoint that generates more than one media type, e.g.  a
   conference participant sending both audio and video, need not (and,
   indeed, should not) use the same SSRC value across RTP sessions.

I'm not sure I understand why the guidance on cross-session behavior is
specific to the multi-media-type case.

   RTCP compound packets containing the CNAME SDES item is the
   designated method to bind an SSRC to a CNAME, effectively cross-
   correlating SSRCs within and between RTP Sessions as coming from the
   same endpoint.  The main property attributed to SSRCs associated with
   the same CNAME is that they are from a particular synchronisation
   context and can be synchronised at playback.

I am curious (but not necessarily needing to see in this document) where
the security considerations regarding CNAME spoofing (where an attacker
claims the CNAME of an existing source to attempt to be treated as part
of the victim's output) are discussed.

Section 3.2.4

   The RTP payload type is scoped by the sending endpoint within an RTP
   session.  PT has the same meaning across all RTP streams in an RTP
   session.  All SSRCs sent from a single endpoint share the same

I'd suggest "same meaning across all RTP streams from that sender",
though given the previous (and next!) sentence it is probably not
strictly necessary.

Section 3.3

   o  Does my communication peer support RTP as defined with multiple
      SSRCs per RTP session?

There's potentially some ambiguity about grouping/binding in this text.

   gateway, for example a need to monitor the RTP streams.  Beware that
   changing the stream transport characteristics in the translator, can
   require thorough understanding of the application logic, specifically
   any congestion control or media adaptation to ensure appropriate
   media handling.

While congestion control and media adaptation are important, they're
hardly the only things that a middlebox might need to know about (but
fail to implement properly, which is the point of this warning).  I'd
suggest rephrasing to be a range/selection rather than drilling into
specific points (e.g., "from congestion control to media adaptation or
particular application-layer semantics").

   Within the uses enabled by the RTP standard the point to point
   topology can contain one to many RTP sessions with one to many media
   sources per session, each having one or more RTP streams per media
   source.

micro-nit: "one to many", "one to many", "one or more" ruins the parallelism
:)

3.4.3

   o  Signalling based (SDP)

"e.g., SDP", no?

   An RTP/RTCP-based grouping solution is to use the RTCP SDES CNAME to
   bind related RTP streams to an endpoint or to a synchronization
   context.  For applications with a single RTP stream per type (media,
   source or redundancy stream), CNAME is sufficient for that purpose
   independent if one or more RTP sessions are used.  However, some

nit: "independent if" doesn't parse properly; maybe "independently of
whether"?

   independent if one or more RTP sessions are used.  However, some
   applications choose not to use CNAME because of perceived complexity
   or a desire not to implement RTCP and instead use the same SSRC value
   to bind related RTP streams across multiple RTP sessions.  RTP

[It's interesting to see this noted, given that we talk about how if you
don't implement RTCP you're not actually using RTP, just the RTP packet
formats; and how we discuss that reusing the same SSRC value across
multiple RTP sessions can be risky.  That said, this should not
discourage us from documenting what implementations actually do...]

Section 3.4.4

   There exist a number of Forward Error Correction (FEC) based schemes
   for how to reduce the packet loss of the original streams.  Most of

nit: I think this is either "mitigate packet loss" or "reduce lost data
from a media stream", but "reduce packet loss" it is not.

   Using multiple RTP sessions supports the case where some set of
   receivers might not be able to utilise the FEC information.  By
   placing it in a separate RTP session and if separating RTP sessions
   on transport level, FEC can easily be ignored already on transport
   level, without considering any RTP layer information.

nit: "the transport level"

Section 4.1.2

   BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], it is possible for
   the RTP translator to map the RTP streams between both sides using
   some method, e.g. if the number and order of SDP "m=" lines between
   both sides are the same.  There are also challenges with SSRC

(There's nothing in SDP that requires that to be the case, though, this
would merely be a "convenient property shared by the two applications'
behavior"?)

Section 4.1.3

   For applications that use any security mechanism, e.g., in the form
   of SRTP, the gateway needs to be able to decrypt and verify source
   integrity of the incoming packets, and re-encrypt, integrity protect,
   and sign the packets as peer in the other application's security
   context.  This is necessary even if all that's needed is a simple

Can you clarify what is meant by "sign the packets as peer" here?  Is it
implying that the terminating gateway needs to have credentials so as to
impersonate both "real" participants to the other?
(Also, nit: "sign packets as the peer" might be a more grammatical
wording, as "peer" needs an article.)

   If one uses security functions, like SRTP, and as can be seen from
   above, they incur both additional risk due to the requirement to have
   the gateway in the security association between the endpoints (unless
   the gateway is on the transport level), and additional complexities
   in form of the decrypt-encrypt cycles needed for each forwarded
   packet.  SRTP, due to its keying structure, also requires that each

This sentence is pretty complicated.  Even in the first part, I'm not
sure what "they" in "they incur both" refers to...it seems that the risk
is to the participant(s) ("one") rather than the "security functions"
themselves...

   RTP session needs different master keys, as use of the same key in
   two RTP sessions can for some ciphers result in two-time pads that
   completely breaks the confidentiality of the packets.

I'd suggest discussing this as "reuse of a one-time pad" rather than a
"two-time pad".

Section 4.1.4

   Endpoints that aren't updated to handle multiple streams following
   these recommendations can have issues with participating in RTP
   sessions containing multiple SSRCs within a single session, such as:

Talking about endpoints being "updated [...] following these
recommendations" also makes me wonder whether an Updates relationship to
3550 or other document(s) would be appropriate.

Section 4.2.2

      the, in most cases 2-3, additional flows.  However, packet loss
      causes extra delays, at least 100 ms, which is the minimal
      retransmission timer for ICE.

Doesn't RFC 8445 say 500 ms, not 100?

   Deep Packet Inspection and Multiple Streams:  Firewalls differ in how
      deeply they inspect packets.  There exist some risk that deeply
      inspecting firewalls will have similar legacy issues with multiple
      SSRCs as some RTP stack implementations.

Re "some risk", can we say that this has definitely been seen in the
wild at least once?

Section 4.3.1

   only premium users are allowed to access.  The mechanism preventing a
   receiver from getting the high quality stream can be based on the
   stream being encrypted with a key that user can't access without
   paying premium, using the key-management to limit access to the key.

nit: there seems to be a missing word here ("paying a premium"?)

   SRTP [RFC3711] has no special functions for dealing with different
   sets of master keys for different SSRCs.  The key-management
   functions have different capabilities to establish different sets of
   keys, normally on a per-endpoint basis.  For example, DTLS-SRTP
   [RFC5764] and Security Descriptions [RFC4568] establish different
   keys for outgoing and incoming traffic from an endpoint.  This key
   usage has to be written into the cryptographic context, possibly
   associated with different SSRCs.

I don't really understand what this paragraph is trying to say.

Section 4.3.2

   Transport translator-based sessions and multicast sessions, can

This doesn't seem to match the terminology we used in § 4.1.2.
(This terminology appears a couple other times, later.)

Section 5.1

   h.  If the applications need finer control over which session
       participants that are included in different sets of security
       associations, most key-management will have difficulties
       establishing such a session.

nit: the grammar is off, here (remove "that" and use "key-management
techniques"?)

Section 5.3

   2.  The application can indicate its usage of the RTP streams on RTP
       session level, in case multiple different usages exist.

nit: is this "in case" (precautionary) or "in the case when"
(descriptive)?

Section 6

   Transport Support Extensions:  When defining new RTP/RTCP extensions

nit: should we swap the order of "Support" and "Extensions"?

Section 11.1

RFC 3830 does not feel like it needs to be normative.

Appendix A

   4.   Sending multiple streams in the same sequence number space makes
        it impossible to determine which payload type, which stream a
        packet loss relates to, and thus to which stream to potentially
        apply packet loss concealment or other stream-specific loss
        mitigation mechanisms.

I don't think this parses properly (around "which payload type,")

Appendix B.1

   One aspect of the existing signalling is that it is focused on RTP
   sessions, or at least in the case of SDP the media description.

nit: I think there's an extra or missing word here (around "the media
description").

   o  Bitrate/Bandwidth exist today only at aggregate or as a common
      "any RTP stream" limit, unless either codec-specific bandwidth
      limiting or RTCP signalling using TMMBR is used.

Should we have a reference for TMMBR?

Appendix B.3

   RTP streams being transported in RTP has some particular usage in an
   RTP application.  This usage of the RTP stream is in many

nit: singular/plural mismatch "has"/"streams"

Barry Leiba Yes

Deborah Brungard No Objection

Alissa Cooper No Objection

Roman Danyliw No Objection

(Suresh Krishnan) No Objection

Warren Kumari No Objection

Comment (2020-03-03)
No email
send info
Thank you for an interesting, and readable document.

(Mirja Kühlewind) No Objection

Comment (2020-03-02)
One processing question: Should this document update RFC3550 given the last paragraph each in section 3.4.1 and 3.4.3?

And one comment on section 4.2.1:
"Different Differentiated
   Services Code Points (DSCP) can be assigned to different packets
   within a flow as well as within an RTP stream. "
not sure what you mean by flow here but at least RFC7657 says
"Should use a single DSCP for all packets within a reliable
      transport protocol session"
Maybe you can say a bit more here to ensure the guidance provided in RFC7657 is reflected accurately.

Even though I didn't see any discussion of the TSV-ART review (Thanks Bernard!) I believe all comments have been addressed. Thanks for that!

Fully editorial minor comments:
1) In the intro maybe:
OLD
 The authors hope that clarification on the usefulness
   of some functionalities in RTP will result in more complete
   implementations in the future.
NEW
This document aims to clarify the usefulness
   of some functionalities in RTP which will hopefully result in more complete
   implementations in the future.

2) sec 3.2
s/one or transport flows/one or more transport flows/
And maybe also
s/transport flows, e.g. an UDP destination port./transport flows, e.g. based on the UDP destination port./?

3) sec 3.2.1:
"   RTP does not contain a session identifier, yet different RTP sessions
   must be possible to identify both across different endpoints and
   within a single endpoint."
Not sure I can parse this sentence correctly...

4) sec 4.1.3:
s/Signalling, choosing and policing/Signalling, choosing, and policing/ -> missing comma

5) sec 6 maybe:
s/specification writers/specification designers/

(Alexey Melnikov) No Objection

Alvaro Retana No Objection

(Adam Roach) No Objection

Éric Vyncke No Objection

Magnus Westerlund (was Abstain) Recuse

Comment (2020-02-17 for -10)
No email
send info
I am a co-author

Martin Duke No Record

Erik Kline No Record

Murray Kucherawy No Record

Martin Vigoureux No Record

Robert Wilton No Record