Guidelines for using the Multiplexing Features of RTP to Support Multiple Media Streams
Summary: Has a DISCUSS. Has enough positions to pass once DISCUSS positions are resolved.
Benjamin Kaduk Discuss
As Mirja maybe already noted, Section 3.4.3 says: Section 8.3 of the RTP Specification [RFC3550] recommends using a single SSRC space across all RTP sessions for layered coding. Based on the experience so far however, we recommend to use a solution with explicit binding between the RTP streams that is agnostic to the used SSRC values. That way, solutions using multiple RTP streams in a This sounds an awful lot like we're trying to update the recommendations from RFC 3550, and looks like different text than was discussed in Mirja's ballot thread. Let's discuss whether the formal Updates: mechanism is appropriate here or we should consider rewording.
Abstract, Introduction Do we consider SRTP to be included in discussion of "RTP"? Section 1 from a particular usage of the RTP multiplexing points. The document will provide some guidelines and recommend against some usages as being unsuitable, in general or for particular purposes. If something is unsuitable in general, should the protocol feature be deprecated/removed? Section 2.1 RTP Session Group: One or more RTP sessions that are used together to perform some function. Examples are multiple RTP sessions used to carry different layers of a layered encoding. In an RTP Session Group, CNAMEs are assumed to be valid across all RTP sessions, and designate synchronisation contexts that can cross RTP sessions; i.e. SSRCs that map to a common CNAME can be assumed to have RTCP SR timing information derived from a common clock such that they can be synchronised for playout. I suggest expanding "RTCP SR timing" or providing a definition. Section 3.1 It seems a little surprising to use simulcast as an example in the "needed to represent one media source" bullet and then have separate bullets for simulcast permutations. sessions to group the RTP streams. The choice suitable for one reason, might not be the choice suitable for another reason. The nit: is it the "reason" or the "situation"/"scenario" that is relevant here? Section 3.2 RTP streams. Figure 1 outlines the process of demultiplexing incoming RTP streams starting already at the socket representing reception of one or transport flows, e.g. an UDP destination port. nit: "one or more"? I'd consider putting more arrowheads in the downward direction, though it's unclear if the resultant vertical expansion of the figure is worth it. Section 3.2.1 For RTP session separation within a single endpoint, RTP relies on the underlying transport layer, and on the signalling to identify RTP sessions in a manner that is meaningful to the application. A single endpoint can have one or more transport flows for the same RTP session, and a single RTP session can therefore span multiple transport layer flows even if all endpoints use a single transport layer flow per endpoint for that RTP session. The signalling layer nit: "therefore" seems misplaced; the relevant linkage in the logic seems to be that there could be one transport flow per endpoint pair (as we don't require multicast usage). Independently if an endpoint has one or more IP addresses, a single nit: I'm not sure if "independently" is the right conjunctive adverb, but whatever is used it should have a comma after it. Section 3.2.2 Endpoints that are both RTP sender and RTP receiver use the same SSRC in both roles. If I have multiple SSRCs as a sender, do I have freedom to vary amongst them when acting as an RTP receiver (or RTCP sender)? SSRC values are unique across RTP sessions. For the RTP retransmission [RFC4588] case it is recommended to use explicit binding of the source RTP stream and the redundancy stream, e.g. using the RepairedRtpStreamId RTCP SDES item [I-D.ietf-avtext-rid]. Some indication of whether this recommendation is new in this document or "long-standing" might be worthwhile. Note that RTP sequence number and RTP timestamp are scoped by the SSRC and thus specific per RTP stream. And now I wonder about the behavior of these two in the retransmission case from the previous paragraph. But that's likely off-topic for this document :) An endpoint that generates more than one media type, e.g. a conference participant sending both audio and video, need not (and, indeed, should not) use the same SSRC value across RTP sessions. I'm not sure I understand why the guidance on cross-session behavior is specific to the multi-media-type case. RTCP compound packets containing the CNAME SDES item is the designated method to bind an SSRC to a CNAME, effectively cross- correlating SSRCs within and between RTP Sessions as coming from the same endpoint. The main property attributed to SSRCs associated with the same CNAME is that they are from a particular synchronisation context and can be synchronised at playback. I am curious (but not necessarily needing to see in this document) where the security considerations regarding CNAME spoofing (where an attacker claims the CNAME of an existing source to attempt to be treated as part of the victim's output) are discussed. Section 3.2.4 The RTP payload type is scoped by the sending endpoint within an RTP session. PT has the same meaning across all RTP streams in an RTP session. All SSRCs sent from a single endpoint share the same I'd suggest "same meaning across all RTP streams from that sender", though given the previous (and next!) sentence it is probably not strictly necessary. Section 3.3 o Does my communication peer support RTP as defined with multiple SSRCs per RTP session? There's potentially some ambiguity about grouping/binding in this text. gateway, for example a need to monitor the RTP streams. Beware that changing the stream transport characteristics in the translator, can require thorough understanding of the application logic, specifically any congestion control or media adaptation to ensure appropriate media handling. While congestion control and media adaptation are important, they're hardly the only things that a middlebox might need to know about (but fail to implement properly, which is the point of this warning). I'd suggest rephrasing to be a range/selection rather than drilling into specific points (e.g., "from congestion control to media adaptation or particular application-layer semantics"). Within the uses enabled by the RTP standard the point to point topology can contain one to many RTP sessions with one to many media sources per session, each having one or more RTP streams per media source. micro-nit: "one to many", "one to many", "one or more" ruins the parallelism :) 3.4.3 o Signalling based (SDP) "e.g., SDP", no? An RTP/RTCP-based grouping solution is to use the RTCP SDES CNAME to bind related RTP streams to an endpoint or to a synchronization context. For applications with a single RTP stream per type (media, source or redundancy stream), CNAME is sufficient for that purpose independent if one or more RTP sessions are used. However, some nit: "independent if" doesn't parse properly; maybe "independently of whether"? independent if one or more RTP sessions are used. However, some applications choose not to use CNAME because of perceived complexity or a desire not to implement RTCP and instead use the same SSRC value to bind related RTP streams across multiple RTP sessions. RTP [It's interesting to see this noted, given that we talk about how if you don't implement RTCP you're not actually using RTP, just the RTP packet formats; and how we discuss that reusing the same SSRC value across multiple RTP sessions can be risky. That said, this should not discourage us from documenting what implementations actually do...] Section 3.4.4 There exist a number of Forward Error Correction (FEC) based schemes for how to reduce the packet loss of the original streams. Most of nit: I think this is either "mitigate packet loss" or "reduce lost data from a media stream", but "reduce packet loss" it is not. Using multiple RTP sessions supports the case where some set of receivers might not be able to utilise the FEC information. By placing it in a separate RTP session and if separating RTP sessions on transport level, FEC can easily be ignored already on transport level, without considering any RTP layer information. nit: "the transport level" Section 4.1.2 BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], it is possible for the RTP translator to map the RTP streams between both sides using some method, e.g. if the number and order of SDP "m=" lines between both sides are the same. There are also challenges with SSRC (There's nothing in SDP that requires that to be the case, though, this would merely be a "convenient property shared by the two applications' behavior"?) Section 4.1.3 For applications that use any security mechanism, e.g., in the form of SRTP, the gateway needs to be able to decrypt and verify source integrity of the incoming packets, and re-encrypt, integrity protect, and sign the packets as peer in the other application's security context. This is necessary even if all that's needed is a simple Can you clarify what is meant by "sign the packets as peer" here? Is it implying that the terminating gateway needs to have credentials so as to impersonate both "real" participants to the other? (Also, nit: "sign packets as the peer" might be a more grammatical wording, as "peer" needs an article.) If one uses security functions, like SRTP, and as can be seen from above, they incur both additional risk due to the requirement to have the gateway in the security association between the endpoints (unless the gateway is on the transport level), and additional complexities in form of the decrypt-encrypt cycles needed for each forwarded packet. SRTP, due to its keying structure, also requires that each This sentence is pretty complicated. Even in the first part, I'm not sure what "they" in "they incur both" refers to...it seems that the risk is to the participant(s) ("one") rather than the "security functions" themselves... RTP session needs different master keys, as use of the same key in two RTP sessions can for some ciphers result in two-time pads that completely breaks the confidentiality of the packets. I'd suggest discussing this as "reuse of a one-time pad" rather than a "two-time pad". Section 4.1.4 Endpoints that aren't updated to handle multiple streams following these recommendations can have issues with participating in RTP sessions containing multiple SSRCs within a single session, such as: Talking about endpoints being "updated [...] following these recommendations" also makes me wonder whether an Updates relationship to 3550 or other document(s) would be appropriate. Section 4.2.2 the, in most cases 2-3, additional flows. However, packet loss causes extra delays, at least 100 ms, which is the minimal retransmission timer for ICE. Doesn't RFC 8445 say 500 ms, not 100? Deep Packet Inspection and Multiple Streams: Firewalls differ in how deeply they inspect packets. There exist some risk that deeply inspecting firewalls will have similar legacy issues with multiple SSRCs as some RTP stack implementations. Re "some risk", can we say that this has definitely been seen in the wild at least once? Section 4.3.1 only premium users are allowed to access. The mechanism preventing a receiver from getting the high quality stream can be based on the stream being encrypted with a key that user can't access without paying premium, using the key-management to limit access to the key. nit: there seems to be a missing word here ("paying a premium"?) SRTP [RFC3711] has no special functions for dealing with different sets of master keys for different SSRCs. The key-management functions have different capabilities to establish different sets of keys, normally on a per-endpoint basis. For example, DTLS-SRTP [RFC5764] and Security Descriptions [RFC4568] establish different keys for outgoing and incoming traffic from an endpoint. This key usage has to be written into the cryptographic context, possibly associated with different SSRCs. I don't really understand what this paragraph is trying to say. Section 4.3.2 Transport translator-based sessions and multicast sessions, can This doesn't seem to match the terminology we used in § 4.1.2. (This terminology appears a couple other times, later.) Section 5.1 h. If the applications need finer control over which session participants that are included in different sets of security associations, most key-management will have difficulties establishing such a session. nit: the grammar is off, here (remove "that" and use "key-management techniques"?) Section 5.3 2. The application can indicate its usage of the RTP streams on RTP session level, in case multiple different usages exist. nit: is this "in case" (precautionary) or "in the case when" (descriptive)? Section 6 Transport Support Extensions: When defining new RTP/RTCP extensions nit: should we swap the order of "Support" and "Extensions"? Section 11.1 RFC 3830 does not feel like it needs to be normative. Appendix A 4. Sending multiple streams in the same sequence number space makes it impossible to determine which payload type, which stream a packet loss relates to, and thus to which stream to potentially apply packet loss concealment or other stream-specific loss mitigation mechanisms. I don't think this parses properly (around "which payload type,") Appendix B.1 One aspect of the existing signalling is that it is focused on RTP sessions, or at least in the case of SDP the media description. nit: I think there's an extra or missing word here (around "the media description"). o Bitrate/Bandwidth exist today only at aggregate or as a common "any RTP stream" limit, unless either codec-specific bandwidth limiting or RTCP signalling using TMMBR is used. Should we have a reference for TMMBR? Appendix B.3 RTP streams being transported in RTP has some particular usage in an RTP application. This usage of the RTP stream is in many nit: singular/plural mismatch "has"/"streams"
Barry Leiba Yes
Deborah Brungard No Objection
Alissa Cooper No Objection
Roman Danyliw No Objection
(Suresh Krishnan) No Objection
Warren Kumari No Objection
Thank you for an interesting, and readable document.
(Mirja Kühlewind) No Objection
One processing question: Should this document update RFC3550 given the last paragraph each in section 3.4.1 and 3.4.3? And one comment on section 4.2.1: "Different Differentiated Services Code Points (DSCP) can be assigned to different packets within a flow as well as within an RTP stream. " not sure what you mean by flow here but at least RFC7657 says "Should use a single DSCP for all packets within a reliable transport protocol session" Maybe you can say a bit more here to ensure the guidance provided in RFC7657 is reflected accurately. Even though I didn't see any discussion of the TSV-ART review (Thanks Bernard!) I believe all comments have been addressed. Thanks for that! Fully editorial minor comments: 1) In the intro maybe: OLD The authors hope that clarification on the usefulness of some functionalities in RTP will result in more complete implementations in the future. NEW This document aims to clarify the usefulness of some functionalities in RTP which will hopefully result in more complete implementations in the future. 2) sec 3.2 s/one or transport flows/one or more transport flows/ And maybe also s/transport flows, e.g. an UDP destination port./transport flows, e.g. based on the UDP destination port./? 3) sec 3.2.1: " RTP does not contain a session identifier, yet different RTP sessions must be possible to identify both across different endpoints and within a single endpoint." Not sure I can parse this sentence correctly... 4) sec 4.1.3: s/Signalling, choosing and policing/Signalling, choosing, and policing/ -> missing comma 5) sec 6 maybe: s/specification writers/specification designers/
(Alexey Melnikov) No Objection
Alvaro Retana No Objection
(Adam Roach) No Objection
Éric Vyncke No Objection
Magnus Westerlund (was Abstain) Recuse
Comment (2020-02-17 for -10)
I am a co-author