Network Working Group M. Westerlund
Internet-Draft B. Burman
Intended status: Standards Track M. Lindqvist
Expires: August 29, 2013 F. Jansson
Ericsson
February 25, 2013
Using Simulcast in RTP Sessions
draft-westerlund-avtcore-rtp-simulcast-02
Abstract
In some applications it may be necessary to send multiple media
encodings derived from the same media source in independent RTP media
streams. This is called Simulcast. This document discusses the best
way of accomplishing this in RTP and how to signal it in SDP. It is
concluded that a solution where the different simulcast versions are
based on separate SDP media descriptions provides best support for
simulcast. A solution is defined by making two extensions to SDP.
The first extension consists of two new attributes in SDP that
express capability to send or receive simulcast streams,
respectively. The second extension describes how to group media
descriptions belonging to the same simulcast source by using the
grouping framework.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 29, 2013.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
Westerlund, et al. Expires August 29, 2013 [Page 1]
Internet-Draft RTP Simulcast February 2013
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5
2.2. Requirements Language . . . . . . . . . . . . . . . . . . 5
3. Simulcast Scenarios . . . . . . . . . . . . . . . . . . . . . 5
3.1. Simulcasting to RTP Mixer . . . . . . . . . . . . . . . . 5
3.1.1. Simulcast Combined with Scalable Encoding . . . . . . 7
3.2. Multicast Transported Simulcasted Media . . . . . . . . . 7
3.2.1. Diversity in Receiver Population . . . . . . . . . . . 7
3.2.2. Bit-rate Adaptation . . . . . . . . . . . . . . . . . 8
3.3. Same Encoding to Multiple Destinations . . . . . . . . . . 9
3.4. Different Encoding to Independent Destinations . . . . . . 9
4. Network Aspects . . . . . . . . . . . . . . . . . . . . . . . 10
5. Simulcast Alternatives . . . . . . . . . . . . . . . . . . . . 10
5.1. Using the Payload Type . . . . . . . . . . . . . . . . . . 11
5.2. Using Single RTP session . . . . . . . . . . . . . . . . . 11
5.3. Using Multiple RTP sessions . . . . . . . . . . . . . . . 12
5.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 12
6. Simulcast Signaling Proposal . . . . . . . . . . . . . . . . . 13
6.1. Simulcast Capability . . . . . . . . . . . . . . . . . . . 14
6.2. Grouping Simulcast Media Descriptions . . . . . . . . . . 16
6.2.1. Declarative Use . . . . . . . . . . . . . . . . . . . 16
6.2.2. Offer/Answer Use . . . . . . . . . . . . . . . . . . . 16
6.3. Two-Phase Negotiation . . . . . . . . . . . . . . . . . . 17
6.4. Media Stream Requirements . . . . . . . . . . . . . . . . 17
6.5. Relating Alternative Encodings . . . . . . . . . . . . . . 18
6.6. Multiple Stream handling . . . . . . . . . . . . . . . . . 18
7. Simulcast Signaling Examples . . . . . . . . . . . . . . . . . 18
7.1. Alice: Desktop Client . . . . . . . . . . . . . . . . . . 19
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22
9. Security Considerations . . . . . . . . . . . . . . . . . . . 22
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
11.1. Normative References . . . . . . . . . . . . . . . . . . . 23
11.2. Informative References . . . . . . . . . . . . . . . . . . 24
Westerlund, et al. Expires August 29, 2013 [Page 2]
Internet-Draft RTP Simulcast February 2013
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25
Westerlund, et al. Expires August 29, 2013 [Page 3]
Internet-Draft RTP Simulcast February 2013
1. Introduction
Simulcast is the act of simultaneously sending multiple different
versions of the same media content, e.g. the same video source
encoded with different video encoders or target resolutions. This
can be done in several ways and for different purposes. This
document focuses on the case where one wants to provide multiple
streams with different encodings over RTP [RFC3550] towards an
intermediary so that the intermediary can select which encoding to
forward to other participants in the session, and more specifically
how the grouping of the streams is defined. From an RTP perspective,
simulcast is a specific application of the aspects discussed in RTP
Multiplexing Architecture
[I-D.westerlund-avtcore-multiplex-architecture].
The different encodings of a media content that are considered in
this document can differ in:
Bit-rate: The difference is the amount of bits spent to encode the
media thus giving different quality.
Codec: Different media codecs are used to ensure that different
receivers that do not have a common set of decoders can decode at
least one of the versions. This can include codec configuration
options that are not compatible, like video encoder profiles, or
the capability of receiving the transport packetization.
Sampling: Different sampling of media, in spatial as well as in
temporal domain, may be used to suit different rendering
capabilities or needs at the receiving endpoints, as well as a
method to achieve different bit-rates. For video streams, spatial
sampling affects image resolution and temporal sampling affects
video frame rate. For audio, spatial sampling relates to the
number of audio channels and temporal sampling affects audio
bandwidth. Obviously, a difference in sampling may result in
difference in bit-rate.
There are different reasons for an application to provide multiple
different encodings of a single media source. As soon as an
application has the need to send multiple encodings, there is a
potential need for simulcast. This need can arise even when using
media codecs that have scalability features built in. The purpose of
this document is to describe a few scenarios where it is motivated to
use simulcast, elaborate on possible alternatives and available
mechanisms, and find a suitable solution for signaling and performing
RTP simulcast. The discussion results in a signaling proposal to
support simulcast.
Westerlund, et al. Expires August 29, 2013 [Page 4]
Internet-Draft RTP Simulcast February 2013
2. Definitions
2.1. Terminology
The following terms and abbreviations are used in this document:
Encoding: A particular encoding is the choice of the media encoder
(codec) that has been used to compress the media and the fidelity
of that encoding through the choice of sampling, bit-rate and
other codec configuration parameters.
Different encodings: An encoding is different when some parameter
that characterize the encoding of a particular media source is
changed. Such changes can be one or more of the following
parameters; codec, codec configuration, bit-rate, sampling.
Simulcast versions: Media streams used for simulcast that use
different encodings and thus constitute different versions of the
same media source.
2.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
3. Simulcast Scenarios
This section discusses different usage scenarios for the term
simulcast and clarifies which of those this document focuses on. It
also reviews why simulcast and scalable codecs can be a useful
combination.
3.1. Simulcasting to RTP Mixer
This scenario relates to a multi-party session where one or more
central nodes are used to facilitate the media transport between the
session participants. Thus, this targets the RTP Mixer Topology
defined in [RFC5117] (Section 3.4: Topo-Mixer). This scenario is
targeted for further discussion in this document.
Simulcasting different media encodings of video that differ both in
resolution and in bit-rate is highly applicable to video conferencing
scenarios. For example, an RTP mixer selects the video of the most
active speaker and sends that participant's video stream as a high
resolution stream to the other participants, and in addition also
sends a number of low resolution video streams of the other
Westerlund, et al. Expires August 29, 2013 [Page 5]
Internet-Draft RTP Simulcast February 2013
participants, enabling the receiving user to both display the current
speaker in high quality and monitor the other participants in lower
quality/resolution/size. As the participants should not receive the
stream showing themselves, the set of streams will be unique to all
participants.
A number of alternatives exist to provide both high and low
resolutions from an RTP Mixer:
Simulcast: The clients send one stream for the low resolution and
another for the high resolution to the RTP Mixer.
Scalable Video Coding: The clients send one stream to the RTP Mixer,
using a video encoder that in this stream can provide both the
high resolution and also enables the mixer to extract a low
resolution representation from that single stream.
Transcoding in the Mixer: The clients send a high resolution stream
to the RTP Mixer which performs a transcoding to a lower
resolution stream.
The Transcoding alternative requires that the RTP mixer has
sufficient amount of transcoding resources to produce the number of
low resolution streams required. In worst case, all participants'
streams may need to be transcoded. If the resources are not
available, a different solution is needed. There will also normally
be a quality loss and an increase in latency associated with the
transcoding operation.
Scalable video encoding requires a more complex encoder compared to
non-scalable encoding. Also, if the resolution difference between
the streams is large, a scalable codec may in fact be only marginally
more bandwidth efficient than the simulcast case where the different
resolutions are sent as separate streams from the clients to the
mixer. At the same time, with scalable video encoding using the
currently available scalable video codecs, the transmission of all
but the lowest resolution will consume more bandwidth from the mixer
to the other participants compared to a non-scalable encoding.
Simulcasting has the benefit that it is conceptually simple. It
enables the use of any media codec that the participants agree on,
allowing the RTP mixer to be codec-agnostic.
Westerlund, et al. Expires August 29, 2013 [Page 6]
Internet-Draft RTP Simulcast February 2013
+------------+ +---+
+---+ | |----->| B |
| |=====>| | +---+
| A | | Mixer |
| |----->| | +---+
+---+ | |=====>| C |
+------------+ +---+
Figure 1: RTP Mixer selecting from simulcast versions
The sender A provides the mixer with both a high resolution version
"===>" and a low resolution version "--->". The mixer selects who in
it's receiver population should get a particular version.
3.1.1. Simulcast Combined with Scalable Encoding
As explained in the previous section, a scalable codec is not always
more bandwidth efficient than simulcast, especially in the path from
the mixer to the receiver.
There are however cases where a combination of simulcast and scalable
encoding can be beneficial. By using simulcast in cases where the
scalable codec is less efficient, it is possible to optimize the
efficiency of the complete system. A good example of this usage
would be where the video is encoded using SVC transported in RTP
[RFC6190], where each simulcast stream has a different resolution,
and each SVC media stream uses temporal scalability and signal to
noise ratio (SNR) scalability within that single media stream. If
only resolution and temporal variations are needed, this can be
implemented using the non-scalable part of H.264, as each simulcast
version provides the different resolution, and each media stream
within a simulcast encoding has temporal scalability through the use
of non-reference frames.
3.2. Multicast Transported Simulcasted Media
When using multicast, particularly Source-Specific Multicast (SSM)
[RFC3569] to distribute RTP/RTCP packets to a large receiver
population one faces some issues. There are at least two different
issues where simulcast can potentially be useful.
3.2.1. Diversity in Receiver Population
If there is any diversity in the receivers regarding e.g. capability,
codec support or code base, there are potentially restrictions in
what streams can be delivered to the receivers. If using the lowest
common denominator over a diverse receiver population isn't
acceptable, simulcast can be one possible solution. By offering
Westerlund, et al. Expires August 29, 2013 [Page 7]
Internet-Draft RTP Simulcast February 2013
different stream alternatives, it is possible to let the receivers
choose the simulcast version that matches their capabilities. By
using explicit signalling for simulcast, it is not necessary for the
stream distributor to handle multiple receiver configurations
individually for a multi-media session, nor to ensure that each
receiver gets an encoding that matches their capabilities.
The simulcast version granularity the receivers can select will be on
multicast group level. Thus, this use case puts a strict requirement
on supporting separation through differnt RTP sessions. The reason
being that having a single RTP session straddle several multicast
groups makes any reporting on the received sources very difficult to
interpret. Using one RTP session per simulcast version instead
provides consistency.
3.2.2. Bit-rate Adaptation
If the network paths from the media sender to the receivers can
support different bit-rates, there is a need to support media streams
encoded to different bit-rates. If these path differences are of a
more static nature, for example depending primarily on the underlying
link layers, using simulcast has an advantage over scalable encoding.
The reason is that the efficiency of scalable coding will never be
better than encoding to a single target rate. When the receiver can
determine current network interface connectivity, it can choose
simulcast version with certainty. That choice will also be correct
until the event of another network interface becoming the active one.
This assumes that the multicast transmission uses dedicated resources
and will thus not be congested due to other network traffic. To
support this behavior, the signalling must support indication of
which media streams that are alternatives to each other, and it is
also necessary to be able to determine aggregate bit-rate for the
selected multicast group(s) compared to available network properties.
Simulcast is possible to use also in more dynamic situations where
each receiver continuously gathers reception statistics to detect
path congestion and based on that may change which version to
receive. The main issue with such usage is how to achieve a switch
from one version to another with minimal playback interruption and
also avoiding to put extra load on the network during the actual
switch. Here, scalable encoding in general have better
characteristics since scalability layers are typically synchronized.
When comparing simulcast and scalable encoding, the trade-offs are
different and the down-sides occur at different places. Simulcast
will have a higher bit-rate load at a media sender and that will also
be the case for any network path shared between receivers of multiple
simulcast versions. However, for parts of the network path where
Westerlund, et al. Expires August 29, 2013 [Page 8]
Internet-Draft RTP Simulcast February 2013
there is only a single simulcast version, the achievable quality at a
given bit-rate will be slightly higher for simulcast. It will also
be more difficult to seamlessly switch between simulcast versions
than between different scalable encodings, as simulcast actually
switches from one media stream version to another instead of adding
or removing some enhancement layers.
3.3. Same Encoding to Multiple Destinations
One interpretation of simulcast is when one encoding is sent to
multiple receivers. This is well supported in RTP by simply copying
all outgoing RTP and RTCP traffic to several transport destinations,
if the intention is to create a common RTP session. As long as all
participants do the same, a full mesh is constructed and everyone in
the multi party session have a similar view of the joint RTP session.
This is analog to an Any Source Multicast (ASM) session but without
the traffic optimization as multiple copies of the same content is
likely to have to pass over the same link.
+---+ +---+
| A |<---->| B |
+---+ +---+
^ ^
\ /
\ /
v v
+---+
| C |
+---+
Figure 2: Full Mesh / Multi-unicast
As this type of simulcast is analog to ASM usage and RTP has good
support for ASM sessions, no further consideration is made in this
document for this scenario.
3.4. Different Encoding to Independent Destinations
Another alternative interpretation of simulcast includes multiple
destinations, where each destination gets a specifically tailored
version, but where the destinations are independent. A typical
example for this would be a streaming server distributing the same
live session to a number of receivers, adapting the quality and
resolution of the multi-media session to each receiver's capability
and available bit-rate. This case can be solved in RTP by having
independent RTP sessions between the sender and the receivers. Thus
this case is not considered further.
Westerlund, et al. Expires August 29, 2013 [Page 9]
Internet-Draft RTP Simulcast February 2013
4. Network Aspects
The network aspects that are relevant for simulcast are:
Quality of Service: When using simulcast it might be of interest to
prioritize a particular simulcast version, rather than applying
equal treatment to all versions. For example, lower bit-rate
versions may be prioritized over higher bit-rate versions to
minimize congestion or packet losses in the low bit-rate versions.
Thus, there is a benefit to use a simulcast solution that supports
QoS as good as possible. By separating simulcast versions into
different RTP sessions and send those RTP sessions over different
transport flows, a simulcast version can be prioritized by
existing flow based QoS mechanisms. When using unicast, QoS
mechanisms based on individual packet marking are also feasible,
which do not require separation of simulcast versions into
different RTP sessions to apply different QoS.
NAT/FW Traversal: Using multiple RTP sessions will incur more cost
for NAT/FW traversal unless they can re-use the same transport
flow, which can be achieved by either one of multiplexing multiple
RTP sessions on a single lower layer transport
[I-D.westerlund-avtcore-transport-multiplexing] or Multiplexing
Negotiation Using SDP Port Numbers
[I-D.ietf-mmusic-sdp-bundle-negotiation]. If flow based QoS with
any differentiation is desirable, the cost for additional
transport flows is likely necessary.
Multicast: Multiple RTP sessions will be required to enable
combining simulcast with multicast. Different simulcast versions
have to be separated to different multicast groups to allow a
multicast receiver to pick the version it wants, rather than
receive all of them. In this case, the only reasonable
implementation is to use different RTP sessions for each multicast
group so that reporting and other RTCP functions operate as
intended.
5. Simulcast Alternatives
Simulcast is in this document defined as the act of sending multiple
alternative encodings of the same underlying media source. When
transmitting multiple independent streams that originate from the
same source, it could potentially be done in several different ways
using RTP. A general discussion on how considerations for use of the
different RTP multiplexing alternatives can be found in Guidelines
for using the Multiplexing Features of RTP
[I-D.westerlund-avtcore-multiplex-architecture]. Discussion and
Westerlund, et al. Expires August 29, 2013 [Page 10]
Internet-Draft RTP Simulcast February 2013
clarification on how to handle multiple streams in an RTP session can
be found in [I-D.lennox-avtcore-rtp-multi-stream].
The below sub-sections briefly describe potential ways of achieving
RTP media stream multiplexing and identification of which streams are
alternative simulcast encodings of the same source. In the following
descriptions it is also included how this interacts with multiple
sources (SSRCs) in the same RTP session for other reasons than
simulcast. Multiple SSRCs may occur for various reasons such as
multiple participants in multipoint topologies like multicast,
transport relays or full mesh transport simulcasting, multiple source
devices such as multiple cameras or microphones at one end-point, or
other RTP mechanisms such as RTP Retransmission [RFC4588].
5.1. Using the Payload Type
An alternative could be to use only the RTP payload type to identify
the different simulcast streams. This could be tempting, since
simulcast streams may differ in codec, codec configuration, or
sampling, all of which are typically specified in SDP by a format
number on the media line that is in turn connected to an RTP Payload
Type. Thus all simulcast streams would be sent in the same RTP
session using only a single SSRC per actual media source. However,
as discussed in Guidelines for using the Multiplexing Features of RTP
[I-D.westerlund-avtcore-multiplex-architecture], using Payload Type
Multiplexing does not generally work and is hereby dismissed as
potential solution.
5.2. Using Single RTP session
This idea is based on using a unique SSRC for each alternative
encoding of an actual media source within a single RTP session. The
identification of streams and how they are specified to be related
alternatives needs an additional mechanism, for example using SSRC
grouping [RFC5576], and potentially also a new SDES item such as
SRCNAME proposed in [I-D.westerlund-avtext-rtcp-sdes-srcname] with a
semantics that indicate them as alternatives of a particular media
source. When there are multiple actual media sources in a session,
each media source will have to use a number of SSRCs to represent the
different simulcast alternatives it produces. For example, assume
the number of media sources is n and if they all produce the same
number of simulcast versions, m, there will be n*m SSRCs in use in
the RTP session. Each SSRC can use any of the configured payload
types for this RTP session. All session level attributes and
parameters that are not source specific will apply and must function
with all the alternative encodings in use.
In the currently used signaling system based on SDP [RFC4566] and
Westerlund, et al. Expires August 29, 2013 [Page 11]
Internet-Draft RTP Simulcast February 2013
Offer/Answer [RFC3264], the properties of media streams are typically
negotiated on media block (m-line) level. Sending simulcast
alternatives as different SSRC belonging to the same media
description is likely possible to achieve, but SSRC centric signaling
providing the needed media stream properties is currently almost non-
existent and it would require a considerable effort to make the
necessary SDP extensions.
A single RTP session can be described in SDP by more than a single
m-line, like for BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], and
it can re-use the same m-line grouping [RFC5888] as would be used for
multiple RTP sessions (Section 5.3), but the RTP aspects described in
this section will still apply. This would enable the same signalling
expressenes for multiple RTP sessions as for a single RTP sessions.
5.3. Using Multiple RTP sessions
Using multiple RTP sessions means that each different simulcast
version of an actual media source is transmitted in a separate RTP
session, using whatever session identifier to distinguish the
different versions. Since each RTP session is described by one or
more SDP m-lines, this solution needs explicit m-line grouping
[RFC5888] with a semantics that indicate them as simulcast
alternatives. It is also important to identify the SSRCs in the
different sessions that are alternative encodings of the same media
source, if there are more than a single media source in each RTP
session. This could be accomplished using the same SSRC across the
sessions, but that is not robust against SSRC collisions and could
potentially force cascading SSRC changes between sessions. A better
choice would be to use different SSRC, but relate streams through a
new SDES item proposed in [I-D.westerlund-avtext-rtcp-sdes-srcname].
Each RTP session will have its own set of configured RTP payload
types available for use with any SSRC in that session. In addition,
all other attributes for sessions or sources can be used as normal to
indicate the configuration of that particular alternative.
5.4. Conclusions
If it is at all desirable to support simulcast based on multicast,
the solution must support using multiple RTP sessions. The main
reason is that receiver based selection of simulcast version must be
possible, which is accomplished in multicast through receiver
selection of which multicast group(s) it joins. This also has the
advantage of being able to use the existing SDP media description
(m=) expressiveness to signal or negotiate simulcast versions.
When using simulcast based on unicast, it is desirable to be able to
use the same media description signalling expressiveness regardless
Westerlund, et al. Expires August 29, 2013 [Page 12]
Internet-Draft RTP Simulcast February 2013
if multiple RTP sessions are used or not. Assuming that MMUSIC
decides to enable single RTP media stream negotiation per SDP media
description and combine that with BUNDLE to identify RTP sessions, it
appears that using one or more RTP sessions for simulcast over
unicast will be able to use the same signalling solution. Thus the
decision to use one or more RTP sessions can be taken based on other
limitations, such as cost of NAT/FW traversal, need for flow-based
QoS etc.
A solution proposal for an SDP media description level signaling for
Simulcast version parameters is outlined below.
6. Simulcast Signaling Proposal
Signaling simulcast is about negotiating between media sender and
receiver what the different simulcast versions should be, how to
identify them in terms of RTP streams, and how to relate those RTP
streams.
The proposed solution consists of:
o Signaling simulcast capability as SDP media level attributes in a
first round of Offer/Answer
* Separate send and receive simulcast capabilities
* Media properties that are supported as base for different
simulcast versions are listed as parameters
o Adding SDP media descriptions for the simulcast streams in a
second round of Offer/Answer
* Grouping SDP media descriptions from the same media source,
belonging to the same simulcast, using the SDP grouping
framework [RFC5888]
* Separate send and receive simulcast groupings
* Negotiating parameters for simulcast version using regular,
individual SDP media descriptions
* Identifying RTP media streams (SSRC) from same media source
using new SDES Item SRCNAME
[I-D.westerlund-avtext-rtcp-sdes-srcname]
This is further outlined below.
Westerlund, et al. Expires August 29, 2013 [Page 13]
Internet-Draft RTP Simulcast February 2013
6.1. Simulcast Capability
There are numerous media properties that can be varied to construct a
set of simulcast versions. A simulcast enabled endpoint could also
support simulcast based on several of those properties. As long as
those properties are relatively independent and if each simulcast
version need explicit definition (an m-line) in the SDP, this would
lead to an exponential number of simulcast version candidates and a
very long SDP that is likely also hard to interpret. There is thus a
need to limit the simulcast version candidates included in the SDP to
cover as small set of properties as possible.
If a legacy endpoint not supporting simulcast were to be presented
with an SDP including media descriptions for a set of simulcast
versions, it may not know how to correctly handle or interpret these
"surplus" media descriptions.
Based on the functionality that simulcast is intended to achieve, it
should be clear that the reasons to send simulcast versions are not
the same as to receive simulcast versions, seen from a single
endpoint.
For these reasons, it is proposed to define two new SDP media level
attributes, "a=sim-send" and "a=sim-recv", which explicitly signal
support for simulcast media transmission and simulcast media
reception, respectively, for that media description. "a=sim-send" and
"a=sim-recv" MAY be used independently and simulaneously. These
attributes are also proposed to have parameters indicating the media
properties used to create the simulcast versions. The meaning of the
attributes on SDP session level is undefined and MUST NOT be used.
simulcast = "a="( "sim-send:" / "sim-recv:" ) prop-list
prop-list = prop-entry *(WSP prop-entry)
prop-entry = prop *("=" q-value)
prop = "rtpmap"
/ "fmtp"
/ "imageattr"
/ "ptime"
/ "crypto"
/ token ; for future extensions
q-value = ( "0" "." 1*2DIGIT )
/ ( "1" "." 1*2("0") )
; Values between 0.00 and 1.00
; WSP and DIGIT defined in [RFC5234]
; token defined in [RFC4566]
Figure 3: ABNF for Simulcast
Westerlund, et al. Expires August 29, 2013 [Page 14]
Internet-Draft RTP Simulcast February 2013
The media property values are taken from existing (and could likely
be extended to cover future) SDP attributes that express media
properties that can be varied to create different simulcast versions:
rtpmap: Differences in codec type, sampling rate (see Section 6.4),
and number of channels
fmtp: Differences in codec-specific encoding parameters
imageattr: Differences in video resolution, aspect ratio, and
framerate [RFC6236]
ptime: Differences in frame aggregation per packet
crypto: Differences in encryption [RFC4568]
...:
The optional q-value expresses the relative preference to base a
simulcast version on that media property, with 1.00 meaning maximum
(100%) preference and 0.00 meaning no (0%) preference. Several media
properties can share the same q-value, in which case they are equally
preferred.
An offerer wanting to use simulcast SHALL include either one or both
of those attributes, depending on in which direction(s) simulcast
will be used. An offerer that receives an answer without "a=sim-
send" or "a=sim-recv" MUST NOT define or use any simulcast
alternatives belonging to that media description and in that
direction to the answerer.
An answerer that does not understand the concept of simulcast will
also not know those attributes and will remove them in the SDP
answer, as defined in existing SDP Offer/Answer procedures. An
answerer that does understand the attributes and that wants to
support simulcast in the indicated direction SHALL reverse
directionality of the attribute, "sim-send" becomes "sim-recv" and
vice versa, and include it in the answer.
An offerer that intends to send simulcast alternatives and thus
includes "a=sim-send", MUST also include at least one media property
parameter that it intends to use to construct the simulcast
alternatives, but it MAY include more media property parameters.
Including multiple media property parameters in "a=sim-send" SHALL be
interpreted as an offer to send simulcast versions covering all
combinations thereof, but MAY be further restricted by other
information in the SDP such as for example the number of simulcast-
related media descriptions in the SDP or use of max-ssrc signaling
Westerlund, et al. Expires August 29, 2013 [Page 15]
Internet-Draft RTP Simulcast February 2013
[I-D.westerlund-mmusic-max-ssrc].
An offerer that is capable of receiving simulcast alternatives and
thus includes "a=sim-recv", MUST also include at least one media
property parameter that it is willing to use as discriminator between
received simulcast alternatives, but MAY include more media property
parameters. Including multiple media property parameters in "a=sim-
recv" SHALL be interpreted as an offer to receive simulcast versions
covering all combinations thereof, but MAY be further restricted by
other information in the SDP such as for example the number of
simulcast-related media descriptions in the SDP or use of max-ssrc
signaling [I-D.westerlund-mmusic-max-ssrc].
An answerer either lacks the capability or desire to use simulcast
versions based on a certain media property parameter in a specific
direction MUST remove such media property parameter from "a=sim-send"
or "a=sim-recv". The answerer MUST NOT add any media property
parameters that were not included in the offer.
6.2. Grouping Simulcast Media Descriptions
To relate media descriptions holding simulcast versions, two new
simulcast grouping semantics are defined, "SimulCast Receive" (SCR)
and "SimulCast Send" (SCS). There is a need to separate semantics
for the intent to send simulcast streams from the semantics that
describe capability to recognize and receive simulcast streams. Both
sematics act as an indicator that simulcast is desired and that the
grouped media descriptions (m-lines) carries simulcast versions of
media sources. There may be multiple sets of media descriptions that
carries simulcast versions.
6.2.1. Declarative Use
When used as a declarative media description, SCR indicates the
configured end-point's required capability to recognize and receive a
specified set of RTP streams as simulcast streams. In the same
fashion, SCS requests the end-point to send a specified set of RTP
streams as simulcast streams. SCR and SCS MAY be used independently
and at the same time and they need not specify the same or even the
same number of media descriptions in the group.
6.2.2. Offer/Answer Use
When used in an offer, SCS indicates the SDP providing agent's intent
of sending simulcast and the particular set of media descriptions,
and SCR indicates the agent's capability of receiving simulcast
streams within the configured set of media descriptions. SCS and SCR
MAY be used independently and at the same time and they need not
Westerlund, et al. Expires August 29, 2013 [Page 16]
Internet-Draft RTP Simulcast February 2013
specify the same or even the same number of media descriptions in the
group. The answerer MUST change SCS to SCR and SCR to SCS in the
answer, given that it has and wants to use the corresponding
(reverse) capability. An answerer not supporting the SCS or SCR
direction, or not supporting SCS or SCR grouping semantics at all,
will remove that grouping attribute altogether, according to the
grouping framework [RFC5888]. However, this case should not occur or
at least be very rare due to the proposed two-phase approach
(Section 6.3). An offerer that receives an answer indicating lack of
simulcast support in one or both directions, where SCR and/or SCS
grouping are removed, MUST NOT use simulcast in the non-supported
direction(s).
6.3. Two-Phase Negotiation
These new "a=sim-send" and "a=sim-recv" attributes are proposed to be
included in the SDP as a first phase in a two-phased approach, where
the first phase involves a first SDP Offer/Answer procedure that only
establishes simulcast capability at both the offerer and the
answerer. This has the additional advantage to avoid sending media
descriptions related to simulcast to an endpoint that does not
support simulcast. It is also not likely that it incurs any
significant extra signaling round-trips, given that many other recent
SDP techniques also makes use of two Offer/Answer procedures, as long
as this phased approach can be used in parallel with those. Such
other two-phase techniques include ICE [RFC5245] and BUNDLE
[I-D.ietf-mmusic-sdp-bundle-negotiation].
Thus, the first Offer/Answer SHOULD NOT include any simulcast-grouped
media descriptions, which SHOULD then be added in a second Offer/
Answer phase. This second phase SHOULD be initiated by the simulcast
receiver, meaning the endpoint that included "a=sim-recv" in the
first phase SDP SHOULD be offerer in the second phase. If both
endpoints are simulcast receivers, it is not possible to define a
preferred offerer in the second phase and either endpoint MAY then
send the offer, using regular Offer/Answer rules to handle race
conditions.
The first phase of establishing capability is not possible to use
with declarative SDP, in which case it SHALL be by-passed, using the
second phase media description grouping directly.
6.4. Media Stream Requirements
When doing simulcast, the media streams that are alternatives need to
meet certain constraints to ensure that switching between alternative
streams are as issue-free as possible. The following constraints are
needed:
Westerlund, et al. Expires August 29, 2013 [Page 17]
Internet-Draft RTP Simulcast February 2013
Same Clock Base: To enable correct alignment of media packets on the
source time-line, all alternative streams (SSRCs) MUST use the
same underlying clock to relate their RTP timestamp values with
the network time protocol (NTP) formatted sender time in the RTCP
Sender Reports.
6.5. Relating Alternative Encodings
To ensure that simulcast streams can be related correctly also on RTP
level, the usage of SDES SRCNAME
[I-D.westerlund-avtext-rtcp-sdes-srcname] to label and relate
simulcast versions belonging to the same media source is RECOMMENDED.
6.6. Multiple Stream handling
When using multiple SSRC in a single media description, for example
when using simulcast for multiple independent media sources, the
grouping semantics SCR and SCS SHOULD be combined with the SDP
attributes "a=max-send-ssrc" and "a=max-recv-ssrc"
[I-D.westerlund-mmusic-max-ssrc] to indicate the number of
simultaneous streams of each encoding that may be sent or that can be
handled in the receive direction.
7. Simulcast Signaling Examples
For brevity and clarity, the SDP in all below examples does not
contain signaling for multiple streams, such as the ones related to
RTP level relations (Section 6.5) or multiple SSRC signaling
(Section 6.6).
This example is for a case of client to video conference service
using a centralized media topology with an RTP mixer. Alice and Bob
calls into a conference server for a conference call with audio and
video sent to the RTP mixer, these clients being capable to send a
few video simulcast versions. The conference server also dials out
to Fred, which is a legacy client resulting in fallback behavior.
When dialing out to Joe, more functionality is enabled as Joe is a
client similar to Alice.
Westerlund, et al. Expires August 29, 2013 [Page 18]
Internet-Draft RTP Simulcast February 2013
+---+ +-----------+ +---+
| A |<---->| |<---->| B |
+---+ | | +---+
| Mixer |
+---+ | | +---+
| F |<---->| |<---->| J |
+---+ +-----------+ +---+
Figure 4: Four-party Mixer-based Conference
Example of Media plane for RTP mixer based multi-party conference
with 4 participants.
7.1. Alice: Desktop Client
Alice is calling in to the mixer with an audiovisual single stream
desktop client, only adding capability to send video resolution
[RFC6236] ("imageattr") and framerate based simulcast compared to a
legacy client. The first phase offer from Alice looks like
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:665
m=audio 49200 RTP/AVP 96 97 9 8
b=AS:145
a=rtpmap:96 G719/48000/2
a=rtpmap:97 G719/48000
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-send:imageattr=1.0 fmtp=0.8
a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
a=content:main
Figure 5: Alice First Offer for a Simulcast Conference
In this first phase, the only thing in the SDP that indicates
simulcast capability is the line in the video media description
containing the "sim-send" attribute.
The answer from the server indicates both that it is simulcast
Westerlund, et al. Expires August 29, 2013 [Page 19]
Internet-Draft RTP Simulcast February 2013
capable and that it would only like to use video resolution
("imageattr") based simulcast only. Should it not have been
simulcast capable, the "a=sim-recv" line would not have been present
and communication would have started with the media negotiated in the
SDP.
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Answer to simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.43
b=AS:665
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-recv:imageattr
a=imageattr:* send [x=640,y=360] [x=320,y=180] recv [x=640,y=360]
a=content:main
Figure 6: Server First Answer for a Simulcast Conference
Since the server is the simulcast media receiver, it immediately
initiates another Offer/Answer including the simulcast versions. The
server also keeps the "sim-recv" as explicit simulcast capability
indication in this second Offer/Answer round. Note that the "non-
simulcast" media can be started already now, before the second phase
Offer/Answer, with the only restriction that the simulcast
functionality is not yet established.
Westerlund, et al. Expires August 29, 2013 [Page 20]
Internet-Draft RTP Simulcast February 2013
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Server inviting simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.43
b=AS:825
a=group:SCR 2 3
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-recv:imageattr
a=imageattr:* send [x=640,y=360] [x=320,y=180] recv [x=640,y=360]
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:160
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 recv [x=320,y=180]
a=mid:3
a=recvonly
Figure 7: Server Second Offer for a Simulcast Conference
The server has added one additional receive-only media description
with the simulcast version based on difference only in imageattr.
That the two media lines are considered to be simulcast versions is
seen from the SCR grouping tag and the two media IDs (2 and 3). The
first video version with media ID 2 prefers 360p resolution (signaled
via imageattr) and the second video version with media ID 3 prefers
180p resolution. The first video media line also acts as the single
send video (making media line sendrecv), while the second video media
line is only related to simulcast transmission and is thus offered
recvonly.
The fact that fmtp for this second video is also different should be
seen as a secondary effect from the change of resolution and does not
create any kind of conflict. The capabilities of Alice's client is
very well aligned with this and the SDP answer is straightforward.
Westerlund, et al. Expires August 29, 2013 [Page 21]
Internet-Draft RTP Simulcast February 2013
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Final answer from simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:825
a=group:SCS 2 3
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-send:imageattr
a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:160
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 send [x=320,y=180]
a=mid:3
a=sendonly
Figure 8: Alice Second Answer for a Simulcast Conference
8. IANA Considerations
This document requests that two new attributes sim-send and sim-recv,
with a new registry of defined parameters taken from existing SDP
attributes, and two new SDP grouping semantics, SCS and SCR, are
registered.
Formal registrations to be written.
9. Security Considerations
The simulcast capability attributes and parameters are vulnerable to
attacks in signaling.
A false inclusion of simulcast attributes may result in generation of
a second phase SDP that potentially contains a large number of non-
supported media descriptions expressing simulcast alternatives. A
Westerlund, et al. Expires August 29, 2013 [Page 22]
Internet-Draft RTP Simulcast February 2013
correct SDP implementation will however be able to reject any non-
supported media descriptions and the effect from that should be
limited.
A hostile removal of the simulcast attributes will result in skipping
any second phase Offer/Answer and that simulcast is not used.
The simulcast grouping semantics are vulnerable to attacks in the
signalling.
A false grouping of non-simulcast streams as simulcast would risk
that some streams are incorrectly ignored by receivers that know
simulcast and that are not interested in the assumed simulcast
streams.
A hostile removal of simulcast grouping will prevent streams from
being interpreted as simulcast, which obviously prevents use of the
simulcast functionality. It will also risk that intended simulcast
streams are instead presented as separate, independent streams to a
receiver.
Neither of the above will likely have any major consequences and can
be mitigated by signaling that is at least integrity and source
authenticated to prevent an attacker to change it.
10. Acknowledgements
11. References
11.1. Normative References
[I-D.westerlund-avtext-rtcp-sdes-srcname]
Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES
Item SRCNAME to Label Individual Sources",
draft-westerlund-avtext-rtcp-sdes-srcname-00 (work in
progress), October 2011.
[I-D.westerlund-mmusic-max-ssrc]
Holmberg, C., Westerlund, M., Burman, B., and F. Jansson,
"Multiple Synchronization Sources (SSRC) in SDP Media
Descriptions", draft-westerlund-mmusic-max-ssrc-00 (work
in progress), September 2012.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
Westerlund, et al. Expires August 29, 2013 [Page 23]
Internet-Draft RTP Simulcast February 2013
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
Description Protocol", RFC 4566, July 2006.
[RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session
Description Protocol (SDP) Security Descriptions for Media
Streams", RFC 4568, July 2006.
[RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific
Media Attributes in the Session Description Protocol
(SDP)", RFC 5576, June 2009.
[RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description
Protocol (SDP) Grouping Framework", RFC 5888, June 2010.
[RFC6236] Johansson, I. and K. Jung, "Negotiation of Generic Image
Attributes in the Session Description Protocol (SDP)",
RFC 6236, May 2011.
11.2. Informative References
[I-D.ietf-mmusic-sdp-bundle-negotiation]
Holmberg, C., Alvestrand, H., and C. Jennings,
"Multiplexing Negotiation Using Session Description
Protocol (SDP) Port Numbers",
draft-ietf-mmusic-sdp-bundle-negotiation-03 (work in
progress), February 2013.
[I-D.lennox-avtcore-rtp-multi-stream]
Lennox, J. and M. Westerlund, "Real-Time Transport
Protocol (RTP) Considerations for Endpoints Sending
Multiple Media Streams",
draft-lennox-avtcore-rtp-multi-stream-01 (work in
progress), October 2012.
[I-D.westerlund-avtcore-multiplex-architecture]
Westerlund, M., Burman, B., and C. Perkins, "RTP
Multiplexing Architecture",
draft-westerlund-avtcore-multiplex-architecture-00 (work
in progress), October 2011.
[I-D.westerlund-avtcore-transport-multiplexing]
Westerlund, M., "Multiple RTP Session on a Single Lower-
Layer Transport",
draft-westerlund-avtcore-transport-multiplexing-00 (work
Westerlund, et al. Expires August 29, 2013 [Page 24]
Internet-Draft RTP Simulcast February 2013
in progress), October 2011.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
with Session Description Protocol (SDP)", RFC 3264,
June 2002.
[RFC3569] Bhattacharyya, S., "An Overview of Source-Specific
Multicast (SSM)", RFC 3569, July 2003.
[RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
July 2006.
[RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
January 2008.
[RFC5245] Rosenberg, J., "Interactive Connectivity Establishment
(ICE): A Protocol for Network Address Translator (NAT)
Traversal for Offer/Answer Protocols", RFC 5245,
April 2010.
[RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
"RTP Payload Format for Scalable Video Coding", RFC 6190,
May 2011.
Authors' Addresses
Magnus Westerlund
Ericsson
Farogatan 6
SE-164 80 Kista
Sweden
Phone: +46 10 714 82 87
Email: magnus.westerlund@ericsson.com
Bo Burman
Ericsson
Farogatan 6
SE-164 80 Kista
Sweden
Phone: +46 10 714 13 11
Email: bo.burman@ericsson.com
Westerlund, et al. Expires August 29, 2013 [Page 25]
Internet-Draft RTP Simulcast February 2013
Morgan Lindqvist
Ericsson
Farogatan 6
Kista, SE-164 80
Sweden
Phone: +46 10 719 00 00
Fax:
Email: morgan.lindqvist@ericsson.com
URI:
Fredrik Jansson
Ericsson
Farogatan 6
Kista, SE-164 80
Sweden
Phone: +46 10 719 00 00
Fax:
Email: fredrik.k.jansson@ericsson.com
URI:
Westerlund, et al. Expires August 29, 2013 [Page 26]