Skip to main content

Session Description Protocol (SDP) Capability Negotiation
RFC 5939

Document Type RFC - Proposed Standard (September 2010) Errata
Updated by RFC 6871
Author Flemming Andreasen
Last updated 2020-01-21
RFC stream Internet Engineering Task Force (IETF)
Formats
Additional resources Mailing list discussion
IESG Responsible AD Robert Sparks
Send notices to (None)
RFC 5939
quot;) as defined in Section
      3.4.1 for each attribute name and associated value (if any) that
      needs to be indicated as a capability in the offer.  Attribute
      capabilities may be included irrespective of whether or not they
      are referenced by a potential configuration.

      Session-level attributes and associated values MUST be provided in
      attribute capabilities only at the session level, whereas media-
      level attributes and associated values can be provided in
      attribute capabilities at either the media level or session level.
      Attributes that are allowed at either the session or media level
      can be provided in attribute capabilities at either level.

   o  Zero or more transport protocol capability attributes.  There MUST
      be transport protocol capabilities as defined in Section 3.4.2
      with values for each transport protocol that needs to be indicated
      as a capability in the offer.

      Transport protocol capabilities may be included irrespective of
      whether or not they are referenced by a potential configuration.
      Transport protocols that apply to multiple media descriptions
      SHOULD be provided as transport protocol capabilities at the
      session level whereas transport protocols that apply only to a
      specific media description ("m=" line), SHOULD be provided as
      transport protocol capabilities within that particular media
      description.  In either case, there MUST NOT be more than a single
      "a=tcap" attribute at the session level and a single "a=tcap"
      attribute in each media description.

Andreasen                    Standards Track                   [Page 32]
RFC 5939               SDP Capability Negotiation         September 2010

   o  Zero or more extension capability attributes.  There MUST be one
      or more extension capability attributes (as outlined in Section
      3.4.3) for each extension capability that is referenced by a
      potential configuration.  Extension capability attributes that are
      not referenced by a potential configuration can be provided as
      well.

   o  Zero or more potential configuration attributes.  There MUST be
      one or more potential configuration attributes ("a=pcfg"), as
      defined in Section 3.5.1, in each media description where
      alternative potential configurations are to be negotiated.  Each
      potential configuration attribute MUST adhere to the rules
      provided in Section 3.5.1 and the additional rules provided below.

   If the offerer requires support for one or more extensions (besides
   the base protocol defined here), then the offerer MUST include one or
   more "a=creq" attributes as follows:

   o  If support for one or more capability negotiation extensions is
      required for the entire session description, then option tags for
      those extensions MUST be included in a single session-level "creq"
      attribute.

   o  For each media description that requires support for one or more
      capability negotiation extensions not listed at the session level,
      a single "creq" attribute containing all the required extensions
      for that media description MUST be included within the media
      description (in accordance with Section 3.3.2).

   Note that extensions that only need to be supported by a particular
   potential configuration can use the "mandatory" extension prefix
   ("+") within the potential configuration (see Section 3.5.1).

   The offerer SHOULD furthermore include the following:

   o  A supported capability negotiation extension attribute ("a=csup")
      at the session level and/or media level as defined in Section
      3.3.2 for each capability negotiation extension supported by the
      offerer and not included in a corresponding "a=creq" attribute
      (i.e., at the session level or in the same media description).
      Option tags provided in a "a=csup" attribute at the session level
      indicate extensions supported for the entire session description,
      whereas option tags provided in a "a=csup" attribute in a media
      description indicate extensions supported for only that particular
      media description.

Andreasen                    Standards Track                   [Page 33]
RFC 5939               SDP Capability Negotiation         September 2010

   Capabilities provided in an offer merely indicate what the offerer is
   capable of doing.  They do not constitute a commitment or even an
   indication to use them.  In contrast, each potential configuration
   constitutes an alternative offer that the offerer would like to use.
   The potential configurations MUST be used by the answerer to
   negotiate and establish the session.

   The offerer MUST include one or more potential configuration
   attributes ("a=pcfg") in each media description where the offerer
   wants to provide alternative offers (in the form of potential
   configurations).  Each potential configuration attribute in a given
   media description MUST contain a unique configuration number and
   zero, one or more potential configuration lists, as described in
   Section 3.5.1.  Each potential configuration list MUST refer to
   capabilities that are provided at the session level or within that
   particular media description; otherwise, the potential configuration
   is considered invalid.  The base SDP Capability Negotiation framework
   REQUIRES that potential configurations not reference any session-
   level attribute capabilities that contain media-level-only
   attributes; however, extensions may modify this behavior, as long as
   it is fully backwards compatible with the base specification.
   Furthermore, it is RECOMMENDED that potential configurations avoid
   use of session-level capabilities whenever possible; refer to Section
   3.5.1.

   The current actual configuration is included in the "m=" line (as
   defined by [RFC3264]) and any associated parameters for the media
   description (e.g., attribute ("a=") and bandwidth ("b=") lines).
   Note that the actual configuration is by default the least-preferred
   configuration, and hence the answerer will seek to negotiate use of
   one of the potential configurations instead.  If the offerer wishes a
   different preference for the actual configuration, the offerer MUST
   include a corresponding potential configuration with the relevant
   configuration number (which indicates the relative preference between
   potential configurations); this corresponding potential configuration
   should simply duplicate the actual configuration.

      This can either be done implicitly (by not referencing any
      capabilities), or explicitly (by providing and using capabilities
      for the transport protocol and all the attributes that are part of
      the actual configuration).  The latter may help detect
      intermediaries that modify the actual configuration but are not
      SDP Capability Negotiation aware.

   Per [RFC3264], once the offerer generates the offer, he must be
   prepared to receive incoming media in accordance with that offer.
   That rule applies here as well, but only for the actual
   configurations provided in the offer: Media received by the offerer

Andreasen                    Standards Track                   [Page 34]
RFC 5939               SDP Capability Negotiation         September 2010

   according to one of the potential configurations MAY be discarded,
   until the offerer receives an answer indicating what the actual
   selected configuration is.  Once that answer is received, incoming
   media MUST be processed in accordance with the actual selected
   configuration indicated and the answer received (provided the
   offer/answer exchange completed successfully).

   The above rule assumes that the offerer can determine whether
   incoming media adheres to the actual configuration offered or one of
   the potential configurations instead; this may not always be the
   case.  If the offerer wants to ensure he does not play out any
   garbage, the offerer SHOULD discard all media received before the
   answer SDP session description is received.  Conversely, if the
   offerer wants to avoid clipping, he SHOULD attempt to play any
   incoming media as soon as it is received (at the risk of playing out
   garbage).  In either case, please note that this document does not
   place any requirements on the offerer to process and play media
   before answer.  For further details, please refer to Section 3.9.

3.6.2.  Generating the Answer

   When receiving an offer, the answerer MUST check for the presence of
   a required capability negotiation extension attribute ("a=creq")
   provided at the session level.  If one is found, then capability
   negotiation MUST be performed.  If none is found, then the answerer
   MUST check each offered media description for the presence of a
   required capability negotiation extension attribute ("a=creq") and
   one or more potential configuration attributes ("a=pcfg").
   Capability negotiation MUST be performed for each media description
   where either of those is present in accordance with the procedures
   described below.

   The answerer MUST first ensure that it supports any required
   capability negotiation extensions:

   o  If a session-level "creq" attribute is provided, and it contains
      an option tag that the answerer does not support, then the
      answerer MUST NOT use any of the potential configuration
      attributes provided for any of the media descriptions.  Instead,
      the normal offer/answer procedures MUST continue as per [RFC3264].
      Furthermore, the answerer MUST include a session-level supported
      capability negotiation extensions attribute ("a=csup") with option
      tags for the capability negotiation extensions supported by the
      answerer.

   o  If a media-level "creq" attribute is provided, and it contains an
      option tag that the answerer does not support, then the answerer
      MUST NOT use any of the potential configuration attributes

Andreasen                    Standards Track                   [Page 35]
RFC 5939               SDP Capability Negotiation         September 2010

      provided for that particular media description.  Instead, the
      offer/answer procedures for that media description MUST continue
      as per [RFC3264] (SDP Capability Negotiation is still performed
      for other media descriptions in the SDP session description).
      Furthermore, the answerer MUST include a supported capability
      negotiation extensions attribute ("a=csup") in that media
      description with option tags for the capability negotiation
      extensions supported by the answerer for that media description.

   Assuming all required capability negotiation extensions are
   supported, the answerer now proceeds as follows.

   For each media description where capability negotiation is to be
   performed (i.e., all required capability negotiation extensions are
   supported and at least one valid potential configuration attribute is
   present), the answerer MUST perform capability negotiation by using
   the most preferred potential configuration that is valid to the
   answerer, subject to any local policies.  A potential configuration
   is valid to the answerer if:

   1. It is in accordance with the syntax and semantics provided in
      Section 3.5.1.

   2. It contains a configuration number that is unique within that
      media description.

   3. All attribute capabilities referenced by the potential
      configuration are valid themselves (as defined in Section 3.4.1)
      and each of them is provided either at the session level or within
      this particular media description.

      For session-level attribute capabilities referenced, the
      attributes contained inside them MUST NOT be media-level-only
      attributes.  Note that the answerer can only determine this for
      attributes supported by the answerer.  If an attribute is not
      supported, it will simply be ignored by the answerer and hence
      will not trigger an "invalid" potential configuration.

   4. All transport protocol capabilities referenced by the potential
      configuration are valid themselves (as defined in Section 3.4.2)
      and each of them is furthermore provided either at the session
      level or within this particular media description.

   5. All extension capabilities referenced by the potential
      configuration and supported by the answerer are valid themselves
      (as defined by that particular extension) and each of them are
      furthermore provided either at the session level or within this
      particular media description.  Unknown or unsupported extension

Andreasen                    Standards Track                   [Page 36]
RFC 5939               SDP Capability Negotiation         September 2010

      capabilities MUST be ignored, unless they are prefixed with the
      plus ("+") sign, which indicates that the extension MUST be
      supported in order to use that potential configuration.  If the
      extension is not supported, that potential configuration is not
      valid to the answerer.

   The most preferred valid potential configuration in a media
   description is the valid potential configuration with the lowest
   configuration number.  The answerer MUST now process the offer for
   that media stream based on the most preferred valid potential
   configuration.  Conceptually, this entails the answerer constructing
   an (internal) offer as follows.  First, all capability negotiation
   parameters from the offer SDP session description are removed,
   thereby yielding an offer SDP session description with the actual
   configuration as if SDP Capability Negotiation was not done in the
   first place.  Secondly, this actual configuration SDP session
   description is modified as follows for each media stream offered,
   based on the capability negotiation parameters included originally:

   o  If a transport protocol capability is included in the potential
      configuration, then it replaces the transport protocol provided in
      the "m=" line for that media description.

   o  If attribute capabilities are present with a delete-attributes
      session indication ("-s") or media and session indication ("-ms"),
      then all session-level attributes from the actual configuration
      SDP session description MUST be deleted in the resulting potential
      configuration SDP session description in accordance with the
      procedures in Section 3.5.1.  If attribute capabilities are
      present with a delete-attributes media indication ("-m") or media
      and session indication ("-ms"), then all attributes from the
      actual configuration SDP session description inside this media
      description MUST be deleted.

   o  If a session-level attribute capability is included, the attribute
      (and its associated value, if any) contained in it MUST be added
      to the resulting SDP session description.  All such added session-
      level attributes MUST be listed before the session-level
      attributes that were initially present in the SDP session
      description.  Furthermore, the added session-level attributes MUST
      be added in the order they were provided in the potential
      configuration (see also Section 3.5.1).

         This allows for attributes with implicit preference ordering to
         be added in the desired order; the "crypto" attribute [RFC4568]
         is one such example.

Andreasen                    Standards Track                   [Page 37]
RFC 5939               SDP Capability Negotiation         September 2010

   o  If a media-level attribute capability is included, then the
      attribute (and its associated value, if any) MUST be added to the
      resulting SDP session description within the media description in
      question.  All such added media-level attributes MUST be listed
      before the media-level attributes that were initially present in
      the media description in question.  Furthermore, the added media-
      level attributes MUST be added in the order they were provided in
      the potential configuration (see also Section 3.5.1).

   o  If a supported extension capability is included, then it MUST be
      processed in accordance with the rules provided for that
      particular extension capability.

   The above steps MUST be performed exactly once per potential
   configuration, i.e., there MUST NOT be any recursive processing of
   any additional capability negotiation parameters that may (illegally)
   have been nested inside capabilities themselves.

   As an example of this, consider the (illegal) attribute capability

    a=acap:1 acap:2 foo:a

   The resulting potential configuration SDP session description will,
   after the above processing has been done, contain the attribute
   capability

    a=acap:2 foo:a

   However, since we do not perform any recursive processing of
   capability negotiation parameters, this second attribute capability
   parameter will not be processed by the offer/answer procedure.
   Instead, it will simply appear as a (useless) attribute in the SDP
   session description that will be ignored by further processing.

   Note that a transport protocol from the potential configuration
   replaces the transport protocol in the actual configuration, but an
   attribute capability from the potential configuration is simply added
   to the actual configuration.  In some cases, this can result in
   having one or more meaningless attributes in the resulting potential
   configuration SDP session description, or worse, ambiguous or
   potentially even illegal attributes.  Use of delete-attributes for
   the session- and/or media-level attributes MUST be done to avoid such
   scenarios.  Nevertheless, it is RECOMMENDED that implementations
   ignore meaningless attributes that may result from potential
   configurations.

Andreasen                    Standards Track                   [Page 38]
RFC 5939               SDP Capability Negotiation         September 2010

      For example, if the actual configuration was using Secure RTP and
      included an "a=crypto" attribute for the SRTP keying material,
      then use of a potential configuration that uses plain RTP would
      make the "crypto" attribute meaningless.  The answerer may or may
      not ignore such a meaningless attribute.  The offerer can here
      ensure correct operation by using delete-attributes to remove the
      "crypto" attribute (but will then need to provide attribute
      capabilities to reconstruct the SDP session description with the
      necessary attributes deleted, e.g., rtpmaps).

   Also note, that while it is permissible to include media-level
   attribute capabilities at the session level, the base SDP Capability
   Negotiation framework defined here does not define any procedures for
   use of them, i.e., the answerer effectively ignores them.

   Please refer to Section 3.6.2.1 for examples of how the answerer may
   conceptually "see" the resulting offered alternative potential
   configurations.

   The answerer MUST check that he supports all mandatory attribute
   capabilities from the potential configuration (if any), the transport
   protocol capability (if any) from the potential configuration, and
   all mandatory extension capabilities from the potential configuration
   (if any).  If he does not, the answerer MUST proceed to the second
   most preferred valid potential configuration for the media
   description, etc.

   o  In the case of attribute capabilities, support implies that the
      attribute name contained in the capability is supported and it can
      (and will) be negotiated successfully in the offer/answer exchange
      with the value provided.  This does not necessarily imply that the
      value provided is supported in its entirety.  For example, the
      "a=fmtp" parameter is often provided with one or more values in a
      list, where the offerer and answerer negotiate use of some subset
      of the values provided.  Other attributes may include mandatory
      and optional parts to their values; support for the mandatory part
      is all that is required here.

         A side effect of the above rule is that whenever an "fmtp" or
         "rtpmap" parameter is provided as a mandatory attribute
         capability, the corresponding media format (codec) must be
         supported and use of it negotiated successfully.  If this is
         not the offerer's intent, the corresponding attribute
         capabilities must be listed as optional instead.

Andreasen                    Standards Track                   [Page 39]
RFC 5939               SDP Capability Negotiation         September 2010

   o  In the case of transport protocol capabilities, support implies
      that the transport protocol contained in the capability is
      supported and the transport protocol can (and will) be negotiated
      successfully in the offer/answer exchange.

   o  In the case of extension capabilities, the extension MUST define
      the rules for when the extension capability is considered
      supported and those rules MUST be satisfied.

   If the answerer has exhausted all potential configurations for the
   media description, without finding a valid one that is also
   supported, then the answerer MUST process the offered media stream
   based on the actual configuration plus any session-level attributes
   added by a valid and supported potential configuration from another
   media description in the offered SDP session description.

   The above process describes potential configuration selection as a
   per-media-stream process.  Inter-media stream coordination of
   selected potential configurations however is required in some cases.
   First of all, session-level attributes added by a potential
   configuration for one media description MUST NOT cause any problems
   for potential configurations selected by other media descriptions in
   the offer SDP session description.  If the session-level attributes
   are mandatory, then those session-level attributes MUST furthermore
   be supported by the session as a whole (i.e., all the media
   descriptions if relevant).  As mentioned earlier, this adds
   additional complexity to the overall processing and hence it is
   RECOMMENDED not to use session-level attribute capabilities in
   potential configurations, unless absolutely necessary.

   Once the answerer has selected a valid and supported offered
   potential configuration for all of the media streams (or has fallen
   back to the actual configuration plus any added session attributes),
   the answerer MUST generate a valid virtual answer SDP session
   description based on the selected potential configuration SDP session
   description, as "seen" by the answerer using normal offer/answer
   rules (see Section 3.6.2.1 for examples).  The actual answer SDP
   session description is formed from the virtual answer SDP session
   description as follows: if the answerer selected one of the potential
   configurations in a media description, the answerer MUST include an
   actual configuration attribute ("a=acfg") within that media
   description.  The "a=acfg" attribute MUST identify the configuration
   number for the selected potential configuration as well as the actual
   parameters that were used from that potential configuration; if the
   potential configuration included alternatives, the selected
   alternatives only MUST be included.  Only the known and supported
   parameters will be included.  Unknown or unsupported parameters MUST
   NOT be included in the actual configuration attribute.  In the case

Andreasen                    Standards Track                   [Page 40]
RFC 5939               SDP Capability Negotiation         September 2010

   of attribute capabilities, only the known and supported capabilities
   are included; unknown or unsupported attribute capabilities MUST NOT
   be included.

   If the answerer supports one or more capability negotiation
   extensions that were not included in a required capability
   negotiation extensions attribute in the offer, then the answerer
   SHOULD furthermore include a supported capability negotiation
   attribute ("a=csup") at the session level with option tags for the
   extensions supported across media streams.  Also, if the answerer
   supports one or more capability negotiation extensions for only
   particular media descriptions, then a supported capability
   negotiation attribute with those option tags SHOULD be included
   within each relevant media description.  The required capability
   negotiation attribute ("a=creq") MUST NOT be used in an answer.

   The offerer's originally provided actual configuration is contained
   in the offer media description's "m=" line (and associated
   parameters).  The answerer MAY send media to the offerer in
   accordance with that actual configuration as soon as it receives the
   offer; however, it MUST NOT send media based on that actual
   configuration if it selects an alternative potential configuration.
   If the answerer selects one of the potential configurations, then the
   answerer MAY immediately start to send media to the offerer in
   accordance with the selected potential configuration; however, the
   offerer MAY discard such media or play out garbage until the offerer
   receives the answer.  Please refer to Section 3.9.  for additional
   considerations and possible alternative solutions outside the base
   SDP Capability Negotiation framework.

   If the answerer selected a potential configuration instead of the
   actual configuration, then it is RECOMMENDED that the answerer send
   back an answer SDP session description as soon as possible.  This
   minimizes the risk of having media discarded or played out as garbage
   by the offerer.  In the case of SIP [RFC3261] without any extensions,
   this implies that if the offer was received in an INVITE message,
   then the answer SDP session description should be provided in the
   first non-100 provisional response sent back (per RFC 3261, the
   answer would need to be repeated in the 200 response as well, unless
   a relevant extension such as [RFC3262] is being used).

Andreasen                    Standards Track                   [Page 41]
RFC 5939               SDP Capability Negotiation         September 2010

3.6.2.1.  Example Views of Potential Configurations

   The following examples illustrate how the answerer may conceptually
   "see" a potential configuration.  Consider the following offered SDP
   session description:

      v=0
      o=alice 2891092738 2891092738 IN IP4 lost.example.com
      s=
      t=0 0
      c=IN IP4 lost.example.com
      a=tool:foo
      a=acap:1 key-mgmt:mikey AQAFgM0XflABAAAAAAAAAAAAAAsAyO...
      a=tcap:1 RTP/SAVP RTP/AVP
      m=audio 59000 RTP/AVP 98
      a=rtpmap:98 AMR/8000
      a=acap:2 crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
      a=pcfg:1 t=1 a=1|2
      m=video 52000 RTP/AVP 31
      a=rtpmap:31 H261/90000
      a=acap:3 crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
      a=pcfg:1 t=1 a=1|3

   This particular SDP session description offers an audio stream and a
   video stream, each of which can either use plain RTP (actual
   configuration) or Secure RTP (potential configuration).  Furthermore,
   two different keying mechanisms are offered, namely session-level Key
   Management Extensions using MIKEY (attribute capability 1) and media-
   level SDP security descriptions (attribute capabilities 2 and 3).
   There are several potential configurations here, however, below we
   show the one the answerer "sees" when using potential configuration 1
   for both audio and video, and furthermore using attribute capability
   1 (MIKEY) for both (we have removed all the capability negotiation
   attributes for clarity):

      v=0
      o=alice 2891092738 2891092738 IN IP4 lost.example.com
      s=
      t=0 0
      c=IN IP4 lost.example.com
      a=tool:foo
      a=key-mgmt:mikey AQAFgM0XflABAAAAAAAAAAAAAAsAyO...
      m=audio 59000 RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      m=video 52000 RTP/SAVP 31
      a=rtpmap:31 H261/90000

Andreasen                    Standards Track                   [Page 42]
RFC 5939               SDP Capability Negotiation         September 2010

   Note that the transport protocol in the media descriptions indicate
   use of Secure RTP.

   Below, we show the offer the answerer "sees" when using potential
   configuration 1 for both audio and video and furthermore using
   attribute capability 2 and 3, respectively, (SDP security
   descriptions) for the audio and video stream -- note the order in
   which the resulting attributes are provided:

      v=0
      o=alice 2891092738 2891092738 IN IP4 lost.example.com
      s=
      t=0 0
      c=IN IP4 lost.example.com
      a=tool:foo
      m=audio 59000 RTP/SAVP 98
      a=crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
      a=rtpmap:98 AMR/8000
      m=video 52000 RTP/SAVP 31
      a=crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
         a=rtpmap:31 H261/90000

   Again, note that the transport protocol in the media descriptions
   indicate use of Secure RTP.

   And finally, we show the offer the answerer "sees" when using
   potential configuration 1 with attribute capability 1 (MIKEY) for the
   audio stream, and potential configuration 1 with attribute capability
   3 (SDP security descriptions) for the video stream:

      v=0
      o=alice 2891092738 2891092738 IN IP4 lost.example.com
      s=
      t=0 0
      c=IN IP4 lost.example.com
      a=key-mgmt:mikey AQAFgM0XflABAAAAAAAAAAAAAAsAyO...
      a=tool:foo
      m=audio 59000 RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      m=video 52000 RTP/SAVP 31
      a=crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
      a=rtpmap:31 H261/90000

Andreasen                    Standards Track                   [Page 43]
RFC 5939               SDP Capability Negotiation         September 2010

3.6.3.  Offerer Processing of the Answer

   When the offerer attempted to use SDP Capability Negotiation in the
   offer, the offerer MUST examine the answer for actual use of SDP
   Capability Negotiation.

   For each media description where the offerer included a potential
   configuration attribute ("a=pcfg"), the offerer MUST first examine
   that media description for the presence of a valid actual
   configuration attribute ("a=acfg").  An actual configuration
   attribute is valid if:

   o  it refers to a potential configuration that was present in the
      corresponding offer, and

   o  it contains the actual parameters that were used from that
      potential configuration; if the potential configuration included
      alternatives, the selected alternatives only MUST be included.
      Note that the answer will include only parameters and attribute
      capabilities that are known and supported by the answerer, as
      described in Section 3.6.2.

   If a valid actual configuration attribute is not present in a media
   description, then the offerer MUST process the answer SDP session
   description for that media stream per the normal offer/answer rules
   defined in [RFC3264].  However, if a valid one is found, the offerer
   MUST instead process the answer as follows:

   o  The actual configuration attribute specifies which of the
      potential configurations was used by the answerer to generate the
      answer for this media stream.  This includes all the supported
      attribute capabilities and the transport capabilities referenced
      by the potential configuration selected, where the attribute
      capabilities have any associated delete-attributes included.
      Extension capabilities supported by the answerer are included as
      well.

   o  The offerer MUST now process the answer in accordance with the
      rules in [RFC3264], except that it must be done as if the offer
      consisted of the selected potential configuration instead of the
      original actual configuration, including any transport protocol
      changes in the media ("m=") line(s), attributes added and deleted
      by the potential configuration at the media and session level, and
      any extensions used.  If this derived answer is not a valid answer
      to the potential configuration offer selected by the answerer, the
      offerer MUST instead continue further processing as it would have
      for a regular offer/answer exchange, where the answer received
      does not adhere to the rules of [RFC3264].

Andreasen                    Standards Track                   [Page 44]
RFC 5939               SDP Capability Negotiation         September 2010

   If the offer/answer exchange was successful, and if the answerer
   selected one of the potential configurations from the offer as the
   actual configuration, and the selected potential configuration
   differs from the actual configuration in the offer (the "m=", "a=",
   etc., lines), then the offerer SHOULD initiate another offer/answer
   exchange.  This second offer/answer exchange will not modify the
   session in any way; however, it will help intermediaries (e.g.,
   middleboxes), which look at the SDP session description but do not
   support the capability negotiation extensions, understand the details
   of the media stream(s) that were actually negotiated.  This new offer
   MUST contain the selected potential configuration as the actual
   configuration, i.e., with the actual configuration used in the "m="
   line and any other relevant attributes, bandwidth parameters, etc.

   Note that, per normal offer/answer rules, the second offer/answer
   exchange still needs to update the version number in the "o=" line
   (<sess-version> in [RFC4566]).  Attribute lines carrying keying
   material SHOULD repeat the keys from the previous offer, unless
   re-keying is necessary, e.g., due to a previously forked SIP INVITE
   request.  Please refer to Section 3.12 for additional considerations
   related to intermediaries.

3.6.4.  Modifying the Session

   Capabilities and potential configurations may be included in
   subsequent offers as defined in [RFC3264], Section 8.  The procedure
   for doing so is similar to that described above with the answer
   including an indication of the actual selected configuration used by
   the answerer.

   If the answer indicates use of a potential configuration from the
   offer, then the guidelines provided in Section 3.6.3 for doing a
   second offer/answer exchange using that potential configuration as
   the actual configuration apply.

3.7.  Interactions with ICE

   Interactive Connectivity Establishment (ICE) [RFC5245] provides a
   mechanism for verifying connectivity between two endpoints by sending
   Session Traversal Utilities for NAT (STUN) messages directly between
   the media endpoints.  The basic ICE specification [RFC5245] is only
   defined to support UDP-based connectivity; however, it allows for
   extensions to support other transport protocols, such as TCP, which
   is being specified in [ICETCP].  ICE defines a new "a=candidate"
   attribute, which, among other things, indicates the possible
   transport protocol(s) to use and then associates a priority with each
   of them.  The most preferred transport protocol that *successfully*
   verifies connectivity will end up being used.

Andreasen                    Standards Track                   [Page 45]
RFC 5939               SDP Capability Negotiation         September 2010

   When using ICE, it is thus possible that the transport protocol that
   will be used differs from what is specified in the "m=" line.  Since
   both ICE and SDP Capability Negotiation may specify alternative
   transport protocols, there is a potentially unintended interaction
   when using these together.

   We provide the following guidelines for addressing that.

   There are two basic scenarios to consider:

   1) A particular media stream can run over different transport
      protocols (e.g., UDP, TCP, or TCP/TLS), and the intent is simply
      to use the one that works (in the preference order specified).

   2) A particular media stream can run over different transport
      protocols (e.g., UDP, TCP, or TCP/TLS) and the intent is to have
      the negotiation process decide which one to use (e.g., T.38 over
      TCP or UDP).

   In scenario 1, there should be ICE "a=candidate" attributes for UDP,
   TCP, etc., but otherwise nothing special in the potential
   configuration attributes to indicate the desire to use different
   transport protocols (e.g., UDP, or TCP).  The ICE procedures
   essentially cover the capability negotiation required (by having the
   answerer select something it supports and then use of trial and error
   connectivity checks).

   Scenario 2 does not require a need to support or use ICE.  Instead,
   we simply use transport protocol capabilities and potential
   configuration attributes to indicate the desired outcome.

   The scenarios may be combined, e.g., by offering potential
   configuration alternatives where some of them can support only one
   transport protocol (e.g., UDP), whereas others can support multiple
   transport protocols (e.g., UDP or TCP).  In that case, there is a
   need for tight control over the ICE candidates that will be used for
   a particular configuration, yet the actual configuration may want to
   use all of the ICE candidates.  In that case, the ICE candidate
   attributes can be defined as attribute capabilities and the relevant
   ones should then be included in the proper potential configurations
   (for example, candidate attributes for UDP only for potential
   configurations that are restricted to UDP, whereas there could be
   candidate attributes for UDP, TCP, and TCP/TLS for potential
   configurations that can use all three).  Furthermore, use of the
   delete-attributes in a potential configuration can be used to ensure
   that ICE will not end up using a transport protocol that is not
   desired for a particular configuration.

Andreasen                    Standards Track                   [Page 46]
RFC 5939               SDP Capability Negotiation         September 2010

   SDP Capability Negotiation recommends use of a second offer/answer
   exchange when the negotiated actual configuration was one of the
   potential configurations from the offer (see Section 3.6.3).
   Similarly, ICE requires use of a second offer/answer exchange if the
   chosen candidate is not the same as the one in the m/c-line from the
   offer.  When ICE and capability negotiation are used at the same
   time, the two secondary offer/answer exchanges SHOULD be combined to
   a single one.

3.8.  Interactions with SIP Option Tags

   SIP [RFC3261] allows for SIP extensions to define a SIP option tag
   that identifies the SIP extension.  Support for one or more such
   extensions can be indicated by use of the SIP Supported header, and
   required support for one or more such extensions can be indicated by
   use of the SIP Require header.  The "a=csup" and "a=creq" attributes
   defined by the SDP Capability Negotiation framework are similar,
   except that support for these two attributes by themselves cannot be
   guaranteed (since they are specified as extensions to the SDP
   specification [RFC4566] itself).

   SIP extensions with associated option tags can introduce enhancements
   to not only SIP, but also SDP.  This is for example the case for SIP
   preconditions defined in [RFC3312].  When using SDP Capability
   Negotiation, some potential configurations may include certain SDP
   extensions, whereas others may not.  Since the purpose of the SDP
   Capability Negotiation is to negotiate a session based on the
   features supported by both sides, use of the SIP Require header for
   such extensions may not produce the desired result.  For example, if
   one potential configuration requires SIP preconditions support,
   another does not, and the answerer does not support preconditions,
   then use of the SIP Require header for preconditions would result in
   a session failure, in spite of the fact that a valid and supported
   potential configuration was included in the offer.

   In general, this can be alleviated by use of mandatory and optional
   attribute capabilities in a potential configuration.  There are
   however cases where permissible SDP values are tied to the use of the
   SIP Require header.  SIP preconditions [RFC3312] is one such example,
   where preconditions with a "mandatory" strength-tag can only be used
   when a SIP Require header with the SIP option tag "precondition" is
   included.  Future SIP extensions that may want to use the SDP
   Capability Negotiation framework should avoid such coupling.

Andreasen                    Standards Track                   [Page 47]
RFC 5939               SDP Capability Negotiation         September 2010

3.9.  Processing Media before Answer

   The offer/answer model [RFC3264] requires an offerer to be able to
   receive media in accordance with the offer prior to receiving the
   answer.  This property is retained with the SDP Capability
   Negotiation extensions defined here, but only when the actual
   configuration is selected by the answerer.  If a potential
   configuration is chosen, the offerer may decide not to process any
   media received before the answer is received.  This may lead to
   clipping.  Consequently, the SDP Capability Negotiation framework
   recommends sending back an answer SDP session description as soon as
   possible.

   The issue can be resolved by introducing a three-way handshake.  In
   the case of SIP, this can, for example, be done by defining a
   precondition [RFC3312] for capability negotiation (or by using an
   existing precondition that is known to generate a second offer/answer
   exchange before proceeding with the session).  However, preconditions
   are often viewed as complicated to implement and they may add to
   overall session establishment delay by requiring an extra
   offer/answer exchange.

   An alternative three-way handshake can be performed by use of ICE
   [RFC5245].  When ICE is being used, and the answerer receives a STUN
   Binding Request for any one of the accepted media streams from the
   offerer, the answerer knows the offer has received his answer.  At
   that point, the answerer knows that the offerer will be able to
   process incoming media according to the negotiated configuration and
   hence he can start sending media without the risk of the offerer
   either discarding it or playing garbage.

   Please note that, the above considerations notwithstanding, this
   document does not place any requirements on the offerer to process
   and play media before answer; it merely provides recommendations for
   how to ensure that media sent by the answerer and received by the
   offerer prior to receiving the answer can in fact be rendered by the
   offerer.

   In some use cases, a three-way handshake is not needed.  An example
   is when the offerer does not need information from the answer, such
   as keying material in the SDP session description, in order to
   process incoming media.  The SDP Capability Negotiation framework
   does not define any such solutions; however, extensions may do so.
   For example, one technique proposed for best-effort SRTP in [BESRTP]
   is to provide different RTP payload type mappings for different
   transport protocols used, outside of the actual configuration, while
   still allowing them to be used by the answerer (exchange of keying

Andreasen                    Standards Track                   [Page 48]
RFC 5939               SDP Capability Negotiation         September 2010

   material is still needed, e.g., inband).  The basic SDP Capability
   Negotiation framework defined here does not include the ability to do
   so; however, extensions that enable that may be defined.

3.10.  Indicating Bandwidth Usage

   The amount of bandwidth used for a particular media stream depends on
   the negotiated codecs, transport protocol and other parameters.  For
   example the use of Secure RTP [RFC3711] with integrity protection
   requires more bandwidth than plain RTP [RFC3551].  SDP defines the
   bandwidth ("b=") parameter to indicate the proposed bandwidth for the
   session or media stream.

   In SDP, as defined by [RFC4566], each media description contains one
   transport protocol and one or more codecs.  When specifying the
   proposed bandwidth, the worst case scenario must be taken into
   account, i.e., use of the highest bandwidth codec provided, the
   transport protocol indicated, and the worst case (bandwidth-wise)
   parameters that can be negotiated (e.g., a 32-bit Hashed Message
   Authentication Code (HMAC) or an 80-bit HMAC).

   The base SDP Capability Negotiation framework does not provide a way
   to negotiate bandwidth parameters.  The issue thus remains; however,
   it is potentially worse than with SDP per [RFC4566], since it is
   easier to negotiate additional codecs, and furthermore possible to
   negotiate different transport protocols.  The recommended approach
   for addressing this is the same as for plain SDP; the worst case (now
   including potential configurations) needs to be taken into account
   when specifying the bandwidth parameters in the actual configuration.
   This can make the bandwidth value less accurate than in SDP per
   [RFC4566] (due to potential greater variability in the potential
   configuration bandwidth use).  Extensions can be defined to address
   this shortcoming.

   Note, that when using RTP retransmission [RFC4588] with the RTCP-
   based feedback profile [RFC4585] (RTP/AVPF), the retransmitted
   packets are part of the media stream bandwidth when using
   synchronization source (SSRC) multiplexing.  If a feedback-based
   protocol is offered as the actual configuration transport protocol, a
   non-feedback-based protocol is offered as a potential configuration
   transport protocol and ends up being used, the actual bandwidth usage
   may be lower than the indicated bandwidth value in the offer (and
   vice versa).

Andreasen                    Standards Track                   [Page 49]
RFC 5939               SDP Capability Negotiation         September 2010

3.11.  Dealing with Large Number of Potential Configurations

   When using the SDP Capability Negotiation, it is easy to generate
   offers that contain a large number of potential configurations.  For
   example, in the offer:

      v=0
      o=- 25678 753849 IN IP4 192.0.2.1
      s=
      c=IN IP4 192.0.2.1
      t=0 0
      m=audio 53456 RTP/AVP 0 18
      a=tcap:1 RTP/SAVPF RTP/SAVP RTP/AVPF
      a=acap:1 crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:WVNfX19zZW1jdGwgKCkgewkyMjA7fQp9CnVubGVz|2^20|1:4
         FEC_ORDER=FEC_SRTP
      a=acap:2 key-mgmt:mikey AQAFgM0XflABAAAAAAAAAAAAAAsAyO...
      a=acap:3 rtcp-fb:0 nack
      a=pcfg:1 t=1 a=1,3|2,3
      a=pcfg:2 t=2 a=1|2
      a=pcfg:3 t=3 a=3

   we have 5 potential configurations on top of the actual configuration
   for a single media stream.  Adding an extension capability with just
   two alternatives for each would double that number (to 10), and doing
   the equivalent with two media streams would again double that number
   (to 20).  While it is easy (and inexpensive) for the offerer to
   generate such offers, processing them at the answering side may not
   be.  Consequently, it is RECOMMENDED that offerers do not create
   offers with unnecessarily large number of potential configurations in
   them.

   On the answering side, implementers MUST take care to avoid excessive
   memory and CPU consumption.  For example, a naive implementation that
   first generates all the valid potential configuration SDP session
   descriptions internally, could find itself being memory exhausted,
   especially if it supports a large number of endpoints.  Similarly, a
   naive implementation that simply performs iterative trial-and-error
   processing on each possible potential configuration SDP session
   description (in the preference order specified) could find itself
   being CPU constrained.  An alternative strategy is to prune the
   search space first by discarding the set of offered potential
   configurations where the transport protocol indicated (if any) is not
   supported, and/or one or more mandatory attribute capabilities (if
   any) are either not supported or not valid.  Potential configurations
   with unsupported mandatory extension configurations in them can be
   discarded as well.

Andreasen                    Standards Track                   [Page 50]
RFC 5939               SDP Capability Negotiation         September 2010

3.12.  SDP Capability Negotiation and Intermediaries

   An intermediary is here defined as an entity between a SIP user agent
   A and a SIP user agent B, that needs to perform some kind of
   processing on the SDP session descriptions exchanged between A and B,
   in order for the session establishment to operate as intended.
   Examples of such intermediaries include Session Border Controllers
   (SBCs) that may perform media relaying, Proxy Call Session Control
   Functions (P-CSCFs) that may authorize use of a certain amount of
   network resources (bandwidth), etc.  The presence and design of such
   intermediaries may not follow the "Internet" model or the SIP
   requirements for proxies (which are not supposed to look in message
   bodies such as SDP session descriptions); however, they are a fact of
   life in some deployment scenarios and hence deserve consideration.

   If the intermediary needs to understand the characteristics of the
   media sessions being negotiated, e.g., the amount of bandwidth used
   or the transport protocol negotiated, then use of the SDP Capability
   Negotiation framework may impact them.  For example, some
   intermediaries are known to disallow answers where the transport
   protocol differs from the one in the offer.  Use of the SDP
   Capability Negotiation framework in the presence of such
   intermediaries could lead to session failures.  Intermediaries that
   need to authorize use of network resources based on the negotiated
   media stream parameters are affected as well.  If they inspect only
   the offer, then they may authorize parameters assuming a different
   transport protocol, codecs, etc., than what is actually being
   negotiated.  For these, and other, reasons it is RECOMMENDED that
   implementers of intermediaries add support for the SDP Capability
   Negotiation framework.

   The SDP Capability Negotiation framework itself attempts to help out
   these intermediaries as well, by recommending a second offer/answer
   exchange when use of a potential configuration has been negotiated
   (see Section 3.6.3).  However, there are several limitations with
   this approach.  First of all, although the second offer/answer
   exchange is RECOMMENDED, it is not required and hence may not be
   performed.  Secondly, the intermediary may refuse the initial answer,
   e.g., due to perceived transport protocol mismatch.  Thirdly, the
   strategy is not foolproof since the offer/answer procedures [RFC3264]
   leave the original offer/answer exchange in effect when a subsequent
   one fails.  Consider the following example:

   1. Offerer generates an SDP session description offer with the actual
      configuration specifying a low-bandwidth configuration (e.g.,
      plain RTP) and a potential configuration specifying a high(er)
      bandwidth configuration (e.g., Secure RTP with integrity).

Andreasen                    Standards Track                   [Page 51]
RFC 5939               SDP Capability Negotiation         September 2010

   2. An intermediary (e.g., an SBC or P-CSCF), that does not support
      SDP Capability Negotiation, authorizes the session based on the
      actual configuration it sees in the SDP session description.

   3. The answerer chooses the high(er) bandwidth potential
      configuration and generates an answer SDP session description
      based on that.

   4. The intermediary passes through the answer SDP session
      description.

   5. The offerer sees the accepted answer, and generates an updated
      offer that contains the selected potential configuration as the
      actual configuration.  In other words, the high(er) bandwidth
      configuration (which has already been negotiated successfully) is
      now the actual configuration in the offer SDP session description.

   6. The intermediary sees the new offer; however, it does not
      authorize the use of the high(er) bandwidth configuration, and
      consequently generates a rejection message to the offerer.

   7. The offerer receives the rejected offer.

   After step 7, per RFC 3264, the offer/answer exchange that completed
   in step 5 remains in effect; however, the intermediary may not have
   authorized the necessary network resources and hence the media stream
   may experience quality issues.  The solution to this problem is to
   upgrade the intermediary to support the SDP Capability Negotiation
   framework.

3.13.  Considerations for Specific Attribute Capabilities

3.13.1.  The "rtpmap" and "fmtp" Attributes

   The base SDP Capability Negotiation framework defines transport
   capabilities and attribute capabilities.  Media capabilities, which
   can be used to describe media formats and their associated
   parameters, are not defined in this document; however, the "rtpmap"
   and "fmtp" attributes can nevertheless be used as attribute
   capabilities.  Using such attribute capabilities in a potential
   configuration requires a bit of care though.

   The rtpmap parameter binds an RTP payload type to a media format
   (e.g., codec).  While it is possible to provide rtpmaps for payload
   types not found in the corresponding "m=" line, such rtpmaps provide
   no value in normal offer/answer exchanges, since only the payload
   types found in the "m=" line are part of the offer (or answer).  This
   applies to the base SDP Capability Negotiation framework as well.

Andreasen                    Standards Track                   [Page 52]
RFC 5939               SDP Capability Negotiation         September 2010

   Only the media formats (e.g., RTP payload types) provided in the "m="
   line are actually offered; inclusion of "rtpmap" attributes with
   other RTP payload types in a potential configuration does not change
   this fact and hence they do not provide any useful information there.
   They may still be useful as pure capabilities though (outside a
   potential configuration) in order to inform a peer of additional
   codecs supported.

   It is possible to provide an "rtpmap" attribute capability with a
   payload type mapping to a different codec than a corresponding actual
   configuration "rtpmap" attribute for the media description has.  Such
   practice is permissible as a way of indicating a capability.  If that
   capability is included in a potential configuration, then delete-
   attributes (see Section 3.5.1) MUST be used to ensure that there is
   not multiple "rtpmap" attributes for the same payload type in a given
   media description (which would not be allowed by SDP [RFC4566]).

   Similar considerations and rules apply to the "fmtp" attribute.  An
   "fmtp" attribute capability for a media format not included in the
   "m=" line is useless in a potential configuration (but may be useful
   as a capability by itself).  An "fmtp" attribute capability in a
   potential configuration for a media format that already has an "fmtp"
   attribute in the actual configuration may lead to multiple fmtp
   format parameters for that media format and that is not allowed by
   SDP [RFC4566].  The delete-attributes MUST be used to ensure that
   there are not multiple "fmtp" attributes for a given media format in
   a media description.

   Extensions to the base SDP Capability Negotiation framework may
   change the above behavior.

3.13.2.  Direction Attributes

   SDP defines the "inactive", "sendonly", "recvonly", and "sendrecv"
   direction attributes.  The direction attributes can be applied at
   either the session level or the media level.  In either case, it is
   possible to define attribute capabilities for these direction
   capabilities; if used by a potential configuration, the normal
   offer/answer procedures still apply.  For example, if an offered
   potential configuration includes the "sendonly" direction attribute,
   and it is selected as the actual configuration, then the answer MUST
   include a corresponding "recvonly" (or "inactive") attribute.

Andreasen                    Standards Track                   [Page 53]
RFC 5939               SDP Capability Negotiation         September 2010

3.14.  Relationship to RFC 3407

   RFC 3407 defines capability descriptions with limited abilities to
   describe attributes, bandwidth parameters, transport protocols and
   media formats.  RFC 3407 does not define any negotiation procedures
   for actually using those capability descriptions.

   This document defines new attributes for describing attribute
   capabilities and transport capabilities.  It also defines procedures
   for using those capabilities as part of an offer/answer exchange.  In
   contrast to RFC 3407, this document does not define bandwidth
   parameters, and it also does not define how to express ranges of
   values.  Extensions to this document may be defined in order to fully
   cover all the capabilities provided by RFC 3407 (for example, more
   general media capabilities).

   It is RECOMMENDED that implementations use the attributes and
   procedures defined in this document instead of those defined in
   [RFC3407].  If capability description interoperability with legacy
   RFC 3407 implementations is desired, implementations MAY include both
   RFC 3407 capability descriptions and capabilities defined by this
   document.  The offer/answer negotiation procedures defined in this
   document will not use the RFC 3407 capability descriptions.

4.  Examples

   In this section, we provide examples showing how to use the SDP
   Capability Negotiation.

4.1.  Multiple Transport Protocols

   The following example illustrates how to use the SDP Capability
   Negotiation extensions to negotiate use of one out of several
   possible transport protocols.  The offerer uses the expected least-
   common-denominator (plain RTP) as the actual configuration, and the
   alternative transport protocols as the potential configurations.

Andreasen                    Standards Track                   [Page 54]
RFC 5939               SDP Capability Negotiation         September 2010

   The example is illustrated by the offer/answer exchange below, where
   Alice sends an offer to Bob:

                Alice                               Bob

                  | (1) Offer (RTP/[S]AVP[F])        |
                  |--------------------------------->|
                  |                                  |
                  | (2) Answer (RTP/AVPF)            |
                  |<---------------------------------|
                  |                                  |
                  | (3) Offer (RTP/AVPF)             |
                  |--------------------------------->|
                  |                                  |
                  | (4) Answer (RTP/AVPF)            |
                  |<---------------------------------|
                  |                                  |

   Alice's offer includes plain RTP (RTP/AVP), RTP with RTCP-based
   feedback (RTP/AVPF), Secure RTP (RTP/SAVP), and Secure RTP with RTCP-
   based feedback (RTP/SAVPF) as alternatives.  RTP is the default, with
   RTP/SAVPF, RTP/SAVP, and RTP/AVPF as the alternatives and preferred
   in the order listed:

      v=0
      o=- 25678 753849 IN IP4 192.0.2.1
      s=
      c=IN IP4 192.0.2.1
      t=0 0
      m=audio 53456 RTP/AVP 0 18
      a=tcap:1 RTP/SAVPF RTP/SAVP RTP/AVPF
      a=acap:1 crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:WVNfX19zZW1jdGwgKCkgewkyMjA7fQp9CnVubGVz|2^20|1:4
         FEC_ORDER=FEC_SRTP
      a=acap:2 rtcp-fb:0 nack
      a=pcfg:1 t=1 a=1,[2]
      a=pcfg:2 t=2 a=1
      a=pcfg:3 t=3 a=[2]

   The "m=" line indicates that Alice is offering to use plain RTP with
   PCMU or G.729.  The capabilities are provided by the "a=tcap" and
   "a=acap" attributes.  The "tcap" capability indicates that Secure RTP
   with RTCP-based feedback (RTP/SAVPF), Secure RTP (RTP/SAVP), and RTP
   with RTCP-based feedback are supported.  The first "acap" attribute
   provides an attribute capability with a handle of 1.  The capability
   is a "crypto" attribute, which provides the keying material for SRTP
   using SDP security descriptions [RFC4568].  The second "acap"
   attribute provides an attribute capability with a handle of 2.  The

Andreasen                    Standards Track                   [Page 55]
RFC 5939               SDP Capability Negotiation         September 2010

   capability is an "rtcp-fb" attribute, which is used by the RTCP-based
   feedback profiles to indicate that payload type 0 (PCMU) supports
   feedback type "nack".  The "a=pcfg" attributes provide the potential
   configurations included in the offer by reference to the
   capabilities.  There are three potential configurations:

   o  Potential configuration 1, which is the most preferred potential
      configuration specifies use of transport protocol capability 1
      (RTP/SAVPF) and attribute capabilities 1 (the "crypto" attribute)
      and 2 (the "rtcp-fb" attribute).  Support for the first one is
      mandatory whereas support for the second one is optional.

   o  Potential configuration 2, which is the second most preferred
      potential configuration specifies use of transport protocol
      capability 2 (RTP/SAVP) and mandatory attribute capability 1 (the
      "crypto" attribute).

   o  Potential configuration 3, which is the least preferred potential
      configuration (but the second least preferred configuration
      overall, since the actual configuration provided by the "m=" line
      is always the least preferred configuration), specifies use of
      transport protocol capability 3 (RTP/AVPF) and optional attribute
      capability 2 (the "rtcp-fb" attribute).

   Bob receives the SDP session description offer from Alice.  Bob does
   not support any Secure RTP profiles; however, he supports plain RTP
   and RTP with RTCP-based feedback, as well as the SDP Capability
   Negotiation extensions, and hence he accepts the potential
   configuration for RTP with RTCP-based feedback provided by Alice:

      v=0
      o=- 24351 621814 IN IP4 192.0.2.2
      s=
      c=IN IP4 192.0.2.2
      t=0 0
      m=audio 54568 RTP/AVPF 0 18
      a=rtcp-fb:0 nack
      a=acfg:1 t=3 a=[2]

   Bob includes the "a=acfg" attribute in the answer to inform Alice
   that he based his answer on an offer containing the potential
   configuration with transport protocol capability 3 and optional
   attribute capability 2 from the offer SDP session description (i.e.,
   the RTP/AVPF profile using the "rtcp-fb" value provided).  Bob also
   includes an "rtcp-fb" attribute with the value "nack" value for RTP
   payload type 0.

Andreasen                    Standards Track                   [Page 56]
RFC 5939               SDP Capability Negotiation         September 2010

   When Alice receives Bob's answer, session negotiation has completed,
   however Alice nevertheless chooses to generate a new offer using the
   actual configuration.  This is done purely to assist any
   intermediaries that may reside between Alice and Bob but do not
   support the SDP Capability Negotiation framework (and hence may not
   understand the negotiation that just took place):

   Alice's updated offer includes only RTP/AVPF, and it is not using the
   SDP Capability Negotiation framework (Alice could have included the
   capabilities as well if she wanted):

      v=0
      o=- 25678 753850 IN IP4 192.0.2.1
      s=
      c=IN IP4 192.0.2.1
      t=0 0
      m=audio 53456 RTP/AVPF 0 18
      a=rtcp-fb:0 nack

   The "m=" line now indicates that Alice is offering to use RTP with
   RTCP-based feedback and using PCMU or G.729.  The "rtcp-fb" attribute
   provides the feedback type "nack" for payload type 0 again (but as
   part of the actual configuration).

   Bob receives the SDP session description offer from Alice, which he
   accepts, and then generates an answer to Alice:

      v=0
      o=- 24351 621815 IN IP4 192.0.2.2
      s=
      c=IN IP4 192.0.2.2
      t=0 0
      m=audio 54568 RTP/AVPF 0 18
      a=rtcp-fb:0 nack

   Bob includes the same "rtcp-fb" attribute as before, and the session
   proceeds without change.  Although Bob did not include any
   capabilities in his answer, he could have done so if he wanted.

   Note that in this particular example, the answerer supported the SDP
   Capability Negotiation framework and hence the attributes and
   procedures defined here; however, had he not, the answerer would
   simply have ignored the new attributes received in step 1 and
   accepted the offer to use normal RTP.  In that case, the following
   answer would have been generated in step 2 instead:

Andreasen                    Standards Track                   [Page 57]
RFC 5939               SDP Capability Negotiation         September 2010

      v=0
      o=- 24351 621814 IN IP4 192.0.2.2
      s=
      c=IN IP4 192.0.2.2
      t=0 0
      m=audio 54568 RTP/AVP 0 18

4.2.  DTLS-SRTP or SRTP with Media-Level Security Descriptions

   The following example illustrates how to use the SDP Capability
   Negotiation framework to negotiate use of SRTP using either SDP
   security descriptions or DTLS-SRTP.  The offerer (Alice) wants to
   establish a Secure RTP audio stream but is willing to use plain RTP.
   Alice prefers to use DTLS-SRTP as the key management protocol, but
   supports SDP security descriptions as well (note that [RFC5763]
   contains additional DTLS-SRTP examples).

   The example is illustrated by the offer/answer exchange below, where
   Alice sends an offer to Bob:

             Alice                                     Bob

               | (1) Offer (RTP/[S]AVP,SDES | DTLS-SRTP)|
               |--------------------------------------->|
               |                                        |
               |<--------- DTLS-SRTP handshake -------->|
               |                                        |
               | (2) Answer (DTLS-SRTP)                 |
               |<---------------------------------------|
               |                                        |
               | (3) Offer (DTLS-SRTP)                  |
               |--------------------------------------->|
               |                                        |
               | (4) Answer (DTLS-SRTP)                 |
               |<---------------------------------------|
               |                                        |

   Alice's offer includes an audio stream that offers use of plain RTP
   and Secure RTP as alternatives.  For the Secure RTP stream, it can be
   established using either DTLS-SRTP or SDP security descriptions:

Andreasen                    Standards Track                   [Page 58]
RFC 5939               SDP Capability Negotiation         September 2010

      v=0
      o=- 25678 753849 IN IP4 192.0.2.1
      s=
      t=0 0
      c=IN IP4 192.0.2.1
      a=acap:1 setup:actpass
      a=acap:2 fingerprint: SHA-1 \
            4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
      a=tcap:1 UDP/TLS/RTP/SAVP RTP/SAVP
      m=audio 59000 RTP/AVP 98
      a=rtpmap:98 AMR/8000
      a=acap:3 crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
      a=pcfg:1 t=1 a=1,2
      a=pcfg:2 t=2 a=3

   The first (and preferred) potential configuration for the audio
   stream specifies use of transport capability 1 (UDP/TLS/RTP/SAVP),
   i.e., DTLS-SRTP, and attribute capabilities 1 and 2 (active/passive
   mode and certificate fingerprint), both of which must be supported to
   choose this potential configuration.  The second (and less preferred)
   potential configuration specifies use of transport capability 2
   (RTP/SAVP) and mandatory attribute capability 3, i.e., the SDP
   security description.

   Bob receives the SDP session description offer from Alice.  Bob
   supports DTLS-SRTP as preferred by Alice and Bob now initiates the
   DTLS-SRTP handshake to establish the DTLS-SRTP session (see [RFC5764]
   for details).

   Bob also sends back an answer to Alice as follows:

      v=0
      o=- 24351 621814 IN IP4 192.0.2.2
      s=
      a=setup:active
      a=fingerprint: SHA-1 \
        FF:FF:FF:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
      t=0 0
      c=IN IP4 192.0.2.2
      m=audio 54568 UDP/TLS/RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      a=acfg:1 t=1 a=1,2

   For the audio stream, Bob accepted the use of DTLS-SRTP, and hence
   the profile in the "m=" line is "UDP/TLS/RTP/SAVP".  Bob also
   includes a "setup:active" attribute to indicate he is the active

Andreasen                    Standards Track                   [Page 59]
RFC 5939               SDP Capability Negotiation         September 2010

   endpoint for the DTLS-SRTP session as well as the fingerprint for
   Bob's certificate.  Bob's "acfg" attribute indicates that he chose
   potential configuration 1 from Alice's offer.

   When Alice receives Bob's answer, session negotiation has completed
   (and Alice can verify the DTLS handshake using Bob's certificate
   fingerprint in the answer); however, Alice nevertheless chooses to
   generate a new offer using the actual configuration.  This is done
   purely to assist any intermediaries that may reside between Alice and
   Bob but do not support the capability negotiation extensions (and
   hence may not understand the negotiation that just took place).

   Alice's updated offer includes only DTLS-SRTP for the audio stream,
   and it is not using the SDP Capability Negotiation framework (Alice
   could have included the capabilities as well if she wanted):

      v=0
      o=- 25678 753850 IN IP4 192.0.2.1
      s=
      t=0 0
      c=IN IP4 192.0.2.1
      a=setup:actpass
      a=fingerprint: SHA-1 \
            4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
      m=audio 59000 UDP/TLS/RTP/AVP 98
      a=rtpmap:98 AMR/8000

   The "m=" line for the audio stream now indicates that Alice is
   offering to use DTLS-SRTP in active/passive mode using her
   certificate fingerprint provided.

   Bob receives the SDP session description offer from Alice, which he
   accepts, and then generates an answer to Alice:

      v=0
      o=- 24351 621814 IN IP4 192.0.2.2
      s=
      a=setup:active
      a=fingerprint: SHA-1 \
        FF:FF:FF:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
      t=0 0
      c=IN IP4 192.0.2.2
      m=audio 54568 UDP/TLS/RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      a=acfg:1 t=1 a=1,2

Andreasen                    Standards Track                   [Page 60]
RFC 5939               SDP Capability Negotiation         September 2010

   Bob includes the same "setup:active" and fingerprint attributes as
   before, and the session proceeds without change.  Although Bob did
   not include any capabilities in his answer, he could have done so if
   he wanted.

   Note that in this particular example, the answerer supported the
   capability extensions defined here; however, had he not, the answerer
   would simply have ignored the new attributes received in step 1 and
   accepted the offer to use normal RTP.  In that case, the following
   answer would have been generated in step 2 instead:

      v=0
      o=- 24351 621814 IN IP4 192.0.2.2
      s=
      t=0 0
      c=IN IP4 192.0.2.2
      m=audio 54568 RTP/AVP 98
      a=rtpmap:98 AMR/8000

   Finally, if Bob had chosen to use SDP security descriptions instead
   of DTLS-SRTP, the following answer would have been generated:

      v=0
      o=- 24351 621814 IN IP4 192.0.2.2
      s=
      t=0 0
      c=IN IP4 192.0.2.2
      m=audio 54568 RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      a=crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:WSJ+PSdFcGdUJShpX1ZjNzB4d1BINUAvLEw6UzF3|2^20|1:32
      a=acfg:2 t=2 a=3

4.3.  Best-Effort SRTP with Session-Level MIKEY and Media-Level Security
      Descriptions

   The following example illustrates how to use the SDP Capability
   Negotiation extensions to support so-called Best-Effort Secure RTP as
   well as alternative keying mechanisms, more specifically MIKEY
   [RFC3830] and SDP security descriptions.  The offerer (Alice) wants
   to establish an audio and video session.  Alice prefers to use
   session-level MIKEY as the key management protocol, but supports SDP
   security descriptions as well.

   The example is illustrated by the offer/answer exchange below, where
   Alice sends an offer to Bob:

Andreasen                    Standards Track                   [Page 61]
RFC 5939               SDP Capability Negotiation         September 2010

             Alice                                     Bob

               | (1) Offer (RTP/[S]AVP[F], SDES|MIKEY)  |
               |--------------------------------------->|
               |                                        |
               | (2) Answer (RTP/SAVP, SDES)            |
               |<---------------------------------------|
               |                                        |
               | (3) Offer (RTP/SAVP, SDES)             |
               |--------------------------------------->|
               |                                        |
               | (4) Answer (RTP/SAVP, SDES)            |
               |<---------------------------------------|
               |                                        |

   Alice's offer includes an audio and a video stream.  The audio stream
   offers use of plain RTP and Secure RTP as alternatives, whereas the
   video stream offers use of plain RTP, RTP with RTCP-based feedback,
   Secure RTP, and Secure RTP with RTCP-based feedback as alternatives:

      v=0
      o=- 25678 753849 IN IP4 192.0.2.1
      s=
      t=0 0
      c=IN IP4 192.0.2.1
      a=acap:1 key-mgmt:mikey AQAFgM0XflABAAAAAAAAAAAAAAsAyO...
      a=tcap:1 RTP/SAVPF RTP/SAVP RTP/AVPF
      m=audio 59000 RTP/AVP 98
      a=rtpmap:98 AMR/8000
      a=acap:2 crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
      a=pcfg:1 t=2 a=1|2
      m=video 52000 RTP/AVP 31
      a=rtpmap:31 H261/90000
      a=acap:3 crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
      a=acap:4 rtcp-fb:* nack
      a=pcfg:1 t=1 a=1,4|3,4
      a=pcfg:2 t=2 a=1|3
      a=pcfg:3 t=3 a=4

   The potential configuration for the audio stream specifies use of
   transport capability 2 (RTP/SAVP) and either attribute capability 1
   (session-level MIKEY as the keying mechanism) or 2 (SDP security
   descriptions as the keying mechanism).  Support for either of these
   attribute capabilities is mandatory.  There are three potential
   configurations for the video stream.

Andreasen                    Standards Track                   [Page 62]
RFC 5939               SDP Capability Negotiation         September 2010

   o  The first configuration with configuration number 1 uses transport
      capability 1 (RTP/SAVPF) with either attribute capabilities 1 and
      4 (session-level MIKEY and the "rtcp-fb" attribute) or attribute
      capabilities 3 and 4 (SDP security descriptions and the "rtcp-fb"
      attribute).  In this example, the offerer insists on not only the
      keying mechanism being supported, but also that the "rtcp-fb"
      attribute is supported with the value indicated.  Consequently,
      all the attribute capabilities are marked as mandatory in this
      potential configuration.

   o  The second configuration with configuration number 2 uses
      transport capability 2 (RTP/SAVP) and either attribute capability
      1 (session-level MIKEY) or attribute capability 3 (SDP security
      descriptions).  Both attribute capabilities are mandatory in this
      configuration.

   o  The third configuration with configuration number 3 uses transport
      capability 3 (RTP/AVPF) and mandatory attribute capability 4 (the
      "rtcp-fb" attribute).

   Bob receives the SDP session description offer from Alice.  Bob
   supports Secure RTP, Secure RTP with RTCP-based feedback and the SDP
   Capability Negotiation extensions.  Bob also supports SDP security
   descriptions, but not MIKEY, and hence he generates the following
   answer:

      v=0
      o=- 24351 621814 IN IP4 192.0.2.2
      s=
      t=0 0
      c=IN IP4 192.0.2.2
      m=audio 54568 RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      a=crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:WSJ+PSdFcGdUJShpX1ZjNzB4d1BINUAvLEw6UzF3|2^20|1:32
      a=acfg:1 t=2 a=2
      m=video 55468 RTP/SAVPF 31
      a=rtpmap:31 H261/90000
      a=crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:AwWpVLFJhQX1cfHJSojd0RmdmcmVCspeEc3QGZiN|2^20|1:32
      a=rtcp-fb:* nack
      a=acfg:1 t=1 a=3,4

   For the audio stream, Bob accepted the use of Secure RTP, and hence
   the profile in the "m=" line is "RTP/SAVP".  Bob also includes a
   "crypto" attribute with his own keying material, and an "acfg"
   attribute identifying actual configuration 1 for the audio media
   stream from the offer, using transport capability 2 (RTP/SAVP) and

Andreasen                    Standards Track                   [Page 63]
RFC 5939               SDP Capability Negotiation         September 2010

   attribute capability 2 (the "crypto" attribute from the offer).  For
   the video stream, Bob accepted the use of Secure RTP with RTCP-based
   feedback, and hence the profile in the "m=" line is "RTP/SAVPF".  Bob
   also includes a "crypto" attribute with his own keying material, and
   an "acfg" attribute identifying actual configuration 1 for the video
   stream from the offer, using transport capability 1 (RTP/SAVPF) and
   attribute capabilities 3 (the "crypto" attribute from the offer) and
   4 (the "rtcp-fb" attribute from the offer).

   When Alice receives Bob's answer, session negotiation has completed;
   however, Alice nevertheless chooses to generate a new offer using the
   actual configuration.  This is done purely to assist any
   intermediaries that may reside between Alice and Bob but do not
   support the capability negotiation extensions (and hence may not
   understand the negotiation that just took place).

   Alice's updated offer includes only SRTP for the audio stream SRTP
   with RTCP-based feedback for the video stream, and it is not using
   the SDP Capability Negotiation framework (Alice could have included
   the capabilities as well is she wanted):

      v=0
      o=- 25678 753850 IN IP4 192.0.2.1
      s=
      t=0 0
      c=IN IP4 192.0.2.1
      m=audio 59000 RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      a=crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
      m=video 52000 RTP/SAVPF 31
      a=rtpmap:31 H261/90000
      a=crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
      a=rtcp-fb:* nack

   The "m=" line for the audio stream now indicates that Alice is
   offering to use Secure RTP with PCMU or G.729, whereas the "m=" line
   for the video stream indicates that Alice is offering to use Secure
   RTP with RTCP-based feedback and H.261.  Each media stream includes a
   "crypto" attribute, which provides the SRTP keying material, with the
   same value again.

Andreasen                    Standards Track                   [Page 64]
RFC 5939               SDP Capability Negotiation         September 2010

   Bob receives the SDP session description offer from Alice, which he
   accepts, and then generates an answer to Alice:

      v=0
      o=- 24351 621815 IN IP4 192.0.2.2
      s=
      t=0 0
      c=IN IP4 192.0.2.2
      m=audio 54568 RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      a=crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:WSJ+PSdFcGdUJShpX1ZjNzB4d1BINUAvLEw6UzF3|2^20|1:32
      m=video 55468 RTP/SAVPF 31
      a=rtpmap:31 H261/90000
      a=crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:AwWpVLFJhQX1cfHJSojd0RmdmcmVCspeEc3QGZiN|2^20|1:32
      a=rtcp-fb:* nack

   Bob includes the same "crypto" attribute as before, and the session
   proceeds without change.  Although Bob did not include any
   capabilities in his answer, he could have done so if he wanted.

   Note that in this particular example, the answerer supported the
   capability extensions defined here; however, had he not, the answerer
   would simply have ignored the new attributes received in step 1 and
   accepted the offer to use normal RTP.  In that case, the following
   answer would have been generated in step 2 instead:

      v=0
      o=- 24351 621814 IN IP4 192.0.2.2
      s=
      t=0 0
      c=IN IP4 192.0.2.2
      m=audio 54568 RTP/AVP 98
      a=rtpmap:98 AMR/8000
      m=video 55468 RTP/AVP 31
      a=rtpmap:31 H261/90000
      a=rtcp-fb:* nack

   Finally, if Bob had chosen to use session-level MIKEY instead of SDP
   security descriptions, the following answer would have been
   generated:

Andreasen                    Standards Track                   [Page 65]
RFC 5939               SDP Capability Negotiation         September 2010

      v=0
      o=- 24351 621814 IN IP4 192.0.2.2
      s=
      t=0 0
      c=IN IP4 192.0.2.2
      a=key-mgmt:mikey AQEFgM0XflABAAAAAAAAAAAAAAYAyO...
      m=audio 54568 RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      a=acfg:1 t=2 a=1
      m=video 55468 RTP/SAVPF 31
      a=rtpmap:31 H261/90000
      a=rtcp-fb:* nack
      a=acfg:1 t=1 a=1,4

   It should be noted, that although Bob could have chosen session-level
   MIKEY for one media stream, and SDP security descriptions for another
   media stream, there are no well-defined offerer processing rules of
   the resulting answer for this, and hence the offerer may incorrectly
   assume use of MIKEY for both streams.  To avoid this, if the answerer
   chooses session-level MIKEY, then all Secure RTP-based media streams
   SHOULD use MIKEY (this applies irrespective of whether or not SDP
   Capability Negotiation is being used).  Use of media-level MIKEY does
   not have a similar constraint.

4.4.  SRTP with Session-Level MIKEY and Media-Level Security
      Descriptions as Alternatives

   The following example illustrates how to use the SDP Capability
   Negotiation framework to negotiate use of either MIKEY or SDP
   security descriptions, when one of them is included as part of the
   actual configuration, and the other one is being selected.  The
   offerer (Alice) wants to establish an audio and video session.  Alice
   prefers to use session-level MIKEY as the key management protocol,
   but supports SDP security descriptions as well.

   The example is illustrated by the offer/answer exchange below, where
   Alice sends an offer to Bob:

             Alice                                     Bob

               | (1) Offer (RTP/[S]AVP[F], SDES|MIKEY)  |
               |--------------------------------------->|
               |                                        |
               | (2) Answer (RTP/SAVP, SDES)            |
               |<---------------------------------------|
               |                                        |

Andreasen                    Standards Track                   [Page 66]
RFC 5939               SDP Capability Negotiation         September 2010

   Alice's offer includes an audio and a video stream.  Both the audio
   and the video stream offer use of Secure RTP:

      v=0
      o=- 25678 753849 IN IP4 192.0.2.1
      s=
      t=0 0
      c=IN IP4 192.0.2.1
      a=key-mgmt:mikey AQAFgM0XflABAAAAAAAAAAAAAAsAyO...
      m=audio 59000 RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      a=acap:1 crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
      a=pcfg:1 a=-s:1
      m=video 52000 RTP/SAVP 31
      a=rtpmap:31 H261/90000
      a=acap:2 crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
      a=pcfg:1 a=-s:2

   Alice does not know whether Bob supports MIKEY or SDP security
   descriptions.  She could include attributes for both; however, the
   resulting procedures and potential interactions are not well-
   defined.  Instead, she places a session-level "key-mgmt" attribute
   for MIKEY in the actual configuration with SDP security descriptions
   as an alternative in the potential configuration.  The potential
   configuration for the audio stream specifies that all session-level
   attributes are to be deleted (i.e., the session-level "a=key-mgmt"
   attribute) and that mandatory attribute capability 2 is to be used
   (i.e., the "crypto" attribute).  The potential configuration for the
   video stream is similar, except it uses its own mandatory "crypto"
   attribute capability (2).  Note how the deletion of the session-level
   attributes does not affect the media-level attributes.

   Bob receives the SDP session description offer from Alice.  Bob
   supports Secure RTP and the SDP Capability Negotiation framework.
   Bob also supports both SDP security descriptions and MIKEY.  Since
   the potential configuration is more preferred than the actual
   configuration, Bob (conceptually) generates an internal potential
   configuration SDP session description that contains the "crypto"
   attributes for the audio and video stream, but not the "key-mgmt"
   attribute for MIKEY, thereby avoiding any ambiguity between the two
   keying mechanisms.  As a result, he generates the following answer:

Andreasen                    Standards Track                   [Page 67]
RFC 5939               SDP Capability Negotiation         September 2010

      v=0
      o=- 24351 621814 IN IP4 192.0.2.2
      s=
      t=0 0
      c=IN IP4 192.0.2.2
      m=audio 54568 RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      a=crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:WSJ+PSdFcGdUJShpX1ZjNzB4d1BINUAvLEw6UzF3|2^20|1:32
      a=acfg:1 a=-s:1
      m=video 55468 RTP/SAVP 31
      a=rtpmap:31 H261/90000
      a=crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:AwWpVLFJhQX1cfHJSojd0RmdmcmVCspeEc3QGZiN|2^20|1:32
      a=acfg:1 a=-s:2

   For the audio stream, Bob accepted the use of Secure RTP using SDP
   security descriptions.  Bob therefore includes a "crypto" attribute
   with his own keying material, and an "acfg" attribute identifying the
   actual configuration 1 for the audio media stream from the offer,
   with the delete-attributes ("-s") and attribute capability 1 (the
   "crypto" attribute from the offer).  For the video stream, Bob also
   accepted the use of Secure RTP using SDP security descriptions.  Bob
   therefore includes a "crypto" attribute with his own keying material,
   and an "acfg" attribute identifying actual configuration 1 for the
   video stream from the offer, with the delete-attributes ("-s") and
   attribute capability 2.

   Below, we illustrate the offer SDP session description, when Bob
   instead offers the "crypto" attribute as the actual configuration
   keying mechanism and "key-mgmt" as the potential configuration:

Andreasen                    Standards Track                   [Page 68]
RFC 5939               SDP Capability Negotiation         September 2010

      v=0
      o=- 25678 753849 IN IP4 192.0.2.1
      s=
      t=0 0
      c=IN IP4 192.0.2.1
      a=acap:1 key-mgmt:mikey AQAFgM0XflABAAAAAAAAAAAAAAsAyO...
      m=audio 59000 RTP/SAVP 98
      a=rtpmap:98 AMR/8000
      a=crypto:1 AES_CM_128_HMAC_SHA1_32
         inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
      a=acap:2 rtpmap:98 AMR/8000
      a=pcfg:1 a=-m:1,2
      m=video 52000 RTP/SAVP 31
      a=rtpmap:31 H261/90000
      a=acap:3 crypto:1 AES_CM_128_HMAC_SHA1_80
         inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
      a=acap:4 rtpmap:31 H261/90000
      a=pcfg:1 a=-m:1,4

   Note how we this time need to perform delete-attributes at the media
   level instead of the session level.  When doing that, all attributes
   from the actual configuration SDP session description, including the
   rtpmaps provided, are removed.  Consequently, we had to include these
   rtpmaps as capabilities as well, and then include them in the
   potential configuration, thereby effectively recreating the original
   "rtpmap" attributes in the resulting potential configuration SDP
   session description.

5.  Security Considerations

   The SDP Capability Negotiation framework is defined to be used within
   the context of the offer/answer model, and hence all the offer/answer
   security considerations apply here as well [RFC3264].  Similarly, the
   Session Initiation Protocol (SIP) uses SDP and the offer/answer
   model, and hence, when used in that context, the SIP security
   considerations apply as well [RFC3261].

   However, SDP Capability Negotiation introduces additional security
   issues.  Its use as a mechanism to enable alternative transport
   protocol negotiation (secure and non-secure) as well as its ability
   to negotiate use of more or less secure keying methods and material
   warrant further security considerations.  Also, the (continued)
   support for receiving media before answer combined with negotiation
   of alternative transport protocols (secure and non-secure) warrants
   further security considerations.  We discuss these issues below.

Andreasen                    Standards Track                   [Page 69]
RFC 5939               SDP Capability Negotiation         September 2010

   The SDP Capability Negotiation framework allows for an offered media
   stream to both indicate and support various levels of security for
   that media stream.  Different levels of security can for example be
   negotiated by use of alternative attribute capabilities each
   indicating more or less secure keying methods as well as more or less
   strong ciphers.  Since the offerer indicates support for each of
   these alternatives, he will presumably accept the answerer seemingly
   selecting any of the offered alternatives.  If an attacker can modify
   the SDP session description offer, he can thereby force the
   negotiation of the weakest security mechanism that the offerer is
   willing to accept.  This may enable the attacker to compromise the
   security of the negotiated media stream.  Similarly, if the offerer
   wishes to negotiate use of a secure media stream (e.g., Secure RTP),
   but includes a non-secure media stream (e.g., plain RTP) as a valid
   (but less preferred) alternative, then an attacker that can modify
   the offered SDP session description will be able to force the
   establishment of an insecure media stream.  The solution to both of
   these problems involves the use of integrity protection over the SDP
   session description.  Ideally, this integrity protection provides
   end-to-end integrity protection in order to protect from any man-in-
   the-middle attack; secure multiparts such as Secure/Multipurpose
   Internet Mail Extensions (S/MIME) [RFC5751] provide one such
   solution; however, S/MIME requires use and availability of a Public
   Key Infrastructure (PKI).  A slightly less secure alternative when
   using SIP, but generally much easier to deploy in practice, is to use
   SIP Identity [RFC4474]; this requires the existence of an
   authentication service (see [RFC4474]).  Although this mechanism
   still requires a PKI, it only requires that servers (as opposed to
   end-users) have third-party validatable certificates, which
   significantly reduces the barrier to entry by ordinary users.  Yet
   another, and considerably less secure, alternative is to use hop-by-
   hop security only, e.g., TLS or IPsec thereby ensuring the integrity
   of the offered SDP session description on a hop-by-hop basis.  This
   is less secure because SIP allows partially trusted intermediaries on
   the signaling path, and such intermediaries processing the SIP
   request at each hop would be able to perform a man-in-the-middle
   attack by modifying the offered SDP session description.  In simple
   architectures where the two UA's proxies communicate directly, the
   security provided by this method is roughly comparable to that
   provided by the previously discussed signature-based mechanisms.

   Per the normal offer/answer procedures, as soon as the offerer has
   generated an offer, the offerer must be prepared to receive media in
   accordance with that offer.  The SDP Capability Negotiation preserves
   that behavior for the actual configuration in the offer; however, the
   offerer has no way of knowing which configuration (actual or
   potential) was selected by the answerer, until an answer indication
   is received.  This opens up a new security issue where an attacker

Andreasen                    Standards Track                   [Page 70]
RFC 5939               SDP Capability Negotiation         September 2010

   may be able to interject media towards the offerer until the answer
   is received.  For example, the offerer may use plain RTP as the
   actual configuration and Secure RTP as an alternative potential
   configuration.  Even though the answerer selects Secure RTP, the
   offerer will not know that until he receives the answer, and hence an
   attacker will be able to send media to the offerer meanwhile.  The
   easiest protection against such an attack is to not offer use of the
   non-secure media stream in the actual configuration; however, that
   may in itself have undesirable side effects: If the answerer does not
   support the secure media stream and also does not support the
   capability negotiation framework, then negotiation of the media
   stream will fail.  Alternatively, SDP security preconditions
   [RFC5027] can be used.  This will ensure that media is not flowing
   until session negotiation has completed and hence the selected
   configuration is known.  Use of preconditions however requires both
   sides to support them.  If they don't, and use of them is required,
   the session will fail.  As a (limited) work around to this, it is
   RECOMMENDED that SIP entities generate an answer SDP session
   description and send it to the offerer as soon as possible, for
   example, in a 183 Session Progress message.  This will limit the time
   during which an attacker can send media to the offerer.  Section 3.9
   presents other alternatives as well.

   Additional security considerations apply to the answer SDP session
   description as well.  The actual configuration attribute tells the
   offerer on which potential configuration the answer was based, and
   hence an attacker that can either modify or remove the actual
   configuration attribute in the answer can cause session failure as
   well as extend the time window during which the offerer will accept
   incoming media that does not conform to the actual answer.  The
   solutions to this SDP session description answer integrity problem
   are the same as for the offer, i.e., use of end-to-end integrity
   protection, SIP identity, or hop-by-hop protection.  The mechanism to
   use depends on the mechanisms supported by the offerer as well as the
   acceptable security trade offs.

   As described in Sections 3.1 and 3.11, SDP Capability Negotiation
   conceptually allows an offerer to include many different offers in a
   single SDP session description.  This can cause the answerer to
   process a large number of alternative potential offers, which can
   consume significant memory and CPU resources.  An attacker can use
   this amplification feature to launch a denial-of-service attack
   against the answerer.  The answerer must protect itself from such
   attacks.  As explained in Section 3.11, the answerer can help reduce
   the effects of such an attack by first discarding all potential
   configurations that contain unsupported transport protocols,
   unsupported or invalid mandatory attribute capabilities, or
   unsupported mandatory extension configurations.  The answerer should

Andreasen                    Standards Track                   [Page 71]
RFC 5939               SDP Capability Negotiation         September 2010

   also look out for potential configurations that are designed to pass
   the above test, but nevertheless produce a large number of potential
   configuration SDP session descriptions that cannot be supported.

      A possible way of achieving that is for an attacker to find a
      valid session-level attribute that causes conflicts or otherwise
      interferes with individual media description configurations.  At
      the time of publication of this document, we do not know of such
      an SDP attribute; however, this does not mean it does not exist,
      or that it will not exist in the future.  If such attributes are
      found to exist, implementers should explicitly protect against
      them.

   A significant number of valid and supported potential configurations
   may remain.  However, since all of those contain only valid and
   supported transport protocols and attributes, it is expected that
   only a few of them will need to be processed on average.  Still, the
   answerer must ensure that it does not needlessly consume large
   amounts of memory or CPU resources when processing those as well as
   be prepared to handle the case where a large number of potential
   configurations still need to be processed.

6.  IANA Considerations

6.1.  New SDP Attributes

   The IANA has registered the following new SDP attributes:

   Attribute name:      csup
   Long form name:      Supported capability negotiation extensions
   Type of attribute:   Session-level and media-level
   Subject to charset:  No
   Purpose:             Option tags for supported SDP Capability
                        Negotiation extensions
   Appropriate values:  See Section 3.3.1 of RFC 5939
   Contact name:        Flemming Andreasen, fandreas@cisco.com

   Attribute name:      creq
   Long form name:      Required capability negotiation extensions
   Type of attribute:   Session-level and media-level
   Subject to charset:  No
   Purpose:             Option tags for required SDP Capability
                        Negotiation extensions
   Appropriate values:  See Section 3.3.2 of RFC 5939
   Contact name:        Flemming Andreasen, fandreas@cisco.com

Andreasen                    Standards Track                   [Page 72]
RFC 5939               SDP Capability Negotiation         September 2010

   Attribute name:      acap
   Long form name:      Attribute capability
   Type of attribute:   Session-level and media-level
   Subject to charset:  No
   Purpose:             Attribute capability containing an attribute
                        name and associated value
   Appropriate values:  See Section 3.4.1 of RFC 5939
   Contact name:        Flemming Andreasen, fandreas@cisco.com

   Attribute name:      tcap
   Long form name:      Transport Protocol Capability
   Type of attribute:   Session-level and media-level
   Subject to charset:  No
   Purpose:             Transport protocol capability listing one or
                        more transport protocols
   Appropriate values:  See Section 3.4.2 of RFC 5939
   Contact name:        Flemming Andreasen, fandreas@cisco.com

   Attribute name:      pcfg
   Long form name:      Potential Configuration
   Type of attribute:   Media-level
   Subject to charset:  No
   Purpose:             Potential configuration for SDP Capability
                        Negotiation
   Appropriate values:  See Section 3.5.1 of RFC 5939
   Contact name:        Flemming Andreasen, fandreas@cisco.com

   Attribute name:      acfg
   Long form name:      Actual configuration
   Type of attribute:   Media-level
   Subject to charset:  No
   Purpose:             Actual configuration for SDP Capability
                        Negotiation
   Appropriate values:  See Section 3.5.2 of RFC 5939
   Contact name:        Flemming Andreasen, fandreas@cisco.com

6.2.  New SDP Capability Negotiation Option Tag Registry

   The IANA has created a new SDP Capability Negotiation Option Tag
   registry.  An IANA SDP Capability Negotiation Option Tag registration
   MUST be documented in an RFC in accordance with the [RFC5226] IETF
   Review policy.  The RFC MUST provide the name of the option tag, a
   syntax, and a semantic specification of any new SDP attributes and
   any extensions to the potential configuration ("a=pcfg") and actual
   configuration ("a=acfg") attributes provided in this document.  If
   the extension defines any new SDP attributes that are intended to be
   capabilities for use by the capability negotiation framework (e.g.,
   similar to "a=acap"), those capabilities MUST adhere to the

Andreasen                    Standards Track                   [Page 73]
RFC 5939               SDP Capability Negotiation         September 2010

   guidelines provided in Section 3.4.3.  Extensions to the potential
   and actual configuration attributes MUST adhere to the syntax
   provided in Sections 3.5.1 and 3.5.2.

   The option tag "cap-v0" is defined in this document, and the IANA has
   registered this option tag.

6.3.  New SDP Capability Negotiation Potential Configuration Parameter
      Registry

   The IANA has created a new SDP Capability Negotiation Potential
   Configuration Parameter registry.  An IANA SDP Capability Negotiation
   Potential Configuration registration MUST be documented in an RFC in
   accordance with the [RFC5226] IETF Review policy.  The RFC MUST
   define the syntax and semantics of each new potential configuration
   parameter.  The syntax MUST adhere to the syntax provided for
   extensions in Section 3.5.1 and the semantics MUST adhere to the
   semantics provided for extensions in Section 3.5.1 and 3.5.2.
   Associated with each registration MUST be the encoding name for the
   parameter as well as a short descriptive name for it.

   The potential configuration parameters "a" for "attribute" and "t"
   for "transport protocol" are defined in this document, and the IANA
   has registered them.

7.  Acknowledgments

   The SDP Capability Negotiation solution defined in this document
   draws on the overall capability negotiation framework that was
   defined by [SDPng].  Also, the SDP Capability Negotiation solution is
   heavily influenced by the discussions and work done by the SDP
   Capability Negotiation Design Team.  The following people in
   particular provided useful comments and suggestions to either the
   document itself or the overall direction of the solution defined
   here: Francois Audet, John Elwell, Roni Even, Miguel Garcia, Robert
   Gilman, Cullen Jennings, Jonathan Lennox, Matt Lepinski, Jean-
   Francois Mule, Joerg Ott, Colin Perkins, Jonathan Rosenberg, Thomas
   Stach, and Dan Wing.

   General Area review comments were provided by Christian Vogt, and
   Stephen Kent provided Security Directorate review comments.  Eric
   Rescorla provided textual input to the Security Considerations.
   Alexey Melnikov, Robert Sparks, and Magnus Westerlund provided
   several review comments as well.

Andreasen                    Standards Track                   [Page 74]
RFC 5939               SDP Capability Negotiation         September 2010

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264, June
              2002.

   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
              Description Protocol", RFC 4566, July 2006.

   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
              May 2008.

   [RFC5234]  Crocker, D., Ed., and P. Overell, "Augmented BNF for
              Syntax Specifications: ABNF", STD 68, RFC 5234, January
              2008.

   [RFC5245]  Rosenberg, J., "Interactive Connectivity Establishment
              (ICE): A Protocol for Network Address Translator (NAT)
              Traversal for Offer/Answer Protocols", RFC 5245, April
              2010.

8.2. Informative References

   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
              A., Peterson, J., Sparks, R., Handley, M., and E.
              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
              June 2002.

   [RFC3312]  Camarillo, G., Ed., Marshall, W., Ed., and J. Rosenberg,
              "Integration of Resource Management and Session Initiation
              Protocol (SIP)", RFC 3312, October 2002.

   [RFC3262]  Rosenberg, J. and H. Schulzrinne, "Reliability of
              Provisional Responses in Session Initiation Protocol
              (SIP)", RFC 3262, June 2002.

   [RFC3407]  Andreasen, F., "Session Description Protocol (SDP) Simple
              Capability Declaration", RFC 3407, October 2002.

   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
              Video Conferences with Minimal Control", STD 65, RFC 3551,
              July 2003.

Andreasen                    Standards Track                   [Page 75]
RFC 5939               SDP Capability Negotiation         September 2010

   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
              RFC 3711, March 2004.

   [RFC3830]  Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K.
              Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830,
              August 2004.

   [RFC4145]  Yon, D. and G. Camarillo, "TCP-Based Media Transport in
              the Session Description Protocol (SDP)", RFC 4145,
              September 2005.

   [RFC4474]  Peterson, J. and C. Jennings, "Enhancements for
              Authenticated Identity Management in the Session
              Initiation Protocol (SIP)", RFC 4474, August 2006.

   [RFC4567]  Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E.
              Carrara, "Key Management Extensions for Session
              Description Protocol (SDP) and Real Time Streaming
              Protocol (RTSP)", RFC 4567, July 2006.

   [RFC4568]  Andreasen, F., Baugher, M., and D. Wing, "Session
              Description Protocol (SDP) Security Descriptions for Media
              Streams", RFC 4568, July 2006.

   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
              "Extended RTP Profile for Real-time Transport Control
              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
              2006.

   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
              July 2006.

   [RFC4756]  Li, A., "Forward Error Correction Grouping Semantics in
              Session Description Protocol", RFC 4756, November 2006.

   [RFC5027]  Andreasen, F. and D. Wing, "Security Preconditions for
              Session Description Protocol (SDP) Media Streams", RFC
              5027, October 2007.

   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
              Real-time Transport Control Protocol (RTCP)-Based Feedback
              (RTP/SAVPF)", RFC 5124, February 2008.

   [RFC5751]  Ramsdell, B. and S. Turner, "Secure/Multipurpose Internet
              Mail Extensions (S/MIME) Version 3.2 Message
              Specification", RFC 5751, January 2010.

Andreasen                    Standards Track                   [Page 76]
RFC 5939               SDP Capability Negotiation         September 2010

   [RFC5763]  Fischl, J., Tschofenig, H., and E. Rescorla, "Framework
              for Establishing a Secure Real-time Transport Protocol
              (SRTP) Security Context Using Datagram Transport Layer
              Security (DTLS)", RFC 5763, May 2010.

   [RFC5764]  McGrew, D. and E. Rescorla, "Datagram Transport Layer
              Security (DTLS) Extension to Establish Keys for the Secure
              Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.

   [RFC5888]  Camarillo, G. and H. Schulzrinne, "The Session Description
              Protocol (SDP) Grouping Framework", RFC 5888, June 2010.

   [BESRTP]   Kaplan, H. and F. Audet, "Session Description Protocol
              (SDP) Offer/Answer Negotiation For Best-Effort Secure
              Real-Time Transport Protocol", Work in Progress, October
              2006.

   [ICETCP]   Rosenberg, J., Keranen, A., Lowekamp, B., and A. Roach,
              "TCP Candidates with Interactive Connectivity
              Establishment (ICE)", Work in Progress, September 2010.

   [SDPMedCap]
              Gilman, R., Even, R., and F. Andreasen, "SDP media
              capabilities Negotiation", Work in Progress, July 2010.

   [SDPng]    Kutscher, D., Ott, J., and C. Bormann, "Session
              Description and Capability Negotiation", Work in Progress,
              February 2005.

Author's Address

   Flemming Andreasen
   Cisco Systems
   Iselin, NJ 08830
   USA

   EMail: fandreas@cisco.com

Andreasen                    Standards Track                   [Page 77]