Skip to main content

Interworking between the Session Initiation Protocol (SIP) and the Extensible Messaging and Presence Protocol (XMPP): Media Sessions.
draft-ietf-stox-media-02

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Expired".
Authors Peter Saint-Andre , Saúl Ibarra Corretgé , Emil Ivov
Last updated 2013-12-12
Replaces draft-saintandre-sip-xmpp-media
RFC stream Internet Engineering Task Force (IETF)
Formats
Additional resources Mailing list discussion
Stream WG state WG Document
Document shepherd (None)
IESG IESG state I-D Exists
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-ietf-stox-media-02
Network Working Group                                     P. Saint-Andre
Internet-Draft                                       Cisco Systems, Inc.
Intended status: Standards Track                               S. Ibarra
Expires: June 15, 2014                                       AG Projects
                                                                 E. Ivov
                                                                   Jitsi
                                                       December 12, 2013

   Interworking between the Session Initiation Protocol (SIP) and the
   Extensible Messaging and Presence Protocol (XMPP): Media Sessions.
                        draft-ietf-stox-media-02

Abstract

   This document defines a bi-directional protocol mapping for use by
   gateways that enable the exchange of media signalling messages
   between systems that implement the Jingle extensions to the
   Extensible Messaging and Presence Protocol (XMPP) and those that
   implement the Session Initiation Protocol (SIP).

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on June 15, 2014.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect

Saint-Andre, et al.       Expires June 15, 2014                 [Page 1]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Compatibility with Offer/Answer model  . . . . . . . . . . . .  4
   4.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  5
   5.  Syntax Mappings  . . . . . . . . . . . . . . . . . . . . . . .  5
   6.  Call Hold  . . . . . . . . . . . . . . . . . . . . . . . . . .  9
   7.  Early Media  . . . . . . . . . . . . . . . . . . . . . . . . . 10
   8.  Detecting Endless Loops  . . . . . . . . . . . . . . . . . . . 11
   9.  SDP Format-Specific Parameters . . . . . . . . . . . . . . . . 11
   10. Dialog Forking . . . . . . . . . . . . . . . . . . . . . . . . 12
   11. Sample Scenarios . . . . . . . . . . . . . . . . . . . . . . . 13
   12. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 18
   13. Security Considerations  . . . . . . . . . . . . . . . . . . . 18
   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
     14.1.  Normative References  . . . . . . . . . . . . . . . . . . 18
     14.2.  Informative References  . . . . . . . . . . . . . . . . . 20
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20

Saint-Andre, et al.       Expires June 15, 2014                 [Page 2]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

1.  Introduction

   The Session Initiation Protocol [RFC3261] is a widely-deployed
   technology for the management of media sessions (such as voice and
   video calls) over the Internet.  SIP itself provides a signalling
   channel (sometimes via the User Datagram Protocol [RFC0768]), over
   which two or more parties can exchange messages for the purpose of
   negotiating a media session that uses a dedicated media channel such
   as the Real-time Transport Protocol [RFC3550].

   The Extensible Messaging and Presence Protocol (XMPP) [RFC6120] also
   provides a signalling channel, typically via the Transmission Control
   Protocol [RFC0793].  Given the significant differences between XMPP
   and SIP, it is difficult to combine the two technologies in a single
   user agent.  Therefore, developers wishing to add media session
   capabilities to XMPP clients have defined an XMPP-specific
   negotiation protocol called Jingle [XEP-0166].

   However, Jingle was designed to easily map to SIP for communication
   through gateways or other transformation mechanisms.  Therefore,
   consistent with existing specifications for mapping between SIP and
   XMPP (see [I-D.ietf-stox-core] and other related specifications),
   this document describes a bidirectional protocol mapping for use by
   gateways that enable the exchange of media signalling messages
   between systems that implement SIP and those that implement the XMPP
   Jingle extensions.

   It is important to note that SIP and Jingle sessions can be
   gateway-ed in a rather simple fashion if all media was always routed
   and potentially even transcoded in through a gateway.  This
   specification aims to define a mapping that goes beyond the above and
   allows gateways to (wherever possible) only intervene at the
   signalling level, letting user agents exchange media in an end-to-end
   manner.  Such gateways would likely focus on handling handling RTP
   session establishment and control within the context of what users
   would perceive as "calls".  This document is hence primarily dealing
   with calling scenarios as opposed to generic media sessions with SIP.

   The discussion venue for this document is the mailing list of the
   STOX WG; visit https://www.ietf.org/mailman/listinfo/stox for
   subscription information and discussion archives.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in

Saint-Andre, et al.       Expires June 15, 2014                 [Page 3]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

   [RFC2119].

   A number of technical terms used here are defined in [RFC3261],
   [RFC6120], [XEP-0166], and [XEP-0167].  The term "JID" is short for
   "Jabber Identifier".

3.  Compatibility with Offer/Answer model

   Even if Jingle semantics have many similarities with those used in
   SIP, there are some use cases that cannot be handled in exactly the
   same way due to the Offer/Answer model used in SIP in conjunction
   with SDP.

   More specifically, mapping SIP and SDP Offer/Answer to XMPP is often
   complicated due to the difference in how each handles backward
   compatibility.  Jingle, as most other XMPP extensions, relies heavily
   on the protocol's advanced service discovery [XEP-0030] mechanisms.
   In other words, XMPP entities are able to verify the capabilities of
   their intended peer before actually attempting to establish a session
   with it.

   SDP Offer/Answer on the other hand uses a least common denominator
   approach where every SDP offer has to be understandable by legacy
   endpoints.  Newer, unsupported aspects in this offer can therefore
   only appear as optional or their use be limited to subsequent Offer/
   Answer exchanges, once their support has been confirmed.

   Use of "trickle ICE" (see [I-D.ietf-mmusic-trickle-ice] and
   [I-D.ivov-mmusic-trickle-ice-sip]) is one example where the issue
   occurs.  SIP endpoints need to always behave as vanilla ICE agents
   when sending their first offer and make sure they gather all
   candidates before sending a SIP INVITE.  This is necessary because
   otherwise ICE agents with no support for trickle can prematurely
   declare failure.  Jingle endpoints, on the other hand can verify
   support for trickle ICE prior to engaging in a session and adapt
   their behaviour accordingly.

   In order to work around such issues, [XEP-0176] defines an Offer/
   Answer support mode through the "urn:ietf:rfc:3264" feature tag.  It
   indicates that a specific XMPP entity can only be contacted through
   the use of Offer/Answer semantics.  Implementations conforming to
   this specification MUST support Offer/Answer model with Jingle.  Note
   that such endpoints are not required to actually declare support for
   this tag because this would mean that they too would only be
   reachable through Offer/Answer semantics.

Saint-Andre, et al.       Expires June 15, 2014                 [Page 4]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

4.  Overview

   As mentioned, Jingle was designed in part to enable straightforward
   protocol mapping between XMPP and SIP.  However, given the
   significantly different technology assumptions underlying XMPP and
   SIP, Jingle is naturally different from SIP in several important
   respects:

   o  Base SIP messages and headers use a plaintext format similar in
      some ways to the Hypertext Transport Protocol [RFC2616], whereas
      Jingle messages are pure XML.  Mappings between SIP headers and
      Jingle message syntax are provided below.

   o  The SIP payloads defining session semantics use the Session
      Description Protocol [RFC4566], whereas the equivalent Jingle
      payloads are defined as XML child elements of the Jingle
      <content/> element.  However, the Jingle specifications defining
      such child elements specify mappings to SDP for all Jingle syntax,
      making the mapping relatively straightforward.

   o  The SIP signalling channel has historically often been transported
      over UDP, whereas the signalling channel for Jingle is XMPP over
      TCP.  Mapping between the transport layers typically happens
      within a gateway using techniques below the application level, and
      therefore is not addressed in this specification.

5.  Syntax Mappings

5.1.  Generic Jingle Syntax

   Jingle is designed in a modular fashion, so that session description
   data is generally carried in a payload within the generic Jingle
   elements, i.e., the <jingle/> element and its <content/> child.  The
   following example illustrates this structure, where the XMPP stanza
   is a request to initiate an audio session using RTP over a raw UDP
   transport.

Saint-Andre, et al.       Expires June 15, 2014                 [Page 5]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

   <iq from='romeo@example.net/v3rsch1kk3l1jk'
     id='ne91v36s'
     to='juliet@example.com/t3hr0zny'
     type='set'>
   <jingle xmlns='urn:xmpp:jingle:1'
           action='session-initiate'
           initiator='romeo@example.net/v3rsch1kk3l1jk'
           sid='a73sjjvkla37jfea'>
     <content creator='initiator'
              media='audio'
              name='this-is-the-audio-content'
              senders='both'>
       <description xmlns='urn:xmpp:jingle:app:rtp:1'>
         <payload-type id='96' name='speex' clockrate='16000'/>
         <payload-type id='97' name='speex' clockrate='8000'/>
         <payload-type id='18' name='G729'/>
         <payload-type channels='2'
                       clockrate='16000'
                       id='103'
                       name='L16'/>
         <payload-type id='98' name='x-ISAC' clockrate='8000'/>
       </description>
       <transport xmlns='urn:xmpp:jingle:transport:raw-udp'>
         <candidate ip='10.1.1.104' port='13540' generation='0'/>
       </transport>
     </content>
   </jingle>
   </iq>

   In the foregoing example, the syntax and semantics of the <jingle/>
   and <content/> elements are defined in [XEP-0166], the syntax and
   semantics of the <description/> element are defined in [XEP-0167],
   and the syntax and semantics of the <transport/> element are defined
   in [XEP-0177].  Other <description/> elements are defined in
   specifications for the appropriate application types (see for example
   [XEP-0167]) and other <transport/> elements are defined in the
   specifications for appropriate transport methods (see for example
   [XEP-0176], which defines an XMPP profile of [RFC5245]).

   At the core Jingle layer, the following mappings are defined.

Saint-Andre, et al.       Expires June 15, 2014                 [Page 6]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

   +--------------------------------+--------------------------------+
   |           Jingle               |             SIP                |
   +--------------------------------+--------------------------------+
   | <jingle/> 'action'             | [ see next table ]             |
   +--------------------------------+--------------------------------+
   | <jingle/> 'initiator'          | [ no mapping ]                 |
   +--------------------------------+--------------------------------+
   | <jingle/> 'responder'          | [ no mapping ]                 |
   +--------------------------------+--------------------------------+
   | <jingle/> 'sid'                | local-part of Dialog ID        |
   +--------------------------------+--------------------------------+
   | local-part of 'initiator'      | <username> in SDP o= line      |
   +--------------------------------+--------------------------------+
   | <content/> 'creator'           | [ no mapping ]                 |
   +--------------------------------+--------------------------------+
   | <content/> 'name'              | [ no mapping ]                 |
   +--------------------------------+--------------------------------+
   | <content/> 'profile'           | <proto> in SDP m= line         |
   +--------------------------------+--------------------------------+
   | <content/> 'senders' value of  | a= line of sendrecv, recvonly, |
   | both, initiator, or responder  | or sendonly                    |
   +--------------------------------+--------------------------------+

   The 'senders' attribute is optional in Jingle, thus in case it's
   absent it's RECOMMENDED that the direction value is considered as
   'sendrecv'.

   The 'action' attribute of the <jingle/> element has nine allowable
   values.  In general they should be mapped as shown in the following
   table, with some exceptions as described herein.

Saint-Andre, et al.       Expires June 15, 2014                 [Page 7]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

   +-------------------+-----------------+
   | Jingle Action     | SIP Method      |
   +-------------------+-----------------+
   | content-accept    | INVITE response |
   |                   | (1xx or 2xx)    |
   +-------------------+-----------------+
   | content-add       | INVITE request  |
   +-------------------+-----------------+
   | content-modify    | INVITE request  |
   +-------------------+-----------------+
   | content-remove    | INVITE request  |
   +-------------------+-----------------+
   | session-accept    | INVITE response |
   |                   | (1xx or 2xx)    |
   +-------------------+-----------------+
   | session-info      | [varies]        |
   +-------------------+-----------------+
   | session-initiate  | INVITE request  |
   +-------------------+-----------------+
   | session-terminate | BYE             |
   +-------------------+-----------------+
   | transport-info    | unnused         |
   +-------------------+-----------------+

5.2.  Application Formats

   Jingle application formats for audio and video exchange via RTP are
   specified in [XEP-0167].  These application formats effectively maps
   to the "RTP/AVP" profile specified in [RFC3551] and the "RTP/SAVP"
   profile specified in [RFC3711], where the media types are "audio" and
   "video", and the specific mappings to SDP syntax are provided in
   [XEP-0167].

   As stated in [XEP-0167] future versions of this specification might
   define how to use other RTP profiles such as "RTP/AVPF" and "RTP/
   SAVPF" as defined in RFC4585 and RFC5124 respectively.

5.3.  Raw UDP Transport Method

   A basic Jingle transport method for exchanging media over UDP is
   specified in [XEP-0177].  This transport method involves the
   negotiation of an IP address and port only.  It does not provide NAT
   traversal, effectively leaving the task to intermediary entries.  The
   Jingle 'ip' attribute maps to the connection-address parameter of the
   SDP c= line and the 'port' attribute maps to the port parameter of
   the SDP m= line.  Use of SIP without ICE would generally map to use
   of Raw UDP on the XMPP side of a session.

Saint-Andre, et al.       Expires June 15, 2014                 [Page 8]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

5.4.  ICE-UDP Transport Method

   A more advanced Jingle transport method for exchanging media over UDP
   is specified in [XEP-0176].  Under ideal conditions this transport
   method provides NAT traversal by following the Interactive
   Connectivity Exchange methodology specified in [RFC5245].

   The relevant SDP mappings are provided in [XEP-0176], however there
   are a few syntax incompatibilities which need to be addressed by
   gateways conforming to this specification:

   o  The 'foundation' attribute is defined as a number in Jingle
      (unsigned byte) whereas ICE [RFC5245] defines it as a string,
      which can contain letters, digits and the '+' and '/' symbols.
      Gateway applications MUST therefore convert ICE originating
      foundations into integer numbers and they MUST guarantee that such
      a conversion preserves foundation uniqueness.  The exact mechanism
      for the conversion is undefined.

   o  Jingle defines a 'generation' attribute which is used to determine
      if an ICE restart is required.  Such attribute has no counterpart
      in SIP as ICE restarts are detected by detecting a change in the
      ICE ufrag and password.  Gateways MUST therefore increase the
      generation number when they detect such changes.

   o  The 'id' attribute defined by Jingle has no SIP counterpart thus
      applications are free to choose means to generate unique
      identifiers across the different candidates of an ICE generation.

   o  The 'network' attribute defined by Jingle has no counterpart in
      SIP and SHOULD be ignored.

6.  Call Hold

   [RFC3264] stipulates that streams are placed on hold by setting their
   direction to "sendonly".  A session is placed on hold by doing this
   for all the streams it contains.  The same semantics are also
   supported by Jingle through the "senders" element and its "initiator"
   and "responder" values.

                 [example to follow]

   In addition to these semantics however Jingle also defines a more
   concise way for achieving the same, which consists in sending a
   "hold" command within a "session-info" action:

Saint-Andre, et al.       Expires June 15, 2014                 [Page 9]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

   <iq from='juliet@capulet.lit/balcony'
       id='xv39z423'
       to='romeo@montague.lit/orchard'
       type='set'>
     <jingle xmlns='urn:xmpp:jingle:1'
             action='session-info'
             initiator='romeo@montague.lit/orchard'
             sid='a73sjjvkla37jfea'>
       <hold xmlns='urn:xmpp:jingle:apps:rtp:info:1'/>
     </jingle>
   </iq>

   Gateways that receive a "hold" command from their Jingle side MUST
   generate a new offer on their SIP side, placing all streams in a
   "sendonly" state.

   When relaying offers from SIP to XMPP however, gateways are not
   required to translate "sendonly" attributes into a "hold" command as
   this would not always be possible (e.g. when not all streams have the
   same direction).  Additionally such conversions might introduce
   complications in case further offers placing a session of hold also
   contain other session modifications.

   [[OPEN ISSUE: do we need to mention double hold here?  That is, when
   you put me on hold after I did it first.  Direction would then be
   "inactive".]]

7.  Early Media

   [RFC3959] and [RFC3960] describe a number of scenarios relying on
   "early media".  While similar attempts have also been made for XMPP
   [XEP-0269] support for early media is not currently widely supported
   in Jingle implementations.  Therefore, gateways SHOULD NOT forward
   SDP answers from SIP to Jingle until a final response has been
   received, except in cases where the gateway is in a position to
   confirm specific support for early media by the endpoint (one
   approach to such support can be found in [XEP-0269] but it has not
   yet been standardized).

   Gateways MUST however store early media SDP answers when they are
   sent inside a reliable provisional response.  In such cases, a
   subsequent final response may follow without an actual answer and the
   one from the provisional response will need to be forwarded to the
   Jingle endpoint.

Saint-Andre, et al.       Expires June 15, 2014                [Page 10]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

8.  Detecting Endless Loops

   [RFC3261] defines a "Max-Forwards" header that allows intermediate
   entities such as SIP proxies to detect and prevent loops from
   occurring.  The specifics of XMPP make such a prevention mechanism
   unnecessary for XMPP-only environments.  With the introduction of
   SIP-to-XMPP gatewaying however, it would be possible for loops to
   occur where messages are being repeatedly forwarded from XMPP to SIP
   to XMPP to SIP, etc.

   To compensate for the lack of a "Max-Forwards" header in SIP,
   gateways MUST therefore keep track of all SIP transactions and Jingle
   sessions that they are currently serving and they MUST block re-
   entrant messages.

   [[OPEN ISSUE: In order for this to work, we need a consistent way of
   translating dialog IDs into Jingle sessions, and vice versa, so that
   the following can be verified: jingleSessID ==
   toJingleSessID(toSipCallID( jingleSessID )).  We need to mention
   mention spirals here as well.  Alice could call Bob, but Bob forwards
   his call to Romeo.  A spiral on the SIP side could end up becoming a
   loop if the gateway is in between.]]

9.  SDP Format-Specific Parameters

   [RFC4566] defines "a=fmtp" attributes for the transmission of format
   specific parameters as a single transparent string.  Such strings can
   be used to convey either a single value or a sequence of parameters,
   separated by semi-colons, comas or whatever delimiters are chosen by
   a particular payload type specification.

   [XEP-0177] on the other hand defines a "<format/>" element as
   follows:

     <format name="paramName" value="paramValue"/>

   A sequence of parameters is thus transmitted as an array of
   distinctive name/value couples.

   These differences make it impossible to devise a generic mechanism
   that accurately translates format parameters from Jingle to SDP
   without the specifics of the payload being known to the gateway.
   This specification therefore makes the following recommendations for
   a best-effort attempt at translation:

Saint-Andre, et al.       Expires June 15, 2014                [Page 11]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

   1.  Gateways that are aware of the formats in use SHOULD parse all
       format parameters and generate "<format/>" arrays and "a=fmtp"
       values accordingly.

   2.  When translating Jingle to SIP, gateways that have no explicit
       support for the formats that are being negotiated, SHOULD convert
       the list of "<format/>" elements into a single string, containing
       a sequence of "name=value" pairs, separated by a semi-colon and a
       space (i.e. "; ").

   3.  When translating SIP to Jingle, gateways that have no explicit
       support for the formats that are being negotiated, SHOULD
       tokenize the "a=fmtp" format string using the following list of
       delimiters: ";", "; ", ",", ", ".  The resulting tokens should
       then be parsed as "name=value" pairs.  If this process does
       actually yield any such pairs, they SHOULD be used for generating
       the respective "<format/>" elements.  If some of the tokens
       cannot be parsed into a "name=value" pair, or in case the format
       string couldn't be tokenized with the above delimiters, the
       remaining brute strings SHOULD be used as a value for the "name"
       attribute of the "<format/>" element and their corresponding
       "value" element SHOULD be left empty.

   [[OPEN ISSUE: we need to add examples for these transformations.]]

10.  Dialog Forking

   [RFC3261] defines semantics for dialog forking.  Such semantics have
   not been defined for Jingle and need to be hidden from XMPP
   endpoints.

   To achieve this SIP-to-XMPP MUST NOT forward more than one
   provisional response on their Jingle side.  Typically they would do
   so only for the first provisional response they receive and ignore
   the rest.  This provisional response SHOULD be forwarded as
   originating from a bare Jabber ID (JID) corresponding to the AOR URI
   found in the "From" header of the SIP provisional response.  The
   gateway MUST NOT attempt to translate GRUUs into full JIDs because it
   cannot know at this stage, which of the dialogs established by these
   provisional responses will be used for the actual session.

   Likewise, gateways conforming to this specification MUST NOT forward
   more than a single final response received through SIP to the Jingle
   side.  The gateway SHOULD terminate the SIP sessions whose received
   final response wasn't forwarded to the Jingle side.

Saint-Andre, et al.       Expires June 15, 2014                [Page 12]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

11.  Sample Scenarios

   The following sections provide sample scenarios (or "call flows")
   that illustrate the principles of interworking from Jingle to SIP.
   These scenarios are not exhaustive.

11.1.  Basic Voice Chat

   The protocol flow for a basic voice chat for which an XMPP user
   (juliet@example.com) is the initiator and a SIP user
   (romeo@example.net) is the responder.  The voice chat is consummated
   through a gateway.  To simplify the example, the transport method
   negotiated is "raw user datagram protocol" as specified in
   [XEP-0177].

     INITIATOR  ...XMPP...   GATEWAY   ...SIP...    RESPONDER
     |                        |                       |
     | session-initiate       |                       |
     |----------------------->|                       |
     | IQ-result (ack)        |                       |
     |<-----------------------|                       |
     |                        | INVITE                |
     |                        |---------------------->|
     |                        | 180 Ringing           |
     |                        |<----------------------|
     | session-info (ringing) |                       |
     |<-----------------------|                       |
     | IQ-result (ack)        |                       |
     |----------------------->|                       |
     |                        | 200 OK                |
     |                        |<----------------------|
     | session-accept         |                       |
     |<-----------------------|                       |
     | IQ-result (ack)        |                       |
     |----------------------->|                       |
     |                        | ACK                   |
     |                        |---------------------->|
     |                   MEDIA SESSION                |
     |<==============================================>|
     |                        | BYE                   |
     |                        |<----------------------|
     | session-terminate      |                       |
     |<-----------------------|                       |
     | IQ-result (ack)        |                       |
     |----------------------->|                       |
     |                        | 200 OK                |
     |                        |---------------------->|
     |                        |                       |

Saint-Andre, et al.       Expires June 15, 2014                [Page 13]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

   The packet flow is as follows.

   First the XMPP user sends a Jingle session-initiation request to the
   SIP user.

     <iq from='juliet@example.com/t3hr0zny'
       id='hu2s61f4'
       from='romeo@example.net/v3rsch1kk3l1jk'
       type='set'>
     <jingle xmlns='urn:xmpp:jingle:1'
             action='session-initiate'
             initiator='juliet@example.com/t3hr0zny'
             sid='a73sjjvkla37jfea'>
       <content creator='initiator'
                media='audio'
                name='this-is-the-audio-content'>
         <description xmlns='urn:xmpp:jingle:app:rtp:1'>
           <payload-type id='96' name='speex' clockrate='16000'/>
           <payload-type id='97' name='speex' clockrate='8000'/>
           <payload-type id='18' name='G729'/>
         </description>
         <transport xmlns='urn:xmpp:jingle:transport:raw-udp'>
           <candidate ip='192.0.2.101' port='49172' generation='0'/>
         </transport>
       </content>
     </jingle>
     </iq>

   The gateway returns an XMPP IQ-result to the initiator on behalf of
   the responder.

     <iq from='juliet@example.com/t3hr0zny'
       id='hu2s61f4'
       to='romeo@example.net/v3rsch1kk3l1jk'
       type='result'/>

   The gateway transforms the Jingle session-initiate action into a SIP
   INVITE.

Saint-Andre, et al.       Expires June 15, 2014                [Page 14]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

     INVITE sip:romeo@example.net SIP/2.0
     Via: SIP/2.0/TCP client.example.com:5060;branch=z9hG4bK74bf9
     Max-Forwards: 70
     From: Juliet Capulet <sip:juliet@example.com>;tag=t3hr0zny
     To: Romeo Montague <sip:romeo@example.net>
     Call-ID: 3848276298220188511@example.com
     CSeq: 1 INVITE
     Contact: <sip:juliet@client.example.com;transport=tcp>
     Content-Type: application/sdp
     Content-Length: 184

     v=0
     o=alice 2890844526 2890844526 IN IP4 client.example.com
     s=-
     c=IN IP4 192.0.2.101
     t=0 0
     m=audio 49172 RTP/AVP 18 96 97
     a=rtpmap:96 sppex/16000
     a=rtpmap:97 speex/8000
     a=rtpmap:18 G729

   The responder returns a SIP 180 Ringing message.

     SIP/2.0 180 Ringing
     Via: SIP/2.0/TCP client.example.com:5060;branch=z9hG4bK74bf9;\
          received=192.0.2.101
     From: Juliet Capulet <sip:juliet@example.com>;tag=t3hr0zny
     To: Romeo Montague <sip:romeo@example.net>;tag=v3rsch1kk3l1jk
     Call-ID: 3848276298220188511@example.com
     CSeq: 1 INVITE
     Contact: <sip:romeo@client.example.net;transport=tcp>
     Content-Length: 0

   The gateway transforms the ringing message into XMPP syntax.

     <iq from='romeo@montague.net/v3rsch1kk3l1jk'
       id='ol3ba71g'
       to='juliet@example.com/t3hr0zny'
       type='set'>
     <jingle xmlns='urn:xmpp:jingle:1'
             action='session-info'
             initiator='juliet@example.com/t3hr0zny'
             sid='a73sjjvkla37jfea'>
       <ringing xmlns='urn:xmpp:jingle:apps:rtp:info:1'/>
     </jingle>
     </iq>

   The initiator returns an IQ-result acknowledging receipt of the

Saint-Andre, et al.       Expires June 15, 2014                [Page 15]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

   ringing message, which is used only by the gateway and not
   transformed into SIP syntax.

     <iq from='juliet@example.com/t3hr0zny'
       id='ol3ba71g'
       to='romeo@example.net/v3rsch1kk3l1jk'
       type='result'/>

   The responder sends a SIP 200 OK to the initiator.

     SIP/2.0 200 OK
     Via: SIP/2.0/TCP client.example.com:5060;branch=z9hG4bK74bf9;\
          received=192.0.2.101
     From: Juliet Capulet <sip:juliet@example.com>;tag=t3hr0zny
     To: Romeo Montague <sip:romeo@example.net>;tag=v3rsch1kk3l1jk
     Call-ID: 3848276298220188511@example.com
     CSeq: 1 INVITE
     Contact: <sip:romeo@client.example.net;transport=tcp>
     Content-Type: application/sdp
     Content-Length: 147

     v=0
     o=romeo 2890844527 2890844527 IN IP4 client.example.net
     s=-
     c=IN IP4 192.0.2.201
     t=0 0
     m=audio 3456 RTP/AVP 97
     a=rtpmap:97 speex/8000

   The gateway transforms the 200 OK into a Jingle session-accept
   action.

Saint-Andre, et al.       Expires June 15, 2014                [Page 16]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

     <iq from='romeo@example.net/v3rsch1kk3l1jk'
       id='pd1bf839'
       to='juliet@example.com/t3hr0zny'
       type='set'>
     <jingle xmlns='urn:xmpp:jingle:1'
             action='session-accept'
             initiator='juliet@example.com/t3hr0zny'
             responder='romeo@example.net/v3rsch1kk3l1jk'
             sid='a73sjjvkla37jfea'>
       <content creator='initiator'
                media='audio'
                name='this-is-the-audio-content'>
         <description xmlns='urn:xmpp:jingle:app:rtp:1'>
           <payload-type id='97' name='speex' clockrate='8000'/>
         </description>
         <transport xmlns='urn:xmpp:jingle:transport:raw-udp'>
           <candidate ip='192.0.2.101' port='49172' generation='0'/>
         </transport>
       </content>
     </jingle>
     </iq>

   If the payload types and transport candidate can be successfully used
   by both parties, then the initiator acknowledges the session-accept
   action.

     <iq from='romeo@example.net/v3rsch1kk3l1jk'
       id='pd1bf839'
       to='juliet@example.com/t3hr0zny'
       type='result'/>

   The parties now begin to exchange media.  In this case they would
   exchange audio using the Speex codec at a clockrate of 8000 since
   that is the highest-priority codec for the responder (as determined
   by the XML order of the <payloadtype/> children).

   The parties may continue the session as long as desired.

   Eventually, one of the parties (in this case the responder)
   terminates the session.

     BYE sip:juliet@client.example.com SIP/2.0
     Via: SIP/2.0/TCP client.example.net:5060;branch=z9hG4bKnashds7
     Max-Forwards: 70
     From: Romeo Montague <sip:romeo@example.net>;tag=8321234356
     To: Juliet Capulet <sip:juliet@example.com>;tag=9fxced76sl
     Call-ID: 3848276298220188511@example.com
     CSeq: 1 BYE

Saint-Andre, et al.       Expires June 15, 2014                [Page 17]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

     Content-Length: 0

   The gateway transforms the SIP BYE into XMPP syntax.

     <iq from='romeo@example.net/v3rsch1kk3l1jk'
       id='rv301b47'
       to='juliet@example.com/t3hr0zny'
       type='set'>
     <jingle xmlns='urn:xmpp:jingle:1'
             action='session-terminate'
             initiator='juliet@example.com/t3hr0zny'
             reasoncode='no-error'
             sid='a73sjjvkla37jfea'/>
     </iq>

   The initiator returns an IQ-result acknowledging receipt of the
   session termination, which is used only by the gateway and not
   transformed into SIP syntax.

     <iq from='romeo@example.net/v3rsch1kk3l1jk'
       id='rv301b47'
       to='juliet@example.com/t3hr0zny'
       type='result'/>

12.  IANA Considerations

   This document has no actions for the IANA.

13.  Security Considerations

   Detailed security considerations for session management are given for
   SIP in [RFC3261] and for XMPP in [XEP-0166] (see also [RFC6120]).
   The security considerations provided in [I-D.ietf-stox-core] also
   apply.

14.  References

14.1.  Normative References

   [I-D.ietf-stox-core]
              Saint-Andre, P., Houri, A., and J. Hildebrand,
              "Interworking between the Session Initiation Protocol
              (SIP) and the Extensible Messaging and Presence Protocol
              (XMPP): Core", draft-ietf-stox-core-08 (work in progress),
              December 2013.

Saint-Andre, et al.       Expires June 15, 2014                [Page 18]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
              A., Peterson, J., Sparks, R., Handley, M., and E.
              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
              June 2002.

   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
              Video Conferences with Minimal Control", STD 65, RFC 3551,
              July 2003.

   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
              RFC 3711, March 2004.

   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
              Description Protocol", RFC 4566, July 2006.

   [RFC5245]  Rosenberg, J., "Interactive Connectivity Establishment
              (ICE): A Protocol for Network Address Translator (NAT)
              Traversal for Offer/Answer Protocols", RFC 5245,
              April 2010.

   [RFC6120]  Saint-Andre, P., "Extensible Messaging and Presence
              Protocol (XMPP): Core", RFC 6120, March 2011.

   [XEP-0030]
              Hildebrand, J., Eatmon, R., and P. Saint-Andre, "Service
              Discovery", XSF XEP 0030, June 2008.

   [XEP-0166]
              Ludwig, S., Beda, J., Saint-Andre, P., McQueen, R., Egan,
              S., and J. Hildebrand, "Jingle", XSF XEP 0166, June 2007.

   [XEP-0167]
              Ludwig, S., Saint-Andre, P., Egan, S., and R. McQueen,
              "Jingle RTP Sessions", XSF XEP 0167, February 2009.

   [XEP-0176]
              Beda, J., Ludwig, S., Saint-Andre, P., Hildebrand, J., and
              S. Egan, "Jingle ICE-UDP Transport Method", XSF XEP 0176,
              February 2009.

   [XEP-0177]
              Beda, J., Saint-Andre, P., Ludwig, S., Hildebrand, J., and
              S. Egan, "Jingle Raw UDP Transport", XSF XEP 0177,
              February 2009.

Saint-Andre, et al.       Expires June 15, 2014                [Page 19]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

14.2.  Informative References

   [I-D.ietf-mmusic-trickle-ice]
              Ivov, E., Rescorla, E., and J. Uberti, "Trickle ICE:
              Incremental Provisioning of Candidates for the Interactive
              Connectivity Establishment (ICE) Protocol",
              draft-ietf-mmusic-trickle-ice-00 (work in progress),
              October 2013.

   [I-D.ivov-mmusic-trickle-ice-sip]
              Ivov, E., Marocco, E., and C. Holmberg, "A Session
              Initiation Protocol (SIP) usage for Trickle ICE",
              draft-ivov-mmusic-trickle-ice-sip-01 (work in progress),
              October 2013.

   [RFC0768]  Postel, J., "User Datagram Protocol", STD 6, RFC 768,
              August 1980.

   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
              RFC 793, September 1981.

   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264,
              June 2002.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC3959]  Camarillo, G., "The Early Session Disposition Type for the
              Session Initiation Protocol (SIP)", RFC 3959,
              December 2004.

   [RFC3960]  Camarillo, G. and H. Schulzrinne, "Early Media and Ringing
              Tone Generation in the Session Initiation Protocol (SIP)",
              RFC 3960, December 2004.

   [XEP-0269]
              Cionoiu, D. and P. Saint-Andre, "Jingle Early Media", XSF
              XEP 0269, May 2009.

Saint-Andre, et al.       Expires June 15, 2014                [Page 20]
Internet-Draft    SIP-XMPP Interworking: Media Sessions    December 2013

Authors' Addresses

   Peter Saint-Andre
   Cisco Systems, Inc.
   1899 Wynkoop Street, Suite 600
   Denver, CO  80202
   USA

   Phone: +1-303-308-3282
   Email: psaintan@cisco.com

   Saul Ibarra Corretge
   AG Projects
   Dr. Leijdsstraat 92
   Haarlem  2021RK
   The Netherlands

   Email: saul@ag-projects.com

   Emil Ivov
   Jitsi
   Strasbourg  67000
   France

   Phone: +33-177-624-330
   Email: emcho@jitsi.org

Saint-Andre, et al.       Expires June 15, 2014                [Page 21]