Internet Engineering Task Force                                   SIP WG
Internet Draft                                              G. Camarillo
                                                                Ericsson
                                                               E. Burger
                                                      SnowShore Networks
                                                          H. Schulzrinne
                                                     Columbia University
                                                             A. van Wijk
                                                                 Viataal
draft-camarillo-sip-deaf-02.txt
February 17, 2003
Expires: August, 2003


   Transcoding Services Invocation in the Session Initiation Protocol

STATUS OF THIS MEMO

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   To view the list Internet-Draft Shadow Directories, see
   http://www.ietf.org/shadow.html.


Abstract

   This document describes how to discover the need of transcoding
   services in a session established with SIP and how to invoke those
   transcoding services. Two models for transcoding services invocation
   are introduced; the conference bridge model and the third party call
   control model. Both models meet the requirements for SIP regarding
   transcoding services invocation to support deaf, hard of hearing and
   speech-impaired individuals.




G. Camarillo et. al.                                          [Page 1]


Internet Draft                    SIP                  February 17, 2003





                           Table of Contents



   1          Introduction ........................................    3
   2          Discovery of the Need for Transcoding Services ......    3
   3          Transcoding Services Invocation .....................    4
   3.1        Terminology .........................................    5
   3.2        Conference Bridge Transcoding Model .................    5
   3.2.1      Caller's Invocation .................................    6
   3.2.2      Callee's Invocation .................................    6
   3.3        Third Party Call Control Transcoding Model ..........    8
   3.3.1      Callee's Invocation .................................    8
   3.3.2      Caller's Invocation .................................   14
   3.3.3      Receiving the Original Stream .......................   16
   3.3.4      Transcoding Services in Parallel ....................   17
   3.3.5      Transcoding Services in Serial ......................   21
   4          Security Considerations .............................   21
   5          TODO List ...........................................   22
   6          Authors' Addresses ..................................   22
   7          Bibliography ........................................   22



























G. Camarillo et. al.                                          [Page 2]


Internet Draft                    SIP                  February 17, 2003


1 Introduction

   Two user agents involved in a SIP [1] dialog may find it impossible
   to establish a media session due to a variety of incompatibilities.
   Assuming that both user agents understand the same session
   description format (e.g., SDP), incompatibilities can be found at the
   user agent level and at the user level. At the user agent level, both
   terminals may not support any common codec or may not support common
   media types (e.g., a text-only terminal and an audio-only terminal).
   At the user level, a deaf person will not be able to understand what
   it is said over an audio stream.

   In order to make communications possible in the presence of
   incompatibilities, user agents need to introduce intermediaries that
   provide transcoding services to a session. From the SIP point of
   view, the introduction of a transcoder is done in the same way to
   resolve both user level and user agent level incompatibilities.
   Therefore, the invocation mechanisms described in this document are
   generally applicable to any type of incompatibility related to how
   the information that needs to be communicated is encoded.

   This document does not describe media server discovery. That is an
   orthogonal problem that one can address using user agent provisioning
   or other methods.

   All the examples provided in this document use the Session
   Description Protocol (SDP) [2]. However, other session description
   formats can be used with the same call flows.

   The remainder of this document is organized as follows. Section 2
   deals with the discovery of the need of transcoding services for a
   particular session. Section 3.2 introduces the conference bridge
   transcoding invocation model, and Section 3.3 introduces the third
   party call control model. Both models meet the requirements regarding
   transcoding services invocation in RFC3351 [3] to support deaf, hard
   of hearing and speech-impaired individuals.

2 Discovery of the Need for Transcoding Services

   Following the one-party consent model defined in RFC 3238 [4],
   transcoding invocation is best performed by one of the end-points
   involved in the communication. Following the same principle, one of
   the end-points should be the one detecting that transcoding is needed
   for a particular session.

   In order to decide whether or not transcoding is needed, a user agent
   needs to know the capabilities of the remote user agent. A user agent
   acting as an offerer typically obtains this knowledge by downloading



G. Camarillo et. al.                                          [Page 3]


Internet Draft                    SIP                  February 17, 2003


   a presence document that includes media capabilities (e.g., Bob is
   available on a terminal that only supports audio) or by getting an
   SDP description of media capabilities as defined in RFC 3264 [5].
   Presence documents are typically received in a NOTIFY request and SDP
   media capabilities descriptions are typically received in a 200 (OK)
   response to an OPTIONS request or in a 488 (Not Acceptable Here)
   response to an INVITE.

   A user agent client acting as an answerer typically gets an offer
   that it cannot accept. The user agent can send back a media
   capabilities description hoping that the offerer will invoke some
   type of transcoding services or it can invoke transcoding services
   itself.

   It is recommended that an offerer does not invoke transcoding
   services before making sure that the answerer does not support the
   capabilities needed for the session. Making wrong assumptions about
   the answerer's capabilities can lead to situations where two
   transcoders are introduced (one by the offerer and one by the
   answerer) in a session that would not need any transcoding services
   at all.

        An example of the situation above is a call between two GSM
        phones (without using transcoding-free operation). Both
        phones use a GSM codec, but the speech is converted from
        GSM to PCM by the originating MSC and from PCM back to GSM
        by the terminating MSC.

   Note that transcoding services can be symmetric (e.g., speech-to-text
   plus text-to-speech) or asymmetric (e.g., a one-way speech-to-text
   transcoding for a hearing impaired user that can talk).

3 Transcoding Services Invocation

   Once the need for transcoding for a particular session has been
   identified as described in Section 2, one of the user agents needs to
   invoke transcoding services.

   Invoking transcoding services from a server (T) for a session between
   two user agents (A and B) involves establishing two media sessions;
   one between A and T and another between T and B. How to invoke T's
   services (i.e., how to establish both A-T and T-B sessions) depends
   on how we model the transcoding service. We have considered two
   models for invoking a transcoding service. The first is to use a
   (dial-in and/or dial-out) conference bridge that negotiates the
   appropriate media parameters on each individual leg (i.e., A-T and
   T-B). The second is to use third party call control  [6], also
   referred to as 3pcc, to invoke the transcoding service. Section 3.2



G. Camarillo et. al.                                          [Page 4]


Internet Draft                    SIP                  February 17, 2003


   describes the conference bridge transcoding invocation model, and
   Section 3.3 describes the third party call control model.

3.1 Terminology

   All the figures in this document follow the naming convention below:

        SDP A: A session description generated by A. It contains, among
             other things, the transport address/es (IP address and port
             number) where A wants to receive media for each particular
             stream.

        SDP B: A session description generated by B. It contains, among
             other things, the transport address/es where B wants to
             receive media for each particular stream.

        SDP A+B: A session description that contains, among other
             things, the transport address/es where A wants to receive
             media and the transport address/es where B wants to receive
             media.

        SDP TA: A session description generated by T and intended for A.
             It contains, among other things, the transport address/es
             where T wants to receive media from A.

        SDP TB: A session description generated by T and intended for B.
             It contains, among other things, the transport address/es
             where T wants to receive media from B.

        SDP TA+TB: A session description generated by T that contains,
             among other things, the transport address/es where T wants
             to receive media from A and the transport address/es where
             T wants to receive media from B.

3.2 Conference Bridge Transcoding Model

   A conference server typically establishes an audio stream with each
   participant of a conference. The server sends over each individual
   stream the media received over the rest of the streams, typically
   performing some mixing. The conference server may have to send audio
   to different participants using different audio codecs. We can think
   of a transcoding service as a two-party conference server that may
   change not only the codec in use, but also the format of the media
   (e.g., audio to text). Using this model, the whole A-T-B session is
   established in the same way as a conference [7]. Typically, the user
   agent invoking the transcoding service sets up the media policy at
   the bridge (possibly using a media policy control protocol) and sends
   an INVITE to join the conference. The media policy for the session



G. Camarillo et. al.                                          [Page 5]


Internet Draft                    SIP                  February 17, 2003


   determines the type of transcoding the bridge will perform.

   Once the conference is set up and the invoker has joined it, the
   remote user has to be added as a participant as well. Users have two
   options to join a conference. A user can dial-in (i.e., send an
   INVITE request to the conference bridge) to join a conference, or the
   conference bridge can dial-out (i.e., send an INVITE request to the
   user) to add the user to the conference. Both dial-in and dial-out
   approaches are discussed in the following sections. Section 3.2.1
   deals with caller's invocation and Section 3.2.2 deals with callee's
   invocation of the service.

3.2.1 Caller's Invocation

   Once the caller has set up the conference bridge and joined the
   conference by sending an INVITE to the bridge, it has two options to
   add the callee to the session; sending a REFER  [8] to the bridge
   (that will instruct the bridge to dial-out) or sending a REFER to the
   callee (that will instruct the callee to dial-in).

   We recommend the first option (i.e., REFER sent to the bridge). The
   bridge, upon reception of the REFER, generates an INVITE towards the
   callee. The session description of the INVITE is generated according
   to the media policy set up by the caller. Figure 1 shows this
   scenario's message flow.


   Note that if the caller chooses to send the REFER directly to the
   callee (rather than to the bridge) the callee may generate an INVITE
   with a session description that contained media types the bridge was
   not configured to handle. In addition to that, some user agents may
   not support REFER or may not be able to handle out-of-the-blue REFER
   requests.

3.2.2 Callee's Invocation

   Similarly to the situation above, once the callee has set up the
   conference bridge and joined the conference by sending an INVITE to
   the bridge, it has two options to add the caller to the session;
   sending a REFER to the bridge (that will instruct the bridge to
   dial-out) or sending a REFER to the caller (that will instruct the
   caller to dial-in).

   We recommend the first option (i.e., REFER sent to the bridge). The
   bridge, upon reception of the REFER, generates an INVITE with a
   Replaces header field [9] header field towards the callee. The
   session description of the INVITE is generated according to the media
   policy set up by the callee. Figure 2 shows this scenario's message



G. Camarillo et. al.                                          [Page 6]


Internet Draft                    SIP                  February 17, 2003




      A                            T                            B

      |                            |                            |
      |------(1) INVITE SDP A----->|                            |
      |                            |                            |
      |<----(2) 200 OK SDP TA------|                            |
      |                            |                            |
      |----------(3) ACK---------->|                            |
      |                            |                            |
      | ************************** |                            |
      |*   Media Policy Set-up    *|                            |
      | ************************** |                            |
      |                            |                            |
      |---------(4) REFER--------->|                            |
      |                            |                            |
      |<--------(5) 200 OK---------|                            |
      |                            |                            |
      |                            |-----(6) INVITE SDP TB----->|
      |                            |                            |
      |                            |<-----(7) 200 OK SDP B------|
      |                            |                            |
      |                            |----------(8) ACK---------->|
      |                            |                            |
      |<--------(9) NOTIFY---------|                            |
      |                            |                            |
      |---------(10) 200 OK------->|                            |
      |                            |                            |
      | ************************** | ************************** |
      |*          MEDIA           *|*          MEDIA           *|
      | ************************** | ************************** |
      |



   Figure 1: Caller's invocation of a conference bridge


   flow.


   The flow in Figure 2 requires that the caller supports the Replaces
   header field. If the caller does not support it, the callee can send
   a 488 (Not Accpetable Here) for the original INVITE and attempt to
   establish the session acting as a caller (i.e., sending a new
   INVITE).

   Sending the REFER to the caller (instead of to the bridge) introduces



G. Camarillo et. al.                                          [Page 7]


Internet Draft                    SIP                  February 17, 2003


   a number of issues, since there is currently no way for the callee to
   inform the caller that the newly established session will substitute
   the original session.

3.3 Third Party Call Control Transcoding Model

   If we model T as a transcoding service rather than a special case of
   a conferencing server, a single INVITE transaction from the invoker
   of the service provides T with both A's and B's session descriptions.
   In order to provide in a single session description information about
   media streams that belong to different entities (A and B), the
   session description format in use should provide a means to define
   how these streams should be mapped. For instance, in a session
   description with two audio streams and one text stream, a possible
   mapping would be the following; the information received over the
   first audio stream should be sent over the text stream and over the
   second audio stream, and the incoming text should be sent only over
   the first audio stream. SDP [2] can convey this information using the
   source and sink attributes [10].

   As stated previously, the invocation of a transcoding service
   consists of establishing two sessions; A-T and T-B. How these
   sessions are established depends on which party, the caller (A) or
   the callee (B), invokes the transcoding services. However, we have
   followed a general principle to design our 3pcc flows; a 200 (OK)
   response from the transcoding service have to be received before
   contacting the callee. This tries to ensure that the transcoding
   service will be available when the callee accepts the session.

   However, note that the transcoding service does not know the exact
   type of transcoding it will be performing until the callee accepts
   the session. Therefore, there are always changes of failing to
   provide transcoding services after the callee has accepted the
   session. A system with tough requirements could use preconditions to
   avoid this situation. When preconditions are used, the callee is not
   alerted until everything is ready for the session.

3.3.1 Callee's Invocation

   In this scenario, B receives an INVITE from A, and B decides to
   introduce T in the session. Figure 3 shows the call flow for this
   scenario.


   In Figure 3 A can both hear and speak and B is a deaf user with a
   speech impairment. A proposes to establish a session that consists of
   an audio stream (1). B wants to send and receive only text, so it
   invokes a transcoding service T that will perform both speech-to-text



G. Camarillo et. al.                                          [Page 8]


Internet Draft                    SIP                  February 17, 2003




      A                            T                            B

      |                            |                            |
      |-------------------(1) INVITE SDP A--------------------->|
      |                            |                            |
      |                            |<-----(2) INVITE SDP B------|
      |                            |                            |
      |                            |------(3) 200 OK SDP TB---->|
      |                            |                            |
      |                            | ************************** |
      |                            |*   Media Policy Set-up    *|
      |                            | ************************** |
      |                            |                            |
      |                            |<--------(5) REFER----------|
      |                            |                            |
      |                            |---------(6) 200 OK-------->|
      |                            |                            |
      |<-----(7) INVITE SDP TA-----|                            |
      |                            |                            |
      |------(8) 200 OK SDP A----->|                            |
      |                            |                            |
      |<----------(9) ACK----------|                            |
      |                            |                            |
      |                            |---------(10) NOTIFY------->|
      |                            |                            |
      |                            |<--------(11) 200 OK--------|
      |                            |                            |
      |---------------------(12) CANCEL------------------------>|
      |                            |                            |
      |<--------------------(13) 200 OK-------------------------|
      |                            |                            |
      |<-------------(14) 487 Request Terminated----------------|
      |                            |                            |
      |-----------------------(15) ACK------------------------->|
      |                            |                            |
      | ************************** | ************************** |
      |*          MEDIA           *|*          MEDIA           *|
      | ************************** | ************************** |
      |                            |                            |








   Figure 2: Conference bridge transcoding model


G. Camarillo et. al.                                          [Page 9]


Internet Draft                    SIP                  February 17, 2003




      A                            T                            B

      |                            |                            |
      |--------------------(1) INVITE SDP A-------------------->|
      |                            |                            |
      |                            |<---(2) INVITE SDP A+B------|
      |                            |                            |
      |                            |---(3) 200 OK SDP TA+TB---->|
      |                            |                            |
      |                            |<---------(4) ACK-----------|
      |                            |                            |
      |<-------------------(5) 200 OK SDP TA--------------------|
      |                            |                            |
      |------------------------(6) ACK------------------------->|
      |                            |                            |
      | ************************** | ************************** |
      |*          MEDIA           *|*          MEDIA           *|
      | ************************** | ************************** |
      |                            |                            |



   Figure 3: Callee's invocation of a transcoding service


   and text-to-speech conversions (2). The session descriptions of
   Figure 3 are partially shown below.

   (1) INVITE SDP A

          m=audio 20000 RTP/AVP 0
          c=IN IP4 A.domain.com



   (2) INVITE SDP A+B

          m=audio 20000 RTP/AVP 0
          c=IN IP4 A.domain.com
          a=source:1
          a=sink:2
          m=text 40000 RTP/AVP 96
          c=IN IP4 B.domain.com
          a=rtpmap:96 t140/1000
          a=source:2
          a=sink:1




G. Camarillo et. al.                                         [Page 10]


Internet Draft                    SIP                  February 17, 2003


   (3) 200 OK SDP TA+TB

          m=audio 30000 RTP/AVP 0
          c=IN IP4 T.domain.com
          a=source:1
          a=sink:2
          m=text 30002 RTP/AVP 96
          c=IN IP4 T.domain.com
          a=rtpmap:96 t140/1000
          a=source:2
          a=sink:1



   (5) 200 OK SDP TA

          m=audio 30000 RTP/AVP 0
          c=IN IP4 T.domain.com



   Four media streams (i.e., two bi-directional streams) have been
   established at this point:

        1.   Audio from A to T.domain.com:30000

        2.   Text from T to B.domain.com:40000

        3.   Text from B to T.domain.com:30002

        4.   Audio from T to A.domain.com:20000

   When either A or B decide to terminate the session, B will send a BYE
   to T indicating that the session is over.

   If the first INVITE (1) received by B is empty (no session
   description), the call flow is slightly different. Figure 4 shows the
   messages involved.


   B may have different reasons for invoking T before knowing A's
   session description. B may want to hide its capabilities, and
   therefore it wants to return a session description with all the
   codecs B supports plus all the codecs T supports. Or T may provide
   recording services (besides transcoding), and B wants T to record the
   conversation, regardless of whether or not transcoding is needed.

   This scenario (Figure 4) is a bit more complex than the previous one.



G. Camarillo et. al.                                         [Page 11]


Internet Draft                    SIP                  February 17, 2003




      A                            T                            B

      |                            |                            |
      |----------------------(1) INVITE------------------------>|
      |                            |                            |
      |                            |<-----(2) INVITE SDP B------|
      |                            |                            |
      |                            |---(3) 200 OK SDP TA+TB---->|
      |                            |                            |
      |                            |<---------(4) ACK-----------|
      |                            |                            |
      |<-------------------(5) 200 OK SDP TA--------------------|
      |                            |                            |
      |-----------------------(6) ACK SDP A-------------------->|
      |                            |                            |
      |                            |<-------(7) INVITE----------|
      |                            |                            |
      |                            |---(8) 200 OK SDP TA+TB---->|
      |                            |                            |
      |<-----------------(9) INVITE SDP TA----------------------|
      |                            |                            |
      |------------------(10) 200 OK SDP A--------------------->|
      |                            |                            |
      |<-----------------------(11) ACK-------------------------|
      |                            |                            |
      |                            |<-----(12) ACK SDP A+B------|
      |                            |                            |
      | ************************** | ************************** |
      |*          MEDIA           *|*          MEDIA           *|
      | ************************** | ************************** |


   Figure 4: Callee's invocation after initial INVITE without SDP


   In INVITE (2), B still does not have SDP A, so it cannot provide T
   with that information. When B finally receives SDP A in (6), it has
   to send it to T. B sends an empty INVITE to T (7) and gets a 200 OK
   with SDP TA+TB (8). In general, this SDP TA+TB can be different than
   the one that was sent in (3). That is why B needs to send the updated
   SDP TA to A in (9). A then sends a possibly updated SDP A (10) and B
   sends it to T in (12). However, if T happens to return the same SDP
   TA+TB in (8) as in (3), B can skip messages (9), (10) and (11).
   Therefore, implementors of transcoding services are encouraged to
   return the same session description in (8) as in (3) in this type of
   scenario. The session descriptions of this flow are shown below:




G. Camarillo et. al.                                         [Page 12]


Internet Draft                    SIP                  February 17, 2003


   (2) INVITE SDP A+B

          m=audio 20000 RTP/AVP 0
          c=IN IP4 0.0.0.0
          a=source:1
          a=sink:2
          m=text 40000 RTP/AVP 96
          c=IN IP4 B.domain.com
          a=rtpmap:96 t140/1000
          a=source:2
          a=sink:1



   (3) 200 OK SDP TA+TB

          m=audio 30000 RTP/AVP 0
          c=IN IP4 T.domain.com
          a=source:1
          a=sink:2
          m=text 30002 RTP/AVP 96
          c=IN IP4 T.domain.com
          a=rtpmap:96 t140/1000
          a=source:2
          a=sink:1



   (5) 200 OK SDP TA

          m=audio 30000 RTP/AVP 0
          c=IN IP4 T.domain.com



   (6) ACK SDP A

          m=audio 20000 RTP/AVP 0
          c=IN IP4 A.domain.com



   (8) 200 OK SDP TA+TB

          m=audio 30004 RTP/AVP 0
          c=IN IP4 T.domain.com
          a=source:1
          a=sink:2



G. Camarillo et. al.                                         [Page 13]


Internet Draft                    SIP                  February 17, 2003


          m=text 30006 RTP/AVP 96
          c=IN IP4 T.domain.com
          a=rtpmap:96 t140/1000
          a=source:2
          a=sink:1



   (9) INVITE SDP TA

          m=audio 30004 RTP/AVP 0
          c=IN IP4 T.domain.com



   (10) 200 OK SDP A

          m=audio 20002 RTP/AVP 0
          c=IN IP4 A.domain.com



   (12) ACK SDP A+B

          m=audio 20002 RTP/AVP 0
          c=IN IP4 A.domain.com
          a=source:1
          a=sink:2
          m=text 40000 RTP/AVP 96
          c=IN IP4 B.domain.com
          a=rtpmap:96 t140/1000
          a=source:2
          a=sink:1



   Four media streams (i.e., two bi-directional streams) have been
   established at this point:

        1.   Audio from A to T.domain.com:30004

        2.   Text from T to B.domain.com:40000

        3.   Text from B to T.domain.com:30006

        4.   Audio from T to A.domain.com:20002

3.3.2 Caller's Invocation



G. Camarillo et. al.                                         [Page 14]


Internet Draft                    SIP                  February 17, 2003


   In this scenario, A wishes to establish a session with B using a
   transcoding service. A uses 3pcc to set up the session between T and
   B. The call flow we provide here is slightly different than the ones
   in [6]. In [6], the controller establishes a session between two user
   agents, being the user agents the ones deciding the characteristics
   of the streams. Here, A wants to establish a session between T and B,
   but A wants to decide how many and which types of streams are
   established. That is why A sends its session description in the first
   INVITE (1) to T, as opposed to the media-less initial INVITE
   recommended by [6]. Figure 5 shows the call flow for this scenario.



      A                            T                            B

      |                            |                            |
      |-------(1) INVITE SDP A---->|                            |
      |                            |                            |
      |<----(2) 200 OK SDP TA+TB---|                            |
      |                            |                            |
      |----------(3) ACK---------->|                            |
      |                            |                            |
      |--------------------(4) INVITE SDP TA------------------->|
      |                            |                            |
      |<--------------------(5) 200 OK SDP B--------------------|
      |                            |                            |
      |-------------------------(6) ACK------------------------>|
      |                            |                            |
      |--------(7) INVITE--------->|                            |
      |                            |                            |
      |<---(8) 200 OK SDP TA+TB  --|                            |
      |                            |                            |
      |--------------------(9) INVITE SDP TA------------------->|
      |                            |                            |
      |<-------------------(10) 200 OK SDP B--------------------|
      |                            |                            |
      |-------------------------(11) ACK----------------------->|
      |                            |                            |
      |------(12) ACK SDP A+B----->|                            |
      |                            |                            |
      | ************************** | ************************** |
      |*          MEDIA           *|*          MEDIA           *|
      | ************************** | ************************** |
      |                            |                            |


   Figure 5: Caller's invocation of a transcoding service





G. Camarillo et. al.                                         [Page 15]


Internet Draft                    SIP                  February 17, 2003


   We do not include the session descriptions of this flow, since they
   are very similar to the ones in Figure 4. In this flow, if T returns
   the same SDP TA+TB in (8) as in (2), messages (9), (10) and (11) can
   be skipped.

3.3.3 Receiving the Original Stream

   Sometimes, as pointed out in the requirements for SIP in support of
   deaf, hard of hearing and speech-impaired individuals [3], a user
   wants to receive both the original stream (e.g., audio) and the
   transcoded stream (e.g., the output of the speech-to-text
   conversion). There are various possible solutions for this problem.
   One solution consists of using the SDP group attribute with FID
   semantics [11]. FID allows requesting that a stream is sent to two
   different transport addresses in parallel, as shown below:

            a=group:FID 1 2
            m=audio 20000 RTP/AVP 0
            c=IN IP4 A.domain.com
            a=mid:1
            m=audio 30000 RTP/AVP 0
            c=IN IP4 T.domain.com
            a=mid:2



   The problem with this solution is that the majority of the SIP user
   agents do not support FID. And even if FID is supported, many user
   agents do not support sending simultaneous copies of the same media
   stream at the same time. In addition to that, both copies of the
   stream need to use the same codec.

   Therefore, we recommend that T (instead of a user agent) replicates
   the media stream. The following session description requests T to
   perform speech-to-text and text-to-speech conversions between the
   first audio stream and the text stream. In addition, it requests T to
   copy of the first audio stream to the second audio stream and send it
   to A.

            m=audio 40000 RTP/AVP 0
            c=IN IP4 B.domain.com
            a=source:1
            a=sink:2
            m=audio 20000 RTP/AVP 0
            c=IN IP4 A.domain.com
            a=recvonly
            a=sink:1
            m=text 20002 RTP/AVP 96



G. Camarillo et. al.                                         [Page 16]


Internet Draft                    SIP                  February 17, 2003


            c=IN IP4 A.domain.com
            a=rtpmap:96 t140/1000
            a=source:2
            a=sink:1



3.3.4 Transcoding Services in Parallel

   Transcoding services sometimes consist of human relays (e.g., a
   person performing speech-to-text and text-to-speech conversions for a
   session). If the same person is involved in both conversions (i.e.,
   from A to B and from B to A), he or she has access to all the
   conversation. In order to provide some degree of privacy, sometimes
   two different persons are allocated to do the job (i.e., one person
   handles A->B and the other B->A). This type of disposition is also
   useful for automated transcoding services, where one machine converts
   text to synthetic speech (text-to-speech) and a different machine
   performs voice recognition (speech-to-text).

   The scenario just described involves four different sessions; A-T1,
   T1-B, B-T2 and T2-A. Figure 6 shows the call flow where A invokes T1
   and T2.


   (1) INVITE SDP AT1

          m=text 20000 RTP/AVP 96
          c=IN IP4 A.domain.com
          a=rtpmap:96 t140/1000
          a=sendonly
          a=source:1
          m=audio 20000 RTP/AVP 0
          c=IN IP4 0.0.0.0
          a=recvonly
          a=sink:1



   (2) INVITE SDP AT2

          m=text 20002 RTP/AVP 96
          c=IN IP4 A.domain.com
          a=rtpmap:96 t140/1000
          a=recvonly
          a=sink:1
          m=audio 20000 RTP/AVP 0
          c=IN IP4 0.0.0.0



G. Camarillo et. al.                                         [Page 17]


Internet Draft                    SIP                  February 17, 2003


          a=sendonly
          a=source:1



   (3) 200 OK SDP T1A+T1B

          m=text 30000 RTP/AVP 96
          c=IN IP4 T1.domain.com
          a=rtpmap:96 t140/1000
          a=recvonly
          a=source:1
          m=audio 30002 RTP/AVP 0
          c=IN IP4 T1.domain.com
          a=sendonly
          a=sink:1



   (5) 200 OK SDP T2A+T2B

          m=text 40000 RTP/AVP 96
          c=IN IP4 T2.domain.com
          a=rtpmap:96 t140/1000
          a=sendonly
          a=sink:1
          m=audio 40002 RTP/AVP 0
          c=IN IP4 T2.domain.com
          a=recvonly
          a=source:1



   (7) INVITE SDP T1B+T2B

          m=audio 30002 RTP/AVP 0
          c=IN IP4 T1.domain.com
          a=sendonly
          m=audio 40002 RTP/AVP 0
          c=IN IP4 T2.domain.com
          a=recvonly



   (8) 200 OK SDP BT1+BT2

          m=audio 50000 RTP/AVP 0
          c=IN IP4 B.domain.com



G. Camarillo et. al.                                         [Page 18]


Internet Draft                    SIP                  February 17, 2003




  A                          T1                     T2            B

  |                          |                      |             |
  |----(1) INVITE SDP AT1--->|                      |             |
  |                          |                      |             |
  |----------------(2) INVITE SDP AT2-------------->|             |
  |                          |                      |             |
  |<-(3) 200 OK SDP T1A+T1B--|                      |             |
  |                          |                      |             |
  |---------(4) ACK--------->|                      |             |
  |                          |                      |             |
  |<---------------(5) 200 OK SDP T2A+T2B-----------|             |
  |                          |                      |             |
  |----------------------(6) ACK------------------->|             |
  |                          |                      |             |
  |-----------------------(7) INVITE SDP T1B+T2B----------------->|
  |                          |                      |             |
  |<----------------------(8) 200 OK SDP BT1+BT2------------------|
  |                          |                      |             |
  |------(9) INVITE--------->|                      |             |
  |                          |                      |             |
  |-------------------(10) INVITE------------------>|             |
  |                          |                      |             |
  |<-(11) 200 OK SDP T1A+T1B-|                      |             |
  |                          |                      |             |
  |<------------(12) 200 OK SDP T2A+T2B-------------|             |
  |                          |                      |             |
  |------------------(13) INVITE SDP T1B+T2B--------------------->|
  |                          |                      |             |
  |<-----------------(14) 200 OK SDP BT1+BT2----------------------|
  |                          |                      |             |
  |--------------------------(15) ACK---------------------------->|
  |                          |                      |             |
  |---(16) ACK SDP AT1+BT1-->|                      |             |
  |                          |                      |             |
  |------------(17) ACK SDP AT2+BT2---------------->|             |
  |                          |                      |             |
  | ************************ | ********************************** |
  |*          MEDIA         *|*               MEDIA              *|
  | ************************ | ********************************** |
  |                          |                      |             |
  | ***********************************************   ***********
  |*                      MEDIA                    *|*   MEDIA   *|
  | *********************************************** | *********** |
  |                          |                      |             |


   Figure 6: Transcoding services in parallel


G. Camarillo et. al.                                         [Page 19]


Internet Draft                    SIP                  February 17, 2003


          a=recvonly
          m=audio 50002 RTP/AVP 0
          c=IN IP4 B.domain.com
          a=sendonly



   (11) 200 OK SDP T1A+T1B

          m=text 30000 RTP/AVP 96
          c=IN IP4 T1.domain.com
          a=rtpmap:96 t140/1000
          a=recvonly
          a=source:1
          m=audio 30002 RTP/AVP 0
          c=IN IP4 T1.domain.com
          a=sendonly
          a=sink:1



   (12) 200 OK SDP T2A+T2B

          m=text 40000 RTP/AVP 96
          c=IN IP4 T2.domain.com
          a=rtpmap:96 t140/1000
          a=sendonly
          a=sink:1
          m=audio 40002 RTP/AVP 0
          c=IN IP4 T2.domain.com
          a=recvonly
          a=source:1



   Since T1 have returned the same SDP in (11) as in (3) and T2 has
   returned the same SDP in (12) as in (5), messages (13), (14) and (15)
   can be skipped.

   (16) ACK SDP AT1+BT1

          m=text 20000 RTP/AVP 96
          c=IN IP4 A.domain.com
          a=rtpmap:96 t140/1000
          a=sendonly
          a=source:1
          m=audio 50000 RTP/AVP 0
          c=IN IP4 B.domain.com



G. Camarillo et. al.                                         [Page 20]


Internet Draft                    SIP                  February 17, 2003


          a=recvonly
          a=sink:1



   (17) ACK SDP AT2+BT2

          m=text 20002 RTP/AVP 96
          c=IN IP4 A.domain.com
          a=rtpmap:96 t140/1000
          a=recvonly
          a=sink:1
          m=audio 50002 RTP/AVP 0
          c=IN IP4 B.domain.com
          a=sendonly
          a=source:1



   Four media streams have been established at this point:

        1.   Text from A to T1.domain.com:30000

        2.   Audio from T1 to B.domain.com:50000

        3.   Audio from B to T2.domain.com:40002

        4.   Text from T2 to A.domain.com:20002

   Note that B, the user agent server, needs to support two media
   streams; one sendonly and the other recvonly. At present, some user
   agents, although they support a single sendrecv media stream, they do
   not support a different media line per direction. Implementers are
   encouraged to build support for this feature.

3.3.5 Transcoding Services in Serial

   In a distributed environment, a complex transcoding service (e.g.,
   English text to Spanish speech) is often provided by several servers.
   For example, one server performs English text to Spanish text
   translation, and its output is feed into a server that performs
   text-to-speech conversion. The flow in Figure 7 shows how A invokes
   T1 and T2.


4 Security Considerations

   This document describes how to use the REFER method and third party



G. Camarillo et. al.                                         [Page 21]


Internet Draft                    SIP                  February 17, 2003


   call control to invoke transcoding services. It does not introduce
   new security considerations besides the ones discussed in  [8] and
   [6].

5 TODO List

   We need to see whether or not it is possible to use the media policy
   work in the 3pcc model as well (instead of source/sink).

6 Authors' Addresses

   Gonzalo Camarillo
   Ericsson
   Advanced Signalling Research Lab.
   FIN-02420 Jorvas
   Finland
   electronic mail:  Gonzalo.Camarillo@ericsson.com

   Eric W. Burger
   SnowShore Networks, Inc.
   Chelmsford, MA
   USA
   electronic mail:  eburger@snowshore.com

   Henning Schulzrinne
   Dept. of Computer Science
   Columbia University 1214 Amsterdam Avenue, MC 0401
   New York, NY 10027
   USA
   electronic mail:  schulzrinne@cs.columbia.edu

   Arnoud van Wijk
   Viataal
   Research & Development
   Afdeling RDS
   Theerestraat 42
   5271 GD Sint-Michielsgestel
   The Netherlands
   electronic mail:  a.vwijk@viataal.nl

7 Bibliography

   [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. R. Johnston, J.
   Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session
   initiation protocol," RFC 3261, Internet Engineering Task Force, June
   2002.

   [2] M. Handley and V. Jacobson, "SDP: session description protocol,"



G. Camarillo et. al.                                         [Page 22]


Internet Draft                    SIP                  February 17, 2003




  A                           T1                    T2            B

  |                           |                     |             |
  |----(1) INVITE SDP A-----> |                     |             |
  |                           |                     |             |
  |<-(2) 200 OK SDP T1A+T1T2- |                     |             |
  |                           |                     |             |
  |----------(3) ACK--------> |                     |             |
  |                           |                     |             |
  |-----------(4) INVITE SDP T1T2------------------>|             |
  |                           |                     |             |
  |<-----------(5) 200 OK SDP T2T1+T2B--------------|             |
  |                           |                     |             |
  |---------------------(6) ACK-------------------->|             |
  |                           |                     |             |
  |---------------------------(7) INVITE SDP T2B----------------->|
  |                           |                     |             |
  |<--------------------------(8) 200 OK SDP B--------------------|
  |                           |                     |             |
  |--------------------------------(9) ACK----------------------->|
  |                           |                     |             |
  |---(10) INVITE-----------> |                     |             |
  |                           |                     |             |
  |------------------(11) INVITE------------------->|             |
  |                           |                     |             |
  |<-(12) 200 OK SDP T1A+T1T2-|                     |             |
  |                           |                     |             |
  |<-------------(13) 200 OK SDP T2T1+T2B-----------|             |
  |                           |                     |             |
  |---(14) ACK SDP T1T2+B---> |                     |             |
  |                           |                     |             |
  |-----------------------(15) INVITE SDP T2B-------------------->|
  |                           |                     |             |
  |<----------------------(16) 200 OK SDP B-----------------------|
  |                           |                     |             |
  |----------------(17) ACK SDP T1T2+B------------->|             |
  |                           |                     |             |
  |----------------------------(18) ACK-------------------------->|
  |                           |                     |             |
  | ************************* | *******************   *********** |
  |*         MEDIA           *|*       MEDIA       *|*   MEDIA   *|
  | ************************* | ******************* | *********** |
  |                           |                     |             |


   Figure 7: Transcoding services in serial

   RFC 2327, Internet Engineering Task Force, Apr. 1998.

   [3] N. Charlton, M. Gasson, G. Gybels, M. Spanner, and A. van Wijk,
G. Camarillo et. al.                                         [Page 23]


Internet Draft                    SIP                  February 17, 2003


   RFC 3351, Internet Engineering Task Force, Aug. 2002.

   [4] S. Floyd and L. Daigle, "IAB architectural and policy
   considerations for open pluggable edge services," RFC 3238, Internet
   Engineering Task Force, Jan. 2002.

   [5] J. Rosenberg and H. Schulzrinne, "An offer/answer model with
   session description protocol (SDP)," RFC 3264, Internet Engineering
   Task Force, June 2002.

   [6] J. Rosenberg, J. Peterson, H. Schulzrinne, and G. Camarillo,
   "Best current practices for third party call control in the session
   initiation protocol," internet draft, Internet Engineering Task
   Force, June 2002.  Work in progress.

   [7] J. Rosenberg, "A framework for conferencing with the session
   initiation protocol," internet draft, Internet Engineering Task
   Force, Nov. 2002.  Work in progress.

   [8] R. Sparks, "The SIP refer method," internet draft, Internet
   Engineering Task Force, Dec. 2002.  Work in progress.

   [9] B. Biggs, R. Dean, and R. Mahy, "The session inititation protocol
   (SIP)," internet draft, Internet Engineering Task Force, May 2002.
   Work in progress.

   [10] G. Camarillo, H. Schulzrinne, and E. Burger, "The source and
   sink attributes for the session description protocol," internet
   draft, Internet Engineering Task Force, Sept. 2002.  Work in
   progress.

   [11] G. Camarillo, J. Holler, G. Eriksson, and H. Schulzrinne,
   "Grouping of m lines in SDP," internet draft, Internet Engineering
   Task Force, Feb. 2002.  Work in progress.



   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights. Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11. Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such



G. Camarillo et. al.                                         [Page 24]


Internet Draft                    SIP                  February 17, 2003


   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard. Please address the information to the IETF Executive
   Director.

   Full Copyright Statement

   Copyright (c) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.















G. Camarillo et. al.                                         [Page 25]