Skip to main content

RTP Payload Format for the Speex Codec
RFC 5574

Document Type RFC - Proposed Standard (June 2009)
Authors Aymeric Moizard , Greg Herlein , Jean-Marc Valin , Alfred Heggestad
Last updated 2015-10-14
RFC stream Internet Engineering Task Force (IETF)
Formats
Additional resources Mailing list discussion
IESG Responsible AD Cullen Fluffy Jennings
Send notices to (None)
RFC 5574
quot;;vbr=on

   Some example SDP session descriptions utilizing Speex encodings
   follow.

5.1.  Example Supporting All Modes, Prefer Mode 4

   The offerer indicates that it wishes to receive a Speex stream at
   8000 Hz, and wishes to receive Speex 'mode 4'.  It is important to
   understand that any other mode might still be sent by remote party:
   the device might have bandwidth limitation or might only be able to
   send 'mode="3"'.  Thus, applications that support all decoding modes
   SHOULD include 'mode="any"' as shown in the example below:

             m=audio 8088 RTP/AVP 97
             a=rtpmap:97 speex/8000
             a=fmtp:97 mode="4,any"

5.2.  Example Supporting Only Modes 3 and 5

   The offerer indicates the mode he wishes to receive (Speex 'mode 3').
   This offer indicates mode 3 and mode 5 are supported and that no
   other modes are supported.  The remote party MUST NOT configure its
   encoder using another Speex mode.

             m=audio 8088 RTP/AVP 97
             a=rtmap:97 speex/8000
             a=fmtp:97 mode="3,5"

5.3.  Example with Variable Bit-Rate and Comfort Noise

   The offerer indicates that it wishes to receive variable bit-rate
   frames with comfort noise:

             m=audio 8088 RTP/AVP 97
             a=rtmap:97 speex/8000
             a=fmtp:97 vbr=on;cng=on

Herlein, et al.             Standards Track                    [Page 10]
RFC 5574                         Speex                         June 2009

5.4.  Example with Voice Activity Detection

   The offerer indicates that it wishes to use silence suppression.  In
   this case, the vbr=vad parameter will be used:

             m=audio 8088 RTP/AVP 97
             a=rtmap:97 speex/8000
             a=fmtp:97 vbr=vad

5.5.  Example with Multiple Sampling Rates

   The offerer indicates that it wishes to receive Speex audio at 16000
   Hz with mode 10 (42.2 kbit/s) or, alternatively, Speex audio at 8000
   Hz with mode 7 (24.6 kbit/s).  The offerer supports decoding all
   modes.

             m=audio 8088 RTP/AVP 97 98
             a=rtmap:97 speex/16000
             a=fmtp:97 mode="10,any"
             a=rtmap:98 speex/8000
             a=fmtp:98 mode="7,any"

5.6.  Example with Ptime and Multiple Speex Frames

   The "ptime" SDP attribute is used to denote the packetization
   interval (i.e., how many milliseconds of audio is encoded in a single
   RTP packet).  Since Speex uses 20 msec frames, ptime values of
   multiples of 20 denote multiple Speex frames per packet.  It is
   recommended to use ptime values that are a multiple of 20.

   If ptime contains a value that is not multiple of 20, the internal
   interpretation of it should be rounded up to the nearest multiple of
   20 before the number of Speex frames is calculated.  For example, if
   the "ptime" attribute is set to 30, the internal interpretation
   should be rounded up to 40 and then used to calculate two Speex
   frames per packet.

   In the example below, the ptime value is set to 40, indicating that
   there are two frames in each packet.

             m=audio 8088 RTP/AVP 97
             a=rtpmap:97 speex/8000
             a=ptime:40

   Note that the ptime parameter applies to all payloads listed in the
   media line and is not used as part of an a=fmtp directive.

Herlein, et al.             Standards Track                    [Page 11]
RFC 5574                         Speex                         June 2009

   Care must be taken when setting the value of ptime so that the RTP
   packet size does not exceed the path MTU.

5.7.  Example with Complete Offer/Answer Exchange

   The offerer indicates that it wishes to receive Speex audio at 16000
   Hz or, alternatively, Speex audio at 8000 Hz.  The offerer does
   support ALL modes because no mode is specified.

             m=audio 8088 RTP/AVP 97 98
             a=rtmap:97 speex/16000
             a=rtmap:98 speex/8000

   The answerer indicates that it wishes to receive Speex audio at 8000
   Hz, which is the only sampling rate it supports.  The answerer does
   support ALL modes because no mode is specified.

             m=audio 8088 RTP/AVP 99
             a=rtmap:99 speex/8000

6.  Implementation Guidelines

   Implementations that support Speex are responsible for correctly
   decoding incoming Speex frames.

   Each Speex frame does contain all needed information to decode
   itself.  In particular, the 'mode' and 'ptime' values proposed in the
   SDP contents MUST NOT be used for decoding: those values are not
   needed to properly decode a RTP Speex stream.

7.  Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [RFC3550], and any appropriate RTP profile.  This
   implies that confidentiality of the media streams is achieved by
   encryption.  Because the data compression used with this payload
   format is applied end-to-end, encryption may be performed after
   compression so there is no conflict between the two operations.

   A potential denial-of-service threat exists for data encodings using
   compression techniques that have non-uniform receiver-end
   computational load.  The attacker can inject pathological datagrams
   into the stream that are complex to decode and cause the receiver to
   be overloaded.  However, this encoding does not exhibit any
   significant non-uniformity.

Herlein, et al.             Standards Track                    [Page 12]
RFC 5574                         Speex                         June 2009

   As with any IP-based protocol, in some circumstances, a receiver may
   be overloaded simply by the receipt of too many packets, either
   desired or undesired.  Network-layer authentication may be used to
   discard packets from undesired sources, but the processing cost of
   the authentication itself may be too high.

8.  Acknowledgments

   The authors would like to thank Equivalence Pty Ltd of Australia for
   their assistance in attempting to standardize the use of Speex in
   H.323 applications, and for implementing Speex in their open-source
   OpenH323 stack.  The authors would also like to thank Brian C. Wiles
   <brian@streamcomm.com> of StreamComm for his assistance in developing
   the proposed standard for Speex use in H.323 applications.

   The authors would also like to thank the following members of the
   Speex and AVT communities for their input: Ross Finlayson, Federico
   Montesino Pouzols, Henning Schulzrinne, Magnus Westerlund, Colin
   Perkins, and Ivo Emanuel Goncalves.

   Thanks to former authors of this document; Simon Morlat, Roger
   Hardiman, and Phil Kerr.

9.  References

9.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
              Description Protocol", RFC 4566, July 2006.

9.2.  Informative References

   [CELP]     Schroeder, M. and B. Atal, "Code-excited linear
              prediction(CELP): High-quality speech at very low bit
              rates", Proc. International Conference on Acoustics,
              Speech, and Signal Processing (ICASSP), Vol 10, pp. 937-
              940, 1985, <http://www.ntis.gov/>.

   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
              Registration Procedures", BCP 13, RFC 4288, December 2005.

Herlein, et al.             Standards Track                    [Page 13]
RFC 5574                         Speex                         June 2009

   [SPEEX]    Valin, J., "The Speex Codec Manual",
              <http://www.speex.org/docs/>.

Authors' Addresses

   Greg Herlein
   Independent
   2034 Filbert Street
   San Francisco, California  94123
   United States

   EMail: gherlein@herlein.com

   Jean-Marc Valin
   Xiph.Org Foundation

   EMail: jean-marc.valin@usherbrooke.ca

   Alfred E. Heggestad
   Creytiv.com
   Biskop J. Nilssonsgt. 20a
   Oslo  0659
   Norway

   EMail: aeh@db.org

   Aymeric Moizard
   Antisip
   5 Place Benoit Crepu
   Lyon,   69005
   France

   EMail: jack@atosc.org

Herlein, et al.             Standards Track                    [Page 14]