RTP Payload Format for the Speex Codec
RFC 5574
Document | Type | RFC - Proposed Standard (June 2009) | |
---|---|---|---|
Authors | Aymeric Moizard , Greg Herlein , Jean-Marc Valin , Alfred Heggestad | ||
Last updated | 2015-10-14 | ||
RFC stream | Internet Engineering Task Force (IETF) | ||
Formats | |||
Additional resources | Mailing list discussion | ||
IESG | Responsible AD | Cullen Fluffy Jennings | |
Send notices to | (None) |
RFC 5574
quot;;vbr=on Some example SDP session descriptions utilizing Speex encodings follow. 5.1. Example Supporting All Modes, Prefer Mode 4 The offerer indicates that it wishes to receive a Speex stream at 8000 Hz, and wishes to receive Speex 'mode 4'. It is important to understand that any other mode might still be sent by remote party: the device might have bandwidth limitation or might only be able to send 'mode="3"'. Thus, applications that support all decoding modes SHOULD include 'mode="any"' as shown in the example below: m=audio 8088 RTP/AVP 97 a=rtpmap:97 speex/8000 a=fmtp:97 mode="4,any" 5.2. Example Supporting Only Modes 3 and 5 The offerer indicates the mode he wishes to receive (Speex 'mode 3'). This offer indicates mode 3 and mode 5 are supported and that no other modes are supported. The remote party MUST NOT configure its encoder using another Speex mode. m=audio 8088 RTP/AVP 97 a=rtmap:97 speex/8000 a=fmtp:97 mode="3,5" 5.3. Example with Variable Bit-Rate and Comfort Noise The offerer indicates that it wishes to receive variable bit-rate frames with comfort noise: m=audio 8088 RTP/AVP 97 a=rtmap:97 speex/8000 a=fmtp:97 vbr=on;cng=on Herlein, et al. Standards Track [Page 10] RFC 5574 Speex June 2009 5.4. Example with Voice Activity Detection The offerer indicates that it wishes to use silence suppression. In this case, the vbr=vad parameter will be used: m=audio 8088 RTP/AVP 97 a=rtmap:97 speex/8000 a=fmtp:97 vbr=vad 5.5. Example with Multiple Sampling Rates The offerer indicates that it wishes to receive Speex audio at 16000 Hz with mode 10 (42.2 kbit/s) or, alternatively, Speex audio at 8000 Hz with mode 7 (24.6 kbit/s). The offerer supports decoding all modes. m=audio 8088 RTP/AVP 97 98 a=rtmap:97 speex/16000 a=fmtp:97 mode="10,any" a=rtmap:98 speex/8000 a=fmtp:98 mode="7,any" 5.6. Example with Ptime and Multiple Speex Frames The "ptime" SDP attribute is used to denote the packetization interval (i.e., how many milliseconds of audio is encoded in a single RTP packet). Since Speex uses 20 msec frames, ptime values of multiples of 20 denote multiple Speex frames per packet. It is recommended to use ptime values that are a multiple of 20. If ptime contains a value that is not multiple of 20, the internal interpretation of it should be rounded up to the nearest multiple of 20 before the number of Speex frames is calculated. For example, if the "ptime" attribute is set to 30, the internal interpretation should be rounded up to 40 and then used to calculate two Speex frames per packet. In the example below, the ptime value is set to 40, indicating that there are two frames in each packet. m=audio 8088 RTP/AVP 97 a=rtpmap:97 speex/8000 a=ptime:40 Note that the ptime parameter applies to all payloads listed in the media line and is not used as part of an a=fmtp directive. Herlein, et al. Standards Track [Page 11] RFC 5574 Speex June 2009 Care must be taken when setting the value of ptime so that the RTP packet size does not exceed the path MTU. 5.7. Example with Complete Offer/Answer Exchange The offerer indicates that it wishes to receive Speex audio at 16000 Hz or, alternatively, Speex audio at 8000 Hz. The offerer does support ALL modes because no mode is specified. m=audio 8088 RTP/AVP 97 98 a=rtmap:97 speex/16000 a=rtmap:98 speex/8000 The answerer indicates that it wishes to receive Speex audio at 8000 Hz, which is the only sampling rate it supports. The answerer does support ALL modes because no mode is specified. m=audio 8088 RTP/AVP 99 a=rtmap:99 speex/8000 6. Implementation Guidelines Implementations that support Speex are responsible for correctly decoding incoming Speex frames. Each Speex frame does contain all needed information to decode itself. In particular, the 'mode' and 'ptime' values proposed in the SDP contents MUST NOT be used for decoding: those values are not needed to properly decode a RTP Speex stream. 7. Security Considerations RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550], and any appropriate RTP profile. This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed after compression so there is no conflict between the two operations. A potential denial-of-service threat exists for data encodings using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the stream that are complex to decode and cause the receiver to be overloaded. However, this encoding does not exhibit any significant non-uniformity. Herlein, et al. Standards Track [Page 12] RFC 5574 Speex June 2009 As with any IP-based protocol, in some circumstances, a receiver may be overloaded simply by the receipt of too many packets, either desired or undesired. Network-layer authentication may be used to discard packets from undesired sources, but the processing cost of the authentication itself may be too high. 8. Acknowledgments The authors would like to thank Equivalence Pty Ltd of Australia for their assistance in attempting to standardize the use of Speex in H.323 applications, and for implementing Speex in their open-source OpenH323 stack. The authors would also like to thank Brian C. Wiles <brian@streamcomm.com> of StreamComm for his assistance in developing the proposed standard for Speex use in H.323 applications. The authors would also like to thank the following members of the Speex and AVT communities for their input: Ross Finlayson, Federico Montesino Pouzols, Henning Schulzrinne, Magnus Westerlund, Colin Perkins, and Ivo Emanuel Goncalves. Thanks to former authors of this document; Simon Morlat, Roger Hardiman, and Phil Kerr. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006. 9.2. Informative References [CELP] Schroeder, M. and B. Atal, "Code-excited linear prediction(CELP): High-quality speech at very low bit rates", Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol 10, pp. 937- 940, 1985, <http://www.ntis.gov/>. [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and Registration Procedures", BCP 13, RFC 4288, December 2005. Herlein, et al. Standards Track [Page 13] RFC 5574 Speex June 2009 [SPEEX] Valin, J., "The Speex Codec Manual", <http://www.speex.org/docs/>. Authors' Addresses Greg Herlein Independent 2034 Filbert Street San Francisco, California 94123 United States EMail: gherlein@herlein.com Jean-Marc Valin Xiph.Org Foundation EMail: jean-marc.valin@usherbrooke.ca Alfred E. Heggestad Creytiv.com Biskop J. Nilssonsgt. 20a Oslo 0659 Norway EMail: aeh@db.org Aymeric Moizard Antisip 5 Place Benoit Crepu Lyon, 69005 France EMail: jack@atosc.org Herlein, et al. Standards Track [Page 14]