Internet Engineering Task Force Ari Lakaniemi, Nokia
Audio Video Transport WG Pasi Ojala, Nokia
INTERNET-DRAFT Johan Sj÷berg, Ericsson
February 23, 2001 Magnus Westerlund, Ericsson
Expires: August 23, 2001
RTP payload format for AMR-WB
<draft-lakaniemi-avt-amrwb-00.txt>
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/lid-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This document is an individual submission to the IETF. Comments
should be directed to the authors.
Abstract
This document specifies a real-time transport protocol (RTP) payload
format for Adaptive Multi-Rate Wideband (AMR-WB) speech encoded
signals. The AMR-WB payload format is designed to be able to
interoperate with existing AMR-WB transport formats. This document
also includes a MIME type registration for AMR-WB.
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 1]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
1. Introduction
The Adaptive Multi-Rate Wideband (AMR-WB) speech codec [1] was
originally developed by the Third Generation Partnership Project
(3GPP) to be used in GSM and 3G systems. I.e. the AMR-WB codec will
be widely used in cellular systems. The AMR-WB codec is developed to
preserve high speech quality under a wide range of transmission
conditions.
The AMR-WB codec is a multi-mode speech codec with 9 wideband speech
coding modes with bit-rates between 6.6 and 23.85 kbps. The sampling
frequency is 16000 Hz and processing is performed on 20 ms frames,
i.e. 320 speech samples per frame. The AMR-WB modes are closely
related to each other and employ the same coding framework. Mode
adaptation functionality is one valuable aspect of the AMR-WB
operation. In mobile radio systems (GSM) it allows the system to
adapt the balance between speech coding and error protection to
enable best possible speech quality in prevailing transmission
conditions. On the other hand, AMR-WB mode adaptation can be also
utilized to adapt to the varying available transmission bandwidth.
Basically the mode change can occur to any mode at any time.
The name and operational principles of the AMR-WB codec largely
resemble those of the Adaptive Multi-Rate (AMR-NB) codec [2,12].
However, these are two separate speech codecs, the principal
difference being that AMR-NB is so-called narrow band speech coding,
using 8000 Hz sampling frequency, compared to 16000 Hz of the AMR-WB.
The AMR-WB codec is designed with a voice activity detector (VAD) [6]
and generation of comfort noise (CN) parameters during silence
periods [5]. Hence, the AMR-WB codec can reduce the number of
transmitted bits and packets during silence periods to a minimum. The
operation to send silence descriptor (SID) frames containing CN
parameters at regular intervals non-speech periods is usually called
discontinuous transmission (DTX) or source controlled rate (SCR)
operation [4].
AMR-WB implementations must support all 9 speech coding modes. AMR-WB
mode switching can occur between any speech frames, and current mode
must be indicated by transmitting the mode information together with
the speech encoded bits. The objective of AMR-WB design has been to
enable highest possible speech quality under a variety of
transmission channel conditions. To realize the mode adaptation the
receiver needs to signal the AMR-WB mode it prefers to receive to the
transmitter.
Due to the flexibility and robustness of AMR-WB, it is suitable also
for other purposes than circuit switched cellular systems. Other
suitable applications are real-time services over packet switched
networks. The payload format should be designed for robustness
against both bit errors and packet loss. The speech encoded bits have
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 2]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
different perceptual sensitivity to bit errors and cellular systems
exploit this by using unequal error protection and detection (UEP and
UED).
The UED/UEP mechanism focus the correction and detection of corrupted
bits to the perceptually most sensitive bits. A speech frame is only
declared damaged if there are bit errors in the most sensitive bits,
i.e. class A bits. It is acceptable to have some bit errors in the
other bits, i.e. class B and C. Also a damaged frame is still useful
for error concealment in the decoding, which uses some of the less
sensitive bits of the damaged data. This improves the speech quality
compared to discarding the data.
Today there exist some link layers that do not discard packets with
bit errors, e.g. SLIP and some wireless links (with the Internet
traffic pattern shifting towards a more media-centric one, more link
layers of such nature may emerge in the future). With transport layer
support for partial checksums, for example those supported by UDP-
Lite [14], bit error tolerant AMR-WB traffic could achieve better
performance over these types of links.
There are at least two basic approaches for carrying AMR-WB traffic
over bit error tolerant networks:
1) Utilizing the a partial checksum to cover headers and the most
important AMR-WB speech bits of the payload. It is recommended
that at least all class A bits are covered by the checksum.
2) Utilizing the a partial checksum to only cover headers, but a
frame CRC to cover the class A bits of each AMR-WB frame in the
payload.
In either approach, at least part of the class B/C bits are left
without error-check and thus bit error tolerance is achieved.
It is still important that the network designer pays attention to the
class B and C residual bit error rate. Though less sensitive to error
than class A bits, class B and C bits are not insignificant and
undetected errors in these bits cause degradation in speech quality.
An example of residual error rates considered acceptable for AMR-WB
in UMTS can be found in [17].
Approach 1 is bit efficient, flexible and simple way, but comes with
two disadvantages, namely, a) bit errors in protected speech bits
will cause the payload to be discarded, and b) when transporting
multiple frames in a payload there is the possibility that a single
bit error in protected bits gets all the frames discarded.
These disadvantages can be avoided if needed, with some overhead in
the form of a frame-wise CRC (Approach 2). In problem a), the CRC
makes it possible to detect bit errors in class A bits and use the
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 3]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
frame for error concealment, which gives a small improvement in
speech quality. Secondly b), when transporting multiple frames in a
payload the CRCs remove the possibility that a single bit error in a
class A bit gets all the frames discarded. Avoiding that gives an
improvement in speech quality when transporting multiple frames and
subject to bit errors.
The choice between the two approaches must be made based on the
available bandwidth, and desired tolerance to bit errors. Neither
solution is appropriate to all cases.
To achieve better robustness against packet loss the payload supports
Forward Error Correction (FEC). The simple scheme of repetition of
previously sent data is one possibility. Another possible scheme,
which is more bandwidth efficient, is to use payload external FEC,
e.g. RFC 2733, which generates extra packets containing repair data.
The whole payload can also be sorted in sensitivity order to support
external FEC schemes using UEP. There is work in progress on a
generic version of such a scheme [15].
Yet another mechanism to enhance error robustness is the interleaving
of AMR-WB speech frames. Sometimes several frames can be encapsulated
into single RTP packet to decrease protocol overhead. One of the
drawbacks of such approach is that in case of packet loss this means
loss of several consecutive speech frames, which usually causes
clearly audible distortion in reconstructed speech. The interleaving
of frames can improve the speech quality in such cases by
distributing the consecutive losses into series of single frame
losses. However, interleaving and bundling several frames per payload
will also increase end-to-end delay and is therefore not applicable
to all usage scenarios. However, e.g. streaming applications are
likely to be able to exploit interleaving to improve speech quality
in lossy transmission conditions.
2. Requirements
The AMR-WB RTP payload format was designed to meet the following
requirements:
o Different levels of robustness must be supported, from no
redundant data to extreme robustness capable of handling very high
packet loss rates with no or small speech quality degradation.
o Fast, bandwidth efficient, frame-wise AMR-WB mode adaptation must
be supported. This means that it must be possible to send Codec
Mode Requests back from the receiving side to the transmitting
side with information on the preferred mode.
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 4]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
o Source controlled rate operation (SCR) (also called DTX) and
comfort noise parameter (CN) transmission defined in AMR-WB must
be supported.
3. Payload format
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [9].
The AMR-WB payload format supports transmission of multiple frames
per payload, the use of fast codec mode adaptation, and robustness
against packet losses and bit errors.
The AMR-WB payload format consists of one payload header, a table of
content, optionally one CRC per payload frame, and zero or more AMR-
WB payload frames. The payload format is made as bandwidth efficient
as possible by not using octet alignment for the payload header,
table of content or the payload frames. However, the full payload is
octet aligned. Therefore any unused bits in the last octet MUST be
padded with zeros.
If the option to transmit a robust sorted payload is enabled by the
receiver, the transmitted may choose to sort the bits in the payload
according to descending bit error sensitivity in order to enable
UEP/UED outside RTP (e.g. UDP-lite). The sensitivity order for AMR-WB
encoded speech bits for each mode is defined in Annex B of [3], the
original bit order being as delivered by the AMR-WB speech encoder
[1]. The AMR-WB frame types, or modes, are defined in [3].
Robustness against packet loss can be accomplished by using the
possibility to retransmit previously transmitted frames together with
the current (new) frame or frames. Another approach is using
interleaving to reduced the speech quality effect of packet losses.
Note that the usage of these options can be restricted by the MIME
parameters during the session set-up. The AMR-WB performance over
error tolerant links can be improved by delivering also the speech
frames that have been corrupted with bit errors. However, UEP/UED
MUST be used in such a way that the bit errors are allowed only in
the least error sensitive bits. Bit errors in class A bits MUST NOT
be allowed in any circumstances. This payload format provides two
alternative methods to implement UED:
A. CRC calculation over the class A speech bits
If several consecutive speech frames are encapsulated into each
payload, the optional CRC may be used to protect the class A speech
bits of each frame, see table 1. The number of class A bits is
specified as informative in [3] and therefore copied into table 1
as normative for this payload format. Speech frames with errors in
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 5]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
class A bits MUST be marked with SPEECH_BAD for corrupted speech
frames (FT=0..8) or SID_BAD for corrupted SID frames (FT=9), and be
sent to the speech decoder to assist error concealment, see [7]. In
this case the RTP header, payload header, and table of content
should be covered by a transport layer CRC, e.g. UDP-lite. A packet
MUST be discarded if the transport layer CRC detects errors in
these bits.
B. Robust sorting of payload bits
Robust behavior can also be accomplished by robust sorting of the
payload. This enables the use of UED (e.g. UDP-lite) and UEP (e.g.
ULP [15]). Note that payloads containing a single frame are sorted
in the same robust way regardless of the use of simple or robust
sorting. The UED and/or UEP is recommended to cover at least the
RTP header, payload header, table of content and all class A bits
from all frames in the payload.
Support for unequal error detection is OPTIONAL. If either scheme is
to be used, it MUST be signaled out of band (see section 8).
Class A total speech
Index Mode bits bits
----------------------------------------
0 AMR-WB 6.6 54 78
1 AMR-WB 8.85 64 113
2 AMR-WB 12.65 72 181
3 AMR-WB 14.25 72 213
4 AMR-WB 15.85 72 245
5 AMR-WB 18.25 72 293
6 AMR-WB 19.85 72 325
7 AMR-WB 23.05 72 389
8 AMR-WB 23.85 72 405
9 AMR-WB SID 40 40
Table 1. Specification of the number of class A bits for AMR-WB.
The speech quality in channel error conditions can be improved by
delivering also the frames corrupted e.g. in transmission over a
radio link to the receiver. Despite the bit-errors, providing damaged
frames to the error concealment unit can improve the speech quality
compared to case where corrupted frames are dropped. However, to
accomplish this, a frame quality indicator is needed to mark the
corrupted frames for the decoder. In many communication scenarios the
AMR-WB frames will be transmitted from one IP/UDP/RTP terminal to a
terminal in a system with another transport format and/or vice versa.
The transport format transcoding will be done in a gateway. A second
likely scenario is that IP/UDP/RTP is used as transport between other
systems, i.e. IP is originated and terminated in gateways on both
sides of the IP transport.
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 6]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
AMR-WB over +------+ +----------+
3G Iu or | | IP/UDP/RTP/AMR-WB | |
-------------->| GW |----------------------->| TERMINAL |
GSM Abis | | | |
etc. +------+ +----------+
Figure 1: GW to VoIP terminal scenario.
AMR-WB over +------+ +------+ AMR-WB over
3G Iu or | | IP/UDP/RTP/AMR-WB | | 3G Iu or
-------------->| GW |-------------------->| GW |--------------->
GSM Abis | | | | GSM Abis
etc. +------+ +------+ etc.
Figure 2. GW to GW scenario.
The speech quality in case of packet losses when transmitting several
AMR-WB frames per packet can be improved by using OPTIONAL frame
interleaving. The interleaving improves perceived speech quality
since it introduces single frame errors instead of several
consecutive frame errors. Note that interleaving can be applied only
if the receiver has signaled support for it in capability
description.
3.1. The payload header
The length of the payload header is either 7 or 15 bits, depending on
whether the interleaving is used or not. Figures 3a and 3b illustrate
the header structure. Header bits are specified in following two
subclauses.
3.1.1. Required fields of the payload header
S (1 bit): Indicates, if set, that the bits in the payload is robust
sorted. If not set, simple payload sorting is employed. Note that
this bit can be set only if the receiver has signaled support for the
OPTIONAL robust payload sorting.
C (1 bit): Indicates the existence of OPTIONAL CRC fields in the
payload table of content. Note that this bit can be set only if the
receiver has signaled support for the OPTIONAL CRC.
I (1 bit): Indicates, if set, that frames in this payload are
interleaved, and that ILL and ILP fields are present in the payload
header. If not set, frames in this payload are successive frames and
ILL and ILP fields are not present in the payload header. Note that
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 7]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
this bit can be set only if the receiver has signaled support for
interleaving.
CMR (4 bits): Indicates Codec Mode Requested for the other
communication direction. It is only allowed to request one of the
AMR-WB speech modes (frame type index 0...8, see Table 1a in [3]).
CMR value 15 indicates that no mode request is present, other values
are for future use.
3.1.2. Optional fields of the payload header
ILL (4 bits): OPTIONAL field that is present only if I=1. The value
of this field specifies the interleaving length used for frames in
this payload.
ILP (4 bits): OPTIONAL field that is present only if I=1. The value
of this field indicates the interleaving index for frames in this
payload. The value of ILP MUST be smaller than or equal to the value
of ILL. Erroneous value of ILP SHOULD cause the payload to be
discarded.
The value of the ILL field defines the length of an interleave group:
ILL=L implies that frames in (L+1)-frame intervals are picked into
the same interleaved payload, and the interleave group consists of
L+1 payloads. The value of ILP=p in payloads belonging to the same
group runs from 0 to L. The interleaving is meaningful only when
number of frames per payload N is greater than or equal to 2. Thus,
when N frames are transmitted in each payload of a group, the
interleave group consists of payloads with sequence numbers s...s+L,
and frames encapsulated into these payloads are f...f+N*(L+1)-1.
To put this in a form of an equation, let's assume that the first
frame of an interleave group is n, the first payload of the group is
s, number of frames per payload is N, ILL=L and ILP=p (p in range
0...L), the frames contained by the payload s+p are n + p + k*(L+1),
where k runs from 0 to N-1. I.e.
The first packet of an interleave group: ILL=L, ILP=0
Payload: s
Frames: n, n+(L+1), n+2*(L+1), ..., n+(N-1)*(L+1)
The second packet of an interleave group: ILL=L, ILP=1
Payload: s+1
Frames: n+1, n+1+(L+1), n+1+2*(L+1), ..., n+(N-1)*(L+1)
...
The last packet of an interleave group: ILL=L, ILP=L
Payload: s+L
Frames: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+(N-1)*(L+1)
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 8]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
Interleaved frames MUST be stored in the payload in timestamp-
increasing order. Furthermore, the interleaved payloads within an
interleave group MUST be sent according to increasing order of ILP
field, and each payload of an interleave group MUST contain equal
number of frames. It is RECOMMENDED that ILL remains constant
throughout the session. If ILL is to be changed, the change SHOULD be
done between interleaving groups, i.e. the ILP of the previous packet
was L. Furthermore, because of the inter-frame dependent nature of
AMR-WB coding, it is RECOMMENDED that ILL values greater than or
equal to 2 are used to enable better error recovery in the decoder in
case of lost interleaved payload. Note also that using value ILL=0 or
using interleaving for payload carrying only one frame is not
meaningful.
0
0 1 2 3 4 5 6
+-+-+-+-+-+-+-+
|S|C|I| CMR |
+-+-+-+-+-+-+-+
Figure 3a: AMR-WB payload header, I=0.
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|C|I| CMR | ILL | ILP |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3b: AMR-WB payload header, I=1.
3.2. The payload table of content and CRCs
The table of content (ToC) consists of one table of content entry for
each speech frame in the payload. A table of content entry includes
several specified fields as follows:
F (1 bit): Indicates if this frame is followed by further frames in
this payload. F=1 further frames follow, F=0 last frame.
FT (4 bits): Frame type indicator, indicating the AMR-WB speech
coding mode or comfort noise (CN) mode. The mapping of AMR-WB modes
to FT is given in Table 1a in [3]. If FT=14 (lost frame) or FT=15 (no
transmission/no reception), no CRC or payload frame is present.
Q (1 bit): The frame quality bit indicates, if not set, that the
payload is corrupted and the receiver should set the RX_TYPE (see
[4]) to SPEECH_BAD or SID_BAD depending on the frame type (FT).
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 9]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
0
0 1 2 3 4 5
+-+-+-+-+-+-+
|F| FT |Q|
+-+-+-+-+-+-+
Figure 4: Table of content (ToC) entry field.
CRC (8 bits): OPTIONAL field, exists if the payload header bit C is
set (C=1). The 8 bit CRC is used for error detection. These 8 parity
bits are generated according to section 4.1.4 in [3].
0
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
| CRC |
+-+-+-+-+-+-+-+-+
Figure 5: CRC field.
The ToC and CRCs are arranged with all table of content entries
fields first followed by all CRC fields. The ToC starts with the
frame data belonging to the oldest speech frame in the payload.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| FT |Q|F| FT |Q|F| FT |Q| CRC | CRC |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | CRC |
+-+-+-+-+-+-+-+-+-+-+
Figure 6: The ToC and CRCs for a payload with three speech frames.
3.3. AMR-WB speech frame
An AMR-WB speech frame represents one encoded speech frame encoded
using the mode according to the FT field in ToC entry corresponding
to this frame. The length of this field is implicitly defined by the
AMR-WB mode in the FT field. The AMR-WB speech bits SHALL be sorted
according to Appendix B of [3].
3.4. Compound AMR-WB payload
The compound AMR-WB payload consists of one AMR-WB payload header,
the table of content, and one or more AMR-WB payload frames, see
section 3.1., 3.2 and 3.3. These can be combined either by using
robust or simple payload sorting. The S-bit in the AMR-WB payload
header indicates which method is used.
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 10]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
Definitions for describing the compound AMR-WB payload:
b(m) - bit m of the compound AMR-WB payload
t(n,m) - bit m in the table of content entry for speech frame n
p(n,m) - bit m in the CRC for speech frame n
f(n,m) - bit m in speech frame n
F(n) - number of bits in speech frame n, defined by FT
h(m) - bit m of payload header
H - number of bits in payload header, 7 or 15 bits
C - number of CRC bits , 0 or 8 bits
N - number of payload frames in the payload
S - number of unused bits in the last octet of the payload
Payload frames f(n,m) are ordered in the order they are delivered by
the AMR-WB speech encoder, i.e. frame n is preceding frame n+1. All
frames between the oldest one and the most recent one MUST be present
in the payload, the only exception is interleaving, when the frame
order are defined in section 3.1.2. If some of the frames are not
available because of a frame loss or they are not transmitted, e.g.
due to DTX, those MUST be replaced by lost speech or by no
transmission/no reception type frames, respectively.
3.4.1. Robust payload sorting
As described earlier, a bit error in a more sensitive bit is
subjectively more annoying than in a less sensitive bit. Therefore,
to enable protection of only the most sensitive bits of a payload
with a forward error detection code, e.g. a CRC outside RTP, the bits
inside a payload can be ordered into sensitivity order. The
protection SHOULD cover an appropriate number of octets from the
beginning of the payload, covering at least the AMR-WB payload
header, ToC, and class A bits (see Table 1). Exactly how many octets
that needs protection depends on the network and application. To
maintain sensitivity ordering inside the AMR-WB payload, when more
than one speech frame is transmitted in one payload, reordering of
the bits in the payload is needed.
The AMR-WB payload header, ToC and CRCs SHALL still be placed
unchanged in the beginning of the robust sorted payload. Thereafter,
the payload frames are sorted with one bit alternating from each AMR-
WB payload frame.
The robust payload sorting algorithm is defined in C-style as:
/* payload header */
k=0;
for (i = 0; i < H; i++){
b(k++) = h(i);
}
/* table of content */
for (j = 0; j < N; j++){
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 11]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
for (i = 0; i < 6; i++){
b(k++) = t(j,i);
}
}
/* CRCs */
for (j = 0; j < N; j++){
for (i = 0; i < C; i++){
b(k++) = p(j,i);
}
}
/* payload frames */
max = max(F(0),..,F(N-1));
for (i = 0; i < max; i++){
for (j = 0; j < N; j++){
if (i < F(j)){
b(k++) = f(j,i);
}
}
}
/* padding */
S = 8 - k%8;
if (S < 8){
for (i = 0; i < S; i++){
b(k++) = 0;
}
}
3.4.2. Simple payload sorting
If multiple frames are encapsulated into the payload and robust
payload sorting is not used, the payload is formed as concatenation
of the AMR-WB payload header, ToC, possibly optional CRC fields, and
the AMR-WB speech frames. However, the bits inside each AMR-WB
payload frame are ordered into sensitivity order as defined in Annex
B of [3].
The simple payload sorting algorithm is defined in C-style as:
/* payload header */
k=0;
for (i = 0; i < H; i++){
b(k++) = h(i);
}
/* table of content */
for (j = 0; j < N; j++){
for (i = 0; i < 6; i++){
b(k++) = t(j,i);
}
}
/* CRCs */
for (j = 0; j < N; j++){
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 12]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
for (i = 0; i < C; i++){
b(k++) = p(j,i);
}
}
/* payload frames */
for (j = 0; j < N; j++){
for (i = 0; i < F(j); i++){
b(k++) = f(j,i);
}
}
}
/* padding */
S = 8 - k%8;
if (S < 8){
for (i = 0; i < S; i++){
b(k++) = 0;
}
}
3.5. Decoding security consideration
If the payload length calculation based on C, I, F and FT fields does
not indicate the same length as the actually received payload size,
the payload should be dropped as erroneous. Decoding AMR-WB frames
that are parsed based on erroneous header information could severely
degrade the speech quality.
4. RTP header usage
The RTP header marker bit (M) is used to mark (M=1) the payloads
containing the first speech frame after a CN period. For all other
payloads the marker bit is set to 0 (M=0).
The timestamp corresponds to the sampling time of the first sample of
the first encoded AMR-WB frame in the payload. A frame can either be
encoded speech, comfort noise parameters, LOST_FRAME, or
NO_TRANSMISSION. The unit used to compute timestamp is one sample.
The duration of one AMR-WB speech frame is 20 ms and the sampling
frequency is 16 kHz, corresponding to 320 speech samples per frame.
Thus, the timestamp is increased by 320 for each consecutive frame.
If the optional interleaving functionality is not used, all frames in
a packet MUST be successive frames, stored in the same order as
delivered by the AMR-WB speech encoder. If the interleaving is
employed, the frames encapsulated into a payload MUST be picked as
defined in section 3.1.2.
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 13]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
5. Congestion Control
The need of congestion control for data transported with RTP has to
be considered. AMR-WB speech data have some elastic properties due to
the different bandwidth demand for each mode. Another parameter that
can reduce the bandwidth demand for AMR-WB are how many frames of
speech data that are encapsulated in each payload. This will reduce
the number of packets and the overhead from IP/UDP/RTP headers. If
using forward error correction (FEC) there is also the need to
regulate the amount, so that the FEC itself does not worsen the
problem. Therefore, it is RECOMMENDED that applications using this
payload implements congestion control. The actual mechanism for
congestion control is not specified but should be suitable for real-
time flows, e.g. [16].
6. Security Considerations
RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP
specification [10]. This implies that confidentiality of the media
streams is achieved by encryption. Because the payload format is
arranged end-to-end, encryption MAY be performed after encapsulation
so there is no conflict between the two operations.
This payload type does not exhibit any significant non-uniformity in
the receiver side computational complexity for packet processing to
cause a potential denial-of-service threat.
As this format transports encoded speech data, the main security
issues are confidentiality and authentication of the speech itself.
Some other smaller issues also exist. The payload format itself does
not have any support for security. These issues have to be solved by
a payload external mechanism.
6.1. Confidentiality
To achieve confidentiality of the encoded speech all speech data bits
must be encrypted. There is less need to encrypt the payload header
or the frame header as they only carry information about the
requested AMR-WB mode, AMR-WB frame type, and frame quality. This
information could be useful to some third party, e.g. quality
monitoring. The type of encryption used can not only have impact on
the confidentiality but also on error robustness. The robustness
against bit errors will be non, unless an encryption method without
error-propagation is used, e.g. a stream cipher. This is only an
issue when using UEP/UED, when bit errors can be accepted in some
part of the payload.
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 14]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
6.2. Authentication
To authenticate the sender of the speech an external mechanism have
to be added. It is recommended that such a mechanism protects all the
speech data bits. Note that the use of UED/UEP is difficult to
combine with authentication. To prevent a man in the middle to tamper
with the packetization of the speech data, some extra data could be
protected. The data is: RTP timestamp, RTP sequence number, RTP
marker bit. Tampering could result in erroneous
decapsulation/decoding that could lower speech quality. Tampering
with the AMR-WB mode request field can result in that the sender
receives speech in a different quality than desired.
7. Examples
7.1. Simple example
In the simple example one AMR-WB frame is encapsulated into the
payload. Simple payload sorting is used (S=0), no CRC fields are
present (C=0), and interleaving is not used (I=0). A 23.05 kbps mode
is requested for the reverse link (CMR=7), and the payload was not
damaged at IP origin (Q=1). The AMR-WB mode is the 12.65 kbps mode
(FT=2). The speech encoded bits are put into f(0...252) in descending
sensitivity order according to [3].
| Bit no. |
Oct| 0 1 2 3 4 5 6 7 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
0 | S=0 | C=0 | I=0 | 0 | 1 | 1 | 1 | F=0 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
1 | 0 | 0 | 1 | 0 | Q=1 | f(0) | f(1) | ... |
---+-------+-------+-------+-------+-------+-------+-------+-------+
32 | ... | ... | ... | ... | ... | ... |f1(249)|f1(250)|
---+-------+-------+-------+-------+-------+-------+-------+-------+
33 | f(251)| f(252)| 0 | 0 | 0 | 0 | 0 | 0 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
Figure 7: One AMR-WB frame per payload example.
7.2. Example with CRCs
In this example two frames are transmitted in one payload. Simple
payload sorting is used (S=0), CRC fields are present (C=1), and
interleaving is not used (I=0). No mode request is sent (CMR=15), and
neither of the frames is corrupted (Q=1). The payload contains one
frame at 14.25 kbps mode (FT=3) and one frame at 15.85 kbps mode
(FT=4). Bits p1(0...7) and p2(0...7) mark the CRC checksum for the
first and second frames, respectively. The bits of the first frame
are denoted by f1(0...284), and bits of the second frame are marked
by f2(0...316).
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 15]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
| Bit no. |
Oct| 0 1 2 3 4 5 6 7 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
0 | S=0 | C=1 | I=0 | 1 | 1 | 1 | 1 | F=1 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
1 | 0 | 0 | 1 | 1 | Q=1 | F=0 | 0 | 1 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
2 | 0 | 0 | Q=1 | p1(0) | p1(1) | p1(2) | p1(3) | p1(4) |
---+-------+-------+-------+-------+-------+-------+-------+-------+
3 | p1(5) | p1(6) | p1(7) | p2(0) | p2(1) | p2(2) | p2(3) | p2(4) |
---+-------+-------+-------+-------+-------+-------+-------+-------+
4 | p2(5) | p2(6) | p2(7) | f1(0) | f1(1) | ... | ... | ... |
---+-------+-------+-------+-------+-------+-------+-------+-------+
40 | ... | ... | ... | ... | ... | ... |f1(283)|f1(284)|
---+-------+-------+-------+-------+-------+-------+-------+-------+
41 | f2(0) | f2(1) | ... | ... | ... | ... | ... | ... |
---+-------+-------+-------+-------+-------+-------+-------+-------+
80 | ... | ... | ... |f2(315)|f2(316)| 0 | 0 | 0 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
Figure 8: Example with two AMR-WB frames and CRCs.
7.3. Example with multiple frames per payload and robust sorting
In this example two frames are transmitted in one payload with robust
sorting (S=1). No CRC is used (C=0), interleaving is not used (I=0),
and 8.85 kbps mode frame is requested from the reverse link (CMR=1).
Both frames are undamaged (Q=1), and the two frames in the payload
are encoded at 14.25 kbps (FT=3) and 15.85 kbps (FT=4) modes. The
first frame is represented by f1(0...284) and the subsequent frame by
f2(0...316).
| Bit no. |
Oct| 0 1 2 3 4 5 6 7 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
0 | S=1 | C=0 | I=0 | 0 | 0 | 0 | 1 | F=1 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
1 | 0 | 0 | 1 | 1 | Q=1 | F=0 | 0 | 1 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
2 | 0 | 0 | Q=1 | f1(0) | f2(0) | f1(1) | f2(1) | ... |
---+-------+-------+-------+-------+-------+-------+-------+-------+
74 | ... |f1(283)|f2(283)|f1(284)|f2(284)|f2(285)|f2(286)| ... |
---+-------+-------+-------+-------+-------+-------+-------+-------+
78 | ... | ... | ... |f2(316)|f2(317)| 0 | 0 | 0 |
---+-------+-------+-------+-------+-------+-------+-------+-------+
Figure 9: Example with two AMR-WB frames per payload and robust
sorting.
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 16]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
8. The AMR-WB MIME type registration
This chapter defines the MIME type for the Adaptive Multi-Rate
Wideband (AMR-WB) speech codec. AMR-WB implementations according to
[1] MUST support all nine coding modes. The fast mode adaptation is
supported by transmitting the mode information in-band together with
encoded speech data to allow mode change without any additional
signaling. Furthermore, fast mode adaptation requires transmission of
codec mode request inside payload.
In addition to the speech codec, AMR-WB specifications also include
Discontinuous Transmission / comfort noise (DTX/CN) functionality
[4]. The DTX/CN switches the transmission off during silent periods
of the speech and only SID frames containing CN parameter updates are
sent at regular intervals. Also the AMR-WB DTX/CN MUST be supported.
It is possible that the receiver may only want to receive a certain
AMR-WB mode or a subset of AMR-WB modes, due to link limitations in
some cellular systems, e.g. the GSM/GERAN radio link can require that
only a subset of AMR-WB modes is used. Therefore, it is possible to
request a specific set of AMR-WB modes in capability description and
the encoder MUST abide this request. If the request for mode set is
not given, any mode may be used or requested.
The AMR-WB codec can in principle perform a mode change at any time
between any two modes. To support interoperability with GSM through a
gateway it is possible to set limitations for mode changes. The
decoder has possibility to define the minimum number of frames
between mode changes and to limit the mode change to happen into
neighboring modes only.
The receiver can limit the number of AMR-WB frames encapsulated into
one RTP packet, and if maximum number of frames per packet is given
in capability description, the transmitter MUST comply with this
limitation. This is an OPTIONAL feature and if no parameter is given
in capability description, the transmitter can encapsulate any number
of AMR-WB speech frames into one RTP packet.
The payload CRC UED MUST only be used if the receiver has signaled
support for this functionality in the capability description.
To enable unequal error protection and/or detection outside RTP, the
payload format supports robust payload sorting. The robust payload
sorting is an optional feature and MUST only be used if the receiver
has signaled support for this functionality in the capability
description.
The speech quality in case of packet losses when transmitting several
AMR-WB frames per packet can be improved by using OPTIONAL frame
interleaving. The interleaving improves perceived speech quality
since it introduces series of single frame errors instead of several
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 17]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
consecutive frame errors. Interleaving MUST only be applied if the
receiver has signaled support for it, and if used, the interleaving
length MUST NOT exceed the limitation given in capability
description. Note that the receiver can use the MIME parameters to
limit increased buffering requirements caused by the interleaving.
For example specifying maxframes=N and interleaving=L, the maximum
size of an interleave group would be N*(L+1) (see section 3.1.2 for
details on interleaving).
8.1. MIME Registration
MIME-name for the AMR-WB codec is allocated from IETF tree since AMR-
WB is expected to be widely used speech codec in VoIP applications.
Media Type name: audio
Media subtype name: AMR-WB
Required parameters: none
Optional parameters:
mode-set: Requested AMR-WB mode set. Restricts the active codec
mode set to a subset of all modes. Possible values are
comma separated list of modes: 0,...,8 (see Table 1a [3],
an example is given in section 8.4). If not present, all
speech modes are available.
mode-change-period: Defines a number N which restricts the mode
changes in such a way that mode changes are only allowed
on multiples of N, initial state of the phase is
arbitrary. If this parameter is not present, mode change
can happen at any time.
mode-change-neighbor: If present, mode changes SHALL only be made to
neighboring modes in the active codec mode set. If not
present, change between any two modes in the active codec
mode set is allowed.
maxframes:Maximum number of AMR-WB speech frames in one RTP packet.
The receiver may set this parameter in order to limit the
buffering requirements or delay.
crc: If present, transmission of CRCs in the payload is
supported, otherwise not supported.
robust-sorting: If present, robust payload sorting is supported,
otherwise not supported and simple payload sorting SHALL
be used.
interleaving: Indicates that the frame interleaving is supported and
defines a maximum value for interleaving length field ILL
(see section 3.1.2). If this parameter is not present,
the interleaving is not supported.
Encoding considerations: See section 3 in this document.
Security considerations: see chapter 6 "Security Consideration".
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 18]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
Public specification: please refer to chapter 9 "References".
Person & email address to contact for further information:
ari.lakaniemi@nokia.com
pasi.s.ojala@nokia.com
Intended usage: COMMON. It is expected that many VoIP applications
(as well as mobile applications) will use this type.
Author/Change controller:
ari.lakaniemi@nokia.com
pasi.s.ojala@nokia.com
8.2. Mapping to SDP Parameters
Parameters are mapped to SDP [11] as usual.
Example usage in SDP:
m=audio 49120 RTP/AVP 97
a=rtpmap:97 AMR-WB/16000
a=fmtp:97 mode-set=2,3,4,5,6; maxframes=1
9. References
[1] 3GPP TS 26.190 "AMR Wideband speech codec; Transcoding
functions".
[2] 3GPP TS 26.090 "AMR speech codec; Transcoding functions".
[3] 3GPP TS 26.201 "AMR Wideband speech codec; Frame Structure".
[4] 3GPP TS 26.193 "AMR Wideband Speech Codec; Source Controlled
Rate operation".
[5] 3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise
aspects".
[6] 3GPP TS 26.194 "AMR Wideband speech codec; Voice Activity
Detector (VAD)".
[7] 3GPP TS 26.191 "AMR Wideband speech codec; Error concealment of
lost frames".
[8] 3GPP TS 25.415 "UTRAN Iu Interface User Plane Protocols".
[9] IETF RFC 2119, "Key words for use in RFCs to Indicate
Requirement Levels".
[10]IETF RFC 1889, "RTP: A Transport Protocol for Real-Time
Applications".
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 19]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
[11]IETF RFC 2327 "SDP: Session Description Protocol", April 1998.
[12]IETF draft-ietf-avt-rtp-amr-03.txt, "RTP payload format for
AMR", work in progress.
[13]IETF draft-westberg-realtime-cellular-01.txt, "Realtime Traffic
over Cellular Access Networks", work in progress.
[14]IETF draft-larzon-udplite-03.txt, "The UDP Lite Protocol", work
in progress.
[15]IETF draft-ietf-avt-ulp-00.txt, " An RTP Payload Format for
Generic FEC with Uneven Level Protection", work in progress.
[16]S. Floyd, M. Handley, J. Padhye, J. Widmer, "Equation-Based
Congestion Control for Unicast Applications", ACM SIGCOMM 2000,
Stockholm, Sweden.
[17] 3GPP TS 26.202 "AMR Wideband speech codec; Interface to Iu and
Uu".
10. Authors' addresses
Ari Lakaniemi
Nokia Research Center
P.O.Box 407
FIN-00045 Nokia Group
Finland
E-mail: ari.lakaniemi@nokia.com
Pasi Ojala
Nokia Research Center
P.O.Box 100
FIN-33721 Tampere
Finland
E-mail: pasi.s.ojala@nokia.com
Johan Sj÷berg
Ericsson Research
Ericsson Radio System AB
Torshamsgatan 23
SE-164 80 Stockholm
SWEDEN
E-mail: johan.sjoberg@ericsson.com
Magnus Westerlund
Ericsson Research
Ericsson Radio System AB
Torshamsgatan 23
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 20]
INTERNET-DRAFT RTP Payload Format for AMR-WB February 23, 2001
SE-164 80 Stockholm
SWEDEN
E-mail: magnus.westerlund@ericsson.com
This Internet-Draft expires in August 23, 2001.
Lakaniemi/Ojala/Sjoberg/Westerlund [Page 21]