Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.
RTP Payload Format for G.711.0
draft-ietf-payload-g7110-03

Versions:
The information below is for an old version of the document.
Document	Type	This is an older version of an Internet-Draft that was ultimately published as RFC 7655.
	Authors	Michael A. Ramalho , Paul Jones , Noboru Harada , Muthu Arul Mozhi Perumal , Miao Lei
	Last updated	2014-12-04 (Latest revision 2014-08-22)
	RFC stream	Internet Engineering Task Force (IETF)
	Formats	txt htmlized pdf bibtex bibxml
	Reviews	OPSDIR Last Call review by David Black Ready GENART Last Call review by David Black On the Right Track SECDIR Last Call review by Steve Hanna Has issues
	Additional resources	Mailing list discussion
Stream	WG state	Submitted to IESG for Publication
	Document shepherd	Roni Even
	Shepherd write-up	Show Last changed 2014-04-01
IESG	IESG state	Became RFC 7655 (Proposed Standard)
	Consensus boilerplate	Yes
	Telechat date	(None) Needs a YES. Needs 10 more YES or NO OBJECTION positions to pass.
	Responsible AD	Richard Barnes
	Send notices to	payload-chairs@tools.ietf.org, draft-ietf-payload-g7110@tools.ietf.org
IANA	IANA review state	IANA OK - Actions Needed
Email authors Email WG IPR References Referenced by Nits Search email archive
draft-ietf-payload-g7110-03
The following Figure illustrates the one or more G.711.0 frames per
   RTP payload case where the number of G.711.0 frames placed in the RTP
   payload is N.  We note that when N is equal to 1 that this case is
   identical to the previous example.

              One or More G.711.0 Frames in RTP Payload Case

       |----------|---------|----------|---------|----------------|
       | First    | Second  |          | Nth     | Zero or more   |
       | G.711.0  | G.711.0 |   ...    | G.711.0 |     0x00       |
       | Frame    | Frame   |          | Frame   | Padding Octets |
       |__________|_________|__________|_________|________________|

                                 Figure 3

   We note here that when we have multiple G.711.0 frames that the
   individual frames can be, and generally are, of different lengths.
   The decoding process in the following section is used to determine
   the frame boundaries.

   Encoding Process: One or more G.711.0 frames are placed in the RTP
   payload simply by concatenating the G.711.0 frames together.  The
   amount of time represented by the G.711 symbols compressed in all the
   G.711.0 frames in the RTP payload MUST correspond to the ptime
   signaled for applications using SDP.  Although not generally desired,
   padding in the RTP payload SHOULD be placed after the last G.711.0
   frame in the payload and MAY be created by placing one or more 0x00
   octets after the last G.711.0 frame.  Such padding may be desired
   based on security considerations (see Section 10).

   Decoding Process: As G.711.0 frames can be of varying length, the
   payload decoding process described in the following section is used
   to determine where the individual G.711.0 frame boundaries are.  Any
   padding octets inserted before or after any G.711.0 frame in the RTP
   payload is silently (and safely) ignored by the G.711.0 decoding
   process.

4.2.3.  G.711.0 RTP Payload Decoding Process

   The G.711.0 decoding process is a standard part of G.711.0 bit stream
   decoding and is implemented in the ITU-T Rec. G.711.0 reference code.
   The decoding process algorithm described in this section is a slight
   enhancement of the ITU-T reference code to explicitly accommodate RTP
   padding (as described above).

Ramalho, et al.         Expires February 23, 2015              [Page 12]
Internet-Draft           G.711.0 Payload Format              August 2014

   Before describing the decoding, we note here that the largest
   possible G.711.0 frame is created whenever the largest number of
   G.711 symbols is encoded (320 from Section 3.2, property A5) and
   these 320 symbols are "uncompressible" by the G.711.0 encoder.  In
   this case (via property A6 in Section 3.2) the G.711.0 output frame
   will be 321 octets long.  We also note that the value 0x00 chosen for
   the optional padding cannot be the first octet of a valid ITU-T Rec.
   G.711.0 frame (see [G.711.0]).  We also note that whenever more than
   one G.711.0 frame is contained in the RTP payload, the decoding of
   the individual G.711.0 frames will occur multiple times.

   For the decoding algorithm below, let N be the number of octets in
   the RTP payload (i.e., excluding any RTP padding, but including any
   RTP payload padding), let P equal the number of RTP payload octets
   processed by the G.711.0 decoding process, let K be the number of
   G.711 symbols presently in the output buffer, let Q be the number of
   octets contained in the G.711.0 frame being processed and let "!="
   represent not equal to.  The keyword "STOP" is used below to indicate
   the end of the processing of G.711.0 frames in the RTP payload.  The
   algorithm below assumes an output buffer for the decoded G.711 source
   symbols of length sufficient to accommodate the expected number of
   G.711 symbols and an input buffer of length 321 octets.

   G.711.0 RTP Payload Decoding Heuristic:

   H1  Initialization of counters: Initialize P, the number of processed
         octets counter, to zero.  Initialize K, the counter for how
         many G.711 symbols are in the output buffer, to zero.
         Initialize N to the number of octets in the RTP payload
         (including any RTP payload padding).  Go to H2.

   H2  Read internal buffer: Read min{320+1, (N-P)-1} octets into the
         internal buffer from the (P+1) octet of the RTP payload.  We
         note at this point, N-P octets have yet to be processed and
         that 320+1 octets is the largest possible G.711.0 frame.  Also
         note that in the common case of zero-based array indexing of a
         uint8 array of octets, that this operation will read octets
         from index P through index [min{320+1, (N-P)}] from the RTP
         payload.  Go to H3.

   H3  Analyze the first octet in the internal buffer: If this octet
         0x00 (a padding octet) go to H4, otherwise go to H5 (process a
         G.711.0 frame).

   H4  Process padding octet (no G.711 symbols generated): Increment the
         processed packets counter by one (set P = P + 1).  If the
         result of this increment results in P >= N then STOP (as all
         RTP Payload octets have been processed), otherwise go to H2.

Ramalho, et al.         Expires February 23, 2015              [Page 13]
Internet-Draft           G.711.0 Payload Format              August 2014

   H5  Process an individual G.711.0 frame (produce G.711 samples in the
         output frame): Pass the internal buffer to the G.711.0 decoder.
         The G.711.0 decoder will read the first octet (called the
         "prefix code" octet in ITU-T Rec. G.711.0 [G.711.0]) to
         determine the number of source G.711 samples M are contained in
         this G.711.0 frame.  The G.711.0 decoder will produce exactly M
         G.711 source symbols.  If K = 0, these M symbols will be the
         first in the output buffer and are placed at the beginning of
         the output buffer.  If K != 0, concatenate these M symbols with
         the prior symbols in the output buffer (there are K prior
         symbols in the buffer).  Set K = K + M (as there are now this
         many G.711 source symbols in the output buffer).  The G.711.0
         decoder will have consumed some number of octets, Q, in the
         internal buffer to produce the M G.711 symbols.  Increment the
         number of payload octet processed counter by this quantity (set
         P = P + Q).  If the result of this increment results in P >= N
         then STOP (as all RTP Payload octets have been processed),
         otherwise go to H2.

   At this point, the output buffer will contain precisely K G.711
   source symbols which should correspond to the ptime signaled if SDP
   was used and the encoding process was without error.

   We also note, as an aside, that the algorithm above (and the ITU-T
   G.711.0 reference code) accommodates padding octets (0x00) placed
   anywhere between G.711.0 frames in the RTP payload as well as prior
   to or after any or all G.711.0 frames.  The ITU-T G.711.0 reference
   code does not have Step H3 and H4 as separate steps (i.e., Step H5
   immediately follows H2) at the added computational cost of some
   additional buffer passing to/from the G.711.0 frame decoder
   functions.  That is the G.711.0 decoder in the reference code
   "silently ignores" 0x00 padding octets at the beginning of what it
   believes to be a G.711.0 encoded frame boundary.  Thus Step H3 and
   Step H4 above are an optimization over the reference code shown for
   clarity.

   If the decoder is at a playout endpoint location, this G.711 buffer
   SHOULD be used in the same manner as a received G.711 RTP payload
   would have been used (passed to a playout buffer, to a PLC
   implementation, etc.).

4.2.4.  G.711.0 RTP Payload for Multiple Channels

   In this section we describe the use of multiple "channels" of G.711
   data encoded by G.711.0 compression.

   The dominant use of G.711 in RTP transport has been for single
   channel use cases.  For this case, the above G.711.0 encoding and

Ramalho, et al.         Expires February 23, 2015              [Page 14]
Internet-Draft           G.711.0 Payload Format              August 2014

   decoding process is used.  However, the multiple channel case for
   G.711.0 (a frame-based compression) is different from G.711 (a
   sample-based encoding) and is described separately here.

   RFC 3551 [RFC3551] provides guidelines for encoding audio channels
   (Section 4) and for the ordering of the channels within the RTP
   payload (Section 4.1).  The ordering guidelines in RFC 3551,
   Section 4.1 SHOULD be used unless an application-specific channel
   ordering is more appropriate.

   An implicit assumption in RFC 3551 is that all the channel data
   multiplexed into a RTP payload MUST represent the same physical time
   span.  The case for G.711.0 is no different; the underlying G.711
   data for all channels in a G.711.0 RTP payload MUST span the same
   interval in time (e.g., the same "ptime" for a SDP-specified codec
   negotiation).

   RFC 3551 provides guidelines for sample-based encodings such as G.711
   in Section 4.2.  This guidance is tantamount to interleaving the
   individual samples in that they SHOULD be packed in consecutive
   octets.

   RFC 3551 provides guidelines for frame-based encodings in which the
   frames are interleaved.  However, this guidance stems from the
   assumption that "the frame size for frame-oriented codecs is a
   given".  However, this assumption is not valid for G.711.0 in that
   individual consecutive G.711.0 frames (as per Section 4.2.2) can:

      1) represent different time spans (e.g., two 5 ms G.711.0 frames
      in lieu of one 10 ms G.711.0 frame), and

      2) be of different lengths in octets (and typically are).

   Therefore a different, but also simple, concatenation-based approach
   is specified in this RFC.

   For the multiple channel G.711.0 case, each G.711 channel is
   independently encoded into one or more G.711.0 frames defined here as
   a "G.711.0 channel superframe".  Each one of these superframes is
   identical to the multiple G.711.0 frame case illustrated in Figure 3
   of Section 4.2.2 in which each superframe can have one or more
   individual G.711.0 frames within it.  Then each G.711.0 channel
   superframe is concatenated - in channel order - into a G.711.0 RTP
   payload.  Then, if optional G.711.0 padding octets (0x00) are
   desired, it is RECOMMENDED that these octets are placed after the
   last G.711.0 channel superframe.  As per above, such padding may be
   desired based on security considerations (see Section 10).  This is
   depicted in the following Figure 4 below.

Ramalho, et al.         Expires February 23, 2015              [Page 15]
Internet-Draft           G.711.0 Payload Format              August 2014

            Multiple G.711.0 Channel Superframes in RTP Payload

           |----------|---------|----------|---------|---------|
           | First    | Second  |          | Nth     | Zero    |
           | G.711.0  | G.711.0 |   ...    | G.711.0 | or more |
           | Channel  | Channel |          | Channel | 0x00    |
           | Super-   | Super-  |          | Super   | Padding |
           | Frame    | Frame   |          | Frame   | Octets  |
           |__________|_________|__________|_________|_________|

                                 Figure 4

   We note that although the individual superframes can be of different
   lengths in octets (and usually are), that the number of G.711 source
   symbols represented - in compressed form - in each channel superframe
   is identical (since all the channels represent the identically same
   time interval).

   The G.711.0 decoder at the receiving end simply decodes the entire
   G.711.0 (multiple channel) payload into individual G.711 symbols.  If
   M such G.711 symbols result and there were N channels, then the first
   M/N G.711 samples would be from the first channel, the second M/N
   G.711 samples would be from the second channel, and so on until the
   Nth set of G.711 samples are found.  Similarly, if the number of
   channels was not known, but the payload "ptime" was known, one could
   infer (knowing the sampling rate) how many G.711 symbols each channel
   contained; then with this knowledge determine how many channels of
   data were contained in the payload.  When SDP is used, the number of
   channels is known because the optional parameter is a MUST when there
   is more than one channel negotiated (see Section 5.1).  Additionally,
   when SDP is used the parameter ptime is a RECOMMENDED optional
   parameter.  We note that if both parameters channels and ptime are
   known that one could provide a check for the other and the converse.
   Whichever algorithm is used to determine the number of channels, if
   the length of the source G.711 symbols in the payload (M) is not an
   integer multiple of the number of channels (N), then the packet
   SHOULD be discarded.

   Lastly we note that although any padding for the multiple channel
   G.711.0 payload is RECOMMENDED to be placed at the end of the
   payload, the G.711.0 decoding algorithm described in Section 4.2.3
   will successfully decode the payload in Figure 4 if the 0x00 padding
   octet is placed anywhere before or after any individual G.711.0 frame
   in the RTP payload.  The number of padding octets introduced at any
   G.711.0 frame boundary therefore does not affect the number M of the

Ramalho, et al.         Expires February 23, 2015              [Page 16]
Internet-Draft           G.711.0 Payload Format              August 2014

   source G.711 symbols produced.  Thus the decision for padding MAY be
   made on a per-superframe basis.

5.  Payload Format Parameters

   This section defines the parameters that may be used to configure
   optional features in the G.711.0 RTP transmission.

   The parameters defined here as a part of the media subtype
   registration for the G.711.0 codec.  Mapping of the parameters into
   Session Description Protocol (SDP) RFC 4566 [RFC4566] is also
   provided for those applications that use SDP.

5.1.  Media Type Registration

   Type name: audio

   Subtype name: G711-0

   Required parameters:

      clock rate: The RTP timestamp clock rate, which is equal to the
      sampling rate.  The typical rate used with G.711 encoding is 8000,
      but other rates may be specified.  The default rate is 8000.

      complaw: This format specific parameter, specified on the "a=fmtp:
      line", indicates the companding law (A-law or mu-law) employed.
      This format specific parameter, as per RFC 4566 [RFC4566], is
      given unchanged to the media tool using this format.  The case-
      insensitive values are "complaw=al" or "complaw=mu" are used for
      A-law and mu-law, respectively.

   Optional parameters:

      channels: See RFC 4566 [RFC4566] for definition.  Specifies how
      many audio streams are represented in the G.711.0 payload and MUST
      be present if the number of channels is greater than one.  This
      parameter defaults to 1 if not present (as per RFC 4566) and is
      typically a non-zero small-valued positive integer.  It is
      expected that implementations that specify multiple channels will
      also define a mechanism to map the channels appropriately within
      their system design, otherwise the channel order specified in RFC
      3551 [RFC3551] Section 4.1 will be assumed (e.g., left, right,
      center, ... ).  Similar to the usual interpretation in RFC 3551
      [RFC3551], the number of channels SHALL be a non-zero positive
      integer.

      maxptime: See RFC 4566 [RFC4566] for definition.

Ramalho, et al.         Expires February 23, 2015              [Page 17]
Internet-Draft           G.711.0 Payload Format              August 2014

      ptime: See RFC 4566 [RFC4566] for definition.  The inclusion of
      "ptime" is RECOMMENDED and SHOULD be in the SDP unless there is an
      application specific reason not to include it (e.g., an
      application that has a variable ptime on a packet-by-packet
      basis).  For constant ptime applications, it is considered good
      form to include "ptime" in the SDP for session diagnostic
      purposes.  For the constant ptime multiple channel case described
      in Section 4.2.2, the inclusion of "ptime" can provide a desirable
      payload check.

   Encoding considerations:

      This media type is framed binary data (see Section 4.8 in RFC 6838
      [RFC6838]) compressed as per ITU-T Rec. G.711.0.

   Security considerations:

      See Section 10.

   Interoperability considerations: none

   Published specification:

      ITU-T Rec. G.711.0 and RFC XXXX.

      [ RFC Editor: please replace XXXXX with a reference to this RFC ]

   Applications that use this media type:

      Although initially conceived for VoIP, the use of G.711.0, like
      G.711 before it, may find use within audio and video streaming
      and/or conferencing applications for the audio portion of those
      applications.

   Additional information:

   The following applies to stored-file transfer methods:

         Magic numbers: #!G7110A\n or #!G7110M\n (for A-law or MU-law
         encodings respectively, see Section 6).

         File Extensions: None

         Macintosh file type code: None

         Object identifier or OIL: None

   Person & email address to contact for further information:

Ramalho, et al.         Expires February 23, 2015              [Page 18]
Internet-Draft           G.711.0 Payload Format              August 2014

      Michael A.  Ramalho <mramalho@cisco.com> or <mar42@cornell.edu>

   Intended usage: COMMON

   Restrictions on usage:

      This media type depends on RTP framing, and hence is only defined
      for transfer via RTP [RFC3550].  Transport within other framing
      protocols is not defined at this time.

   Author: Michael A.  Ramalho

   Change controller:

      IETF Payload working group delegated from the IESG.

5.2.  Mapping to SDP Parameters

   The information carried in the media type specification has a
   specific mapping to fields in the Session Description Protocol (SDP),
   which is commonly used to describe RTP sessions.  When SDP is used to
   specify sessions employing G.711.0, the mapping is as follows:

   o  The media type ("audio") goes in SDP "m=" as the media name.

   o  The media subtype ("G711-0") goes in SDP "a=rtpmap" as the
      encoding name.

   o  The required parameter "rate" also goes in "a=rtpmap" as the clock
      rate.

   o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
      "a=maxptime" attributes, respectively.

   o  Remaining parameters go in the SDP "a=fmtp" attribute by copying
      them directly from the media type string as a semicolon-separated
      list of parameter=value pairs.

5.3.  Offer/Answer Considerations

   The following considerations apply when using the SDP offer/answer
   RFC 3264 [RFC3264] mechanism to negotiate the "channels" attribute.

   o  If the offering endpoint specifies a value for the optional
      channels parameter greater than one and the answering endpoint
      both understands the parameter and cannot support that value
      requested, the answer MUST contain the optional channels parameter
      with the highest value it can support.

Ramalho, et al.         Expires February 23, 2015              [Page 19]
Internet-Draft           G.711.0 Payload Format              August 2014

   o  If the offering endpoint specifies a value for the optional
      channels parameter the answer MUST contain the optional channels
      parameter unless the only value the answering endpoint can support
      is one, in which case the answer MAY contain the optional channels
      parameter with value of 1.

   o  If the offering endpoint specifies a value for the ptime parameter
      that the answering endpoint cannot support, the answer MUST
      contain the optional ptime parameter.

   o  If the offering endpoint specifies a value for the maxptime
      parameter that the answering endpoint cannot support, the answer
      MUST contain the optional maxptime parameter.

5.4.  SDP Examples

   The following examples illustrate how to signal G.711.0 via SDP.

5.4.1.  SDP Example 1

         m=audio RTP/AVP 98
         a=rtpmap:98 G711-0/8000
         a=fmtp:98 complaw=mu

   In the above example the dynamic payload type 98 is mapped to G.711.0
   via the "a=rtpmap" parameter.  The mandatory "complaw" is on the
   "a=fmtp" parameter line.  Note that neither optional parameters
   "ptime" nor "channels" is present; although it is generally good form
   to include "ptime" in the SDP for session diagnostic purposes.

5.4.2.  SDP Example 2

   The following example illustrates an offering endpoint requesting 2
   channels, but the answering endpoint can only support (or render) one
   channel.

   Offer:

         m=audio RTP/AVP 98
         a=rtpmap:98 G711-0/8000/2
         a=ptime:20
         a=fmtp:98 complaw=al

   Answer:

         m=audio RTP/AVP 98
         a=rtpmap: 98 G711-0/8000/1
         a=ptime: 20

Ramalho, et al.         Expires February 23, 2015              [Page 20]
Internet-Draft           G.711.0 Payload Format              August 2014

         a=fmtp:98 complaw=al

   In this example the offer had an optional channels parameter.  The
   answer must have the optional channels parameter also unless the
   value in the answer is one.  Shown here is when the answer explicitly
   contains the channels parameter (it need not have and it would be
   interpreted as one channel).  As mentioned previously, it is
   considered good form to include "ptime" in the SDP for session
   diagnostic purposes if the session is a constant ptime session.

6.  G.711.0 Storage Mode Conventions and Definition

   The G.711.0 storage mode definition in this section is similar to
   many other IETF codecs (e.g., iLBC, EVRC-NW) and is essentially a
   concatenation of individual G.711.0 frames.

   We note that something must be stored for any G.711.0 frames that not
   received at the receiving endpoint, no matter what the cause.  In
   this section we describe two mechanisms, a "G.711.0 PLC Frame" and a
   "G.711.0 Erasure Frame".  These G.711.0 PLC and G.711.0 Erasure
   Frames are described prior to the G.711.0 storage mode definition for
   clarity.

6.1.  G.711.0 PLC Frame

   When G.711 RTP payloads not received by a rendering endpoint a Packet
   Loss Concealment (PLC) mechanism is typically employed to "fill in"
   the missing G.711 symbols with something that is auditorially
   pleasing and thus the loss may be not noticed by a listener.  Such a
   PLC mechanism for G.711 is specified in ITU-T Rec. G.711 - Appendix 1
   [G.711-AP1].

   An natural extension when creating G.711.0 frames for storage
   environments is to employ such a PLC mechanism to create G.711
   symbols for the span of time in which G.711.0 payloads were not
   received - and then to compress the resulting "G.711 PLC symbols" via
   G.711.0 compression.  The G.711.0 frame(s) created by such a process
   are called "G.711.0 PLC Frames".

   Since PLC mechanisms are designed to render missing audio data with
   the best fidelity and intelligibility, G.711.0 frames created via
   such processing is likely best for most recording situations (such as
   voicemail storage) unless there is a requirement not to fabricate
   (audio) data not actually received.

   After such PLC G.711 symbols have been generated and then encoded by
   a G.711.0 encoder, the resulting frames may be stored in G.711.0
   frame format.  As a result, there is nothing to specify here - the

Ramalho, et al.         Expires February 23, 2015              [Page 21]
Internet-Draft           G.711.0 Payload Format              August 2014

   G.711.0 PLC Frames are stored as if they were received by the
   receiving endpoint.  In other words, PLC-generated G.711.0 frames
   appear as "normal" or "ordinary" G.711.0 frames in the storage mode
   file.

6.2.  G.711.0 Erasure Frame

   "Erasure Frames", or equivalently "Null Frames", have been designed
   for many frame-based codecs since G.711 was standardized.  These
   null/erasure frames explicitly represent data from incoming audio
   that were either not received by the receiving system or represent
   data that a transmitting system decided not to send.  Transmitting
   systems may choose not to send data for a variety of reasons (e.g.,
   not enough wireless link capacity in radio-based systems) and can
   choose to send a "null frame" in lieu of the actual audio.  It is
   also envisioned that erasure frames would be used in storage mode
   applications for specific archival purposes where there is a
   requirement not to fabricate audio data that was not actually
   received.

   Thus, a G.711.0 erasure frame is a representation of the amount of
   time in G.711.0 frames that were not received or not encoded by the
   transmitting system.

   Prior to defining a G.711.0 erasure frame it is beneficial to note
   what many G.711 RTP systems send when the endpoint is "muted".  When
   muted, many of these systems will send an entire G.711 payload of
   either 0+ or 0- (i.e., one of the two levels closest to "analog zero"
   in either G.711 companding law).  Next we note that a desirable
   property for a G.711.0 erasure frame is for "non G.711.0 Erasure
   Frame aware" endpoints to be able to playback a G.711.0 erasure frame
   with the existing G.711.0 ITU-T reference code.

   A G.711.0 Erasure Frame is defined as any G.711.0 frame for which the
   corresponding G.711 sample values are either the value 0++ or the
   value 0-- for the entirety of the G.711.0 frame.  The levels of 0++
   and 0-- are defined to be the two levels above or below analog zero,
   respectively.  An entire frame of value 0++ or 0-- is expected to be
   extraordinarily rare when the frame was in fact generated by a
   natural signal (on the order of one in 2^{ptime in samples, minus
   one}), as analog inputs such as speech and music are zero-mean and
   are typically acoustically coupled to digital sampling systems.  Note
   that the playback of a G.711.0 frame characterized as an erasure
   frame is auditorially equivalent to a muted signal (a very low value
   constant).

   These G.711.0 erasure frames can be reasonably characterized as null
   or erasure frames while meeting the desired playback goal of being

Ramalho, et al.         Expires February 23, 2015              [Page 22]
Internet-Draft           G.711.0 Payload Format              August 2014

   decoded by the G.711.0 ITU-T reference code.  Thus, similarly to
   G.711 PLC frames, the G.711.0 erasure frames appear as "normal" or
   "ordinary" G.711.0 frames in the storage mode format.

6.3.  G.711.0 Storage Mode Definition

   The storage format is used for storing G.711.0 encoded frames.  The
   format for the G.711.0 storage mode file defined by this RFC is shown
   below.

                        G.711.0 Storage Mode Format

          |---------------------------|----------|--------------|
          |       Magic Number        |          |              |
          |                           |  Version | Concatenated |
          | "#!G7110A\n" (for A-law)  |   Octet  |   G.711.0    |
          |            or             |          |    Frames    |
          | "#!G7110M\n" (for mu-law) |  "0x00"  |              |
          |___________________________|__________|______________|

                                 Figure 5

   The storage mode file consists of a magic number and a version octet
   followed by the individual G.711.0 frames concatenated together.

   The magic number for G.711.0 A-law corresponds to the ASCII character
   string "#!G7110A\n", i.e., "0x23 0x21 0x47 0x37 0x31 0x31 0x30 0x41
   0x0A".  Likewise, the magic number for G.711.0 MU-law corresponds to
   the ASCII character string "#!G7110M\n", i.e., "0x23 0x21 0x47 0x37
   0x31 0x31 0x4E 0x4D 0x0A".

   The version number octet allows for the future specification of other
   G.711.0 storage mode formats.  The specification of other storage
   mode formats may be desirable as G.711.0 frames are of variable
   length and a future format may include an indexing methodology that
   would enable playout far into a long G.711.0 recording without the
   necessity of decoding all the G.711.0 frames since the beginning of
   the recording.  Other future format specification may include support
   for multiple channels, metadata and the like.  For these reasons it
   was determined that a versioning strategy was desirable for the
   G.711.0 storage mode definition specified by this RFC.  This RFC only
   specifies Version 0 and thus the value of "0x00" MUST be used for the
   storage mode defined by this RFC.

   The G.711.0 codec data frames, including any necessary erasure or PLC
   frames, are stored in consecutive order concatenated together as

Ramalho, et al.         Expires February 23, 2015              [Page 23]
Internet-Draft           G.711.0 Payload Format              August 2014

   shown in Section 4.2.2.  As the Version 0 storage mode only supports
   a single channel, the RTP payload format supporting multiple channels
   defined in Section 4.2.4 is not supported in this storage mode
   definition.

   To decode the individual G.711.0 frames, the algorithm presented in
   Section 4.2.2 may be used to decode the individual G.711.0 frames.
   If the version octet is determined not to be zero, the remainder of
   the payload MUST NOT be passed to the G.711.0 decoder, as the ITU-T
   G.711.0 reference decoder can only decode concatenated G.711.0 frames
   and has not been designed to decode elements in yet to be specified
   future storage mode formats.

7.  Acknowledgements

   There have been many people contributing to G.711.0 in the course of
   its development.  The people listed here deserve special mention:
   Takehiro Moriya, Claude Lamblin, Herve Taddei, Simao Campos, Yusuke
   Hiwasaki, Jacek Stachurski, Lorin Netsch, Paul Coverdale, Patrick
   Luthi, Paul Barrett, Jari Hagqvist, Pengjun (Jeff) Huang, John Gibbs,
   Yutaka Kamamoto, and Csaba Kos.  The review and oversight by the IETF
   Payload Working Group chairs Ali Begen and Roni Even during the
   development of this RFC is appreciated.  Additionally, the careful
   review and comments by Richard Barnes is likewise very much
   appreciated.

8.  Contributors

   The authors thank everyone who have contributed to this document.
   The people listed here deserve special mention: Ali Begen, Roni Even,
   and Hadriel Kaplan.

9.  IANA Considerations

   One media type (audio/G711-0) has been defined and requires IANA
   registration in the media types registry.  See Section 5.1 for
   details.

10.  Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [RFC3550], and in any appropriate RTP profile (for
   example RFC 3551 [RFC3551] or [RFC4585]).  This implies that
   confidentiality of the media streams is achieved by encryption; for
   example, through the application of SRTP [RFC3711].  Because the data
   compression used with this payload format is applied end-to-end, any
   encryption needs to be performed after compression.

Ramalho, et al.         Expires February 23, 2015              [Page 24]
Internet-Draft           G.711.0 Payload Format              August 2014

   Note that the appropriate mechanism to ensure confidentiality and
   integrity of RTP packets and their payloads is very dependent on the
   application and on the transport and signaling protocols employed.
   Thus, although SRTP is given as an example above, other possible
   choices exist.

   Note that end-to-end security with either authentication, integrity
   or confidentiality protection will prevent a network element not
   within the security context from performing media-aware operations
   other than discarding complete packets.  To allow any (media-aware)
   intermediate network element to perform its operations, it is
   required to be a trusted entity which is included in the security
   context establishment.

   G.711.0 has no known denial-of-service attacks due to decoding, as
   data posing as a desired G711.0 payload will be decoded into
   something (as per the decoding algorithm) with a finite amount of
   computation.  This is due to the decompression algorithm having a
   finite worst-case processing path (no infinite computational loops
   are possible).  We also note that the data read by the G.711.0
   decoder is controlled by the length of the individual encoded G.711.0
   frame(s) contained in the RTP payload.  The decoding algorithm
   specified in Section 4.2.3 above ensures that the G.711.0 decoder
   will not read beyond the length of the internal buffer specified
   (which is in turn specified to be no greater than the largest
   possible G.711.0 frame of 321 octets).  Therefore a G.711.0 payload
   does not carry "active content" that could impose malicious side-
   effects upon the receiver.

   G.711.0 is a variable bit rate (VBR) audio codec.  There have been
   recent concerns with VBR speech codecs where a passive observer can
   identify phrases from a standard speech corpus by means of the
   lengths produced by the encoder even when the payload is encrypted
   [IEEE].  In this paper, it was determined that some code excited
   linear prediction (CELP) codecs would produce discrete packet lengths
   for some phonemes.  And furthermore with the use of appropriately
   designed Hidden Markov Models (HMMs) that such a system could predict
   phrases with unexpected accuracy.  One CELP codec studied, SPEEX, had
   the property that it produced 21 different packet lengths in its
   wideband mode and that these packet lengths probabilistically mapped
   to phonemes that a HMM system could be trained on.  In this paper it
   was determined that a mitigation technique would be to pad the output
   of the encoder with random padding lengths to the effect: 1) that
   more discrete payload sizes would result, and 2) that the
   probabilistic mapping to phonemes would become less clear.  As G.711
   is not a speech model based codec, neither is G.711.0.  A G.711.0
   encoding, during talking periods, produces frames of varying frame
   lengths which are not likely to have a strong mapping to phonemes.

Ramalho, et al.         Expires February 23, 2015              [Page 25]
Internet-Draft           G.711.0 Payload Format              August 2014

   Thus G.711.0 is not expected to have this same vulnerability.  It
   should be noted that "silence" (only one value of G.711 in the entire
   G.711 input frame)" or "near silence" (only a few G.711 values) is
   easily detectable as G.711.0 frame lengths or one or a few octets.
   If one desires to mitigate for silence/non-silence detection,
   statistically variable padding should be added to G.711.0 frames that
   resulted in very small G.711.0 frames (less than about 20% of the
   symbols of the corresponding G.711 input frame).  Methods of
   introducing padding in the G.711.0 payloads have been provided in the
   G.711.0 RTP payload definition in Section 4.2.2.

11.  Congestion Control

   The G.711 codec is a Constant Bit Rate (CBR) codec which does not
   have a means to regulate the bitrate.  The G.711.0 lossless
   compression algorithm typically compresses the G.711 CBR stream into
   a smaller VBR stream.  However, being lossless, it does not possess
   means of further reducing the bitrate beyond the G.711.0-based
   compression result.  The G.711.0 RTP payloads can be made arbitrarily
   large by means of adding optional padding bytes (subject only to MTU
   limitations).

   Therefore, there are no explicit ways to regulate the bit-rate of the
   transmissions outlined in this RTP Payload format except by means of
   modulating the number of optional padding bytes in the RTP payload.

12.  References

12.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
              Description Protocol", RFC 4566, July 2006.

   [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
              Specifications and Registration Procedures", BCP 13, RFC
              6838, January 2013.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
              Video Conferences with Minimal Control", STD 65, RFC 3551,
              July 2003.

Ramalho, et al.         Expires February 23, 2015              [Page 26]
Internet-Draft           G.711.0 Payload Format              August 2014

   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
              "Extended RTP Profile for Real-time Transport Control
              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
              2006.

   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
              RFC 3711, March 2004.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264, June
              2002.

   [G.711.0]  ITU-T G.711.0, , "Recommendation ITU-T G.711.0 - Lossless
              Compression of G.711 Pulse Code Modulation", September
              2009.

   [G.711]    ITU-T G.711.0, , "Recommendation ITU-T G.711: Pulse Code
              Modulation (PCM) of Voice Frequencies", November 1988.

   [G.711-AP1]
              ITU-T G.711 Appendix 1, , "Recommendation G.711
              Appendix 1: A high quality low-complexity algorithm for
              packet loss concealment with G.711", September 1999.

   [G.711-A1]
              ITU-T G.711 Amendment 1, , "Recommendation ITU-T G.711
              Amendment 1 - Amendment 1: New Annex A on Lossless
              Encoding of PCM Frames", September 2009.

12.2.  Informative References

   [G.729]    ITU-T G.729, , "Recommendation ITU-T G.729 - Coding of
              speech at 8 kbit/s using conjugate-structure algebraic-
              code-excited linear prediction (CS-ACELP)", January 2007.

   [G.722]    ITU-T G.722, , "Recommendation ITU-T G.722 - 7 kHz audio-
              coding within 64 kbit/s", November 1988.

   [ICASSP]   N. Harada, , Y. Yamamoto, , T. Moriya, , Y. Hiwasaki, , M.
              A. Ramalho, , L. Netsch, , Y. Stachurski, , Miao Lei, , H.
              Taddei, , and Q. Fengyan, "Emerging ITU-T Standard G.711.0
              - Lossless Compression of G.711 Pulse Code Modulation,
              International Conference on Acoustics Speech and Signal
              Processing (ICASSP), 2010, ISBN 978-1-4244-4244-4295-9",
              March 2010.

Ramalho, et al.         Expires February 23, 2015              [Page 27]
Internet-Draft           G.711.0 Payload Format              August 2014

   [IEEE]     C.V. Wright, , L. Ballard, , S.E. Coull, , F. Monrose, ,
              and G.M. Masson, "Spot Me if You Can: Uncovering Spoken
              Phrases in Encrypted VoIP Conversations, IEEE Symposium on
              Security and Privacy, 2008, ISBN: 978-0-7695-3168-7", May
              2008.

Authors' Addresses

   Michael A. Ramalho (editor)
   Cisco Systems, Inc.
   6310 Watercrest Way Unit 203
   Lakewood Ranch, FL  34202
   USA

   Phone: +1 919 476 2038
   Email: mramalho@cisco.com

   Paul E. Jones
   Cisco Systems, Inc.
   7025 Kit Creek Rd.
   Research Triangle Park, NC  27709
   USA

   Phone: +1 919 476 2048
   Email: paulej@packetizer.com

   Noboru Harada
   NTT Communications Science Labs.
   3-1 Morinosato-Wakamiya
   Atsugi, Kanagawa  243-0198
   JAPAN

   Phone: +81 46 240 3676
   Email: harada.noboru@lab.ntt.co.jp

   Muthu Arul Mozhi Perumal
   Ericsson
   Ferns Icon
   Doddanekundi, Mahadevapura
   Bangalore, Karnataka  560037
   India

   Phone: +91 9449288768
   Email: muthu.arul@gmail.com

Ramalho, et al.         Expires February 23, 2015              [Page 28]
Internet-Draft           G.711.0 Payload Format              August 2014

   Lei Miao
   Huawei Technologies Co. Ltd
   Q22-2-A15R, Enviroment Protection Park
   No. 156 Beiqing Road
   HaiDian District
   Beijing  100095
   China

   Phone: +86 1059728300
   Email: lei.miao@huawei.com

Ramalho, et al.         Expires February 23, 2015              [Page 29]
RTP Payload Format for G.711.0 draft-ietf-payload-g7110-03

RTP Payload Format for G.711.0
draft-ietf-payload-g7110-03