A More Loss-Tolerant RTP Payload Format for MP3 Audio
RFC 3119

Document Type RFC - Proposed Standard (June 2001; Errata)
Obsoleted by RFC 5219
Author Ross Finlayson 
Last updated 2020-01-21
Stream IETF
Formats plain text html pdf htmlized with errata bibtex
Stream WG state (None)
Document shepherd No shepherd assigned
IESG IESG state RFC 3119 (Proposed Standard)
Consensus Boilerplate Unknown
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                       R. Finlayson
Request for Comments: 3119                                      LIVE.COM
Category: Standards Track                                      June 2001

         A More Loss-Tolerant RTP Payload Format for MP3 Audio

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2001).  All Rights Reserved.


   This document describes a RTP (Real-Time Protocol) payload format for
   transporting MPEG (Moving Picture Experts Group) 1 or 2, layer III
   audio (commonly known as "MP3").  This format is an alternative to
   that described in RFC 2250, and performs better if there is packet

1. Introduction

   While the RTP payload format defined in RFC 2250 [2] is generally
   applicable to all forms of MPEG audio or video, it is sub-optimal for
   MPEG 1 or 2, layer III audio (commonly known as "MP3").  The reason
   for this is that an MP3 frame is not a true "Application Data Unit" -
   it contains a back-pointer to data in earlier frames, and so cannot
   be decoded independently of these earlier frames.  Because RFC 2250
   defines that packet boundaries coincide with frame boundaries, it
   handles packet loss inefficiently when carrying MP3 data.  The loss
   of an MP3 frame will render some data in previous (or future) frames
   useless, even if they are received without loss.

   In this document we define an alternative RTP payload format for MP3
   audio.  This format uses a data-preserving rearrangement of the
   original MPEG frames, so that packet boundaries now coincide with
   true MP3 "Application Data Units", which can also (optionally) be
   rearranged in an interleaving pattern.  This new format is therefore
   more data-efficient than RFC 2250 in the face of packet loss.

Finlayson                   Standards Track                     [Page 1]
RFC 3119     Loss-Tolerant RTP Payload Format for MP3 Audio    June 2001

2. The Structure of MP3 Frames

   In this section we give a brief overview of the structure of a MP3
   frame.  (For more detailed description, see the MPEG 1 audio [3] and
   MPEG 2 audio [4] specifications.)

   Each MPEG audio frame begins with a 4-byte header.  Information
   defined by this header includes:

   -  Whether the audio is MPEG 1 or MPEG 2.
   -  Whether the audio is layer I, II, or III.
      (The remainder of this document assumes layer III, i.e., "MP3"
   -  Whether the audio is mono or stereo.
   -  Whether or not there is a 2-byte CRC field following the header.
   -  (indirectly) The size of the frame.

   The following structures appear after the header:

   -  (optionally) A 2-byte CRC field
   -  A "side info" structure.  This has the following length:
      -  32 bytes for MPEG 1 stereo
      -  17 bytes for MPEG 1 mono, or for MPEG 2 stereo
      -  9 bytes for MPEG 2 mono
   -  Encoded audio data, plus optional ancillary data (filling out the
      rest of the frame)

   For the purpose of this document, the "side info" structure is the
   most important, because it defines the location and size of the
   "Application Data Unit" (ADU) that an MP3 decoder will process.  In
   particular, the "side info" structure defines:

   -  "main_data_begin": This is a back-pointer (in bytes) to the start
      of the ADU.  The back-pointer is counted from the beginning of the
      frame, and counts only encoded audio data and any ancillary data
      (i.e., ignoring any header, CRC, or "side info" fields).

   An MP3 decoder processes each ADU independently.  The ADUs will
   generally vary in length, but their average length will, of course,
   be that of the of the MP3 frames (minus the length of the header,
   CRC, and "side info" fields).  (In MPEG literature, this ADU is
   sometimes referred to as a "bit reservoir".)

Finlayson                   Standards Track                     [Page 2]
RFC 3119     Loss-Tolerant RTP Payload Format for MP3 Audio    June 2001

3. A New Payload Format

   As noted in [5], a payload format should be designed so that packet
   boundaries coincide with "codec frame boundaries" - i.e., with ADUs.
   In the RFC 2250 payload format for MPEG audio [2], each RTP packet
   payload contains MP3 frames.  In this new payload format for MP3
   audio, however, each RTP packet payload contains "ADU frames", each
   preceded by an "ADU descriptor".

3.1 ADU frames

   An "ADU frame" is defined as:

      -  The 4-byte MPEG header
         (the same as the original MP3 frame, except that the first 11
         bits are (optionally) replaced by an "Interleaving Sequence
Show full document text