Skip to main content

RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)
draft-ietf-avt-rtp-h263-video-01

The information below is for an old version of the document that is already published as an RFC.
Document Type
This is an older version of an Internet-Draft that was ultimately published as RFC 2429.
Authors Chad Zhu, Gary Sullivan , Carsten Bormann , Linda Cline , Gim L. Deisher , Dr. Thomas R. Gardos , Christian Maciocco , Donald Newell , Joerg Ott , Stephan Wenger
Last updated 2013-03-02 (Latest revision 1998-04-07)
RFC stream Internet Engineering Task Force (IETF)
Intended RFC status Proposed Standard
Formats
Additional resources Mailing list discussion
Stream WG state (None)
Document shepherd (None)
IESG IESG state Became RFC 2429 (Proposed Standard)
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-ietf-avt-rtp-h263-video-01
Internet Engineering Task Force                 Audio-Video Transport WG
INTERNET-DRAFT                                 C. Bormann / Univ. Bremen
                                                        L. Cline / Intel
                                                      G. Deisher / Intel
                                                       T. Gardos / Intel
                                                     C. Maciocco / Intel
                                                       D. Newell / Intel
                                                   J. Ott / Univ. Bremen
                                                G. Sullivan / PictureTel
                                                   S. Wenger / TU Berlin
                                                          C. Zhu / Intel

                                            Date Generated: 14 Jan. 1998

               RTP Payload Format for the 1998 Version of
                    ITU-T Rec. H.263 Video (H.263+)
                 <draft-ietf-avt-rtp-h263-video-01.txt>

Status of This Memo

This document is an Internet-Draft.  Internet-Drafts are working 
documents of the Internet Engineering Task Force (IETF), its areas, and 
its working groups.  Note that other groups may also distribute working 
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months 
and may be updated, replaced, or made obsolete by other documents at any 
time.  It is inappropriate to use Internet-Drafts as reference material 
or to cite them other than as "work in progress."

To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 
ftp.isi.edu (US West Coast).

Distribution of this document is unlimited.

1. Introduction

This document specifies an RTP payload header format applicable to the 
transmission of video streams generated based on the 1998 version of
ITU-T Recommendation H.263 [4].  Because the 1998 version of H.263 is a 
superset of the 1996 syntax, this format can also be used with the 1996 
version of H.263.

The 1998 version of ITU-T Recommendation H.263 added numerous coding 
options to improve codec performance over the 1996 version.  The 1998 
version is referred to as H.263+ in this document.  Among the new 
options, the ones with the biggest impact on the RTP payload 
specification and the error resilience of the video content are the 
slice structured mode, the independent segment decoding mode (ISD), the 
reference picture selection mode, and the scalability mode.  This 
section summarizes the impact of these new coding options on 
packetization.  Refer to [4] for more information on coding options.

The slice structured mode was added to H.263+ for three purposes: to 
provide enhanced error resilience capability, to make the bitstream more 
amenable to use with an underlying packet transport such as RTP, and to 
minimize video delay.  The slice structured mode supports fragmentation 
at macroblock boundaries.

With the independent segment decoding option, a video picture frame is 
broken into segments and encoded in such a way that each segment is 
independently decodable.  Utilizing ISD in a lossy network environment 
helps to prevent the propagation of errors from one segment of the 
picture to others.

The reference picture selection mode allows the use of an older 
reference picture rather than the one immediately preceding the current 
picture.  Usually, the last transmitted frame is implicitly used as the 
reference picture for inter-frame prediction.  If the reference picture 
selection mode is used, the data stream carries information on what 
reference frame should be used, indicated by the temporal reference as 
an ID for that reference frame.  The reference picture selection mode 
can be used with or without a back channel, which provides information 
to the encoder about the internal status of the decoder.  However, no 
special provision is made herein for carrying back channel information.

H.263+ also includes bitstream scalability as an optional coding mode.
Three kinds of scalability are defined: temporal, signal-to-noise ratio
(SNR), and spatial scalability.  Temporal scalability is achieved via 
the disposable nature of bi-directionally predicted frames, or B-frames.  
SNR scalability permits refinement of encoded video frames, thereby 
improving the quality (or SNR).  Spatial scalability is similar to SNR 
scalability except the refinement layer is twice the size of the base 
layer in the horizontal dimension, vertical dimension, or both.

2. Usage of RTP

When transmitting H.263+ video streams over the Internet, the output of 
the encoder can be packetized directly.  All the bits resulting from the 
bitstream including the fixed length codes and variable length codes 
will be included in the packet, with the only exception being that when 
the payload of a packet begins with a Picture, GOB, Slice, EOS, or EOSBS 
start code, the first two (all-zero) bytes of the start code are removed 
and replaced by setting an indicator bit in the payload header.

For H.263+ bitstreams coded with temporal, spatial, or SNR scalability, 
each layer may be transported to a different network address.  More 
specifically, each layer may use a unique IP address and port number 
combination.  The temporal relations between layers shall be expressed 
using the RTP timestamp so that they can be synchronized at the 
receiving ends in multicast or unicast applications.

The H.263+ video stream will be carried as payload data within RTP 
packets.  A new H.263+ payload header is defined in section 4.  This 
section defines the usage of the RTP fixed header and H.263+ video 
packet structure.

2.1 RTP Header Usage

Each RTP packet starts with a fixed RTP header.  The following fields of 
the RTP fixed header are used for H.263+ video streams:

Marker bit (M bit): The Marker bit of the RTP header is set to 1 when 
the current packet carries the end of current frame, and is 0 otherwise.

Payload Type (PT): The Payload Type shall specify the H.263+ video 
payload format.

Timestamp: The RTP Timestamp encodes the sampling instance of the first 
video frame data contained in the RTP data packet.  The RTP timestamp 
shall be the same on successive packets if a video frame occupies more 
than one packet.  In a multilayer scenario, all pictures corresponding 
to the same temporal reference should use the same timestamp.  If 
temporal scalability is used (if B-frames are present), the timestamp 
may not be monotonically increasing in the RTP stream.  If B-frames are 
transmitted on a separate layer and address, they must be synchronized 
properly with the reference frames.  Refer to the 1998 ITU-T 
Recommendation H.263 [4] for information on required transmission order 
to a decoder.  For an H.263+ video stream, the RTP timestamp is based on 
a 90 kHz clock, the same as that of the RTP payload for H.261 stream 
[5].  Since both the H.263+ data and the RTP header contain time 
information, it is required that those timing information run 
synchronously.  That is, both the RTP timestamp and the temporal 
reference (TR in the picture header of H.263) should carry the same 
relative timing information.  If necessary, mathematical rounding should 
be applied to the information of the H.263+ data stream to generate the 
RTP timestamp (this is especially true for the standard picture clock 
frequency of 30000/1001 Hz, and may also be true if custom picture clock 
frequencies are to be used; see [4] for details).

2.2 Video Packet Structure

A section of an H.263+ compressed bitstream is carried as a payload 
within each RTP packet.  For each RTP packet, the RTP header is followed 
by an H.263+ payload header, which is followed by a number of bytes of a 
standard H.263+ compressed bitstream.  The size of the H.263+ payload 
header is variable depending on the payload involved as detailed in the 
section 4.  The layout of the RTP H.263+ video packet is shown as:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    RTP Header                                               ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    H.263+ Payload Header                                    ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    H.263+ Compressed Data Stream                            ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Any H.263+ start codes can be byte aligned by an encoder by using the 
stuffing mechanisms of H.263+.  As specified in H.263+, picture, slice, 
and EOSBS start codes shall always be byte aligned, and GOB and EOS 
start codes may be byte aligned.  For packetization purposes, GOB start 
codes should be byte aligned, although this is not absolutely required 
herein since it is not required in H.263+.

All H.263+ start codes (Picture, GOB, Slice, EOS, and EOSBS) begin with 
16 zero-valued bits.  If a start code is byte aligned and it occurs at 
the beginning of a packet, these two bytes shall be removed from the 
H.263+ compressed data stream in the packetization process and shall 
instead be represented by setting a bit (the P bit) in the payload 
header.

3. Design Considerations

The goals of this payload format are to specify an efficient way of 
encapsulating an H.263+ standard compliant bitstream and to enhance the 
resiliency towards packet losses.  Due to the large number of different 
possible coding schemes in H.263+, a copy of the picture header with 
configuration information is inserted into the payload header when 
appropriate.  The use of that copy of the picture header along with the 
payload data can allow decoding of a received packet even in such cases 
in which another packet containing the original picture header becomes 
lost.

There are a few assumptions and constraints associated with this H.263+ 
payload header design.  The purpose of this section is to point out 
various design issues and also to discuss several coding options 
provided by H.263+ that may impact the performance of network-based 
H.263+ video.

o The optional slice structured mode described in annex K of H.263+ [4]
  enables more flexibility for packetization.  Similar to a picture
  segment that begins with a GOB header, the motion vector predictors in
  a slice are restricted to reside within its boundaries.  However,
  slices provide much greater freedom in the selection of the size and
  shape of the area which is represented as a distinct decodable region.  
  In particular, slices can have a size which is dynamically selected to 
  allow the data for each slice to fit into a chosen packet size.  
  Slices can also be chosen to have a rectangular shape which is
  conducive for minimizing the impact of errors and packet losses on 
  motion compensated prediction.  For these reasons, the use of the 
  slice structured mode is strongly recommended for any applications 
  used in environments where significant packet loss occurs.

o In non-rectangular slice structured mode, only complete slices should
  be included in a packet.  In other words, slices should not be
  fragmented across packet boundaries.  The only reasonable need for a 
  slice to be fragmented across packet boundaries is when the encoder 
  which generated the H.263+ data stream could not be influenced by an 
  awareness of the packetization process (such as when sending H.263+ 
  data through a network other than the one to which the encoder is 
  attached, as in network gateway implementations).  Optimally, each 
  packet will contain only one slice.

o The independent segment decoding (ISD) described in annex R of [4]
  prevents any data dependency across slice or GOB boundaries in the
  reference picture.  It can be utilized to further improve resiliency
  in high loss conditions.

o If ISD is used in conjunction with the slice structure, the 
  rectangular slice submode shall be enabled and the dimensions and 
  quantity of the slices present in a frame shall remain the same 
  between each two intra-coded frames (I-frames), as required in H.263+.  
  The individual ISD segments may also be entirely intra coded from time 
  to time to realize quick error recovery without adding the latency 
  time associated with sending complete INTRA-pictures.

o When the slice structure is not applied, the insertion of a 
  (preferably byte-aligned) GOB header can be used to provide resync 
  boundaries in the bitstream, as the presence of a GOB header 
  eliminates the dependency of motion vector prediction across GOB 
  boundaries.  These resync boundaries provide natural locations for 
  packet payload boundaries.

o H.263+ allows picture headers to be sent in an abbreviated form in 
  order to prevent repetition of overhead information that does not 
  change from picture to picture.  For resiliency, sending a complete 
  picture header for every frame is often advisable.  This means, that 
  especially in cases with high packet loss probability in which picture 
  header contents are not expected to be highly predictable, the sender 
  may always set the subfield UFEP in PLUSPTYPE to '001' in the H.263+ 
  video bitstream.

o In a multi-layer scenario, each layer may be transmitted to a 
  different network address.  The configuration of each layer such as 
  the enhancement layer number (ELNUM), reference layer number (RLNUM), 
  and scalability type should be determined at the start of the session 
  and should not change during the course of the session.

o All start codes can be byte aligned, and picture, slice, and EOSBS 
  start codes are always byte aligned.  The boundaries of these
  syntactical elements provide ideal locations for placing packet 
  boundaries.

o We assume that a maximum Picture Header size of 504 bits is
  sufficient.  The syntax of H.263+ does not explicitly prohibit larger 
  picture header sizes, but the use of such extremely large picture 
  headers is not expected.

4. H.263+ Payload Header

For H.263+ video streams, each RTP packet carries only one H.263+ video 
packet.  The H.263+ payload header is always present for each H.263+ 
video packet.  The payload header is of variable length.  A 16 bit field 
of the basic payload header may be followed by an 8 bit field for Video 
Redundancy Coding information, and/or by a variable length picture 
header as indicated by PLEN. These optional fields appear in the order 
given above when present.

If a picture header is included in the payload header, the length of the 
picture header in number of bytes is specified by PLEN.  The minimum 
length of the payload header is 16 bits, corresponding to PLEN equal to 
0 and no VRC information present.

The remainder of this section defines the various components of the RTP 
payload header.  Section five defines the various packet types that are 
used to carry different types of H.263+ coded data, and section six 
summarizes how to distinguish between the various packet types.

4.1 General H.263+ payload header

The H.263+ payload header is structured as follows:

 0                   1
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  RR     |P|V|  PLEN     |PEBIT|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

RR: 5 bits
  Reserved bits.  Shall be zero.

P: 1 bit
  Indicates the picture start or a picture segment (GOB/Slice) start or 
  a video sequence end (EOS or EOSBS).  Two bytes of zero bits then have 
  to be prefixed to the payload of such a packet to compose a complete
  picture/GOB/slice/EOS/EOSBS start code.  This bit allows the omission 
  of the two first bytes of the start codes, thus improving the 
  compression ratio.

V: 1 bit
  Indicates the presence of an 8 bit field containing information for 
  Video Redundancy Coding (VRC), which follows immediately after the 
  initial 16 bits of the payload header if present.  For syntax and 
  semantics of that 8 bit VRC field see section 4.2.

PLEN: 6 bits
  Picture header length in number of bytes.  If no additional picture 
  header is attached, PLEN is 0.  If PLEN>0, the additional picture 
  header is attached immediately following the rest of the payload 
  header.

PEBIT: 3 bits
  Indicates the number of bits that shall be ignored in the last byte of 
  the picture header.  If PLEN is zero, then PEBIT shall also be zero.

4.2 Video Redundancy Coding Header Extension

Video Redundancy Coding (VRC) is an optional mechanism intended to 
improve error resilience over packet networks.  Implementing VRC in 
H.263+ will require the Reference Picture Selection option described in 
Annex N.  By having multiple "threads" of independently inter-frame 
predicted pictures, damage of individual frame will cause distortions 
only within its own thread but leave the other threads unaffected.  From 
time to time, all threads converge to a so-called sync frame (an INTRA 
picture or a non-INTRA picture which is redundantly represented within 
multiple threads); from this sync frame, the independent threads are 
started again.  For a more complete description of VRC see [7].

While a VRC data stream is - like all H.263+ data - totally self-
contained, it may be useful for the transport hierarchy implementation 
to have knowledge about the current damage status of each thread.  On 
the Internet, this status can easily be determined by observing the 
marker bit, the sequence number of the RTP header, and the thread-id and 
a circling "packet per thread" number.  The latter two numbers are coded 
in the VRC header extension.

The format of the VRC header extension is as follows:

 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
| TID | Trun  |S|
+-+-+-+-+-+-+-+-+

TID: 3 bits
  Thread ID.  Up to 7 threads are allowed. Each frame of H.263+ VRC data 
  will use as reference information only sync frames or frames within 
  the same thread.  By convention, thread 0 is expected to be the 
  "canonical" thread, which is the thread from which the sync frame 
  should ideally be used.  In the case of corruption or loss of the 
  thread 0 representation, a representation of the sync frame with a 
  higher thread number can be used by the decoder.  Lower thread numbers 
  are expected to contain equal or better representations of the sync 
  frames than higher thread numbers in the absence of data corruption or 
  loss.  See [7] for details.

Trun: 4 bits
  Monotonically increasing (modulo 16) 4 bit number counting the packet
  number within each thread.

S: 1 bit
  A bit that indicates that the packet content is for a sync frame.  An
  encoder using VRC may send several representations of the same "sync"
  picture, in order to ensure that regardless of which thread of 
  pictures is corrupted by errors or packet losses, the reception of at 
  least one representation of a particular picture is ensured (within at 
  least one thread).  The sync picture can then be used for the 
  prediction of any thread.  If packet losses have not occurred, then 
  the sync frame contents of thread 0 can be used and those of other 
  threads can be discarded (and similarly for other threads).  Thread 0 
  is considered the "canonical" thread, the use of which is preferable 
  to all others.  The contents of packets having lower thread numbers 
  shall be considered as generally preferred over those with higher 
  thread numbers.

5. Packetization schemes

5.1 Picture Segment Packets and Sequence Ending Packets (P=1)

A picture segment packet is defined as a packet that starts at the 
location of a Picture, GOB, or slice start code in the H.263+ data 
stream.  This corresponds to the definition of the start of a video 
picture segment as defined in H.263+.  For such packets, P=1 always.

An extra picture header can sometimes be attached in the payload header 
of such packets.  Whenever an extra picture header is attached as 
signified by PLEN>0, only the last six bits of its picture start code, 
'100000', are included in the payload header.  A complete H.263+ picture 
header with byte aligned picture start code can be conveniently 
assembled on the receiving end by prepending the sixteen leading '0' 
bits.

When PLEN>0, the end bit position corresponding to the last byte of the 
picture header data is indicated by PEBIT.  The actual bitstream data 
shall begin on an 8-bit byte boundary following the payload header.

A sequence ending packet is defined as a packet that starts at the 
location of an EOS or EOSBS code in the H.263+ data stream.  This 
delineates the end of a sequence of H.263+ video data (more H.263+ video 
data may still follow later, however, as specified in ITU-T 
Recommendation H.263).  For such packets, P=1 and PLEN=0 always.

The optional header extension for VRC may or may not be present as 
indicated by the V bit flag.

5.1.1 Packets that begin with a Picture Start Code

Any packet that contains the whole or the start of a coded picture shall 
start at the location of the picture start code (PSC), and should 
normally be encapsulated with no extra copy of the picture header. In 
other words, normally PLEN=0 in such a case.   However, if the coded 
picture contains an incomplete picture header (UFEP = "000"), then a 
representation of the complete (UFEP = "001") picture header may be 
attached during packetization in order to provide greater error 
resilience.  Thus, for packets that start at the location of a picture 
start code, PLEN shall be zero unless both of the following conditions 
apply:
1) The picture header in the H.263+ bitstream payload is incomplete
   (PLUSPTYPE present and UFEP="000"), and
2) The additional picture header which is attached is not incomplete
   (UFEP="001").

A packet which begins at the location of a Picture, GOB, slice, EOS, or 
EOSBS start code shall omit the first two (all zero) bytes from the 
H.263+ bitstream, and signify their presence by setting P=1 in the 
payload header.

Here is an example of encapsulating the first packet in a frame (without 
an attached redundant complete picture header):

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  RR     |1|V|0|0|0|0|0|0|0|0|0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------------+
| bitstream data without the first two 0 bytes of the PSC       |
+---------------------------------------------------------------+

5.1.2 Packets that begin with GBSC or SSC

For a packet that begins at the location of a GOB or slice start code, 
PLEN may be zero or may be nonzero, depending on whether a redundant 
picture header is attached to the packet.  In environments with very low 
packet loss rates, or when picture header contents are very seldom 
likely to change (except as can be detected from the GFID syntax of 
H.263+), a redundant copy of the picture header is not required.  
However, in less ideal circumstances a redundant picture header should 
be attached for enhanced error resilience, and its presence is indicated 
by PLEN>0.

Assuming a PLEN of 9, below is an example of a packet that begins with a
GBSC or a SSC:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  RR     |1|V|0 0 1 0 0 1|PEBIT|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1 0 0 0 0 0| picture header starting with TR, PTYPE, ...       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ...                                                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ...           | bitstream data begins with GBSC/SCC ...       .
+-+-+-+-+-+-+-+-+-----------------------------------------------+

Notice that only the last six bits of the picture start code, '100000', 
are included in the payload header.  A complete H.263+ picture header 
with byte aligned picture start code can be conveniently assembled if 
needed on the receiving end by prepending the sixteen leading '0' bits.

5.1.3 Packets that Begin with an EOS or EOSBS Code

For a packet that begins with an EOS or EOSBS code, PLEN shall be zero, 
and no Picture, GOB, or Slice start codes shall be included within the 
same packet.  As with other packets beginning with start codes, the two 
all-zero bytes that begin the EOS or EOSBS code at the beginning of the 
packet shall be omitted, and their presence shall be indicated by 
setting the P bit to 1 in the payload header.

System designers should be aware that some decoders may interpret the 
loss of a packet containing only EOS or EOSBS information as the loss of 
essential video data and may thus respond by not displaying some 
subsequent video information.  Since EOS and EOSBS codes do not actually 
affect the decoding of video pictures, they are somewhat unnecessary to 
send at all.  Because of the danger of misinterpretation of the loss of 
such a packet, encoders are generally to be discouraged from sending EOS 
and EOSBS.

Below is an example of a packet containing an EOS code:

 0                   1
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  RR     |1|V|0|0|0|0|0|0|0|0|0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|1|1|1|1|1|0|0|
+-+-+-+-+-+-+-+-+

5.2 Encapsulating Follow-On Packet (P=0)

A Follow-on packet contains a number of bytes of coded H.263+ data which 
does not start at a synchronization point.  That is, a Follow-On packet 
does not start with a Picture, GOB, Slice, EOS, or EOSBS header, and it 
may or may not start at a macroblock boundary.  Since Follow-on packets 
do not start at synchronization points, the data at the beginning of a 
follow-on packet is not independently decodable.  For such packets, P=0 
always.  If the preceding packet of a Follow-on packet got lost, the 
receiver may discard that Follow-on packet as well as all other 
following Follow-on packets.  Better behavior, of course, would be for 
the receiver to scan the interior of the packet payload content to 
determine whether any start codes are found in the interior of the 
packet which can be used as resync points.  The use of an attached copy 
of a picture header for a follow-on packet is useful only if the 
interior of the packet or some subsequent follow-on packet contains a 
resync code such as a GOB or slice start code.  PLEN>0 is allowed, since 
it may allow resync in the interior of the packet.  The decoder may also 
be resynchronized at the next segment or picture packet.

Here is an example of a follow-on packet (with PLEN=0):

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  RR     |0|V|0|0|0|0|0|0|0|0|0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------------+
| bitstream data                                                |
+---------------------------------------------------------------+

6. Use of this payload specification

There is no syntactical difference between a picture segment packet and 
a Follow-on packet, other than the indication P=1 for picture segment or 
sequence ending packets and P=0 for Follow-on packets.  See the 
following for a summary of the entire packet types and ways to 
distinguish between them.

For a more detailed discussion on how to use the payload specification, 
the reader should refer to [8].

It is possible to distinguish between the different packet types by 
checking the P bit and the first 6 bits of the payload along with the 
header information.  The following table shows the packet type for 
permutations of this information (see also the picture/GOB/Slice header 
descriptions in H.263+ for details):

--------------+--------------+----------------------+-------------------
 First 6 bits | P-Bit | PLEN |  Packet              |  Remarks
 of Payload   |(payload hdr.)|                      |
--------------+--------------+----------------------+-------------------
 100000       |   1   |  0   |  Picture             |  Typical Picture
 100000       |   1   | > 0  |  Picture             |  Note UFEP
 1xxxxx       |   1   |  0   |  GOB/Slice/EOS/EOSBS |  See possible GNs
 1xxxxx       |   1   | > 0  |  GOB/Slice           |  See possible GNs
 Xxxxxx       |   0   |  0   |  Follow-on           |
 Xxxxxx       |   0   | > 0  |  Follow-on           |  Interior Resync
--------------+--------------+----------------------+-------------------

See [4] for details regarding the possible values of the six bits (a "1" 
bit followed by a five bit GN field explicit or emulated) of GOB, Slice, 
EOS, and EOSBS codes.

As defined in this specification, every start of a coded frame (as 
indicated by the presence of a PSC) has to be encapsulated as a picture 
segment packet.  If the whole coded picture fits into one packet of 
reasonable size (which is dependent on the connection characteristics), 
this is the only type of packet that needs to be observed.  Due to the 
high compression ratio achieved by H.263+ it is often possible to use 
this mechanism, especially for small spatial picture formats such as 
QCIF and typical Internet packet sizes around 1500 bytes.

If the complete coded frame does not fit into a single packet, two 
different ways for the packetization may be chosen.  In case of very low 
or zero packet loss probability, one or more Follow-on packets may be 
used for coding the rest of the picture.  Doing so leads to minimal 
coding and packetization overhead as well as to an optimal use of the 
maximal packet size, but does not provide any added error resilience.

The alternative is to break the picture into reasonably small partitions 
- called Segments - (by using the Slice or GOB mechanism), that do offer 
synchronization points.  By doing so and using the Picture Segment 
payload with PLEN>0, decoding of the transmitted packets is possible 
even in such cases in which the Picture packet containing the picture 
header was lost (provided any necessary reference picture is available). 
Picture Segment packets can also be used in conjunction with Follow-on 
packets for large segment sizes.

7. Security Considerations

RTP packets using the payload format defined in this specification are 
subject to the security considerations discussed in the RTP 
specification [1], and any appropriate RTP profile (for example [3]).
This implies that confidentiality of the media streams is achieved by 
encryption.  Because the data compression used with this payload format 
is applied end-to-end, encryption may be performed after compression so 
there is no conflict between the two operations.

A potential denial-of-service threat exists for data encodings using 
compression techniques that have non-uniform receiver-end computational 
load.  The attacker can inject pathological datagrams into the stream 
which are complex to decode and cause the receiver to be overloaded.
However, this encoding does not exhibit any significant non-uniformity.

As with any IP-based protocol, in some circumstances a receiver may be 
overloaded simply by the receipt of too many packets, either desired or 
undesired.  Network-layer authentication may be used to discard packets 
from undesired sources, but the processing cost of the authentication 
itself may be too high.  In a multicast environment, pruning of specific 
sources may be implemented in future versions of IGMP [5] and in 
multicast routing protocols to allow a receiver to select which sources 
are allowed to reach it.

A security review of this payload format found no additional 
considerations beyond those in the RTP specification.

8. References

[1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP : A
    Transport Protocol for Real-Time Applications", RFC 1889.

[2] "Video Codec for Audiovisual Services at px64 kbits/s", ITU-T
    Recommendation H.261, 1993.

[3] "RTP Profile for Audio and Video Conference with Minimal Control",
    RFC 1890.

[4] "Video Coding for Low Bitrate Communication", Draft ITU-T
    Recommendation H.263, Draft 20, September 1997.

[5] T. Turletti, C. Huitema, "RTP Payload Format for H.261 Video
    Streams", RFC 2032.

[6] C. Zhu, "RTP Payload Format for H.263 Video Streams", RFC 2190.

[7] S. Wenger, "Video Redundancy Coding in H.263+", Proc. AVSPN97, 
    Aberdeen, U.K..

[8] S. Wenger, G. Knorr, J. Ott: "Error resilience support in H.263 
    V.2", submitted for publication to IEEE T-CSVT, 1997.