FTP data compression
RFC 468

Document Type RFC - Unknown (March 1973; No errata)
Last updated 2013-03-02
Stream Legacy
Formats plain text html pdf htmlized bibtex
Stream Legacy state (None)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state RFC 468 (Unknown)
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                         R.  Braden
Request for Comment: 468                                        UCLA/CCN
NIC: 14742                                                 March 8, 1973

                          FTP DATA COMPRESSION

I.  INTRODUCTION

APOLOGIA

   Major design objectives of the proposed File Transfer Protocol (FTP)
   are reliability and efficiency for transmission of large files.
   Efficiency has two faces: efficiency of the host CPU's, and efficient
   use of the Network bandwidth.  Block mode is intended to minimize CPU
   overhead for bandwidth efficiency, there is a mode called "HASP" in
   RFC 454.  The "HASP" mode of FTP is really transmission with data
   compression, i.e., an encoding scheme to reduce the information
   redundancy in the messages.

   RFC 454 contains no explicit definition of the "HASP" or compressed
   mode, but instead notes that a future RFC by yours truly will define
   the mode.  Students of FTP may find this scarcely credible, but you
   are now reading the promised RFC.  It turned out to be much farther
   in the future than any of us expected.  Mea Culpa.

GENERAL CONSIDERATIONS

   In the early years of the Network, its major uses have been remote
   terminal interactions and the small-to-medium-sized file transmission
   typical of remote job entry.  As facilities such as the Illiac IV and
   the Data Machine become operational on the Network, and the Network
   community begins to include users with heavy data transmission
   requirements, large file transmission will become a major mode of
   Network use.  For example, one user of CCN expects to send 2 x 10**8
   bits of data _each_ _day_ over the Network.

   Local byte compression of the type proposed here is particular
   effective for reducing the size of "printer" files such as those
   transmitted under the Network RJE protocol.  Experience with CCN's
   RJS service has shown a typical compression of print files by a
   factor of between two and three.  Since FTP was intended to contain
   the data transfer part of Network RJE protocol as a subset, it is
   appropriate to include a print file compression mechanism in FTP.
   These considerations led the FTP committee to include a compressed
   mode within FTP.

Braden                                                          [Page 1]
RFC 468                   FTP Data Compression                March 1973

   The two main arguments for data compression are economics and
   convenience (usability).  Consider first economics, which is
   essentially a trade-off between CPU time and transmission costs.  Of
   course, as long as Network use is a free commodity, the economics of
   data compression are all bad.  That happy state won't last forever.
   What does data compression cost?

   Let us consider only simple linear compression schemes, such as the
   one proposed here.  By linear, I mean that the CPU time to examine a
   source record is proportional to number of bytes in the record.  A
   simple linear scheme could detect repeated single characters, for
   example.  One could imagine quadratic schemes, which detected
   repeated substrings; but except for possible special circumstance
   where the source stings have some structure known to the compression
   algorithm, the CPU economics don't favor quadratic compression.

   Assuming a reasonable figure for large-scale CPU costs in the
   generation of CCN's 360/91, we concluded that an upper bound on CPU
   costs for total compression and decompression would be 5 cents per
   megabit; this is based on very loose coding of a simple linear
   algorithm.  This may be compared with the projected Network
   transmission costs of over 30 cents per megabit (possibly a lot
   over).

   Thus, the CPU time to conserve bandwidth costs significantly less
   than the bandwidth saved.  Both CPU costs and bandwidth costs are
   trending downward, but it seems exceedingly unlikely that the ratio
   of CPU cost to bandwidth cost for linear compression will reverse in
   the next few years.  On the other hand, this calculation clearly
   discourages one from using quadratic compression.

WHY HASP

   CCN's batch remote job entry protocol NETRJS (see RFC #189, July 15,
   1971) was designed to include two data transfer modes, truncated and
   compressed.  The NETRJS truncated mode is essentially identical to
   current FTP block mode record structure (except for minor bit format
   differences).  The compressed mode of NETRJS uses an adaptation of
   the particular compression scheme which is incorporated in the
   "Multileaving protocol" of the binary synchronous rje support in
   IBM's HASP system.

   Although it isn't really necessary for the purpose of defining a
   compression scheme in FTP, I have included an appendix summarizing
   very briefly the nature of HASP and its rje package.  That appendix
   may be considered cultural enrichment for those in the Network
Show full document text