INTERNET-DRAFT                  J. McCann, Digital Equipment Corporation
November 6, 1995                                  S. Deering, Xerox PARC



                  Path MTU Discovery for IP version 6

                    draft-ietf-ipngwg-pmtuv6-00.txt



Abstract

   This document describes Path MTU Discovery for IP version 6.  It is
   largely derived from RFC-1191, which describes Path MTU Discovery for
   IP version 4.


Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

   Distribution of this document is unlimited.


Expiration

   May 6, 1996














draft-ietf-ipngwg-pmtuv6-00.txt                                 [Page 1]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


Contents

   Abstract........................................................1

   Status of this Memo.............................................1

   Contents........................................................2

   1. Introduction.................................................3

   2. Protocol overview............................................3

   3. Protocol Requirements........................................4

   4. Implementation suggestions...................................4
   4.1. Layering...................................................5
   4.2. Storing PMTU information...................................5
   4.3. Purging stale PMTU information.............................7
   4.4. TCP layer actions..........................................8
   4.5. Issues for other transport protocols.......................9
   4.6. Management interface......................................10

   5. Security considerations.....................................10

   Acknowledgements...............................................11

   References.....................................................12

   Authors' Addresses.............................................13
























draft-ietf-ipngwg-pmtuv6-00.txt                                 [Page 2]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


1. Introduction

   When one IPv6 node has a large amount of data to send to another
   node, the data is transmitted in a series of IPv6 packets.  It is
   usually preferable that these packets be of the largest size that can
   successfully traverse the path from the source node to the
   destination node.  This packet size is referred to as the Path MTU
   (PMTU), and it is equal to the minimum of the MTUs of the hops in a
   path.  IPv6 defines a standard mechanism for a node to discover the
   PMTU of an arbitrary path.

   A PMTU is associated with a path.  In IPv6, a path is identified by a
   particular combination of source and destination IPv6 addresses, flow
   id, and perhaps IPv6 Routing header information.

   Nodes not implementing Path MTU Discovery use the IPv6 minimum link
   MTU as defined in [IPv6-SPEC] as the maximum packet size.  In most
   cases, this will result in the use of smaller packets than necessary,
   because most paths have a PMTU greater than the IPv6 minimum link
   MTU.  A node sending packets much smaller than the Path MTU allows is
   wasting network resources and probably getting suboptimal throughput.


2. Protocol overview

   This memo describes a technique to dynamically discover the PMTU of a
   path.  The basic idea is that a source node initially assumes that
   the PMTU of a path is the (known) MTU of the first hop in the path.
   If any of the packets sent on that path are too large to be forwarded
   by some router along the path, that router will discard them and
   return ICMPv6 Packet Too Big messages [ICMPv6].  Upon receipt of such
   a message, the source node reduces its assumed PMTU for the path to
   be equal to the MTU of the constricting hop as reported in the Packet
   Too Big message.

   The PMTU discovery process ends when the node's estimate of the PMTU
   is less than or equal to the actual PMTU.  Note that several
   iterations of the packet-sent/Packet-Too-Big-message-received cycle
   may occur before the PMTU discovery process ends, as there may be
   hops with smaller MTUs further along the path.

   Alternatively, the node may elect to end the discovery process by
   ceasing to send packets larger than the IPv6 minimum link MTU.

   The PMTU of a path may change over time, due to changes in the
   routing topology.  Reductions of the PMTU are detected by Packet Too
   Big messages.  To detect increases in a path's PMTU, a node
   periodically increases its assumed PMTU.  This will almost always
   result in packets being discarded and Packet Too Big messages being
   generated, because in most cases the PMTU of the path will not have
   changed.  Therefore, attempts to detect increases in a path's PMTU


draft-ietf-ipngwg-pmtuv6-00.txt                                 [Page 3]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


   should be done infrequently.


3. Protocol Requirements

   When a node receives a Packet Too Big message, it MUST reduce its
   estimate of the PMTU for the relevant path, based on the value of the
   MTU field in the message.  The precise behavior of a node in this
   circumstance is not specified, since different applications may have
   different requirements, and since different implementation
   architectures may favor different strategies.

   After receiving a Packet Too Big message, a node MUST attempt to
   avoid eliciting more such messages in the near future.  The node MUST
   reduce the size of the packets it is sending along the path.  Using a
   PMTU estimate larger than the IPv6 minimum link MTU may continue to
   elicit Packet Too Big messages.  Since each of these messages (and
   the dropped packets they respond to) consume network resources, the
   node MUST force the PMTU Discovery process to end.

   Nodes using PMTU Discovery MUST detect decreases in Path MTU as fast
   as possible.  Nodes MAY detect increases in Path MTU, but because
   doing so requires sending packets larger than the current estimated
   PMTU, and because the likelihood is that the PMTU will not have
   increased, this MUST be done at infrequent intervals.  An attempt to
   detect an increase (by sending a packet larger than the current
   estimate) MUST NOT be done less than 5 minutes after a Packet Too Big
   message has been received for the given path.  The recommended
   setting for this timer is twice its minimum value (10 minutes).

   A node MUST NOT reduce its estimate of the Path MTU below the IPv6
   minimum link MTU [IPv6].

   A node MUST NOT increase its estimate of the Path MTU in response to
   the contents of a Packet Too Big message.  A message purporting to
   announce an increase in the Path MTU might be a stale packet that has
   been floating around in the network, a false packet injected as part
   of a denial-of-service attack, or the result of having multiple paths
   to the destination.


4. Implementation suggestions

   This section discusses how PMTU Discovery may be implemented.  This
   is not a specification, but rather a set of suggestions.

   The issues include:

   - What layer or layers implement PMTU Discovery?

   - Where is the PMTU information cached?


draft-ietf-ipngwg-pmtuv6-00.txt                                 [Page 4]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


   - How is stale PMTU information removed?

   - What must transport and higher layers do?


4.1. Layering

   In the IP architecture, the choice of what size packet to send is
   made by a protocol at a layer above IP.  This memo refers to such a
   protocol as a "packetization protocol".  Packetization protocols are
   usually transport protocols (for example, TCP) but can also be
   higher-layer protocols (for example, protocols built on top of UDP).

   Implementing PMTU Discovery in the packetization layers simplifies
   some of the inter-layer issues, but has several drawbacks: the
   implementation may have to be redone for each packetization protocol,
   it becomes hard to share PMTU information between different
   packetization layers, and the connection-oriented state maintained by
   some packetization layers may not easily extend to save PMTU
   information for long periods.

   It is therefore suggested that the IP layer store PMTU information
   and that the ICMP layer process received Packet Too Big messages.
   The packetization layers may respond to changes in the PMTU, by
   changing the size of the messages they send.  To support this
   layering, packetization layers require a way to learn of changes in
   the value of MMS_S, the "maximum send transport-message size".  The
   MMS_S is derived from the Path MTU by subtracting the size of the
   IPv6 header plus space reserved by the IP layer for additional
   headers (if any).

   It is possible that a packetization layer, perhaps a UDP application
   outside the kernel, is unable to change the size of messages it
   sends.  This may result in a packet size that exceeds the Path MTU.
   To accommodate such situations, IPv6 defines a mechanism that allows
   large payloads to be divided into fragments, with each fragment sent
   in a separate packet (see [IPv6-SPEC] section "Fragment Header").
   However, packetization layers are encouraged to avoid sending
   messages that will require fragmentation (for the case against
   fragmentation, see [FRAG]).


4.2. Storing PMTU information

   In general, each PMTU value learned should be associated with a
   specific path.  A path is identified by a source IPv6 address, a
   destination IPv6 address, a flow id, and possibly IPv6 Routing header
   information.

      Note: Some paths may be further distinguished by different
      security classifications.  The details of such classifications are


draft-ietf-ipngwg-pmtuv6-00.txt                                 [Page 5]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


      beyond the scope of this memo.

   The obvious place to store this association is as a field in the
   routing table entries.  A node will not have a route for every
   possible destination, but it should be able to cache a per-
   destination route for every active destination.  (This requirement is
   already imposed by the need to process ICMP Redirect messages.)

   When the first packet is sent to a destination for which no per-
   destination route exists, a route is chosen from the set of more
   aggregated routes, for example a subnet route or a default route.
   The PMTU fields in these route entries should be initialized to be
   the MTU of the associated first-hop link, and must never be changed
   by the PMTU Discovery process.  (PMTU Discovery only creates or
   changes entries for per-destination routes).  Until a Packet Too Big
   message is received, the PMTU associated with the initially chosen
   route is presumed to be accurate.

   When a Packet Too Big message is received, the ICMP layer determines
   a new estimate for the Path MTU (from the value in the MTU field in
   the Packet Too Big message).  If a per-destination route for this
   path does not exist, then one is created (the new route uses the same
   first-hop router as the current route).  If the PMTU estimate
   associated with the per-destination route is higher than the new
   estimate, then the value in the routing entry is changed.

   The packetization layers must be notified about decreases in the
   PMTU.  Any packetization layer instance (for example, a TCP
   connection) that is actively using the path must be notified if the
   PMTU estimate is decreased.

      Note: even if the Packet Too Big message contains an Original
      Packet Header that refers to a UDP packet, the TCP layer must be
      notified if any of its connections use the given path.

   Also, the instance that sent the packet that elicited the Packet Too
   Big message should be notified that its packet has been dropped, even
   if the PMTU estimate has not changed, so that it may retransmit the
   dropped data.

      Note: An implementation can avoid the use of an asynchronous
      notification mechanism for PMTU decreases by postponing
      notification until the next attempt to send a packet larger than
      the PMTU estimate.  In this approach, when an attempt is made to
      SEND a packet that is larger than the PMTU estimate, the SEND
      function should fail and return a suitable error indication.  This
      approach may be more suitable to a connectionless packetization
      layer (such as one using UDP), which (in some implementations) may
      be hard to "notify" from the ICMP layer.  In this case, the normal
      timeout-based retransmission mechanisms would be used to recover
      from the dropped packets.


draft-ietf-ipngwg-pmtuv6-00.txt                                 [Page 6]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


   It is important to understand that the notification of the
   packetization layer instances using the path about the change in the
   PMTU is distinct from the notification of a specific instance that a
   packet has been dropped.  The latter should be done as soon as
   practical (i.e., asynchronously from the point of view of the
   packetization layer instance), while the former may be delayed until
   a packetization layer instance wants to create a packet.
   Retransmission should be done for only for those packets that are
   known to be dropped, as indicated by a Packet Too Big message.


4.3. Purging stale PMTU information

   Internetwork topology is dynamic; routes change over time.  The PMTU
   discovered for a given destination may be wrong if a new route comes
   into use.  Thus, PMTU information cached by a node can become stale.

   If the stale PMTU value is too large, this will be discovered almost
   immediately once a large enough packet is sent to the given
   destination.  No such mechanism exists for realizing that a stale
   PMTU value is too small, so an implementation should "age" cached
   values.  When a PMTU value has not been decreased for a while (on the
   order of 10 minutes), the PMTU estimate should be set to the MTU of
   the first-hop link, and the packetization layers should be notified
   of the change.  This will cause the complete PMTU Discovery process
   to take place again.

      Note: an implementation should provide a means for changing the
      timeout duration, including setting it to "infinity".  For
      example, nodes attached to an FDDI link which is then attached to
      the rest of the Internet via a small MTU serial line are never
      going to discover a new non-local PMTU, so they should not have to
      put up with dropped packets every 10 minutes.

   An upper layer must not retransmit data in response to an increase in
   the PMTU estimate, since this increase never comes in response to an
   indication of a dropped packet.

   One approach to implementing PMTU aging is to add a timestamp field
   to the routing table entry.  This field is initialized to a
   "reserved" value, indicating that the PMTU has never been changed.
   Whenever the PMTU is decreased in response to a Packet Too Big
   message, the timestamp is set to the current time.

   Once a minute, a timer-driven procedure runs through the routing
   table, and for each entry whose timestamp is not "reserved" and is
   older than the timeout interval:

   - The PMTU estimate is set to the MTU of the first hop link.

   - Packetization layers using this route are notified of the increase.


draft-ietf-ipngwg-pmtuv6-00.txt                                 [Page 7]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


   PMTU estimates may disappear from the routing table if the per-
   destination routes are removed; this can happen in response to an
   ICMPv6 Redirect message, or because certain routing-table daemons
   delete old routes after several minutes.  Also, on a multi-homed node
   a topology change may result in the use of a different source
   interface.  When this happens, if the packetization layer is not
   notified then it may continue to use a cached PMTU value that is now
   too small.  One solution is to notify the packetization layer of a
   possible PMTU change whenever a Redirect message causes a route
   change, and whenever a route is simply deleted from the routing
   table.


4.4. TCP layer actions

   The TCP layer must track the PMTU for the destination of a
   connection; it should not send segments that would result in packets
   larger than the PMTU.  A simple implementation could ask the IP layer
   for this value each time it created a new segment, but this could be
   inefficient.  Moreover, TCP implementations that follow the "slow-
   start" congestion-avoidance algorithm [CONG] typically calculate and
   cache several other values derived from the PMTU.  It may be simpler
   to receive asynchronous notification when the PMTU changes, so that
   these variables may be updated.

   A TCP implementation must also store the MSS value received from its
   peer, and must not send any segment larger than this MSS, regardless
   of the PMTU.  In 4.xBSD-derived implementations, this may require
   adding an additional field to the TCP state record.

   The value sent in the TCP MSS option is independent of the PMTU.
   This MSS option value is used by the other end of the connection,
   which may be using an unrelated PMTU value.  See [IPv6-SPEC] sections
   "Packet Size Issues" and "Maximum Upper-Layer Payload Size" for
   information on selecting a value for the TCP MSS option.

   When a Packet Too Big message is received, it implies that a packet
   was dropped by the router that sent the ICMP message.  It is
   sufficient to treat this as any other dropped segment, and wait until
   the retransmission timer expires to cause retransmission of the
   segment.  If the PMTU Discovery process requires several steps to
   find the PMTU of the full path, this could delay the connection by
   many round-trip times.

   Alternatively, the retransmission could be done in immediate response
   to a notification that the Path MTU has changed, but only for the
   specific connection specified by the Packet Too Big message.  The
   packet size used in the retransmission should, of course, be no
   larger than the new PMTU.

      Note: A packetization layer must not retransmit in response to


draft-ietf-ipngwg-pmtuv6-00.txt                                 [Page 8]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


      every Packet Too Big message, since a burst of several oversized
      segments will give rise to several such messages and hence several
      retransmissions of the same data.  If the new estimated PMTU is
      still wrong, the process repeats, and there is an exponential
      growth in the number of superfluous segments sent!

      This means that the TCP layer must be able to recognize when a
      Packet Too Big notification actually decreases the PMTU that it
      has already used to send a packet on the given connection, and
      should ignore any other notifications.

   Modern TCP implementations incorporate "congestion avoidance" and
   "slow-start" algorithms to improve performance [CONG].  Unlike a
   retransmission caused by a TCP retransmission timeout, a
   retransmission caused by a Packet Too Big message should not change
   the congestion window.  It should, however, trigger the slow-start
   mechanism (i.e., only one segment should be retransmitted until
   acknowledgements begin to arrive again).

   TCP performance can be reduced if the sender's maximum window size is
   not an exact multiple of the segment size in use (this is not the
   congestion window size, which is always a multiple of the segment
   size).  In many systems (such as those derived from 4.2BSD), the
   segment size is often set to 1024 octets, and the maximum window size
   (the "send space") is usually a multiple of 1024 octets, so the
   proper relationship holds by default.  If PMTU Discovery is used,
   however, the segment size may not be a submultiple of the send space,
   and it may change during a connection; this means that the TCP layer
   may need to change the transmission window size when PMTU Discovery
   changes the PMTU value.  The maximum window size should be set to the
   greatest multiple of the segment size that is less than or equal to
   the sender's buffer space size.


4.5. Issues for other transport protocols

   Some transport protocols (such as ISO TP4 [ISOTP]) are not allowed to
   repacketize when doing a retransmission.  That is, once an attempt is
   made to transmit a segment of a certain size, the transport cannot
   split the contents of the segment into smaller segments for
   retransmission.  In such a case, the original segment can be
   fragmented by the IP layer during retransmission.  Subsequent
   segments, when transmitted for the first time, should be no larger
   than allowed by the Path MTU.

   The Sun Network File System (NFS) uses a Remote Procedure Call (RPC)
   protocol [RPC] that, in many cases, sends payloads that must be
   fragmented even for the first-hop link.  This might improve
   performance in certain cases, but it is known to cause reliability
   and performance problems, especially when the client and server are
   separated by routers.


draft-ietf-ipngwg-pmtuv6-00.txt                                 [Page 9]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


   It is recommended that NFS implementations use PMTU Discovery
   whenever routers are involved.  Most NFS implementations allow the
   RPC datagram size to be changed at mount-time (indirectly, by
   changing the effective file system block size), but might require
   some modification to support changes later on.

   Also, since a single NFS operation cannot be split across several UDP
   datagrams, certain operations (primarily, those operating on file
   names and directories) require a minimum payload size that if sent in
   a single packet would exceed the PMTU.  NFS implementations should
   not reduce the payload size below this threshold, even if PMTU
   Discovery suggests a lower value.  (Of course, in this case the
   payload will be fragmented by the IP layer.)


4.6. Management interface

   It is suggested that an implementation provide a way for a system
   utility program to:

   - Specify that PMTU Discovery not be done on a given route.

   - Change the PMTU value associated with a given route.

   The former can be accomplished by associating a flag with the routing
   entry; when a packet is sent via a route with this flag set, the IP
   layer does not send packets larger than the IPv6 minimum link MTU.

   These features might be used to work around an anomalous situation,
   or by a routing protocol implementation that is able to obtain Path
   MTU values.

   The implementation should also provide a way to change the timeout
   period for aging stale PMTU information.


5. Security considerations

   This Path MTU Discovery mechanism makes possible two denial-of-
   service attacks, both based on a malicious party sending false Packet
   Too Big messages to a node.

   In the first attack, the false message indicates a PMTU much smaller
   than reality.  This should not entirely stop data flow, since the
   victim node should never set its PMTU estimate below the IPv6 minimum
   link MTU.  It will, however, result in suboptimal performance.

   In the second attack, the false message indicates a PMTU larger than
   reality.  If believed, this could cause temporary blockage as the
   victim sends packets that will be dropped by some router.  Within one
   round-trip time, the node would discover its mistake (receiving


draft-ietf-ipngwg-pmtuv6-00.txt                                [Page 10]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


   Packet Too Big messages from that router), but frequent repetition of
   this attack could cause lots of packets to be dropped.  A node,
   however, should never raise its estimate of the PMTU based on a
   Packet Too Big message, so should not be vulnerable to this attack.

   A malicious party could also cause problems if it could stop a victim
   from receiving legitimate Packet Too Big messages, but in this case
   there are simpler denial-of-service attacks available.


Acknowledgements

   We would like to acknowledge the authors of and contributors to
   [RFC-1191], from which the majority of this document was derived.







































draft-ietf-ipngwg-pmtuv6-00.txt                                [Page 11]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


References

   [CONG]      Van Jacobson.  Congestion Avoidance and Control.  Proc.
               SIGCOMM '88 Symposium on Communications Architectures and
               Protocols, pages 314-329.  Stanford, CA, August, 1988.

   [FRAG]      C. Kent and J. Mogul.  Fragmentation Considered Harmful.
               In Proc. SIGCOMM '87 Workshop on Frontiers in Computer
               Communications Technology.  August, 1987.

   [ICMPv6]    A. Conta and S. Deering, "Internet Control Message
               Protocol (ICMPv6) for the Internet Protocol Version 6
               (IPv6) Specification", June 1995
               <draft-ietf-ipngwg-icmp-02.txt>

   [IPv6-SPEC] S. Deering and R. Hinden, "Internet Protocol Version 6
               [IPv6] Specification" Internet Draft, June 1995
               <draft-ietf-ipngwg-ipv6-spec-02.txt>

   [ISOTP]     ISO.  ISO Transport Protocol Specification: ISO DP 8073.
               RFC 905, SRI Network Information Center, April, 1984.

   [RFC-1191]  J. Mogul and S. Deering, "Path MTU Discovery",
               November 1990

   [RPC]       Sun Microsystems, Inc.  RPC: Remote Procedure Call
               Protocol.  RFC 1057, SRI Network Information Center,
               June, 1988.

























draft-ietf-ipngwg-pmtuv6-00.txt                                [Page 12]


INTERNET-DRAFT      draft-ietf-ipngwg-pmtuv6-00.txt     November 6, 1995


Authors' Addresses

    Jack McCann
    Digital Equipment Corporation
    110 Spitbrook Road, ZKO3-3/U14
    Nashua, NH 03062
    Phone: +1 603 881 2608
    Fax:   +1 603 881 0120
    Email: mccann@zk3.dec.com

    Stephen E. Deering
    Xerox Palo Alto Research Center
    3333 Coyote Hill Road
    Palo Alto, CA 94304
    Phone: +1 415 812 4839
    Fax:   +1 415 812 4471
    Email: deering@parc.xerox.com


Expiration

    May 6, 1996































draft-ietf-ipngwg-pmtuv6-00.txt                                [Page 13]