Internet-Draft Path MTU Option March 2020
Hinden & Fairhurst Expires 10 September 2020 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-ietf-6man-mtu-option-02
Published:
Intended Status:
Experimental
Expires:
Authors:
R. Hinden
Check Point Software
G. Fairhurst
University of Aberdeen

IPv6 Minimum Path MTU Hop-by-Hop Option

Abstract

This document specifies a new Hop-by-Hop IPv6 option that is used to record the minimum Path MTU along the forward path between a source host to a destination host. This collects a minimum recorded MTU along the path to the destination. The value can then be communicated back to the source using the return Path MTU field in the option.

This Hop-by-Hop option is intended to be used in environments like Data Centers and on paths between Data Centers, to allow them to better take advantage of paths able to support a large Path MTU. The method could also be useful in other environments, including the general Internet.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 10 September 2020.

1. Introduction

This draft proposes a new Hop-by-Hop Option to be used to record the minimum MTU along the forward path between the source and destination hosts. The source host creates a packet with this Hop-by-Hop Option and fills the Reported PMTU Field in the option with the value of the MTU for the outbound link that will be used to forward the packet towards the destination.

At each subsequent hop where the option is processed, the router compares the value of the Reported PMTU in the option and the MTU of its outgoing link. If the MTU of the outgoing link is less than the Reported PMTU specified in the option, it rewrites the value in the Option Data with the smaller value. When the packet arrives at the destination host, the destination host can send the minimum reported PMTU value back to the source host using the Return PMTU field in the option.

The figure below can be used to illustrate the operation of the method. In this case, the path between the source and destination hosts comprises three links, the sender has a link MTU of size MTU-S, the link between routers R1 and R2 has an MTU of size 9000 bytes, and the final link to the destination has an MTU of size MTU-D.


   +--------+         +----+        +----+         +-------+
   |        |         |    |        |    |         |       |
   | Sender +---------+ R1 +--------+ R2 +-------- + Dest. |
   |        |         |    |        |    |         |       |
   +--------+  MTU-S  +----+  9000B +----+  MTU-D  +-------+

The scenarios are described:

Scenario 1, considers all links to have an 9000 byte MTU and the method is supported by both routers.

Scenario 2, considers the link to the destination host (MTU-D) to have an MTU of 1500 bytes. This is the smallest MTU, router R2 resets the reported PMTU to 1500 bytes and this is detected by the method. Had there been another smaller MTU at a link further along the path that supports the method, the lower PMTU would also have been detected.

Scenario 3, considers the case where the router preceding the smallest link does not support the method, and the method then fails to detect the actual PMTU. These scenarios are summarized in the table below. In this scenario, the lower PMTU would also fail to be detected had PMTUD been used and an ICMPv6 PTB message had not been delivered to the sender.


   +-+-----+-----+----+----+----------+-----------------------+
   | |MTU-S|MTU-D| R1 | R2 | Rec PMTU | Note                  |
   +-+-----+-----+----+----+----------+-----------------------+
   |1|9000B|9000B| H  | H  |  9000 B  | Endpoints attempt to  |
   |       |     |    |    |          | use an 9000 B PMTU.   |
   +-+-----+-----+----+----+----------+-----------------------+
   |2|9000B|1500B| H  | H  |  1500 B  | Endpoints attempt to  |
   | |     |     |    |    |          | use a 1500 B PMTU.    |
   +-+-----+-----+----+----+----------+-----------------------+
   |3|9000B|1500B| H  | -  |  9000 B  | Endpoints attempt to  |
   | |     |     |    |    |          | use an 9000 B PMTU,   |
   | |     |     |    |    |          | but need to implement |
   | |     |     |    |    |          | a method to fall back |
   | |     |     |    |    |          | use a 1500 B PMTU.    |
   +-+-----+-----+----+----+----------+-----------------------+

IPv6 as specified in [RFC8200] allows nodes to optionally process Hop-by-Hop headers. Specifically from Section 4:

  • The Hop-by-Hop Options header is not inserted or deleted, but may be examined or processed by any node along a packet's delivery path, until the packet reaches the node (or each of the set of nodes, in the case of multicast) identified in the Destination Address field of the IPv6 header. The Hop-by-Hop Options header, when present, must immediately follow the IPv6 header. Its presence is indicated by the value zero in the Next Header field of the IPv6 header.
  • NOTE: While [RFC2460] required that all nodes must examine and process the Hop-by-Hop Options header, it is now expected that nodes along a packet's delivery path only examine and process the Hop-by-Hop Options header if explicitly configured to do so.

The Hop-by-Hop Option defined in this document is designed to take advantage of this property of how Hop-by-Hop options are processed. Nodes that do not support this Option SHOULD ignore them. This can mean that the value returned in the response message does not account for all links along a path.

2. Motivation and Problem Solved

The current state of Path MTU Discovery on the Internet is problematic. The problems with the mechanisms defined in [RFC8201] are known to not work well in all environments. Nodes in the middle of the network may not send ICMP Packet Too Big messages or they are rate limited to the point of not making them a useful mechanism.

This results in many transport connections defaulting to 1280 bytes and makes it very difficult to take advantage of links with a larger MTU where they exist. Applications that need to send large packets (e.g., using UDP) are forced to use IPv6 Fragmentation [RFC8200].

Transport encapsulations and network-layer tunnels reduce the PMTU available for a transport to use. For example, Network Virtualization Using Generic Routing Encapsulation (NVGRE) [RFC7637] encapsulates L2 packets in an outer IP header and does not allow IP Fragmentation.

The potential of multi-gigabit Ethernet will not be realized if the packet size is limited to 1280 bytes, because this exceeds the packet per second rate that most nodes can send. For example, the packet per second rate required to reach wire speed on a 10G Ethernet link with 1280 byte packets is about 977K packets per second (pps), vs. 139K pps for 9000 byte packets. A significant difference.

The purpose of the this draft is to improve the situation by defining a mechanism that does not rely on nodes in the middle of the network to send ICMPv6 Packet Too Big messages, instead it provides the destination host information on the minimum Path MTU and it can send this information back to the source host. This is expected to work better than the current RFC8201 based mechanisms.

3. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

4. Applicability Statements

This Hop-by-Hop Option header is intended to be used in environments such as Data Centers and on paths between Data Centers, to allow them to better take advantage of a path that is able to support a large PMTU. For example, it helps inform a sender that the path includes links that have a MTU of 9000 bytes. This has many performance advantages compared to the current practice of limiting packets to 1280 bytes.

The design of the option is sufficiently simple that it could be executed on a router's fast path. To create critical mass for this to happen will have to be a strong pull from router vendors customers. This could be the case for connections within and between Data Centers.

The method could also be useful in other environments, including the general Internet.

5. IPv6 Minimum Path MTU Hop-by-Hop Option

The Minimum Path MTU Hop-by-Hop Option has the following format:


 Option    Option    Option
  Type    Data Len   Data
+--------+--------+--------+--------+---------+-------+-+
|BBCTTTTT|00000100|     Min-PMTU    |     Rtn-PMTU    |R|
+--------+--------+--------+--------+---------+-------+-+

  Option Type:

  BB     00   Skip over this option and continue processing.

  C       1   Option data can change en route to the packet's final
              destination.

  TTTTT 10000 Option Type assigned from IANA [IANA-HBH].

  Length:  4  The size of the each value field in Option Data
              field supports Path MTU values from 0 to 65,535 octets.

  Min-PMTU: n 16-bits.  The minimum PMTU in octets, reflecting the
              smallest link MTU that the packet experienced across
              the path.  This is called the Reported PMTU.  A value
              less than the IPv6 minimum link MTU [RFC8200]
              should be ignored.

  Rtn-PMTU: n 15-bits.  The returned mimimum PMTU, carrying the 15
              most significant bits of the latest received Min-PMTU
              field.  The value zero means that no Reported MTU is
              being returned.

  R        n  1-bit.  R-Flag.   Set by the source to signal that
              the destination should include the received
              Reported PMTU in Rtn-PMTU field.

NOTE: The encoding of the final two octets (Rtn-PMTU and R-Flag) could be implemented by a mask of the latest received Min-MTU value with 0xFFFE, discarding the right-most bit and then performing a logical 'OR' with the R-Flag value of the sender.

6. Router, Host, and Transport Behaviors

6.1. Router Behaviour

Routers that do not support Hop-by-Hop options SHOULD ignore this option and SHOULD forward the packet.

Routers that support Hop-by-Hop Options, but do not recognize this option SHOULD ignore the option and SHOULD forward the packet.

Routers that recognize this option SHOULD compare the Reported PMTU in the Min-PMTU field and the MTU configured for the outgoing link. If the MTU of the outgoing link is less than the Reported PMTU, the router rewrites the Reported PMTU in the Option to use the smaller value.

The router MUST ignore and not change the Rtn-PMTU field and R-Flag in the option.

Discussion:

  • The design of this Hop-by-Hop Option makes it feasible to be implemented within the fast path of a router, because the required processing is simple.

6.2. Host Behavior

The source host that supports this option SHOULD create a packet with this Hop-by-Hop Option and fill the Min-PMTU field of the option with the MTU of configured for the link over which it will send the packet on the next hop towards the destination.

The source host may request that the destination host return the received minimum MTU value by setting the R-Flag in the option. This will cause the destination host to include a PMTU option in an outgoing packet.

Discussion:

  • This option does not need to be sent in all packets belonging to a flow. A transport protocol (or packetization layer [I-D.ietf-tsvwg-datagram-plpmtud]) can set this option only on specific packets used to test the path.
  • In the case of TCP, the option could be included in packets carrying a SYN segment as part of the connection set up, or can periodically be sent in packets carrying other segments. Including this packet in a SYN could increase the probability that SYN segment is lost, when routers on the path drop packets with this option.
  • Including this option in a large packet (e.g., greater than the present PMTU) is not likely to be useful, since the large packet might itself also be dropped by a link along the path with a smaller MTU, preventing the Reported PMTU information from reaching the destination host.
  • The use with datagram transport protocols (e.g., UDP) is harder to characterize because applications using datagram transports range from very short-lived (low data-volume applications) exchanges, to longer (bulk) exchanges of packets between the source and destination hosts [RFC8085].
  • For applications that use Anycast, this option should be included in all packets as the actual destination will vary due to the nature of Anycast.
  • Simple-exchange protocols (i.e low data-volume applications [RFC8085] that only send one or a few packets per transaction, could be optimized by assuming that the Path MTU is symmetrical, that is where the Path MTU is the same in both directions, or at least not smaller in the return path. This optimisation does not hold when the paths are not symmetric.
  • The use of this option with DNS and DNSSEC over UDP ought to work as long as the paths are symmetric. The DNS server will learn the Path MTU from the DNS query messages. If the return Path MTU is smaller, then the large DNSSEC response may be dropped and the known problems with PMTUD will occur. DNS and DNSSEC over transport protocols that can carry the Path MTU should work.

The source host can request the destination host to send a packet carrying the PMTU Option using the R-Flag.

A destination host SHOULD respond to each packet received with the R-Flag set, by setting the PMTU Option in the next packet that it sends to the source host by the same upper layer protocol instance.

The upper layer protocol MAY generate a packet when any of these conditions are met when the R Flag is set in the PMTU Option and either:

  • It is the first Reported PMTU value it has received from the source.
  • The Reported PMTU value is lower than previously received.

The R-Flag SHOULD NOT be set when the PMTU Option was sent solely to carry the feedback of a Reported PMTU.

The PMTU Option sent back to the source SHOULD contain the outgoing link MTU in Min-PMTU field and SHOULD set the last Received PMTU in the Rtn-PMTU field. If these values are not present the field MUST be set to zero.

For a connection-oriented upper layer protocol, this could be implemented by saving the value of the last received option within the connection context. This last received value is then used to set the return Path MTU field for all packets belonging to this flow that carry the IPv6 Minimum Path MTU Hop-by-Hop Option.

A connection-less protocol (e.g., based on UDP), requires the application to be updated to cache the Received PMTU value, and to ensure that this corresponding value is used to set the last Received PMTU in the Rtn-PMTU field of any PMTU Option that it sends.

NOTE: The Rtn-PMTU value is specific to the instance of the upper layer protocol (i.e., matching the IPv6 flow ID, port-fields in UDP or the SPI in IPsec, etc), not the protocol itself, because network devices can make forwarding decisions that impact the PMTU based on the presence and values of these upper layer fields, and therefore these fields need to correspond to those of the packets for the flow received by the destination host set to ensure feedback is provided to the corresponding source host.

NOTE: An upper layer protocol that sends packets from the destination host towards the source host less frequently than the destination host receives packets from the source host, provides less frequent feedback of the received Min-PMTU value. However, it will always needs to send the most recent value.

Discussion:

  • A simple mechanism could only send an MTU Option with the Rtn-PMTU field filled in the first time this option is received or when the Received PMTU is reduced. This is good because it limits the number sent, but there is no provision for retransmission of the PMTU Option fails to reach the sender, or the sender looses state.
  • The Reported PMTU value could increase or decrease over time. For instance, it would increase when the path changes and the packets become then forwarded over a link with a MTU larger than the link previously used.

6.3. Transport Behavior

An upper layer protocol (e.g., transport endpoint) using this option needs to use a method to verify the information provided by this option.

The Received PMTU does not necessarily reflect the actual PMTU between the sender and destination. Care therefore needs to be exercised in using this value at the sender. Specifically:

  • If the Received PMTU value returned by the destination is the same as the initial Reported PMTU value, there could still be a router or layer 2 device on the path that does not support this PMTU. The usable PMTU therefore needs to be confirmed.
  • If the Received PMTU value returned by the destination is smaller than the initial Reported PMTU value, this is an indication that there is at least one router in the path with a smaller MTU. There could still be another router or layer 2 device on the path that does not support this MTU.
  • If the Received PMTU value returned by the destination is larger than the initial Reported PMTU value, this may be a corrupted, delayed or mis-ordered response, and SHOULD be ignored.

A sender needs to discriminate between the Received PMTU value in a PTB message generated in response to a Hop-by-Hop option requesting this, and a PTB message received from a router on the path.

A PMTUD or PLPMTUD method could use the Received PMTU value as an initial target size to probe the path. This can significantly decrease the number of probe attempts (and hence time taken) to arrive at a workable PMTU. It has the potential to complete discovery of the correct value in a single Round Trip Time (RTT), even over paths that may have successive links configured with lower MTUs.

Since the method can delay notification of an increase in the actual PMTU, a sender with a link MTU larger than the current PMTU SHOULD periodically probe for a PMTU value that is larger than the Received PMTU value. This specification does not define an interval for the time between probes.

Since the option consumes less capacity than an a full probe packet, there may be advantage in using this to detect a change in the path characteristics.

NOTE: Further details to be included in next version.

NOTE: A future version of the document will consider more the impact of Equal Cost Multipath (ECMP) [RFC6438]. Specifically, whether a Received PMTU value should be maintained by the method for each transport endpoint, or for each network address, and how these are best used by methods such as PLPMTUD or DPLPMTUD.

7. IANA Considerations

No IANA actions are requested in this document.

Earlier IANA assigned and registered a new IPv6 Hop-by-Hop Option type from the "Destination Options and Hop-by-Hop Options" registry [IANA-HBH]. This assignment is shown in Section 5.

8. Security Considerations

The method has no way to protect the destination from off-path attack using this option in packets that do not originate from the source. If the Rtn-PMTU value is used directly to update the PMTU, this attack could cause the receiver to inflate or reduce the size of the reported PMTU. The attack can be mitigated in DPLPMTUD [I-D.ietf-tsvwg-datagram-plpmtud] when the Rtn-PMTU value is used to trigger a rate-limited probe first confirms that a packet with the size Rtn-PMTU value can use the current path, before the PMTU is updated.

The method solicits a response from the destination, which should be used to generate a response to the IPv6 host originating the option packet. A malicious attacker could generate a packet to the destination for a previously inactive flow or one that advertises a change in the size of the MTU for an active flow. This would create additional work at the destination, and could induce creation of state when a new flow is created. It could potentially result in additional traffic on the return path to the sender, which could be mitigated by limiting the rate at which responses are generated.

TBD

9. Acknowledgments

A somewhat similar mechanism was proposed for IPv4 in 1988 in [RFC1063] by Jeff Mogul, C. Kent, Craig Partridge, and Keith McCloghire. It was later obsoleted in 1990 by [RFC1191] the current deployed approach to Path MTU Discovery.

Helpful comments were received from Tom Herbert, Tom Jones, Fred Templin, Ole Troan, [Your name here], and other members of the 6MAN working group.

10. Change log [RFC Editor: Please remove]

draft-ietf-6man-mtu-option-02, 2020-March-9

draft-ietf-6man-mtu-option-01, 2019-September-13

  • Changes to show IANA assigned code point.
  • Editorial changes to make text and terminology more consistent.
  • Added a reference to RFC8200 in Section 2 and a reference to RFC6438 in Section 6.3.

draft-ietf-6man-mtu-option-00, 2019-August-9

  • First 6man w.g. draft version.
  • Changes to request IANA allocation of code point.
  • Editorial changes.

draft-hinden-6man-mtu-option-02, 2019-July-5

  • Changed option format to also include the Returned MTU value and Return flag and made related text changes in Section 6.2 to describe this behaviour.
  • ICMP Packet Too Big messages are no longer used for feedback to the source host.
  • Added to Acknowledgements Section that a similar mechanism was proposed for IPv4 in 1988 in [RFC1063].
  • Editorial changes.

draft-hinden-6man-mtu-option-01, 2019-March-05

  • Changed requested status from Standards Track to Experimental to allow use of experimental option type (11110) to allow for experimentation. Removed request for IANA Option assignment.
  • Added Section 2 "Motivation and Problem Solved" section to better describe what the purpose of this document is.
  • Added Appendix A describing planned experiments and how the results will be measured.
  • Editorial changes.

draft-hinden-6man-mtu-option-00, 2018-Oct-16

  • Initial draft.

11. References

11.1. Normative References

[IANA-HBH]
"Destination Options and Hop-by-Hop Options", , <https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC8200]
Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", STD 86, RFC 8200, DOI 10.17487/RFC8200, , <https://www.rfc-editor.org/info/rfc8200>.
[RFC8201]
McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., "Path MTU Discovery for IP version 6", STD 87, RFC 8201, DOI 10.17487/RFC8201, , <https://www.rfc-editor.org/info/rfc8201>.

11.2. Informative References

[I-D.ietf-tsvwg-datagram-plpmtud]
Fairhurst, G., Jones, T., Tuexen, M., Ruengeler, I., and T. Voelker, "Packetization Layer Path MTU Discovery for Datagram Transports", Work in Progress, Internet-Draft, draft-ietf-tsvwg-datagram-plpmtud-16, , <https://tools.ietf.org/html/draft-ietf-tsvwg-datagram-plpmtud-16>.
[RFC1063]
Mogul, J., Kent, C., Partridge, C., and K. McCloghrie, "IP MTU discovery options", RFC 1063, DOI 10.17487/RFC1063, , <https://www.rfc-editor.org/info/rfc1063>.
[RFC1191]
Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, DOI 10.17487/RFC1191, , <https://www.rfc-editor.org/info/rfc1191>.
[RFC2460]
Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, , <https://www.rfc-editor.org/info/rfc2460>.
[RFC6438]
Carpenter, B. and S. Amante, "Using the IPv6 Flow Label for Equal Cost Multipath Routing and Link Aggregation in Tunnels", RFC 6438, DOI 10.17487/RFC6438, , <https://www.rfc-editor.org/info/rfc6438>.
[RFC7637]
Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network Virtualization Using Generic Routing Encapsulation", RFC 7637, DOI 10.17487/RFC7637, , <https://www.rfc-editor.org/info/rfc7637>.
[RFC8085]
Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, , <https://www.rfc-editor.org/info/rfc8085>.

Appendix A. Planned Experiments

TBD

This section will describe a set of experiments planned for the use of the option defined in this document. There are many aspects of the design that require experimental data or experience to evaluate this experimental specification.

This includes experiments to understand the pathology of packets sent with the specified option to determine the likelihood that they are lost within specific types of network segment.

This includes consideration of the cost and alternatives for providing the feedback required by the mechanism and how to effectively limit the rate of transmission.

This includes consideration of the potential for integration in frameworks such as that offered by DPLPMTUD.

There are also security-related topics to be understood as described in the Security Considerations (Section 8).

Authors' Addresses

Robert M. Hinden
Check Point Software
959 Skyway Road
San Carlos, CA 94070
United States of America
Godred Fairhurst
University of Aberdeen
School of Engineering
Fraser Noble Building
Aberdeen
AB24 3UE
United Kingdom