Generic UDP Encapsulation
draft-ietf-intarea-gue-02
The information below is for an old version of the document.
Document | Type |
This is an older version of an Internet-Draft whose latest revision state is "Expired".
|
|
---|---|---|---|
Authors | Tom Herbert , Lucy Yong , Osama Zia | ||
Last updated | 2017-04-26 (Latest revision 2017-03-13) | ||
Replaces | draft-ietf-nvo3-gue | ||
RFC stream | Internet Engineering Task Force (IETF) | ||
Formats | |||
Reviews |
TSVART Early review
(of
-07)
by David Black
On the Right Track
INTDIR Early review
(of
-06)
by Charles Perkins
Almost ready
|
||
Additional resources | Mailing list discussion | ||
Stream | WG state | WG Document | |
Document shepherd | (None) | ||
IESG | IESG state | I-D Exists | |
Consensus boilerplate | Unknown | ||
Telechat date | (None) | ||
Responsible AD | (None) | ||
Send notices to | (None) |
draft-ietf-intarea-gue-02
the requirements for using zero IPv6 UDP checksums in [RFC6935] and [RFC6936] MUST be met. When a decapsulator receives a packet, the UDP checksum field MUST be processed. If the UDP checksum is non-zero, the decapsulator MUST verify the checksum before accepting the packet. By default a decapsulator MUST only accept UDP packets with a zero checksum if the GUE header checksum is used and is verified. If verification of a non-zero checksum fails, a decapsulator lacks the capability to verify a non-zero checksum, or a packet with a zero-checksum and no GUE header checksum was received, the packet MUST be dropped. 5.8. MTU and fragmentation Standard conventions for handling of MTU (Maximum Transmission Unit) and fragmentation in conjunction with networking tunnels (encapsulation of layer 2 or layer 3 packets) SHOULD be followed. Details are described in MTU and Fragmentation Issues with In-the- Network Tunneling [RFC4459]. If a packet is fragmented before encapsulation in GUE, all the related fragments MUST be encapsulated using the same UDP source port. An operator SHOULD set MTU to account for encapsulation overhead and reduce the likelihood of fragmentation. Alternative to IP fragmentation, the GUE fragmentation extension can be used. GUE fragmentation is described in [GUEEXTENS]. 5.9. Congestion control Per requirements of [RFC5405], if the IP traffic encapsulated with GUE implements proper congestion control no additional mechanisms should be required. In the case that the encapsulated traffic does not implement any or sufficient control, or it is not known whether a transmitter will consistently implement proper congestion control, then congestion control at the encapsulation layer MUST be provided per [RFC5405]. Note that this case applies to a significant use case in network virtualization in which guests run third party networking stacks that cannot be implicitly trusted to implement conformant congestion control. Out of band mechanisms such as rate limiting, Managed Circuit Breaker [CIRCBRK], or traffic isolation MAY be used to provide rudimentary congestion control. For finer-grained congestion control that allows alternate congestion control algorithms, reaction time within an RTT, and interaction with ECN, in-band mechanisms might be Herbert, Yong, Zia Expires October, 2017 [Page 21] Internet Draft Generic UDP Encapsulation April 26, 2017 warranted. 5.10. Multicast GUE packets can be multicast to decapsulators using a multicast destination address in the encapsulating IP headers. Each receiving host will decapsulate the packet independently following normal decapsulator operations. The receiving decapsulators need to agree on the same set of GUE parameters and properties; how such an agreement is reached is outside the scope of this document. GUE allows encapsulation of unicast, broadcast, or multicast traffic. Flow entropy (the value in the UDP source port) can be generated from the header of encapsulated unicast or broadcast/multicast packets at an encapsulator. The mapping mechanism between the encapsulated multicast traffic and the multicast capability in the IP network is transparent and independent of the encapsulation and is otherwise outside the scope of this document. 5.11. Flow entropy for ECMP 5.11.1. Flow classification A major objective of using GUE is that a network device can perform flow classification corresponding to the flow of the inner encapsulated packet based on the contents in the outer headers. Hardware devices commonly perform hash computations on packet headers to classify packets into flows or flow buckets. Flow classification is done to support load balancing of flows across a set of networking resources. Examples of such load balancing techniques are Equal Cost Multipath routing (ECMP), port selection in Link Aggregation, and NIC device Receive Side Scaling (RSS). Hashes are usually either a three-tuple hash of IP protocol, source address, and destination address; or a five-tuple hash consisting of IP protocol, source address, destination address, source port, and destination port. Typically, networking hardware will compute five- tuple hashes for TCP and UDP, but only three-tuple hashes for other IP protocols. Since the five-tuple hash provides more granularity, load balancing can be finer-grained with better distribution. When a packet is encapsulated with GUE and connection semantics are not applied, the source port in the outer UDP packet is set to a flow entropy value that corresponds to the flow of the inner packet. When a device computes a five-tuple hash on the outer UDP/IP header of a GUE packet, the resultant value classifies the packet per its inner flow. Herbert, Yong, Zia Expires October, 2017 [Page 22] Internet Draft Generic UDP Encapsulation April 26, 2017 Examples of deriving flow entropy for encapsulation are: o If the encapsulated packet is a layer 4 packet, TCP/IPv4 for instance, the flow entropy could be based on the canonical five- tuple hash of the inner packet. o If the encapsulated packet is an AH transport mode packet with TCP as next header, the flow entropy could be a hash over a three-tuple: TCP protocol and TCP ports of the encapsulated packet. o If a node is encrypting a packet using ESP tunnel mode and GUE encapsulation, the flow entropy could be based on the contents of the clear-text packet. For instance, a canonical five-tuple hash for a TCP/IP packet could be used. [RFC6438] discusses methods to compute and set flow entropy value for IPv6 flow labels. Such methods can also be used to create flow entropy values for GUE. 5.11.2. Flow entropy properties The flow entropy is the value set in the UDP source port of a GUE packet. Flow entropy in the UDP source port SHOULD adhere to the following properties: o The value set in the source port is within the ephemeral port range (49152 to 65535 [RFC6335]). Since the high order two bits of the port are set to one, this provides fourteen bits of entropy for the value. o The flow entropy has a uniform distribution across encapsulated flows. o An encapsulator MAY occasionally change the flow entropy used for an inner flow per its discretion (for security, route selection, etc). To avoid thrashing or flapping the value, the flow entropy used for a flow SHOULD NOT change more than once every thirty seconds (or a configurable value). o Decapsulators, or any networking devices, SHOULD NOT attempt to interpret flow entropy as anything more than an opaque value. Neither should they attempt to reproduce the hash calculation used by an encapasulator in creating a flow entropy value. They MAY use the value to match further receive packets for steering decisions, but MUST NOT assume that the hash uniquely or permanently identifies a flow. Herbert, Yong, Zia Expires October, 2017 [Page 23] Internet Draft Generic UDP Encapsulation April 26, 2017 o Input to the flow entropy calculation is not restricted to ports and addresses; input could include flow label from an IPv6 packet, SPI from an ESP packet, or other flow related state in the encapsulator that is not necessarily conveyed in the packet. o The assignment function for flow entropy SHOULD be randomly seeded to mitigate denial of service attacks. The seed SHOULD be changed periodically. 5.12 Negotiation of acceptable flags and extension fields An encapsulator and decapsulator need to achieve agreement about GUE parameters will be used in communications. Parameters include GUE version, flags and extension fields that can be used, security algorithms and keys, supported protocols and control messages, etc. This document proposes different general methods to accomplish this, however the details of implementing these are considered out of scope. General methods for this are: o Configuration. The parameters used for a tunnel are configured at each endpoint. o Negotiation. A tunnel negotiation can be performed. This could be accomplished in-band of GUE using control messages or private data. o Via a control plane. Parameters for communicating with a tunnel endpoint can be set in a control plane protocol (such as that needed for nvo3). o Via security negotiation. Use of security typically implies a key exchange between endpoints. Other GUE parameters may be conveyed as part of that process. 6. Motivation for GUE This section presents the motivation for GUE with respect to other encapsulation methods. 6.1. Benefits of GUE * GUE is a generic encapsulation protocol. GUE can encapsulate protocols that are represented by an IP protocol number. This includes layer 2, layer 3, and layer 4 protocols. * GUE is an extensible encapsulation protocol. Standardized Herbert, Yong, Zia Expires October, 2017 [Page 24] Internet Draft Generic UDP Encapsulation April 26, 2017 optional data such as security, virtual networking identifiers, fragmentation are being defined. * For extensilbity, GUE uses flag fields as opposed to TLVs as some other encapsulation protocols do. Flag fields are strictly ordered, allow random access, and are efficient in use of header space. * GUE allows private data to be sent as part of the encapsulation. This permits experimentation or customization in deployment. * GUE allows sending of control messages such as OAM using the same GUE header format (for routing purposes) as normal data messages. * GUE maximizes deliverability of non-UDP and non-TCP protocols. * GUE provides a means for exposing per flow entropy for ECMP for atypical protocols such as SCTP, DCCP, ESP, etc. 6.2 Comparison of GUE to other encapsulations A number of different encapsulation techniques have been proposed for the encapsulation of one protocol over another. EtherIP [RFC3378] provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling layer 2 and layer 3 packets over IP. NVGRE [RFC7637] and VXLAN [RFC7348] are proposals for encapsulation of layer 2 packets for network virtualization. IPIP [RFC2003] and Generic packet tunneling in IPv6 [RFC2473] provide methods for tunneling IP packets over IP. Several proposals exist for encapsulating packets over UDP including ESP over UDP [RFC3948], TCP directly over UDP [TCPUDP], VXLAN [RFC7348], LISP [RFC6830] which encapsulates layer 3 packets, MPLS/UDP [7510], and Generic UDP Encapsulation for IP Tunneling (GRE over UDP)[RFC8086]. Generic UDP tunneling [GUT] is a proposal similar to GUE in that it aims to tunnel packets of IP protocols over UDP. GUE has the following discriminating features: o UDP encapsulation leverages specialized network device processing for efficient transport. The semantics for using the UDP source port for flow entropy as input to ECMP are defined in section 5.11. o GUE permits encapsulation of arbitrary IP protocols, which includes layer 2 3, and 4 protocols. Herbert, Yong, Zia Expires October, 2017 [Page 25] Internet Draft Generic UDP Encapsulation April 26, 2017 o Multiple protocols can be multiplexed over a single UDP port number. This is in contrast to techniques to encapsulate protocols over UDP using a protocol specific port number (such as ESP/UDP, GRE/UDP, SCTP/UDP). GUE provides a uniform and extensible mechanism for encapsulating all IP protocols in UDP with minimal overhead (four bytes of additional header). o GUE is extensible. New flags and extension fields can be defined. o The GUE header includes a header length field. This allows a network node to inspect an encapsulated packet without needing to parse the full encapsulation header. o Private data in the encapsulation header allows local customization and experimentation while being compatible with processing in network nodes (routers and middleboxes). o GUE includes both data messages (encapsulation of packets) and control messages (such as OAM). o The flags-field model facilitates efficient implementation of extensibility in hardware. For instance a TCAM can be use to parse a known set of N flags where the number of entries in the TCAM is 2^N. For comparison, the number of TCAM entries needed to parse a set of N arbitrarily ordered TLVS is approximately e*N!. 7. Security Considerations There are two important considerations of security with respect to GUE. o Authentication and integrity of the GUE header. o Authentication, integrity, and confidentiality of the GUE payload. GUE security is provided by extensions for security defined in [GUEEXTENS]. These extensions include methods to authenticate the GUE header and encrypt the GUE payload. The GUE header can be authenticated using a security extension for an HMAC. Securing the GUE payload can be accomplished use of the GUE Payload Transform. This extension can be used to perform DTLS in the payload of a GUE packet to encrypt the payload. Herbert, Yong, Zia Expires October, 2017 [Page 26] Internet Draft Generic UDP Encapsulation April 26, 2017 A hash function for computing flow entropy (section 5.11) SHOULD be randomly seeded to mitigate some possible denial service attacks. 8. IANA Consideration 8.1. UDP source port A user UDP port number assignment for GUE has been assigned: Service Name: gue Transport Protocol(s): UDP Assignee: Tom Herbert <tom@herbertland.com> Contact: Tom Herbert <tom@herbertland.com> Description: Generic UDP Encapsulation Reference: draft-herbert-gue Port Number: 6080 Service Code: N/A Known Unauthorized Uses: N/A Assignment Notes: N/A Herbert, Yong, Zia Expires October, 2017 [Page 27] Internet Draft Generic UDP Encapsulation April 26, 2017 8.2. GUE version number IANA is requested to set up a registry for the GUE version number. The GUE version number is 2 bits containing four possible values. This document defines version 0 and 1. New values are assigned in accordance with RFC Required policy [RFC5226]. +----------------+-------------+---------------+ | Version number | Description | Reference | +----------------+-------------+---------------+ | 0 | Version 0 | This document | | | | | | 1 | Version 1 | This document | | | | | | 2..3 | Unassigned | | +----------------+-------------+---------------+ 8.3. Control types IANA is requested to set up a registry for the GUE control types. Control types are 8 bit values. New values for control types 1-127 are assigned in accordance with RFC Required policy [RFC5226]. +----------------+------------------+---------------+ | Control type | Description | Reference | +----------------+------------------+---------------+ | 0 | Need further | This document | | | interpretation | | | | | | | 1..127 | Unassigned | | | | | | | 128..255 | User defined | This document | +----------------+------------------+---------------+ 8.4. Flag-fields IANA is requested to create a "GUE flag-fields" registry to allocate flags and extension fields used with GUE. This shall be a registry of bit assignments for flags, length of extension fields for corresponding flags, and descriptive strings. There are sixteen bits for primary GUE header flags (bit number 0-15). New values are assigned in accordance with RFC Required policy [RFC5226]. Herbert, Yong, Zia Expires October, 2017 [Page 28] Internet Draft Generic UDP Encapsulation April 26, 2017 +-------------+--------------+-------------+--------------------+ | Flags bits | Field size | Description | Reference | +-------------+--------------+-------------+--------------------+ | Bit 0 | 4 bytes | VNID | [GUE4NVO3] | | | | | | | Bit 1..3 | 001->8 bytes | Security | [GUEEXTENS] | | | 010->16 bytes| | | | | 011->32 bytes| | | | | | | | | Bit 4 | 8 bytes | Fragmen- | [GUEEXTENS] | | | | tation | | | | | | | | Bit 5 | 4 bytes | Payload | [GUEEXTENS] | | | | transform | | | | | | | | Bit 6 | 4 bytes | Remote | [GUEEXTENS] | | | | checksum | | | | | offload | | | | | | | | Bit 7 | 4 bytes | Checksum | [GUEEXTENS] | | | | | | | Bit 8..15 | | Unassigned | | +-------------+--------------+-------------+--------------------+ New flags are to be allocated from high to low order bit contiguously without holes. 9. Acknowledgements The authors would like to thank David Liu, Erik Nordmark, Fred Templin, Adrian Farrel, Bob Briscoe, and Murray Kucherawy for valuable input on this draft. 10. References 10.1. Normative References [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 10.17487/RFC0768, August 1980, <http://www.rfc- editor.org/info/rfc768>. [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, DOI 10.17487/RFC1122, October 1989, <http://www.rfc- editor.org/info/rfc1122>. Herbert, Yong, Zia Expires October, 2017 [Page 29] Internet Draft Generic UDP Encapsulation April 26, 2017 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", RFC 2434, DOI 10.17487/RFC2434, October 1998, <http://www.rfc- editor.org/info/rfc2434>. [RFC2983] Black, D., "Differentiated Services and Tunnels", RFC 2983, DOI 10.17487/RFC2983, October 2000, <http://www.rfc- editor.org/info/rfc2983>. [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion Notification", RFC 6040, DOI 10.17487/RFC6040, November 2010, <http://www.rfc-editor.org/info/rfc6040>. [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and UDP Checksums for Tunneled Packets", RFC 6935, DOI 10.17487/RFC6935, April 2013, <http://www.rfc- editor.org/info/rfc6935>. [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement for the Use of IPv6 UDP Datagrams with Zero Checksums", RFC 6936, DOI 10.17487/RFC6936, April 2013, <http://www.rfc-editor.org/info/rfc6936>. [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- Network Tunneling", RFC 4459, DOI 10.17487/RFC4459, April 2006, <http://www.rfc-editor.org/info/rfc4459>. 10.2. Informative References [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., and G. Fairhurst, Ed., "The Lightweight User Datagram Protocol (UDP-Lite)", RFC 3828, July 2004, <http://www.rfc-editor.org/info/rfc3828>. [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M., and C. Wright, "Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, August 2014, <http://www.rfc- editor.org/info/rfc7348>. [RFC7605] Touch, J., "Recommendations on Using Assigned Transport Port Numbers", BCP 165, RFC 7605, DOI 10.17487/RFC7605, August 2015, <http://www.rfc-editor.org/info/rfc7605>. [RFC7637] Garg, P., Ed., and Y. Wang, Ed., "NVGRE: Network Virtualization Using Generic Routing Encapsulation", RFC 7637, DOI 10.17487/RFC7637, September 2015, Herbert, Yong, Zia Expires October, 2017 [Page 30] Internet Draft Generic UDP Encapsulation April 26, 2017 <http://www.rfc-editor.org/info/rfc7637>. [RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, March 2017, <http://www.rfc-editor.org/info/rfc8086>. [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, "Encapsulating MPLS in UDP", RFC 7510, DOI 10.17487/RFC7510, April 2015, <http://www.rfc- editor.org/info/rfc7510>. [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram Congestion Control Protocol (DCCP)", RFC 4340, DOI 10.17487/RFC4340, March 2006, <http://www.rfc- editor.org/info/rfc4340>. [RFC4787] Audet, F., Ed., and C. Jennings, "Network Address Translation (NAT) Behavioral Requirements for Unicast UDP", BCP 127, RFC 4787, DOI 10.17487/RFC4787, January 2007, <http://www.rfc-editor.org/info/rfc4787>. [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, "Session Traversal Utilities for NAT (STUN)", RFC 5389, DOI 10.17487/RFC5389, October 2008, <http://www.rfc- editor.org/info/rfc5389>. [RFC5285] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", RFC 5245, DOI 10.17487/RFC5245, April 2010, <http://www.rfc- editor.org/info/rfc5245>. [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines for Application Designers", BCP 145, RFC 5405, DOI 10.17487/RFC5405, November 2008, <http://www.rfc- editor.org/info/rfc5405>. [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label for Equal Cost Multipath Routing and Link Aggregation in Tunnels", RFC 6438, DOI 10.17487/RFC6438, November 2011, <http://www.rfc-editor.org/info/rfc6438>. [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, DOI 10.17487/RFC2003, October 1996, <http://www.rfc- editor.org/info/rfc2003>. [RFC3948] Huttunen, A., Swander, B., Volpe, V., DiBurro, L., and M. Stenberg, "UDP Encapsulation of IPsec ESP Packets", RFC Herbert, Yong, Zia Expires October, 2017 [Page 31] Internet Draft Generic UDP Encapsulation April 26, 2017 3948, DOI 10.17487/RFC3948, January 2005, <http://www.rfc- editor.org/info/rfc3948>. [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The Locator/ID Separation Protocol (LISP)", RFC 6830, DOI 10.17487/RFC6830, January 2013, <http://www.rfc- editor.org/info/rfc6830>. [RFC3378] Housley, R. and S. Hollenbeck, "EtherIP: Tunneling Ethernet Frames in IP Datagrams", RFC 3378, DOI 10.17487/RFC3378, September 2002, <http://www.rfc- editor.org/info/rfc3378>. [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, DOI 10.17487/RFC2784, March 2000, <http://www.rfc- editor.org/info/rfc2784>. [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., "Encapsulating MPLS in IP or Generic Routing Encapsulation (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, <http://www.rfc-editor.org/info/rfc4023>. [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", RFC 2661, DOI 10.17487/RFC2661, August 1999, <http://www.rfc-editor.org/info/rfc2661>. [GUEEXTENS] Herbert, T., Yong, L., and Templin, F., "Extensions for Generic UDP Encapsulation" draft-herbert-gue-extensions-00 [GUE4NVO3] Yong, L., Herbert, T., Zia, O., "Generic UDP Encapsulation (GUE) for Network Virtualization Overlay" draft-hy-nvo3-gue-4-nvo-03 [GUESEC] Yong, L., Herbert, T., "Generic UDP Encapsulation (GUE) for Secure Transport" draft-hy-gue-4-secure-transport-03 [TCPUDP] Chesire, S., Graessley, J., and McGuire, R., "Encapsulation of TCP and other Transport Protocols over UDP" draft-cheshire-tcp-over-udp-00 [TOU] Herbert, T., "Transport layer protocols over UDP" draft- herbert-transports-over-udp-00 [GUT] Manner, J., Varia, N., and Briscoe, B., "Generic UDP Tunnelling (GUT) draft-manner-tsvwg-gut-02.txt" Herbert, Yong, Zia Expires October, 2017 [Page 32] Internet Draft Generic UDP Encapsulation April 26, 2017 [CIRCBRK] Fairhurst, G., "Network Transport Circuit Breakers", draft-ietf-tsvwg-circuit-breaker-15 [LCO] Cree, E., https://www.kernel.org/doc/Documentation/ networking/checksum-offloads.txt Appendix A: NIC processing for GUE This appendix provides some guidelines for Network Interface Cards (NICs) to implement common offloads and accelerations to support GUE. Note that most of this discussion is generally applicable to other methods of UDP based encapsulation. A.1. Receive multi-queue Contemporary NICs support multiple receive descriptor queues (multi- queue). Multi-queue enables load balancing of network processing for a NIC across multiple CPUs. On packet reception, a NIC selects the appropriate queue for host processing. Receive Side Scaling is a common method which uses the flow hash for a packet to index an indirection table where each entry stores a queue number. Flow Director and Accelerated Receive Flow Steering (aRFS) allow a host to program the queue that is used for a given flow which is identified either by an explicit five-tuple or by the flow's hash. GUE encapsulation is compatible with multi-queue NICs that support five-tuple hash calculation for UDP/IP packets as input to RSS. The flow entropy in the UDP source port ensures classification of the encapsulated flow even in the case that the outer source and destination addresses are the same for all flows (e.g. all flows are going over a single tunnel). By default, UDP RSS support is often disabled in NICs to avoid out- of-order reception that can occur when UDP packets are fragmented. As discussed above, fragmentation of GUE packets is be mostly avoided by fragmenting packets before entering a tunnel, GUE fragmentation, path MTU discovery in higher layer protocols, or operator adjusting MTUs. Other UDP traffic might not implement such procedures to avoid fragmentation, so enabling UDP RSS support in the NIC might be a considered tradeoff during configuration. A.2. Checksum offload Many NICs provide capabilities to calculate standard ones complement payload checksum for packets in transmit or receive. When using GUE encapsulation, there are at least two checksums that are of interest: the encapsulated packet's transport checksum, and the UDP checksum in the outer header. Herbert, Yong, Zia Expires October, 2017 [Page 33] Internet Draft Generic UDP Encapsulation April 26, 2017 A.2.1. Transmit checksum offload NICs can provide a protocol agnostic method to offload transmit checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with GUE. In this method, the host provides checksum related parameters in a transmit descriptor for a packet. These parameters include the starting offset of data to checksum, the length of data to checksum, and the offset in the packet where the computed checksum is to be written. The host initializes the checksum field to pseudo header checksum. In the case of GUE, the checksum for an encapsulated transport layer packet, a TCP packet for instance, can be offloaded by setting the appropriate checksum parameters. NICs typically can offload only one transmit checksum per packet, so simultaneously offloading both an inner transport packet's checksum and the outer UDP checksum is likely not possible. If an encapsulator is co-resident with a host, then checksum offload may be performed using remote checksum offload (described in [GUEEXTENS]). Remote checksum offload relies on NIC offload of the simple UDP/IP checksum which is commonly supported even in legacy devices. In remote checksum offload, the outer UDP checksum is set and the GUE header includes an option indicating the start and offset of the inner "offloaded" checksum. The inner checksum is initialized to the pseudo header checksum. When a decapsulator receives a GUE packet with the remote checksum offload option, it completes the offload operation by determining the packet checksum from the indicated start point to the end of the packet, and then adds this into the checksum field at the offset given in the option. Computing the checksum from the start to end of packet is efficient if checksum-complete is provided on the receiver. Another alternative when an encapsulator is co-resident with a host is to perform Local Checksum Offload [LCO]. In this method, the inner transport layer checksum is offloaded and the outer UDP checksum can be deduced based on the fact that the portion of the packet covered by the inner transport checksum will sum to zero (or at least the bit wise "not" of the inner pseudo header). A.2.2. Receive checksum offload GUE is compatible with NICs that perform a protocol agnostic receive checksum (CHECKSUM_COMPLETE in Linux parlance). In this technique, a NIC computes a ones complement checksum over all (or some predefined portion) of a packet. The computed value is provided to the host stack in the packet's receive descriptor. The host driver can use Herbert, Yong, Zia Expires October, 2017 [Page 34] Internet Draft Generic UDP Encapsulation April 26, 2017 this checksum to "patch up" and validate any inner packet transport checksum, as well as the outer UDP checksum if it is non-zero. Many legacy NICs don't provide checksum-complete but instead provide an indication that a checksum has been verified (CHECKSUM_UNNECESSARY in Linux). Usually, such validation is only done for simple TCP/IP or UDP/IP packets. If a NIC indicates that a UDP checksum is valid, the checksum-complete value for the UDP packet is the "not" of the pseudo header checksum. In this way, checksum-unnecessary can be converted to checksum-complete. So, if the NIC provides checksum-unnecessary for the outer UDP header in an encapsulation, checksum conversion can be done so that the checksum-complete value is derived and can be used by the stack to validate checksums in the encapsulated packet. A.3. Transmit Segmentation Offload Transmit Segmentation Offload (TSO) is a NIC feature where a host provides a large (>MTU size) TCP packet to the NIC, which in turn splits the packet into separate segments and transmits each one. This is useful to reduce CPU load on the host. The process of TSO can be generalized as: - Split the TCP payload into segments which allow packets with size less than or equal to MTU. - For each created segment: 1. Replicate the TCP header and all preceding headers of the original packet. 2. Set payload length fields in any headers to reflect the length of the segment. 3. Set TCP sequence number to correctly reflect the offset of the TCP data in the stream. 4. Recompute and set any checksums that either cover the payload of the packet or cover header which was changed by setting a payload length. Following this general process, TSO can be extended to support TCP encapsulation in GUE. For each segment the Ethernet, outer IP, UDP header, GUE header, inner IP header (if tunneling), and TCP headers are replicated. Any packet length header fields need to be set properly (including the length in the outer UDP header), and checksums need to be set correctly (including the outer UDP checksum if being used). Herbert, Yong, Zia Expires October, 2017 [Page 35] Internet Draft Generic UDP Encapsulation April 26, 2017 To facilitate TSO with GUE, it is recommended that extension fields do not contain values that need to be updated on a per segment basis. For example, extension fields should not include checksums, lengths, or sequence numbers that refer to the payload. If the GUE header does not contain such fields then the TSO engine only needs to copy the bits in the GUE header when creating each segment and does not need to parse the GUE header. A.4. Large Receive Offload Large Receive Offload (LRO) is a NIC feature where packets of a TCP connection are reassembled, or coalesced, in the NIC and delivered to the host as one large packet. This feature can reduce CPU utilization in the host. LRO requires significant protocol awareness to be implemented correctly and is difficult to generalize. Packets in the same flow need to be unambiguously identified. In the presence of tunnels or network virtualization, this may require more than a five-tuple match (for instance packets for flows in two different virtual networks may have identical five-tuples). Additionally, a NIC needs to perform validation over packets that are being coalesced, and needs to fabricate a single meaningful header from all the coalesced packets. The conservative approach to supporting LRO for GUE would be to assign packets to the same flow only if they have identical five- tuple and were encapsulated the same way. That is the outer IP addresses, the outer UDP ports, GUE protocol, GUE flags and fields, and inner five tuple are all identical. Appendix B: Implementation considerations This appendix is informational and does not constitute a normative part of this document. B.1. Priveleged ports Using the source port to contain a flow entropy value disallows the security method of a receiver enforcing that the source port be a privileged port. Privileged ports are defined by some operating systems to restrict source port binding. Unix, for instance, considered port number less than 1024 to be privileged. Enforcing that packets are sent from a privileged port is widely considered an inadequate security mechanism and has been mostly deprecated. To approximate this behavior, an implementation could restrict a user from sending a packet destined to the GUE port without proper credentials. Herbert, Yong, Zia Expires October, 2017 [Page 36] Internet Draft Generic UDP Encapsulation April 26, 2017 B.2. Setting flow entropy as a route selector An encapsulator generating flow entropy in the UDP source port could modulate the value to perform a type of multipath source routing. Assuming that networking switches perform ECMP based on the flow hash, a sender can affect the path by altering the flow entropy. For instance, a host can store a flow hash in its PCB for an inner flow, and might alter the value upon detecting that packets are traversing a lossy path. Changing the flow entropy for a flow SHOULD be subject to hysteresis (at most once every thirty seconds) to limit the number of out of order packets. B.3. Hardware protocol implementation considerations Low level data path protocol, such is GUE, are often supported in high speed network device hardware. Variable length header (VLH) protocols like GUE are often considered difficult to efficiently implement in hardware. In order to retain the important characteristics of an extensible and robust protocol, hardware vendors may practice "constrained flexibility". In this model, only certain combinations or protocol header parameterizations are implemented in hardware fast path. Each such parameterization is fixed length so that the particular instance can be optimized as a fixed length protocol. In the case of GUE this constitutes specific combinations of GUE flags, fields, and next protocol. The selected combinations would naturally be the most common cases which form the "fast path", and other combinations are assumed to take the "slow path". In time, needs and requirements of the protocol may change which may manifest themselves as new parameterizations to be supported in the fast path. To allow allow this extensibility, a device practicing constrained flexibility should allow the fast path parameterizations to be programmable. Authors' Addresses Tom Herbert Quantonium 4701 Patrick Henry Santa Clara, CA 95054 US Email: tom@herbertland.com Lucy Yong Huawei USA 5340 Legacy Dr. Herbert, Yong, Zia Expires October, 2017 [Page 37] Internet Draft Generic UDP Encapsulation April 26, 2017 Plano, TX 75024 US Email: lucy.yong@huawei.com Osama Zia Microsoft 1 Microsoft Way Redmond, WA 98029 US Email: osamaz@microsoft.com Herbert, Yong, Zia Expires October, 2017 [Page 38]