Network Virtualization Overlays (nvo3)                        T. Herbert
Internet-Draft                                                  Facebook
Intended status: Standard track                                  L. Yong
Expires December 12, 2016                                     Huawei USA
                                                                  O. Zia
                                                               Microsoft
                                                           June 10, 2016

                       Generic UDP Encapsulation
                         draft-ietf-nvo3-gue-03

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on December 12, 2015.

Copyright Notice

   Copyright (c) 2016 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents



Herbert, Yong, Zia       Expires December 2016                  [Page 1]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.












































Herbert, Yong, Zia       Expires December 2016                  [Page 2]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


Abstract

   This specification describes Generic UDP Encapsulation (GUE), which
   is a scheme for using UDP to encapsulate packets of arbitrary IP
   protocols for transport across layer 3 networks. By encapsulating
   packets in UDP, specialized capabilities in networking hardware for
   efficient handling of UDP packets can be leveraged. GUE specifies
   basic encapsulation methods upon which higher level constructs, such
   tunnels and overlay networks for network virtualization, can be
   constructed. GUE is extensible by allowing optional data fields as
   part of the encapsulation, and is generic in that it can encapsulate
   packets of various IP protocols.

Table of Contents

   1. Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2. Base packet format  . . . . . . . . . . . . . . . . . . . . . .  5
     2.1. GUE version . . . . . . . . . . . . . . . . . . . . . . . .  5
   3. Version 0 . . . . . . . . . . . . . . . . . . . . . . . . . . .  5
     3.1. Header format . . . . . . . . . . . . . . . . . . . . . . .  6
     3.2. Proto/ctype field . . . . . . . . . . . . . . . . . . . . .  7
     3.3. Flags and optional fields . . . . . . . . . . . . . . . . .  8
     3.4. Private data  . . . . . . . . . . . . . . . . . . . . . . .  8
     3.5 Message types  . . . . . . . . . . . . . . . . . . . . . . .  9
       3.5.1. Control messages  . . . . . . . . . . . . . . . . . . .  9
       3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . .  9
   4. Version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
     4.1. Direct encapsulation of IPv4  . . . . . . . . . . . . . . . 10
     4.2 Direct encapsulation of IPv6 . . . . . . . . . . . . . . . . 11
   5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
     5.1. Network tunnel encapsulation  . . . . . . . . . . . . . . . 12
     5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 12
     5.3. Encapsulator operation  . . . . . . . . . . . . . . . . . . 12
     5.4. Decapsulator operation  . . . . . . . . . . . . . . . . . . 13
     5.5. Router and switch operation . . . . . . . . . . . . . . . . 13
     5.6. Middlebox interactions  . . . . . . . . . . . . . . . . . . 13
       5.6.1. Inferring connection semantics  . . . . . . . . . . . . 13
       5.6.2. Identification of GUE packets . . . . . . . . . . . . . 14
     5.7. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
     5.8. Checksum Handling . . . . . . . . . . . . . . . . . . . . . 15
       5.8.1. Checksum requirements . . . . . . . . . . . . . . . . . 15
       5.8.2. GUE header checksum . . . . . . . . . . . . . . . . . . 15
       5.8.3. UDP Checksum with IPv4  . . . . . . . . . . . . . . . . 16
       5.8.4. UDP Checksum with IPv6  . . . . . . . . . . . . . . . . 16
     5.9. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 17
     5.10. Congestion control . . . . . . . . . . . . . . . . . . . . 17
     5.11. Multicast  . . . . . . . . . . . . . . . . . . . . . . . . 18
   6. Inner flow identifier properties  . . . . . . . . . . . . . . . 18



Herbert, Yong, Zia       Expires December 2016                  [Page 3]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


     6.1. Flow classification . . . . . . . . . . . . . . . . . . . . 18
     6.2. Inner flow identifier properties  . . . . . . . . . . . . . 19
   7. Motivation for GUE  . . . . . . . . . . . . . . . . . . . . . . 20
   8. Security Considerations . . . . . . . . . . . . . . . . . . . . 21
   9. IANA Consideration  . . . . . . . . . . . . . . . . . . . . . . 21
   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21
   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
     11.1. Normative References . . . . . . . . . . . . . . . . . . . 22
     11.2. Informative References . . . . . . . . . . . . . . . . . . 22
   Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 23
     A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 23
     A.2. Checksum offload  . . . . . . . . . . . . . . . . . . . . . 23
       A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 23
       A.2.2. Receive checksum offload  . . . . . . . . . . . . . . . 24
     A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 25
     A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 26
   Appendix B: Privileged ports . . . . . . . . . . . . . . . . . . . 26
   Appendix C: Inner flow identifier as a route selector  . . . . . . 27
   Appendix D: Hardware protocol implementation considerations  . . . 27
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27

1. Introduction

   This specification describes Generic UDP Encapsulation (GUE) which is
   a general method for encapsulating packets of arbitrary IP protocols
   within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating
   packets in UDP facilitates efficient transport across networks.
   Networking devices widely provide protocol specific processing and
   optimizations for UDP (as well as TCP) packets. Packets for atypical
   IP protocols (those not usually parsed by networking hardware) can be
   encapsulated in UDP packets to maximize deliverability and to
   leverage flow specific mechanisms for routing and packet steering.

   GUE provides an extensible header format for including optional data
   in the encapsulation header. This data potentially covers items such
   as virtual networking identifier, security data for validating or
   authenticating the GUE header, congestion control data, etc. GUE also
   allows private optional data in the encapsulation header. This
   feature can be used by a site or implementation to define local
   custom optional data, and allows experimentation of options that may
   eventually become standard.










Herbert, Yong, Zia       Expires December 2016                  [Page 4]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


2. Base packet format

   A GUE packet is comprised of a UDP packet whose payload is a GUE
   header followed by a payload which is either an encapsulated packet
   of some IP protocol or a control message (like an OAM message). A GUE
   packet has the general format:

   +-------------------------------+
   |                               |
   |        UDP/IP header          |
   |                               |
   |-------------------------------|
   |                               |
   |         GUE Header            |
   |                               |
   |-------------------------------|
   |                               |
   |      Encapsulated packet      |
   |      or control message       |
   |                               |
   +-------------------------------+

   The GUE header is variable length as determined by the presence of
   optional fields.

2.1. GUE version

   The first two bits of the GUE header contain the GUE protocol version
   number. The rest of the fields after the GUE version number are
   defined based on the version number. Versions 0x0 and 0x1 are
   described in this specification; versions 0x2 and 0x3 are reserved,

3. Version 0

   Version 0 of GUE is defines a generic extensible format to
   encapsulate packets by IP protocol number.















Herbert, Yong, Zia       Expires December 2016                  [Page 5]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


3.1. Header format

   The header format for version 0x0 of GUE in UDP is:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Source port            |      Destination port         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Length              |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0x0|C|   Hlen  |  Proto/ctype  |            Flags            |E|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   ~                       Fields (optional)                       ~
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                   Extension flags (optional)                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   ~                    Extension fields (optional)                ~
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   ~                    Private data (optional)                    ~
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The contents of the UDP header are:

      o Source port (inner flow identifier): This should be set to a
        value that represents the encapsulated flow. The properties of
        the inner flow identifier are described below.

      o Destination port: The GUE assigned port number, 6080.

      o Length: Canonical length of the UDP packet (length of UDP header
        and payload).

      o Checksum: Standard UDP checksum (see section 5.8).

   The GUE header consists of:

      o Ver: GUE protocol version (0x0).

      o C: Control flag. When set indicates a control message, not set
        indicates a data message.




Herbert, Yong, Zia       Expires December 2016                  [Page 6]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


      o Hlen: Length in 32-bit words of the GUE header, including
        optional fields but not the first four bytes of the header.
        Computed as (header_len - 4) / 4. All GUE headers are a multiple
        of four bytes in length. Maximum header length is 128 bytes.

      o Proto/ctype: When the C bit is set this field contains a control
        message type for the payload. When C bit is not set, the field
        holds the IP protocol number for the encapsulated packet in the
        payload. The control message or encapsulated packet begins at
        the offset provided by Hlen.

      o Flags. Header flags that may be allocated for various purposes
        and may indicate presence of optional fields. Undefined header
        flag bits must be set to zero on transmission.

      o 'E' Extension flag. Indicates presence of extension flags option
        in the optional fields.

      o Fields: Optional fields whose presence is indicated by
        corresponding flags.

      o Extension flags: An optional field indicated by the E bit. This
        field provides an additional set of thirty-two bits for flags.

      o Extension fields: Optional fields whose presence is indicated by
        corresponding extension flags.

      o Private data: Optional private data. If private data is present
        it immediately follows that last field present in the header.
        The length of this data is determined by subtracting the
        starting offset of the private data from the header length.

3.2. Proto/ctype field

   When the C bit is not set, the proto/ctype field must be set to a
   valid IP protocol number. The IP protocol number serves as an
   indication of the type of next protocol header which is contained in
   the GUE payload at the offset indicated in Hlen. Intermediate devices
   may parse the GUE payload per the IP protocol number in the
   proto/ctype field, and header flags cannot affect the interpretation
   of the proto/ctype field.

   IP protocol number 59 ("No next header") may be set to indicate that
   the GUE payload does not begin with the header of an IP protocol.
   This would be the case, for instance, if the GUE payload were a
   fragment when performing GUE level fragmentation. The interpretation
   of the payload is performed though other means (such as flags and
   optional fields), and intermediate devices must not parse packets the



Herbert, Yong, Zia       Expires December 2016                  [Page 7]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   packet based on the IP protocol number in this case.

   When the C bit is set, the proto/ctype field must be set to a valid
   control message type. A value of zero indicates that the GUE payload
   requires further interpretation to deduce the control type. This
   might be the case when the payload is a fragment of a control
   message, where only the reassembled packet can be interpreted as a
   control message.

3.3. Flags and optional fields

   Flags and associated optional fields are the primary mechanism of
   extensibility in GUE. There are sixteen flag bits in the primary GUE
   header with one being reserved to indicate that an optional extension
   flags field is present. The extension flags field contains an
   additional thirty-two flag bits.

   A flag may indicate presence of optional fields. The size of an
   optional field indicated by a flag must be fixed.

   Flags may be paired together to allow different lengths for an
   optional field. For example, if two flag bits are paired, a field may
   possibly be three different lengths. Regardless of how flag bits may
   be paired, the lengths and offsets of optional fields corresponding
   to a set of flags must be well defined.

   Optional fields are placed in order of the flags. New flags should be
   allocated from high to low order bit contiguously. Flags allow random
   access, for instance to inspect the field corresponding to the Nth
   flag bit, an implementation only considers the previous N-1 flags to
   determine the offset. Flags after the Nth flag are not pertinent in
   calculating the offset of the Nth flag.

   Flags (or paired flags) are idempotent such that new flags should not
   cause reinterpretation of old flags. Also, new flags should not alter
   interpretation of other elements in the GUE header nor how the
   message is parsed (for instance, in a data message the proto/ctype
   field always holds an IP protocol number as an invariant).

3.4. Private data

   An implementation may use private data for its own use. The private
   data immediately follows the last field in the GUE header and is not
   a fixed length. This data is considered part of the GUE header and
   must be accounted for in header length (Hlen). The length of the
   private data must be a multiple of four and is determined by
   subtracting the offset of private data in the GUE header from the
   header length. Specifically:



Herbert, Yong, Zia       Expires December 2016                  [Page 8]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


      Private_length = (Hlen * 4) - Length(flags)

   Where "Length(flags)" returns the sum of lengths of all the optional
   fields present in the GUE header. When there is no private data
   present, length of the private data is zero.

   The semantics and interpretation of private data are implementation
   specific. An encapsulator and decapsulator MUST agree on the meaning
   of private data before using it. The private data may be structured
   as necessary, for instance it might contain its own set of flags and
   optional fields.

   If a decapsulator receives a GUE packet with private data, it MUST
   validate the private data appropriately. If a decapsulator does not
   expect private data from an encapsulator the packet MUST be dropped.
   If a decapsulator cannot validate the contents of private data per
   the provided semantics the packet MUST also be dropped. An
   implementation may place security data in GUE private data which must
   be verified for packet acceptance.

3.5 Message types

3.5.1. Control messages

   Control messages are indicated in the GUE header when the C bit is
   set. The payload is interpreted as a control message with type
   specified in the proto/ctype field. The format and contents of the
   control message are indicated by the type and can be variable length.

   Other than interpreting the proto/ctype field as a control message
   type, the meaning and semantics of the rest of the elements in the
   GUE header are the same as that of data messages. Forwarding and
   routing of control messages should be the same as that of a data
   message with the same outer IP and UDP header and GUE flags-- this
   ensures that control messages can be created that follow the same
   path as data messages.

   Control messages can be defined for OAM type messages. For instance,
   an echo request and corresponding echo reply message may be defined
   to test for liveness.

3.5.2. Data messages

   Data messages are indicated in GUE header with C bit not set. The
   payload of a data message is interpreted as an encapsulated packet of
   an IP protocol indicated in the proto/ctype field. The packet
   immediately follows the GUE header.




Herbert, Yong, Zia       Expires December 2016                  [Page 9]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   Data messages are a primary means of encapsulation and can be used to
   create tunnels for overlay networks.

4. Version 1

   Version 1 of GUE allows direct encapsulation of IPv4 and IPv6 in UDP.
   In this version there is no GUE header, a UDP packet carries an IP
   packet. The first two bits of the UDP payload indicate the GUE
   version and coincide with the first two bits of the version number in
   the IP header. So, the first two version bits of IPv4 and IPv6 are 01
   which corresponds with GUE version 1.

4.1. Direct encapsulation of IPv4

   The format for encapsulating IPv4 directly in UDP is demonstrated
   below.
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Source port            |      Destination port         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Length              |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|1|0|0|  IHL  |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |   Protocol    |   Header Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Source IPv4 Address                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Destination IPv4 Address                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Note that 0100 value IP version field express the GUE version as 1
   (bits 01) and IP version as 4 (bits 0100).















Herbert, Yong, Zia       Expires December 2016                 [Page 10]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


4.2 Direct encapsulation of IPv6

   The format for encapsulating IPv6 directly in UDP is demonstrated
   below.
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Source port            |      Destination port         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Length              |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|1|1|0| Traffic Class |           Flow Label                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Payload Length        |     NextHdr   |   Hop Limit   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +                                                               +
   |                                                               |
   +                     Outer Source IPv6 Address                 +
   |                                                               |
   +                                                               +
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +                                                               +
   |                                                               |
   +                  Outer Destination IPv6 Address               +
   |                                                               |
   +                                                               +
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Note that 0110 value IP version field express the GUE version as 1
   (bits 01) and IP version as 6 (bits 0110).

5. Operation

   The figure below illustrates the use of GUE encapsulation between two
   servers. Sever 1 is sending packets to server 2. An encapsulator
   performs encapsulation of packets from server 1. These encapsulated
   packets traverse the network as UDP packets. At the decapsulator,
   packets are decapsulated and sent on to server 2. Packet flow in the
   reverse direction need not be symmetric; GUE encapsulation is not
   required in the reverse path.







Herbert, Yong, Zia       Expires December 2016                 [Page 11]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   +---------------+                       +---------------+
   |               |                       |               |
   |   Server 1    |                       |    Server 2   |
   |               |                       |               |
   +---------------+                       +---------------+
          |                                        ^
          V                                        |
   +---------------+   +---------------+   +---------------+
   |               |   |               |   |               |
   | Encapsulator  |-->|    Layer 3    |-->| Decapsulator  |
   |               |   |    Network    |   |               |
   +---------------+   +---------------+   +---------------+

   The encapsulator and decapsulator may be co-resident with the
   corresponding servers, or may be on separate nodes in the network.

5.1. Network tunnel encapsulation

   Network tunneling can be achieved by encapsulating layer 2 or layer 3
   packets. In this case the encapsulator and decapsulator nodes are the
   tunnel endpoints. These could be routers that provide network tunnels
   on behalf of communicating servers.

5.2. Transport layer encapsulation

   When encapsulating layer 4 packets, the encapsulator and decapsulator
   should be co-resident with the servers. In this case, the
   encapsulation headers are inserted between the IP header and the
   transport packet. The addresses in the IP header refer to both the
   endpoints of the encapsulation and the endpoints for terminating the
   the transport protocol.

5.3. Encapsulator operation

   Encapsulators create GUE data messages, set the source port to the
   inner flow identifier, set flags and optional fields in the GUE
   header, and forward packets to a decapsulator.

   An encapsulator may be an end host originating the packets of a flow,
   or may be a network device performing encapsulation on behalf of
   servers (routers implementing tunnels for instance). In either case,
   the intended target (decapsulator) is indicated by the outer
   destination IP address.

   If an encapsulator is tunneling packets, that is encapsulating
   packets of layer 2 or layer 3 protocols (e.g. EtherIP, IPIP, ESP
   tunnel mode), it should follow standard conventions for tunneling of
   one IP protocol over another. Diffserv interaction with tunnels is



Herbert, Yong, Zia       Expires December 2016                 [Page 12]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   described in [RFC2983], ECN propagation for tunnels is described in
   [RFC6040].

5.4. Decapsulator operation

   A decapsulator performs decapsulation of GUE packets. A decapsulator
   is addressed by the outer destination IP address of a GUE packet.
   The decapsulator validates packets, including fields of the GUE
   header. If a packet is acceptable, the UDP and GUE headers are
   removed and the packet is resubmitted for IP protocol processing or
   control message processing if it is a control message.

   If a decapsulator receives a GUE packet with an unsupported version,
   unknown flag, bad header length (too small for included optional
   fields), unknown control message type, or an otherwise malformed
   header, it must drop the packet and may log the event. No error
   message is returned back to the encapsulator. Note that set flags in
   GUE that are unknown to a decapsulator MUST NOT be ignored. If a GUE
   packet is received by a decapsulator with unknown flags, the packet
   MUST be dropped.

5.5. Router and switch operation

   Routers and switches should forward GUE packets as standard UDP/IP
   packets. The outer five-tuple should contain sufficient information
   to perform flow classification corresponding to the flow of the inner
   packet. A router should not normally need to parse a GUE header, and
   none of the flags or optional fields in the GUE header should affect
   routing.

   A router should not modify a GUE header when forwarding a packet. It
   may encapsulate a GUE packet in another GUE packet, for instance to
   implement a network tunnel. In this case the router takes the role of
   an encapsulator, and the corresponding decapsulator is the logical
   endpoint of the tunnel.

5.6. Middlebox interactions

   A middle box may interpret some flags and optional fields of the GUE
   header for classification purposes, but is not required to understand
   all flags and fields in GUE packets. A middle box should not drop a
   GUE packet because there are flags unknown to it. The header length
   in the GUE header allows a middlebox to inspect the payload packet
   without needing to parse the flags or optional fields.

5.6.1. Inferring connection semantics

   A middlebox may infer bidirectional connection semantics to a UDP



Herbert, Yong, Zia       Expires December 2016                 [Page 13]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   flow. For instance a stateful firewall may create a five-tuple rule
   to match flows on egress, and a corresponding five-tuple rule for
   matching ingress packets where the roles of source and destination
   are reversed for the IP addresses and UDP port numbers. To operate in
   this environment, a GUE tunnel must assume connected semantics
   defined by the UDP five tuple and the use of GUE encapsulation must
   be symmetric between both endpoints. The source port set in the UDP
   header must be the destination port the peer would set for replies.

5.6.2. Identification of GUE packets

   For a middlebox to be able to parse GUE packets it must be able to
   positively identify GUE packets. The obvious way to do this would be
   by destination port number, that is if the destination UDP port
   matches the GUE port number it would be considered GUE. This method
   is problematic since ports numbers do not have global meaning
   ([RFC7605]) and a packet which is not GUE but destined to the same
   port number could be misinterpreted.

   To allow middleboxes to accurately identify GUE packet, UDP magic
   numbers may be used ([UDPMAG]). A UDP magic number is a sixy-four bit
   value inserted between the UDP header and the encapsulated protocol
   headers. The value is a constant for each defined protocol and can be
   matched by middlebox to identify (with very high probability) the
   encapsulated protocol.

   The sixty-four bit magic number for GUE is 0xffd871a2:0x217431f9, so
   a GUE packet with a UDP magic number would have the format:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Source port            |      Destination port         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Length              |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           0xffd871a2                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           0x217431f9                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0x0|C|   Hlen  |  Proto/ctype  |            Flags            |E|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   ~                       Fields (optional)                       ~
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+





Herbert, Yong, Zia       Expires December 2016                 [Page 14]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


5.7. NAT

   IP address and port translation can be performed on the UDP/IP
   headers adhering to the requirements for NAT with UDP [RFC478]. In
   the case of stateful NAT, connection semantics must be applied to a
   GUE tunnel as described above.

   When using transport mode encapsulation and traversing a NAT, the IP
   addresses may be changed such that the pseudo header checksum used
   for checksum calculation is modified and the checksum will be found
   invalid at the receiver. To compensate for this, A GUE option can be
   added which contains the checksum over the source and destination
   addresses when the packet is transmitted. Upon receiving this option,
   the delta of the pseudo header checksum is computed by subtracting
   the checksum over the source and and destination addresses from the
   checksum value in the option. The resultant value is then added into
   checksum calculation when validating the inner transport checksum.

5.8. Checksum Handling

   This section describes the requirements around the UDP checksum and
   GUE header checksum. Checksums are an important consideration in that
   that they can provide end to end validation and protect against
   packet mis-delivery. The latter is allowed by the inclusion of a
   pseudo header that covers the IP addresses and UDP ports of the
   encapsulating headers.

5.8.1. Checksum requirements

   The potential for mis-delivery of packets due to corruption of IP,
   UDP, or GUE headers must be considered. One of the following
   requirements must be met:

  o UDP checksums are enabled (for IPv4 or IPv6).

  o The GUE header checksum is used.

  o Zero UDP checksums are used in accordance with applicable
    requirements in [GREUDP], [RFC6935], and [RFC6936].

5.8.2. GUE header checksum

    The GUE header checksum (in version 0x0) provides a UDP-lite
    [RFC3828] type of checksum capability as an optional field of the
    GUE header. The GUE header checksum minimally covers the GUE header
    and a GUE pseudo header. The GUE pseudo header includes the
    corresponding IP addresses as well as the UDP ports of the
    encapsulating headers. This checksum should provide adequate



Herbert, Yong, Zia       Expires December 2016                 [Page 15]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


    protection against address corruption in IPv6 when the UDP checksum
    is zero. Additionally, the GUE checksum provides protection of the
    GUE header when the UDP checksum is set to zero with either IPv4 or
    IPv6. The GUE header checksum is defined in [GUECSUM].

5.8.3. UDP Checksum with IPv4

    For UDP in IPv4, the UDP checksum MUST be processed as specified in
    [RFC768] and [RFC1122] for both transmit and receive. An
    encapsulator MAY set the UDP checksum to zero for performance or
    implementation considerations. The IPv4 header includes a checksum
    that protects against mis-delivery of the packet due to corruption
    of IP addresses. The UDP checksum potentially provides protection
    against corruption of the UDP header, GUE header, and GUE payload.
    Enabling or disabling the use of checksums is a deployment
    consideration that should take into account the risk and effects of
    packet corruption, and whether the packets in the network are
    already adequately protected by other, possibly stronger mechanisms
    such as the Ethernet CRC. If an encapsulator sets a zero UDP
    checksum for IPv4 it SHOULD use the GUE header checksum as described
    in section 4.8.2.

    When a decapsulator receives a packet, the UDP checksum field MUST
    be processed. If the UDP checksum is non-zero, the decapsulator MUST
    verify the checksum before accepting the packet. By default a
    decapsulator SHOULD accept UDP packets with a zero checksum. A node
    MAY be configured to disallow zero checksums per [RFC1122]; this may
    be done selectively, for instance disallowing zero checksums from
    certain hosts that are known to be sending over paths subject to
    packet corruption. If verification of a non-zero checksum fails, a
    decapsulator lacks the capability to verify a non-zero checksum, or
    a packet with a zero-checksum was received and the decapsulator is
    configured to disallow, the packet MUST be dropped and an event MAY
    be logged.

5.8.4. UDP Checksum with IPv6

    For UDP in IPv6, the UDP checksum MUST be processed as specified in
    [RFC768] and [RFC2460] for both transmit and receive. Unlike IPv4,
    there is no header checksum in IPv6 that protects against mis-
    delivery due to address corruption. Therefore, when GUE is used over
    IPv6, either the UDP checksum must be enabled or the GUE header
    checksum must be used. An encapsulator MAY set a zero UDP checksum
    for performance or implementation reasons, in which case the GUE
    header checksum MUST be used or applicable requirements for using
    zero UDP checksums in [GREUDP] MUST be met. If the UDP checksum is
    enabled, then the GUE header checksum should not be used since it is
    mostly redundant.



Herbert, Yong, Zia       Expires December 2016                 [Page 16]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


    When a decapsulator receives a packet, the UDP checksum field MUST
    be processed. If the UDP checksum is non-zero, the decapsulator MUST
    verify the checksum before accepting the packet. By default a
    decapsulator MUST only accept UDP packets with a zero checksum if
    the GUE header checksum is used and is verified. If verification of
    a non-zero checksum fails, a decapsulator lacks the capability to
    verify a non-zero checksum, or a packet with a zero-checksum and no
    GUE header checksum was received, the packet MUST be dropped and an
    event MAY be logged.

5.9. MTU and fragmentation

    Standard conventions for handling of MTU (Maximum Transmission Unit)
    and fragmentation in conjunction with networking tunnels
    (encapsulation of layer 2 or layer 3 packets) should be followed.
    Details are described in MTU and Fragmentation Issues with In-the-
    Network Tunneling [RFC4459]

    If a packet is fragmented before encapsulation in GUE, all the
    related fragments must be encapsulated using the same source port
    (inner flow identifier). An operator may set MTU to account for
    encapsulation overhead and reduce the likelihood of fragmentation.

5.10. Congestion control

    Per requirements of [RFC5405], if the IP traffic encapsulated with
    GUE implements proper congestion control no additional mechanisms
    should be required.

    In the case that the encapsulated traffic does not implement any or
    sufficient control, or it is not known rather a transmitter will
    consistently implement proper congestion control, then congestion
    control at the encapsulation layer must be provided. Note this case
    applies to a significant use case in network virtualization in which
    guests run third party networking stacks that cannot be implicitly
    trusted to implement conformant congestion control.

    Out of band mechanisms such as rate limiting, Managed Circuit
    Breaker, or traffic isolation may used to provide rudimentary
    congestion control. For finer grained congestion control that allows
    alternate congestion control algorithms, reaction time within an
    RTT, and interaction with ECN, in-band mechanisms may warranted.

    DCCP may be used to provide congestion control for encapsulated
    flows. In this case, the protocol stack for an IP tunnel may be IP-
    GUE-DCCP-IP. Alternatively, GUE can be extended to include
    congestion control (related data carried in GUE optional fields).
    Congestion control mechanisms for GUE will be elaborated in other



Herbert, Yong, Zia       Expires December 2016                 [Page 17]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


    specifications.

5.11. Multicast

    GUE packets may be multicast to decapsulators using a multicast
    destination address in the encapsulating IP headers. Each receiving
    host will decapsulate the packet independently following normal
    decapsulator operations. The receiving decapsulators should agree on
    the same set of GUE parameters and properties.

    GUE allows encapsulation of unicast, broadcast, or multicast
    traffic. Entropy for the inner flow identifier (UDP source port) may
    be generated from the header of encapsulated unicast or
    broadcast/multicast packets at an encapsulator. The mapping
    mechanism between the encapsulated multicast traffic and the
    multicast capability in the IP network is transparent and
    independent to the encapsulation and is otherwise outside the scope
    of this document.

6. Inner flow identifier properties

6.1. Flow classification

    A major objective of using GUE is that a network device can perform
    flow classification corresponding to the flow of the inner
    encapsulated packet based on the contents in the outer headers.

    Hardware devices commonly perform hash computations on packet
    headers to classify packets into flows or flow buckets. Flow
    classification is done to support load balancing (statistical
    multiplexing) of flows across a set of networking resources.
    Examples of such load balancing techniques are Equal Cost Multipath
    routing (ECMP), port selection in Link Aggregation, and NIC device
    Receive Side Scaling (RSS).  Hashes are usually either a three-tuple
    hash of IP protocol, source address, and destination address; or a
    five-tuple hash consisting of IP protocol, source address,
    destination address, source port, and destination port. Typically,
    networking hardware will compute five-tuple hashes for TCP and UDP,
    but only three-tuple hashes for other IP protocols. Since the five-
    tuple hash provides more granularity, load balancing can be finer
    grained with better distribution. When a packet is encapsulated with
    GUE, the source port in the outer UDP packet is set to reflect the
    flow of the inner packet. When a device computes a five-tuple hash
    on the outer UDP/IP header of a GUE packet, the resultant value
    classifies the packet per its inner flow.

    To support flow classification, the source port of the UDP header in
    GUE is set to a value that maps to the inner flow. This is referred



Herbert, Yong, Zia       Expires December 2016                 [Page 18]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


    to as the inner flow identifier. The inner flow identifier is set by
    the encapsulator; it can be computed on the fly based on packet
    contents or retrieved from a state maintained for the inner flow.

    Examples of deriving an inner flow identifier are:

      o If the encapsulated packet is a layer 4 packet, TCP/IPv4 for
        instance, the inner flow identifier could be based on the
        canonical five-tuple hash of the inner packet.

      o If the encapsulated packet is an AH transport mode packet with
        TCP as next header, the inner flow identifier could be a hash
        over a three-tuple: TCP protocol and TCP ports of the
        encapsulated packet.

      o If a node is encrypting a packet using ESP tunnel mode and GUE
        encapsulation, the inner flow identifier could be based on the
        contents of clear-text packet. For instance, a canonical five-
        tuple hash for a TCP/IP packet could be used.

6.2. Inner flow identifier properties

        The inner flow identifier is the value set in the UDP source
        port of a GUE packet. The inner flow identifier should adhere to
        the following properties:

      o The value set in the source port should be within the ephemeral
        port range. IANA suggests this range to be 49152 to 65535, where
        the high order two bits of the port are set to one. This
        provides fourteen bits of entropy for the inner flow identifier.

      o The inner flow identifier should have a uniform distribution
        across encapsulated flows.

      o An encapsulator may occasionally change the inner flow
        identifier used for an inner flow per its discretion (for
        security, route selection, etc). Changing the value should
        happen no more than once every thirty seconds.

      o Decapsulators, or any networking devices, should not attempt any
        interpretation of the inner flow identifier, nor should they
        attempt to reproduce any hash calculation. They may use the
        value to match further receive packets for steering decisions,
        but cannot assume that the hash uniquely or permanently
        identifies a flow.

      o Input to the inner flow identifier is not restricted to ports
        and addresses; input could include flow label from an IPv6



Herbert, Yong, Zia       Expires December 2016                 [Page 19]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


        packet, SPI from an ESP packet, or other flow related state in
        the encapsulator that is not necessarily conveyed in the packet.

      o The assignment function for inner flow identifiers should be
        randomly seeded to mitigate denial of service attacks. The seed
        may be changed periodically.

7. Motivation for GUE

   This section presents the motivation for GUE with respect to other
   encapsulation methods.

   A number of different encapsulation techniques have been proposed for
   the encapsulation of one protocol over another. EtherIP [RFC3378]
   provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784],
   MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling
   layer 2 and layer 3 packets over IP. NVGRE [NVGRE] and VXLAN
   [RFC7348] are proposals for encapsulation of layer 2 packets for
   network virtualization. IPIP [RFC2003] and Generic packet tunneling
   in IPv6 [RFC2473] provide methods for tunneling IP packets over IP.

   Several proposals exist for encapsulating packets over UDP including
   ESP over UDP [RFC3948], TCP directly over UDP [TCPUDP], VXLAN, LISP
   [RFC6830] which encapsulates layer 3 packets, and Generic UDP
   Encapsulation for IP Tunneling (GRE over UDP)[GREUDP]. Generic UDP
   tunneling [GUT] is a proposal similar to GUE in that it aims to
   tunnel packets of IP protocols over UDP.

   GUE has the following discriminating features:

      o UDP encapsulation leverages specialized network device
        processing for efficient transport. The semantics for using the
        UDP source port as an identifier for an inner flow are defined.

      o GUE permits encapsulation of arbitrary IP protocols, which
        includes layer 2 3, and 4 protocols. This potentially allows
        nearly all traffic within a data center to be normalized to be
        either TCP or UDP on the wire.

      o Multiple protocols can be multiplexed over a single UDP port
        number. This is in contrast to techniques to encapsulate
        protocols over UDP using a protocol specific port number (such
        as ESP/UDP, GRE/UDP, SCTP/UDP). GUE provides a uniform and
        extensible mechanism for encapsulating all IP protocols in UDP
        with minimal overhead (four bytes of additional header).

      o GUE is extensible. New flags and optional fields can be defined.




Herbert, Yong, Zia       Expires December 2016                 [Page 20]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


      o The GUE header includes a header length field. This allows a
        network node to inspect an encapsulated packet without needing
        to parse the full encapsulation header.

      o Private data in the encapsulation header allows local
        customization and experimentation while being compatible with
        processing in network nodes (routers and middleboxes).

      o GUE includes both data messages (encapsulation of packets) and
        control messages (such as OAM).

8. Security Considerations

   Encapsulation of IP protocols within GUE should not increase security
   risk, nor provide additional security in itself. As suggested in
   section 5 the source port for of UDP packets in GUE should be
   randomly seeded to mitigate some possible denial service attacks.

   Security for Generic UDP Encapsulation, including security for the
   GUE header and payload, is described in detail in [GUESEC].

9. IANA Consideration

   A user UDP port number assignment for GUE has been assigned:

          Service Name: gue
          Transport Protocol(s): UDP
          Assignee: Tom Herbert <therbert@google.com>
          Contact: Tom Herbert <therbert@google.com>
          Description: Generic UDP Encapsulation
          Reference: draft-herbert-gue
          Port Number: 6080
          Service Code: N/A
          Known Unauthorized Uses: N/A
          Assignment Notes: N/A

   IANA is requested to create a "GUE flag-fields" registry to allocate
   flags and optional fields for the primary GUE header flags and
   extension flags. This shall be a registry of bit assignments for
   flags, length of optional fields for corresponding flags, and
   descriptive strings. There are sixteen bits for primary GUE header
   flags (bit number 0-15) where bit 15 is reserved as the extension
   flag in this document. There are thirty-two bits for extension flags.

10. Acknowledgements

   The authors would like to thank David Liu, Erik Nordmark, and Fred
   Templin for valuable input on this draft.



Herbert, Yong, Zia       Expires December 2016                 [Page 21]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


11. References

11.1. Normative References


11.2. Informative References

   [RFC3828]  Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed.,
              and G. Fairhurst, Ed., "The Lightweight User Datagram
              Protocol (UDP-Lite)", RFC 3828, July 2004,
              <http://www.rfc-editor.org/info/rfc3828>.

   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
              eXtensible Local Area Network (VXLAN): A Framework for
              Overlaying Virtualized Layer 2 Networks over Layer 3
              Networks", RFC 7348, August 2014, <http://www.rfc-
              editor.org/info/rfc7348>.

   [RFC7605]  Touch, J., "Recommendations on Using Assigned Transport
              Port Numbers", BCP 165, RFC 7605, DOI 10.17487/RFC7605,
              August 2015, <http://www.rfc-editor.org/info/rfc7605>.

   [NVGRE]    Garg, P., and Wang, Y., "NVGRE: Network Virtualization
              using Generic Routing Encapsulation" draft-sridharan-
              virtualization-nvgre-08

   [TCPUDP]   Chesire, S., Graessley, J., and McGuire, R.,
              "Encapsulation of TCP and other Transport Protocols over
              UDP" draft-cheshire-tcp-over-udp-00

   [GREUDP]   Crabbe, E., Yong, L., Xu, X., and Herbert, T.,Cpsest
              "Generic UDP Encapsulation for IP Tunneling" draft-ietf-
              tsvwg-gre-in-udp-encap-06

   [GUESEC]   Yong, L., Herbert, T., "Generic UDP Encapsulation  (GUE)
              for Secure Transport", draft-hy-gue-4-secure-transport-02,
              work in progress.

   [UDPMAG]   Herbert, T., Yong, L., "UDP Magic Numbers", draft-herbert-
              udp-magic-numbers-01

   [GUT]      Manner, J., Varia, N., and Briscoe, B., "Generic UDP
              Tunnelling (GUT) draft-manner-tsvwg-gut-02.txt"

   [REMCSUM]  Herbert, T., "Remote Checksum Offload" draft-herbert-
              remotecsumoffload-00




Herbert, Yong, Zia       Expires December 2016                 [Page 22]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   [LCO]      Cree, E., "Checksum Offloads in the Linux Networking
              Stack", Linx Documentation,
              Documentation/networking/checksum-offloads.txt
Appendix A: NIC processing for GUE

   This appendix provides some guidelines for Network Interface Cards
   (NICs) to implement common offloads and accelerations to support GUE.
   Note that most of this discussion is generally applicable to other
   methods of UDP based encapsulation.

A.1. Receive multi-queue

   Contemporary NICs support multiple receive descriptor queues (multi-
   queue). Multi-queue enables load balancing of network processing for
   a NIC across multiple CPUs. On packet reception, a NIC must select
   the appropriate queue for host processing. Receive Side Scaling is a
   common method which uses the flow hash for a packet to index an
   indirection table where each entry stores a queue number. Flow
   Director and Accelerated Receive Flow Steering (aRFS) allow a host to
   program the queue that is used for a given flow which is identified
   either by an explicit five-tuple or by the flow's hash.

   GUE encapsulation should be compatible with multi-queue NICs that
   support five-tuple hash calculation for UDP/IP packets as input to
   RSS. The inner flow identifier (source port) ensures classification
   of the encapsulated flow even in the case that the outer source and
   destination addresses are the same for all flows (e.g. all flows are
   going over a single tunnel).

   By default, UDP RSS support is often disabled in NICs to avoid out of
   order reception that can occur when UDP packets are fragmented. As
   discussed above, fragmentation of GUE packets should be mitigated by
   fragmenting packets before entering a tunnel, path MTU discovery in
   higher layer protocols, or operator adjusting MTUs. Other UDP traffic
   may not implement such procedures to avoid fragmentation, so enabling
   UDP RSS support in the NIC should be a considered tradeoff during
   configuration.

A.2. Checksum offload

   Many NICs provide capabilities to calculate standard ones complement
   payload checksum for packets in transmit or receive. When using GUE
   encapsulation there are at least two checksums that may be of
   interest: the encapsulated packet's transport checksum, and the UDP
   checksum in the outer header.

A.2.1. Transmit checksum offload




Herbert, Yong, Zia       Expires December 2016                 [Page 23]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   NICs may provide a protocol agnostic method to offload transmit
   checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with
   GUE. In this method the host provides checksum related parameters in
   a transmit descriptor for a packet. These parameters include the
   starting offset of data to checksum, the length of data to checksum,
   and the offset in the packet where the computed checksum is to be
   written. The host initializes the checksum field to pseudo header
   checksum.

   In the case of GUE, the checksum for an encapsulated transport layer
   packet, a TCP packet for instance, can be offloaded by setting the
   appropriate checksum parameters.

   NICs typically can offload only one transmit checksum per packet.
   Multiple checksums may simultaneously be offloaded using Local
   Checksum Offload [LCO]. LCO is a technique for efficiently computing
   the outer checksum of an encapsulated datagram when the inner
   checksum is due to be offloaded. The ones-complement sum of a
   correctly checksummed TCP or UDP packet is equal to the complement of
   the sum of the pseudo header, because everything else gets 'canceled
   out' by the checksum field.  This is because the sum was complemented
   before being written to the checksum field. More generally, this
   holds in any case where the 'IP-style' ones complement checksum is
   used, and thus any checksum that TX Checksum Offload supports. That
   is, if we have set up TX Checksum Offload with a start/offset pair,
   we know that after the device has filled in that checksum, the ones
   complement sum from starting offset to the end of the packet will be
   equal to the complement of whatever value we put in the checksum
   field beforehand.

   If an encapsulator is co-resident with a host, then checksum offload
   may be performed using remote checksum offload [REMCSUM]. Remote
   checksum offload relies on NIC offload of the simple UDP/IP checksum
   which is commonly supported even in legacy devices. In remote
   checksum offload the outer UDP checksum is set and the GUE header
   includes an option indicating the start and offset of the inner
   "offloaded" checksum. The inner checksum is initialized to the pseudo
   header checksum. When a decapsulator receives a GUE packet with the
   remote checksum offload option, it completes the offload operation by
   determining the packet checksum from the indicated start point to the
   end of the packet, and then adds this into the checksum field at the
   offset given in the option. Computing the checksum from the start to
   end of packet is efficient if checksum-complete is provided on the
   receiver.

A.2.2. Receive checksum offload

   GUE is compatible with NICs that perform a protocol agnostic receive



Herbert, Yong, Zia       Expires December 2016                 [Page 24]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   checksum (CHECKSUM_COMPLETE in Linux parlance). In this technique, a
   NIC computes a ones complement checksum over all (or some predefined
   portion) of a packet. The computed value is provided to the host
   stack in the packet's receive descriptor. The host driver can use
   this checksum to "patch up" and validate any inner packet transport
   checksum, as well as the outer UDP checksum if it is non-zero.

   Many legacy NICs don't provide checksum-complete but instead provide
   an indication that a checksum has been verified (CHECKSUM_UNNECESSARY
   in Linux). Usually, such validation is only done for simple TCP/IP or
   UDP/IP packets. If a NIC indicates that a UDP checksum is valid, the
   checksum-complete value for the UDP packet is the "not" of the pseudo
   header checksum. In this way, checksum-unnecessary can be converted
   to checksum-complete. So if the NIC provides checksum-unnecessary for
   the outer UDP header in an encapsulation, checksum conversion can be
   done so that the checksum-complete value is derived and can be used
   by the stack to validate an checksums in the encapsulated packet.

A.3. Transmit Segmentation Offload

   Transmit Segmentation Offload (TSO) is a NIC feature where a host
   provides a large (>MTU size) TCP packet to the NIC, which in turn
   splits the packet into separate segments and transmits each one. This
   is useful to reduce CPU load on the host.

   The process of TSO can be generalized as:

      - Split the TCP payload into segments which allow packets with
        size less than or equal to MTU.

      - For each created segment:

        1. Replicate the TCP header and all preceding headers of the
           original packet.

        2. Set payload length fields in any headers to reflect the
           length of the segment.

        3. Set TCP sequence number to correctly reflect the offset of
           the TCP data in the stream.

        4. Recompute and set any checksums that either cover the payload
           of the packet or cover header which was changed by setting a
           payload length.

   Following this general process, TSO can be extended to support TCP
   encapsulation in GUE.  For each segment the Ethernet, outer IP, UDP
   header, GUE header, inner IP header if tunneling, and TCP headers are



Herbert, Yong, Zia       Expires December 2016                 [Page 25]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   replicated. Any packet length header fields need to be set properly
   (including the length in the outer UDP header), and checksums need to
   be set correctly (including the outer UDP checksum if being used).

   To facilitate TSO with GUE it is recommended that optional fields
   should not contain values that must be updated on a per segment
   basis-- for example the GUE fields should not include checksums,
   lengths, or sequence numbers that refer to the payload. If the GUE
   header does not contain such fields then the TSO engine only needs to
   copy the bits in the GUE header when creating each segment and does
   not need to parse the GUE header.

A.4. Large Receive Offload

   Large Receive Offload (LRO) is a NIC feature where packets of a TCP
   connection are reassembled, or coalesced, in the NIC and delivered to
   the host as one large packet. This feature can reduce CPU utilization
   in the host.

   LRO requires significant protocol awareness to be implemented
   correctly and is difficult to generalize. Packets in the same flow
   need to be unambiguously identified. In the presence of tunnels or
   network virtualization, this may require more than a five-tuple match
   (for instance packets for flows in two different virtual networks may
   have identical five-tuples). Additionally, a NIC needs to perform
   validation over packets that are being coalesced, and needs to
   fabricate a single meaningful header from all the coalesced packets.

   The conservative approach to supporting LRO for GUE would be to
   assign packets to the same flow only if they have identical five-
   tuple and were encapsulated the same way. That is the outer IP
   addresses, the outer UDP ports, GUE protocol, GUE flags and fields,
   and inner five tuple are all identical.

Appendix B: Privileged ports

   Using the source port to contain an inner flow identifier value
   disallows the security method of a receiver enforcing that the source
   port be a privileged port. Privileged ports are defined by some
   operating systems to restrict source port binding. Unix, for
   instance, considered port number less than 1024 to be privileged.

   Enforcing that packets are sent from a privileged port is widely
   considered an inadequate security mechanism and has been mostly
   deprecated. To approximate this behavior, an implementation could
   restrict a user from sending a packet destined to the GUE port
   without proper credentials.




Herbert, Yong, Zia       Expires December 2016                 [Page 26]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


Appendix C: Inner flow identifier as a route selector

   An encapsulator generating an inner flow identifier may modulate the
   value to perform a type of multipath source routing. Assuming that
   networking switches perform ECMP based on the flow hash, a sender can
   affect the path by altering the inner flow identifier.  For instance,
   a host may store a flow hash in its PCB for an inner flow, and may
   alter the value upon detecting that packets are traversing a lossy
   path. Changing the inner flow identifier for a flow should be subject
   to hysteresis (at most once every thirty seconds) to limit the number
   of out of order packets.

Appendix D: Hardware protocol implementation considerations

   A low level protocol, such is GUE, is likely interesting to being
   supported by high speed network devices. Variable length header (VLH)
   protocols like GUE are often considered difficult to efficiently
   implement in hardware. In order to retain the important
   characteristics of an extensible and robust protocol, hardware
   vendors may practice "constrained flexibility". In this model, only
   certain combinations or protocol header parameterizations are
   implemented in hardware fast path. Each such parameterization is
   fixed length so that the particular instance can be optimized as a
   fixed length protocol. In the case of GUE this constitutes specific
   combinations of GUE flags, fields, and next protocol. The selected
   combinations would naturally be the most common cases which form the
   "fast path", and other combinations are assumed to take the "slow
   path".

   In time, needs and requirements of the protocol may change which may
   manifest themselves as new parameterizations to be supported in the
   fast path. To allow allow this extensibility, a device practicing
   constrained flexibility should allow the fast path parameterizations
   to be programmable.

Authors' Addresses

   Tom Herbert
   Facebook
   1 Hacker Way
   Menlo Park, CA 94052
   US

   Email: tom@herbertland.com

   Lucy Yong
   Huawei USA
   5340 Legacy Dr.



Herbert, Yong, Zia       Expires December 2016                 [Page 27]


Internet Draft         Generic UDP Encapsulation           June 10, 2016


   Plano, TX 75024
   US

   Email: lucy.yong@huawei.com

   Osama Zia
   Microsoft
   1 Microsoft Way
   Redmond, WA 98029
   US

   Email: osamaz@microsoft.com







































Herbert, Yong, Zia       Expires December 2016                 [Page 28]