Geneve: Generic Network Virtualization Encapsulation
Summary: Has 8 DISCUSSes. Needs 5 more YES or NO OBJECTION positions to pass.
Alissa Cooper Discuss
Exciting to see this work progressing. Section 3.5 (and Section 7): "Type (8 bits): Type indicating the format of the data contained in this option. Options are primarily designed to encourage future extensibility and innovation and so standardized forms of these options will be defined in a separate document." I'm a little confused about what is expected to happen with the option classes and types. Are all future option types in the 0x0000..0x00FF range expected to be specified in a single separate document? If not, that should be clarified. I also think there needs to be a normative requirement that such future specifications define all of the types associated with the option classes. In the registry defined in Section 7, I think the table needs a column for the document to reference for each option class definition. That way when option classes are defined in the 0x0000..0x00FF range, implementers and operators will be able to find the reference and understand the semantics of the types. For the vendor-specific options this can be optional, but still would be nice to list if such documentation exists.
Section 1: s/Current work/Work/ What is meant by "service based context for interposing advanced middleboxes?" (I think the verb tense is tripping me up -- are the middleboxes already there?) Section 1.2: "A transit device MAY be capable of understanding the Geneve packet format but does not originate or terminate Geneve packets." I don't think normative MAY is appropriate here. Section 2.1: s/the VXLAN spec/the VXLAN spec [RFC7348]/ Section 2.2: "Transit devices MAY be able to interpret the options" Normative MAY is not appropriate here. The normative requirement is captured in the last sentence of the paragraph. Section 4.6: "Conversely, when performing LRO, a NIC MAY assume that a binary comparison of the options (including unknown options) is sufficient to ensure equality" Normative MAY is not appropriate here.
Roman Danyliw Discuss
(1) The threat model assumed by geneve appears to be expressed in conflicting ways. Section 4.1 notes that RFC8085’s definition of “controlled environment” applies. However, - Section 6 notes “When crossing an untrusted link, such as the public Internet, …” - Section 6.1 notes “Geneve data traffic between tenant systems across such separated networks should be protected from threats when traversing public networks. Any Geneve overlay data leaving the data center network beyond the operator's security domain SHOULD be secured by encryption mechanisms such as IPsec or other VPN mechanisms to protect the communications between the NVEs when they are geographically separated over untrusted network links.” The advice provided in Section 6.x is sound. Nevertheless, it doesn’t appear to describe a “controlled environment”. (2) Section 6. Per “Compromised tunnel endpoints may also spoof identifiers in the tunnel header to gain access to networks owned by other tenants”, couldn’t compromised transit devices do the same? (3) Section 6.1. Similar to what is discussed in Section 6.2 (for integrity), please refer to the impact of a compromised node on confidentiality. For example (not verbatim) “A compromised network node or a transit device within a data center may passively monitor Geneve packet data between NVEs; or route traffic for further inspection.” (4) Section 6.1. Per “Due to the nature of multi-tenancy in such environments, a tenant system may expect data confidentiality to ensure its packet data is not tampered with (active attack) in transit or a target of unauthorized monitoring (passive attack).”, please provide additional precision on the confidentiality. It is only relative to other tenants, but not from the provider (who can engage in tampering and passive monitoring).
I support Ben Kaduk’s DISCUSS position. To reiterate part of his write-up, the role of the transit device which is only permitted to inspect the geneve traffic isn’t clear, especially if end-to-end security is applied. RFC7365 didn’t provide insight into this architectural element.
Benjamin Kaduk Discuss
This first point is a "discuss discuss" for which I'd like to get a sense of what the rest of the IESG feels. I've read the discussion at https://mailarchive.ietf.org/arch/msg/last-call/ywRKREnxWAlunHR7MSaTM4ScsDs but I'm left with a similar sense of uncertainty that Daniel has as to whether the question is fully resolved. Specifically, "the question" that I have in mind is to what extent the Geneve architecture includes support for middleboxes that inspect (but do not modify!) the Geneve header and inner payload, to what extent the Geneve architecture is intended to be applicable to scenarios where (end-to-end per-tunnel) underlay confidentiality protection is necessary, and whether those requirements are both strong enough to be deemed an internal inconsistency of requirements/applicability. "Interposing advanced middleboxes" and "service interposition" are conceived as possible uses for Geneve metadata in Sections 1 and 2.2 as a consideration for why structured tagging is needed on the data plane and not just the control plane, which to me suggests that such usage is considered a first-class use case for Geneve. Section 6.1.1 discusses encryption for traffic traversing untrusted links between geographically separated data centers (though perhaps in this case an encrypted tunnel would be used just for that untrusted transit and leaving the in-datacenter traffic visible to middleboxes), but Section 6.1 discusses cases where the tenant may expect the service provider to provide confidentiality as part of the service. Would this be above or below the Geneve encapsulation? Might some customers insist on one or the other? The consideration from Section 6.1 that the provider of the underlay and the provider of the overlay may not be the same could be taken to imply that the overlay provider itself wants (cryptographic) protection from the underlay provider. I don't have a clear picture of how these considerations interact. (I also note that, since DTLS is mentioned, DTLS 1.3 is going the way of TLS 1.3 and not defining any authentication-only ciphersuites, so if authentication-only service is desired, DTLS may not be the way of the future, leaving IPsec AH as the leading candidate.) Some other section-by-section discuss-level points follow, mostly self-contained/localized issues. Section 3.5.1 o Some options may be defined in such a way that the position in the option list is significant. Options MUST NOT be changed by transit devices. o An option SHOULD NOT be dependent upon any other option in the packet, i.e., options can be processed independently of one another. [...] As was already noted, I don't see how these two requirements are self-consistent. size. A particular option is specified to have either a fixed length, which is constant, or a variable length, which may change over time or for different use cases. This property is part of the definition of the option and conveyed by the 'Type'. For fixed This text is written as if this specification is going to specify further substructure for the "Type", with respect to certain types that have fixed length and others that may vary. Otherwise the property would be attached to the option value and not the type value, in my understanding. With the current way the registry is laid out it seems like we need to explicitly say that the entity allocating the option class value needs to specify the interpretation of the 'type' field when used with that option class. Section 4.3.1 2. If Geneve is used with zero UDP checksum over IPv6 then such tunnel endpoint implementation MUST meet all the requirements specified in section 4 of [RFC6936] and requirements 1 as specified in section 5 of [RFC6936]. This seems to implicitly be saying that the other numbered requirements in Section 5 of RFC 6936 can be ignored, which is updating the behavior of a standards-track document. We need to either be explicit about the update or justify why (the rest of) that applicability statement is not applicable here. If, as the paragraph following the enumerated list says, the requirements specified in RFC 6936 continue to apply in full, why do we need to call out a MUST-level requirement here? 4. The Geneve tunnel endpoint that encapsulates the tunnel MAY use different IPv6 source addresses for each Geneve tunnel that uses Zero UDP checksum mode in order to strengthen the decapsulator's check of the IPv6 source address (i.e the same IPv6 source address is not to be used with more than one IPv6 destination address, irrespective of whether that destination address is a unicast or multicast address). When this is not possible, it is RECOMMENDED to use each source address for as few Geneve tunnels that use zero UDP checksum as is feasible. This functionality is not usable without some mechanism to signal from encapsulator to decapsulator that it is in use. The requirement to check the source IPv6 address in addition to the destination IPv6 address, [...] I do not see this specified as a requirement, only a MAY-level suggestion. Section 4.6 o When performing LSO, a NIC MUST replicate the entire Geneve header and all options, including those unknown to the device, onto each resulting segment. However, a given option definition may override this rule and specify different behavior in supporting devices. [...] This second sentence makes the MUST in the first no longer a MUST.
Section 2.2.1 recipient. As new functionality becomes sufficiently well defined to add to tunnel endpoints, supporting options can be designed using ordering restrictions and other techniques to ease parsing. I'm having trouble parsing the second half of this sentence -- what does "supporting options" mean as a noun? Further, either tunnel endpoints or transit devices MAY use offload capabilities of NICs such as checksum offload to improve the performance of Geneve packet processing. The presence of a Geneve variable length header SHOULD NOT prevent the tunnel endpoints and transit devices from using such offload capabilities. I agree with the directorate reviewer that this implementation guidance is unenforcable as normative keywords. Section 3.1, 3.2 If we're going to give concrete values for the IPv4 protocol/IPv6 NextHeader (17) and destination port (6081), shouldn't we also use the concreve value for Geneve protocol type (0x6558) that corresponds to the inner ethernet frame? I'd also suggest some visual distinction that the "Variable Length Options" do in fact have variable length, perhaps using the '~' character in vertical lines. Similarly, the original ethernet payload need not be 4-byte-aligned and the figure could make that more prominent. It's a little awkward to expand FCS on second usage, not first usage. Section 3.4 The critical bit allows hardware implementations the flexibility to handle options processing in the hardware fastpath or in the exception (slow) path without the need to process all the options. For example, a critical option such as secure hash to provide Geneve header integrity check must be processed by tunnel endpoints and typically processed in the hardware fastpath. I think I'm failing to make a connection between some of these steps. How does having a critical bit let a header integrity check happen in the hardware fastpath while deferring other option processing to software? Transit devices MUST maintain consistent forwarding behavior irrespective of the value of 'Opt Len', including ECMP link selection. These devices SHOULD be able to forward packets containing options without resorting to a slow path. There seem to be two broad aspects in play here. First, requiring insensitivity to "Opt Len" might be because the value would change as a packet traverses the network. I think this is forbidden by virtue of transit devices not being allowed to add/delete options, but please confirm. Second, this affects the ability of transit devices to look past the geneve header to the inner ethernet header and payload. Given the substantial discussion we've had in the broader IETF about IPv6 extension headers and the inability of hardware to examine such variable-length chains to get to the actual upper layer protocol (with the result that extension headers are largely unusuable on substantial portions of the internet), it seems like we might conclude from this statement that either we expect transit devices to not inspect the upper-layer content or there's a significant chance that this requirement will be ignored (possibly just by capping the 'Opt Len' value that is supported), or both. What makes this setup different from IPv6 EH such that we expect hardware compliance and a usable deployment? This is particularly poigniant given that we claim this to be a requirement on transit devices but allow (in Section 4.5) for endpoints to use profiles that have a restricted maximum length for the options. If such profiles are common, the incentive for transit devices to slip and use the lower maximum length increases. Section 3.5 The high order bit of the option type indicates that this is a critical option. If the receiving tunnel endpoint does not recognize this option and this bit is set then the packet MUST be dropped. If the 'C' bit (critical bit) is set in any option then the 'C' bit in the Geneve base header MUST also be set. Transit devices MUST NOT drop packets on the basis of this bit. The nit: since we mention the Geneve header, one might claim that "this bit" in "MUST NOT drop packets on the basis of this bit" is ambiguous (but since we said this before for the Geneve header one, I assume we're talking about the one in the Type field now). Section 4.4.1 It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191], [RFC8201]) be used by setting the DF bit in the IP header when Geneve packets are transmitted over IPv4 (this is the default with IPv6). Is it the default or the only specified behavior for IPv6? Section 4.4.3 outside of the scope of this document. When physical multicast is in use, the 'C' bit in the Geneve header may be used with groups of devices with heterogeneous capabilities as each device can interpret only the options that are significant to it if they are not critical. Please double-check this sentence, particularly the "may be used". If the intent is, as written, to note that the packets with the 'C' bit set might take paths with heterogenous paths, I suggest being more explicit about the consequences that the traffic might only be delivered to some but not all endpoints. Section 6 untrusted boundaries. In addition, tunnel endpoints should only be operated in environments controlled by the service provider, such as the hypervisor itself rather than within a customer VM. Can you say a bit more about how this "should only be operated in environments controlled by the service provider" meshes with the note in Section 4.1 that "[i]t is intended for use in public or private data center environments" (specifically the "public data center" portion) and the note in Section 6.1 that the provider of the overlay may not be the same as the provider of the underlay? Section 6.1.1 traversing public networks. Any Geneve overlay data leaving the data center network beyond the operator's security domain SHOULD be secured by encryption mechanisms such as IPsec or other VPN mechanisms to protect the communications between the NVEs when they are geographically separated over untrusted network links. Since we use "mechanisms" in both the IPsec clause and the "other VPN" clause, the "encryption" does not automatically bind to both clauses from a grammatical perspective. Given that "VPN" is currently in use for both encrypted and non-encrypted schemes (much to my chagrin), please clarify that the other VPN mechanisms also need to provide cryptographic confidentiality protection. (Replacing "VPN mechanisms" with "VPN technologies" would probably suffice.) Section 6.2 network. To prevent such attacks, an NVE MUST NOT propagate Geneve packets beyond the NVE to tenant systems and SHOULD employ packet We also care about not propagating Geneve packets from the tenant systems past the NVE, right? filtering mechanisms so as not to forward unauthorized traffic between TSs in different tenant networks. What does "TS" stand for, here? Section 10.2 RFCs 1191, 2460 (er, 8200), 6040, and 8201 should be listed as normative references. [ETYPES] The IEEE Registration Authority, "IEEE 802 Numbers", 2013, <http://www.iana.org/assignments/ieee-802-numbers/ieee- 802-numbers.xml>. Hmm, firefox claims the content of this resource is invalid XML, sigh.
Suresh Krishnan Discuss
* Section 3.3. This might be an easy DISCUSS to resolve. Since the specification requires the Destination port to be configurable, it is not clear to me how the "transit" devices will identify Geneve packets being sent to a non-default port (i.e. not 6081). Can you please clarify?
I support Ben's DISCUSS position and I would like to ensure that the concerns brought up regarding transit devices and UDP zero checksums are resolved. I would also like to ensure that RFC8200 is used as the reference for the IPv6 protocol as stated in Eric's DISCUSS. * Section 3.3 Have you considered the use of the flow label instead of source port for in the IPv6 tunnel case? I highly recommend looking at [RFC6438] for further details as it is specifically addresses ECMP for IP-in-IPv6 tunneled traffic.
Mirja Kühlewind Discuss
Thanks for the really well written document that addresses all transport related question well (and thanks to David for the early TSV review!). I only have one minor process point that need to be addressed before publication: Inline with RFC6335 the Assignee and Contact of the port entry should also be updated to IESG <email@example.com> and IETF Chair <firstname.lastname@example.org> respectively.
1) One small comment/question on the editorial note in sec 4.4.1: "It was discussed during TSVART early review if the level of requirement for maintaining tunnel MTU at the ingress has to be "MAY" or "SHOULD". The discussion concluded that it was appropriate to leave this as "MAY", considering the high level of state to be maintained. I would have preferred a SHOULD and I'm not sure I understand what state your are talking about...? 2) And one more small question on sec 4.4.1. in general: Is the assumption that all tunnel packets have the same options (and therefore same Geneve header length) at a certain ingress, or should the announced MTU always consider the maximum length that a certain ingress could produce. Would be good to clarify this in the document! 3) Section 6: "When crossing an untrusted link, such as the public Internet, IPsec [RFC4301] may be used to provide authentication and/or encryption of the IP packets formed as part of Geneve encapsulation." Should this maybe be a normative SHOULD and not a lower case "may"? 3) And one random thought on the protocol design (given we all love to design protocols :-) ): Was it considered to require to have critical options first in order to speed up processing?
Barry Leiba Discuss
This will be trivial to address: — Section 1.2 — The NVO3 framework [RFC7365] defines many of the concepts commonly used in network virtualization. Indeed, and it seems a critical normative reference here. So why is it in the informative section?
I support Ben’s DISCUSS and comments. In addition: — Section 3.3 — In the description of the UDP Checksum, the first paragraph says the checksum MUST be set for v6, then the second paragraph contradicts that. You really should note when the MUST is specified that there are exceptions. — Section 3.5 — In the description of the Type field, I believe it confuses things to say that it’s 8 bits, and then to say that the first bit is not really part of the type, but has a special meaning. Why do you not show the C bit and Type field in the main diagram as it is shown in the mini-figure, describe the C bit separately, and define the Type field as 7 bits?
Éric Vyncke Discuss
Thank you for the work put into this document. It solves an interesting problem and the document is easy to read. I have one DISCUSS that is **trivial to fix** and some COMMENTs, feel free to ignore my COMMENTs even if I would appreciate your answers to those COMMENTs. Regards, -éric == DISCUSS == -- Section 3.3 -- Please use RFC 8200 the 'new' IPv6 standard rather than RFC 2460 ;-)
== COMMENTS == -- Generic -- Is it worth mentioning that when transporting an Ethernet frame neither the preamble nor the inter-frame gap are included? (AFAIR, IEEE considers those parts as integral part of the IEEE 802.3 frame) Is a length of 24 bits for the VNI be enough? -- Section 1 -- In the list of protocols, rather than presenting the current list as comprehensive, I would suggest to clearly present this list as non-exhaustive. Is it worth to mention the reasoning behind "one additional defining requirement is the need to carry system state along with the packet data" (beside common sense) -- Section 4.4.1 -- It is unclear to me whether Geneve endpoints can fragment the Geneve UDP-encapsulated packet itself as the transit routers see only unfragmentable packets.
Magnus Westerlund Discuss
I want to discuss the implications of the source port usage and if that needs a bit more consideration of failure cases and ICMP. So Section 3.3 says: Source port: A source port selected by the originating tunnel endpoint. This source port SHOULD be the same for all packets belonging to a single encapsulated flow to prevent reordering due to the use of different paths. To encourage an even distribution of flows across multiple links, the source port SHOULD be calculated using a hash of the encapsulated packet headers using, for example, a traditional 5-tuple. Since the port represents a flow identifier rather than a true UDP connection, the entire 16-bit range MAY be used to maximize entropy. I think using the different source ports to enable flow hashing is a nice idea. However, I am a bit worried over the implications of using the full 16-bit range without caveats. Specifically in cases where a network error or other failure to forward the Geneve encapsulated packet and that result in any form a return traffic towards the tunnel ingress. Such as ICMP Packet Too Big messages or Port / Host unreachable. These messages needs to be consumed by the Geneve tunneling endpoint to affect the right response to them. However, if the source port is corresponding to any port where there exist a listenser or bi-directional server on the tunnel ingress host, such as SSH, Echo etc. the ICMP messages may be consumed by the wrong entity that only filter on source port and not the destination port. I believe this issue may require at least a explicit consideration in the document. Otherwise thanks for thinking through many transport issues for tunnels.
Alvaro Retana Yes
Martin Vigoureux Yes
Deborah Brungard No Objection
Warren Kumari No Objection
Adam Roach No Objection
Thanks for the work that went into consolidating network tunneling protocols into a single, unified design. I have one comment that I think is rather important to Geneve's success. In fact, I'm on the wall about whether this comment should be a DISCUSS, since I think the current design will render Geneve broadly unusable in a number of important use-cases. > Dest port: IANA has assigned port 6081 as the fixed well-known > destination port for Geneve. Although the well-known value should > be used by default, it is RECOMMENDED that implementations make > this configurable. The chosen port is used for identification of > Geneve packets and MUST NOT be reversed for different ends of a > connection as is done with TCP. This behavior -- using 6081 as the destination in both directions -- has the unfortunate property of violating NAT and Firewall assumptions about the nature of UDP traffic (see RFC 4748 for a discussion of UDP behavior in NATs). For example, while RTP was originally specified to typically work in the way described here (using two unrelated unidirectional flows when a bidirectional flow was desired), all (or nearly all) modern implementations use a technique known as "symmetric RTP" (see RFC 4961), which uses port numbers in the same way as TCP does. I can't find any discussion of NAT traversal in this document. One might assume that such responsibility is delegated to the control plane, but it should be noted that this specific requirement is going to frustrate every NAT traversal technique that I'm aware of (save for the mostly undeployed NAT-PMP and similar approaches), regardless of how well-designed the control plane is. If the working group has already considered NAT/Firewall traversal and decided to use the specified design anyway , please add text laying out the rationale in this document. If this point has not yet been discussed, I urge the working group to withdraw its request for publication and to carefully reconsider the implications of this specific normative requirement. (I take the point about the design being applicable to "controlled networks," but that doesn't necessarily imply the absence of a NAT or a non-NAT Firewall; and, as Roman notes in his DISCUSS, the applicability statement appears to be overstated anyway: if crossing public networks -- as this document clearly anticipates -- using IPv4, the presence of a CG-NAT device will become increasingly likely as time goes on.) ____  I searched mailarchive.ietf.org and found no such discussion, but did not search meeting minutes