Geneve: Generic Network Virtualization Encapsulation
draft-ietf-nvo3-geneve-14

Summary: Has 8 DISCUSSes. Needs 5 more YES or NO OBJECTION positions to pass.

Alissa Cooper Discuss

Discuss (2019-12-03)
Exciting to see this work progressing.

Section 3.5 (and Section 7):

"Type (8 bits):  Type indicating the format of the data contained in
      this option.  Options are primarily designed to encourage future
      extensibility and innovation and so standardized forms of these
      options will be defined in a separate document."
    
I'm a little confused about what is expected to happen with the option classes and types. Are all future option types in the 0x0000..0x00FF range expected to be specified in a single separate document? If not, that should be clarified. I also think there needs to be a normative requirement that such future specifications define all of the types associated with the option classes.

In the registry defined in Section 7, I think the table needs a column for the document to reference for each option class definition. That way when option classes are defined in the 0x0000..0x00FF range, implementers and operators will be able to find the reference and understand the semantics of the types. For the vendor-specific options this can be optional, but still would be nice to list if such documentation exists.
Comment (2019-12-03)
Section 1: 

s/Current work/Work/

What is meant by "service based context for interposing advanced middleboxes?" (I think the verb tense is tripping me up -- are the middleboxes already there?)

Section 1.2:

"A transit device MAY be capable of understanding the Geneve packet
   format but does not originate or terminate Geneve packets."
   
I don't think normative MAY is appropriate here.

Section 2.1:

s/the VXLAN spec/the VXLAN spec [RFC7348]/

Section 2.2:

"Transit devices MAY be able to interpret the options"

Normative MAY is not appropriate here. The normative requirement is captured in the last sentence of the paragraph.

Section 4.6:

"Conversely, when performing LRO, a NIC MAY assume that a
      binary comparison of the options (including unknown options) is
      sufficient to ensure equality"
    
Normative MAY is not appropriate here.

Roman Danyliw Discuss

Discuss (2019-12-04)
(1) The threat model assumed by geneve appears to be expressed in conflicting ways.  Section 4.1 notes that RFC8085’s definition of “controlled environment” applies.  However, 

- Section 6 notes “When crossing an untrusted link, such as the public Internet, …”

- Section 6.1 notes “Geneve data traffic between tenant systems across such separated networks should be protected from threats when traversing public networks. Any Geneve overlay data leaving the data center network beyond the operator's security domain SHOULD be secured by encryption mechanisms such as IPsec or other VPN mechanisms to protect the communications between the NVEs when they are geographically separated over untrusted network links.”  

The advice provided in Section 6.x is sound.  Nevertheless, it doesn’t appear to describe a “controlled environment”.

(2) Section 6.  Per “Compromised tunnel endpoints may also spoof identifiers in the tunnel header to gain access to networks owned by other tenants”, couldn’t compromised transit devices do the same?

(3) Section 6.1.  Similar to what is discussed in Section 6.2 (for integrity), please refer to the impact of a compromised node on confidentiality.  For example (not verbatim) “A compromised network node or a transit device within a data center may passively monitor Geneve packet data between NVEs; or route traffic for further inspection.”

(4) Section 6.1.  Per “Due to the nature of multi-tenancy in such environments, a tenant system may expect data confidentiality to ensure its packet data is not tampered with (active attack) in transit or a target of unauthorized monitoring (passive attack).”, please provide additional precision on the confidentiality. It is only relative to other tenants, but not from the provider (who can engage in tampering and passive monitoring).
Comment (2019-12-04)
I support Ben Kaduk’s DISCUSS position.  To reiterate part of his write-up, the role of the transit device which is only permitted to inspect the geneve traffic isn’t clear, especially if end-to-end security is applied.  RFC7365 didn’t provide insight into this architectural element.

Benjamin Kaduk Discuss

Discuss (2019-12-04)
This first point is a "discuss discuss" for which I'd like to get a
sense of what the rest of the IESG feels.  I've read the discussion at
https://mailarchive.ietf.org/arch/msg/last-call/ywRKREnxWAlunHR7MSaTM4ScsDs
but I'm left with a similar sense of uncertainty that Daniel has as to
whether the question is fully resolved.  Specifically, "the question"
that I have in mind is to what extent the Geneve architecture includes
support for middleboxes that inspect (but do not modify!) the Geneve
header and inner payload, to what extent the Geneve architecture is
intended to be applicable to scenarios where (end-to-end per-tunnel)
underlay confidentiality protection is necessary, and whether those
requirements are both strong enough to be deemed an internal
inconsistency of requirements/applicability.  "Interposing advanced
middleboxes" and "service interposition" are conceived as possible uses
for Geneve metadata in Sections 1 and 2.2 as a consideration for why
structured tagging is needed on the data plane and not just the control
plane, which to me suggests that such usage is considered a first-class
use case for Geneve.  Section 6.1.1 discusses encryption for traffic
traversing untrusted links between geographically separated data
centers (though perhaps in this case an encrypted tunnel would be used
just for that untrusted transit and leaving the in-datacenter traffic
visible to middleboxes), but Section 6.1 discusses cases where the tenant
may expect the service provider to provide confidentiality as part of
the service.  Would this be above or below the Geneve encapsulation?
Might some customers insist on one or the other?  The consideration from
Section 6.1 that the provider of the underlay and the provider of the
overlay may not be the same could be taken to imply that the overlay
provider itself wants (cryptographic) protection from the underlay
provider.  I don't have a clear picture of how these considerations
interact.  (I also note that, since DTLS is mentioned, DTLS 1.3 is going
the way of TLS 1.3 and not defining any authentication-only
ciphersuites, so if authentication-only service is desired, DTLS may not
be the way of the future, leaving IPsec AH as the leading candidate.)

Some other section-by-section discuss-level points follow, mostly
self-contained/localized issues.

Section 3.5.1

   o  Some options may be defined in such a way that the position in the
      option list is significant.  Options MUST NOT be changed by
      transit devices.

   o  An option SHOULD NOT be dependent upon any other option in the
      packet, i.e., options can be processed independently of one
      another.  [...]

As was already noted, I don't see how these two requirements are
self-consistent.

   size.  A particular option is specified to have either a fixed
   length, which is constant, or a variable length, which may change
   over time or for different use cases.  This property is part of the
   definition of the option and conveyed by the 'Type'.  For fixed

This text is written as if this specification is going to specify
further substructure for the "Type", with respect to certain types that
have fixed length and others that may vary.  Otherwise the property
would be attached to the option value and not the type value, in my
understanding.  With the current way the registry is laid out it seems
like we need to explicitly say that the entity allocating the option
class value needs to specify the interpretation of the 'type' field when
used with that option class.

Section 4.3.1

   2.  If Geneve is used with zero UDP checksum over IPv6 then such
       tunnel endpoint implementation MUST meet all the requirements
       specified in section 4 of [RFC6936] and requirements 1 as
       specified in section 5 of [RFC6936].

This seems to implicitly be saying that the other numbered requirements
in Section 5 of RFC 6936 can be ignored, which is updating the behavior
of a standards-track document.  We need to either be explicit about the
update or justify why (the rest of) that applicability statement is not
applicable here.  If, as the paragraph following the enumerated list
says, the requirements specified in RFC 6936 continue to apply in full,
why do we need to call out a MUST-level requirement here?

   4.  The Geneve tunnel endpoint that encapsulates the tunnel MAY use
       different IPv6 source addresses for each Geneve tunnel that uses
       Zero UDP checksum mode in order to strengthen the decapsulator's
       check of the IPv6 source address (i.e the same IPv6 source
       address is not to be used with more than one IPv6 destination
       address, irrespective of whether that destination address is a
       unicast or multicast address).  When this is not possible, it is
       RECOMMENDED to use each source address for as few Geneve tunnels
       that use zero UDP checksum as is feasible.

This functionality is not usable without some mechanism to signal from
encapsulator to decapsulator that it is in use.

   The requirement to check the source IPv6 address in addition to the
   destination IPv6 address, [...]

I do not see this specified as a requirement, only a MAY-level
suggestion.

Section 4.6

   o  When performing LSO, a NIC MUST replicate the entire Geneve header
      and all options, including those unknown to the device, onto each
      resulting segment.  However, a given option definition may
      override this rule and specify different behavior in supporting
      devices.  [...]

This second sentence makes the MUST in the first no longer a MUST.
Comment (2019-12-04)
Section 2.2.1

   recipient.  As new functionality becomes sufficiently well defined to
   add to tunnel endpoints, supporting options can be designed using
   ordering restrictions and other techniques to ease parsing.

I'm having trouble parsing the second half of this sentence -- what does
"supporting options" mean as a noun?

   Further, either tunnel endpoints or transit devices MAY use offload
   capabilities of NICs such as checksum offload to improve the
   performance of Geneve packet processing.  The presence of a Geneve
   variable length header SHOULD NOT prevent the tunnel endpoints and
   transit devices from using such offload capabilities.

I agree with the directorate reviewer that this implementation guidance
is unenforcable as normative keywords.

Section 3.1, 3.2

If we're going to give concrete values for the IPv4 protocol/IPv6
NextHeader (17) and destination port (6081), shouldn't we also use the
concreve value for Geneve protocol type (0x6558) that corresponds to the
inner ethernet frame?

I'd also suggest some visual distinction that the "Variable Length
Options" do in fact have variable length, perhaps using the '~'
character in vertical lines.
Similarly, the original ethernet payload need not be 4-byte-aligned and
the figure could make that more prominent.

It's a little awkward to expand FCS on second usage, not first usage.

Section 3.4

      The critical bit allows hardware implementations the flexibility
      to handle options processing in the hardware fastpath or in the
      exception (slow) path without the need to process all the options.
      For example, a critical option such as secure hash to provide
      Geneve header integrity check must be processed by tunnel
      endpoints and typically processed in the hardware fastpath.

I think I'm failing to make a connection between some of these steps.
How does having a critical bit let a header integrity check happen in
the hardware fastpath while deferring other option processing to
software?

   Transit devices MUST maintain consistent forwarding behavior
   irrespective of the value of 'Opt Len', including ECMP link
   selection.  These devices SHOULD be able to forward packets
   containing options without resorting to a slow path.

There seem to be two broad aspects in play here.  First, requiring
insensitivity to "Opt Len" might be because the value would change as a
packet traverses the network.  I think this is forbidden by virtue of
transit devices not being allowed to add/delete options, but please
confirm.  Second, this affects the ability of transit devices to look
past the geneve header to the inner ethernet header and payload.  Given
the substantial discussion we've had in the broader IETF about IPv6
extension headers and the inability of hardware to examine such
variable-length chains to get to the actual upper layer protocol (with
the result that extension headers are largely unusuable on substantial
portions of the internet), it seems like we might conclude from this
statement that either we expect transit devices to not inspect the
upper-layer content or there's a significant chance that this
requirement will be ignored (possibly just by capping the 'Opt Len'
value that is supported), or both.  What makes this setup different from
IPv6 EH such that we expect hardware compliance and a usable deployment?
This is particularly poigniant given that we claim this to be a
requirement on transit devices but allow (in Section 4.5) for endpoints
to use profiles that have a restricted maximum length for the options.
If such profiles are common, the incentive for transit devices to slip
and use the lower maximum length increases.

Section 3.5

      The high order bit of the option type indicates that this is a
      critical option.  If the receiving tunnel endpoint does not
      recognize this option and this bit is set then the packet MUST be
      dropped.  If the 'C' bit (critical bit) is set in any option then
      the 'C' bit in the Geneve base header MUST also be set.  Transit
      devices MUST NOT drop packets on the basis of this bit.  The

nit: since we mention the Geneve header, one might claim that "this bit"
in "MUST NOT drop packets on the basis of this bit" is ambiguous (but
since we said this before for the Geneve header one, I assume we're
talking about the one in the Type field now).

Section 4.4.1

   It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191],
   [RFC8201]) be used by setting the DF bit in the IP header when Geneve
   packets are transmitted over IPv4 (this is the default with IPv6).

Is it the default or the only specified behavior for IPv6?

Section 4.4.3

   outside of the scope of this document.  When physical multicast is in
   use, the 'C' bit in the Geneve header may be used with groups of
   devices with heterogeneous capabilities as each device can interpret
   only the options that are significant to it if they are not critical.

Please double-check this sentence, particularly the "may be used".  If
the intent is, as written, to note that the packets with the 'C' bit set
might take paths with heterogenous paths, I suggest being more explicit
about the consequences that the traffic might only be delivered to some
but not all endpoints.

Section 6

   untrusted boundaries.  In addition, tunnel endpoints should only be
   operated in environments controlled by the service provider, such as
   the hypervisor itself rather than within a customer VM.

Can you say a bit more about how this "should only be operated in
environments controlled by the service provider" meshes with the note in
Section 4.1 that "[i]t is intended for use in public or private data
center environments" (specifically the "public data center" portion) and
the note in Section 6.1 that the provider of the overlay may not be the
same as the provider of the underlay?

Section 6.1.1

   traversing public networks.  Any Geneve overlay data leaving the data
   center network beyond the operator's security domain SHOULD be
   secured by encryption mechanisms such as IPsec or other VPN
   mechanisms to protect the communications between the NVEs when they
   are geographically separated over untrusted network links.

Since we use "mechanisms" in both the IPsec clause and the "other VPN"
clause, the "encryption" does not automatically bind to both clauses
from a grammatical perspective.  Given that "VPN" is currently in use
for both encrypted and non-encrypted schemes (much to my chagrin),
please clarify that the other VPN mechanisms also need to provide
cryptographic confidentiality protection.  (Replacing "VPN mechanisms"
with "VPN technologies" would probably suffice.)

Section 6.2

   network.  To prevent such attacks, an NVE MUST NOT propagate Geneve
   packets beyond the NVE to tenant systems and SHOULD employ packet

We also care about not propagating Geneve packets from the tenant
systems past the NVE, right?

   filtering mechanisms so as not to forward unauthorized traffic
   between TSs in different tenant networks.

What does "TS" stand for, here?

Section 10.2

RFCs 1191, 2460 (er, 8200), 6040, and 8201 should be listed as normative
references.

   [ETYPES]   The IEEE Registration Authority, "IEEE 802 Numbers", 2013,
              <http://www.iana.org/assignments/ieee-802-numbers/ieee-
              802-numbers.xml>.

Hmm, firefox claims the content of this resource is invalid XML, sigh.

Suresh Krishnan Discuss

Discuss (2019-12-05)
* Section 3.3.

This might be an easy DISCUSS to resolve. Since the specification requires the Destination port to be configurable, it is not clear to me how the "transit" devices will identify Geneve packets being sent to a non-default port (i.e. not 6081). Can you please clarify?
Comment (2019-12-05)
I support Ben's DISCUSS position and I would like to ensure that the concerns brought up regarding transit devices and UDP zero checksums are resolved. I would also like to ensure that RFC8200 is used as the reference for the IPv6 protocol as stated in Eric's DISCUSS.

* Section 3.3

Have you considered the use of the flow label instead of source port for  in the IPv6 tunnel case? I highly recommend looking at [RFC6438] for further details as it is specifically addresses ECMP for IP-in-IPv6 tunneled traffic.

Mirja Kühlewind Discuss

Discuss (2019-12-04)
Thanks for the really well written document that addresses all transport related question well (and thanks to David for the early TSV review!). I only have one minor process point that need to be addressed before publication:

Inline with RFC6335 the Assignee and Contact of the port entry should also be updated to IESG <iesg@ietf.org> and IETF Chair <chair@ietf.org> respectively.
Comment (2019-12-04)
1) One small comment/question on the editorial note in sec 4.4.1: 
"It was discussed during TSVART early review if the level of
   requirement for maintaining tunnel MTU at the ingress has to be "MAY"
   or "SHOULD".  The discussion concluded that it was appropriate to
   leave this as "MAY", considering the high level of state to be
   maintained.
I would have preferred a SHOULD and I'm not sure I understand what state your are talking about...?

2) And one more small question on sec 4.4.1. in general:
Is the assumption that all tunnel packets have the same options (and therefore same Geneve header length) at a certain ingress, or should the announced MTU always consider the maximum length that a certain ingress could produce. Would be good to clarify this in the document!

3) Section 6:
"When crossing an untrusted link, such as the public Internet, IPsec
   [RFC4301] may be used to provide authentication and/or encryption of
   the IP packets formed as part of Geneve encapsulation."
Should this maybe be a normative SHOULD and not a lower case "may"?

3) And one random thought on the protocol design (given we all love to design protocols :-) ): Was it considered to require to have critical options first in order to speed up processing?

Barry Leiba Discuss

Discuss (2019-12-04)
This will be trivial to address:

— Section 1.2 —

   The NVO3 framework [RFC7365] defines many of the concepts commonly
   used in network virtualization.

Indeed, and it seems a critical normative reference here.  So why is it in the informative section?
Comment (2019-12-04)
I support Ben’s DISCUSS and comments.  In addition:

— Section 3.3 —
In the description of the UDP Checksum, the first paragraph says the checksum MUST be set for v6, then the second paragraph contradicts that.  You really should note when the MUST is specified that there are exceptions.

— Section 3.5 —
In the description of the Type field, I believe it confuses things to say that it’s 8 bits, and then to say that the first bit is not really part of the type, but has a special meaning.  Why do you not show the C bit and Type field in the main diagram as it is shown in the mini-figure, describe the C bit separately, and define the Type field as 7 bits?

Éric Vyncke Discuss

Discuss (2019-12-04)
Thank you for the work put into this document. It solves an interesting problem and the document is easy to read.

I have one DISCUSS that is **trivial to fix** and some COMMENTs, feel free to ignore my COMMENTs even if  I would appreciate your answers to those COMMENTs.

Regards,

-éric

== DISCUSS ==

-- Section 3.3 --
Please use RFC 8200 the 'new' IPv6 standard rather than RFC 2460 ;-)
Comment (2019-12-04)
== COMMENTS ==

-- Generic --
Is it worth mentioning that when transporting an Ethernet frame neither the preamble nor the inter-frame gap are included? (AFAIR, IEEE considers those parts as integral part of the IEEE 802.3 frame)

Is a length of 24 bits for the VNI be enough?

-- Section 1 --
In the list of protocols, rather than presenting the current list as comprehensive, I would suggest to clearly present this list as non-exhaustive.

Is it worth to mention the reasoning behind "one additional defining requirement is the need to carry system state along with the packet data" (beside common sense)

-- Section 4.4.1 --
It is unclear to me whether Geneve endpoints can fragment the Geneve UDP-encapsulated packet itself as the transit routers see only unfragmentable packets.

Magnus Westerlund Discuss

Discuss (2019-12-05)
I want to discuss the implications of the source port usage and if that needs a bit more consideration of failure cases and ICMP. So Section 3.3 says:

   Source port:  A source port selected by the originating tunnel
      endpoint.  This source port SHOULD be the same for all packets
      belonging to a single encapsulated flow to prevent reordering due
      to the use of different paths.  To encourage an even distribution
      of flows across multiple links, the source port SHOULD be
      calculated using a hash of the encapsulated packet headers using,
      for example, a traditional 5-tuple.  Since the port represents a
      flow identifier rather than a true UDP connection, the entire
      16-bit range MAY be used to maximize entropy.

I think using the different source ports to enable flow hashing is a nice idea. However, I am a bit worried over the implications of using the full 16-bit range without caveats. Specifically in cases where a network error or other failure to forward the Geneve encapsulated packet and that result in any form a return traffic towards the tunnel ingress. Such as ICMP Packet Too Big messages or Port / Host unreachable. These messages needs to be consumed by the Geneve tunneling endpoint to affect the right response to them. However, if the source port is corresponding to any port where there exist a listenser or bi-directional server on the tunnel ingress host, such as SSH, Echo etc. the ICMP messages may be consumed by the wrong entity that only filter on source port and not the destination port. 

I believe this issue may require at least a explicit consideration in the document.

Otherwise thanks for thinking through many transport issues for tunnels.

Alvaro Retana Yes

Martin Vigoureux Yes

Deborah Brungard No Objection

Warren Kumari No Objection

Adam Roach No Objection

Comment (2019-12-04)
Thanks for the work that went into consolidating network tunneling protocols
into a single, unified design. I have one comment that I think is rather
important to Geneve's success.

In fact, I'm on the wall about whether this comment should be a DISCUSS, since
I think the current design will render Geneve broadly unusable in a number of
important use-cases.

>  Dest port:  IANA has assigned port 6081 as the fixed well-known
>     destination port for Geneve.  Although the well-known value should
>     be used by default, it is RECOMMENDED that implementations make
>     this configurable.  The chosen port is used for identification of
>     Geneve packets and MUST NOT be reversed for different ends of a
>     connection as is done with TCP.

This behavior -- using 6081 as the destination in both directions -- has the
unfortunate property of violating NAT and Firewall assumptions about the
nature of UDP traffic (see RFC 4748 for a discussion of UDP behavior in NATs).
For example, while RTP was originally specified to typically work in the way
described here (using two unrelated unidirectional flows when a bidirectional
flow was desired), all (or nearly all) modern implementations use a technique
known as "symmetric RTP" (see RFC 4961), which uses port numbers in the same
way as TCP does.

I can't find any discussion of NAT traversal in this document. One might
assume that such responsibility is delegated to the control plane, but it
should be noted that this specific requirement is going to frustrate every NAT
traversal technique that I'm aware of (save for the mostly undeployed NAT-PMP
and similar approaches), regardless of how well-designed the control plane is.

If the working group has already considered NAT/Firewall traversal and decided
to use the specified design anyway [1], please add text laying out the
rationale in this document. If this point has not yet been discussed, I urge
the working group to withdraw its request for publication and to carefully
reconsider the implications of this specific normative requirement.

(I take the point about the design being applicable to "controlled networks,"
but that doesn't necessarily imply the absence of a NAT or a non-NAT Firewall;
and, as Roman notes in his DISCUSS, the applicability statement appears to be
overstated anyway: if crossing public networks -- as this document clearly
anticipates -- using IPv4, the presence of a CG-NAT device will become
increasingly likely as time goes on.)

____
[1] I searched mailarchive.ietf.org and found no such discussion, but did
    not search meeting minutes

Ignas Bagdonas No Record

Alexey Melnikov No Record