Ballot for draft-ietf-tsvwg-datagram-plpmtud
Yes
No Objection
Note: This ballot was opened for revision 17 and is now closed.
{Yes} [nits] S1.1 * It's not clear to me that "stateful" in the phrase "stateful firewall" matters here. I imagine a "stateless" firewall could just as easily block incoming ICMP messages. S4.5 * The "[RFC8201]" didn't render as a clickable link. Source XML weirdness? Same in S4.6.1. S5. * First sentence of second paragraph seems like two sentences, one sentence with a parenthetical comment, or a run-on sentence. Also, I think "only be performed once" here means "only performed at one layer" rather than "only performed at one point in time"? S5.1.1 * The "section 3.1.1" link seems to be internal to the draft rather than linked into RFC 8085, as I would expect from the text. (this happens twice in this section) S5.3.3 * Is there any additional detail worth including here? It asserts that a PL sender is able to detect inconsistencies, but I wonder whether more guidance (or an example) might be helpful to implementors. S6.2.2 * s/is to be added// S6.2.3.4 * Maybe check the XML for the "[RFC4960]" reference, since it doesn't seem to have been converted into a link.
Kudos on a very approachable document. The thin air up in the ART layers had me fearing something that said "TSVWG" on it. Section 2: * The definition for EMTU_R appears to double up on itself. Section 3: * In bullet 2, "On request, a DPLPMTUD sender is REQUIRED to be able to transmit a packet ..." -- I read this as "If I ask you to do X, you MUST be able to do X", versus "you MUST do X". Was that the intent? * In bullet 3, there's reference to a "feedback method" that is REQUIRED, but this method is unspecified. Is that defined elsewhere, or is out of scope here? Section 4.1: Nits: * "... uses a probe packet carrying an application data ..." -- s/an// * "... this probe packet, could perform ..." -- s/,// * "This retransmited data block ..." -- typo: "retransmitted" Section 4.3: * PROBE_COUNT and MAX_PROBES are first used here, though they are not defined until later in the document in the DPLPMTUD section. Nit: * "... data is sent again, MAY choose ..." -- s/,// Section 4.4: Nit: * "... over clearing the DF-bit in the IPv4 header ..." -- s/-/ / Section 4.6.1: * "For example, by checking the value ..." -- Suggest: "For example, it could check the value..." Nit: * "... from a router or middlebox, performs ..." -- s/,// Section 5: Nit: * "... in a lower layer, DPLPMTUD SHOULD only ..." -- s/,/./ Section 5.1.1: Nit: * "... use an up-to-data PMTU once ..." -- I think you mean "up-to-date" Section 5.1.2: Nit: * "... from a MAX_PROBES valugreater than 1 because ..." -- s/valugreater/value greater/ Section 5.3.3: Nit: * "... inconsistent, when, for example, ..." -- remove the first comma Section 6.1.1: Nit: * "... from off-path insertion of data [RFC8085], suitable methods include ..." -- s/, s/. S/ Section 6.3.2: Nit: * "... packets of the required size, this sets the ..." -- either s/, t/. T/, or s/this/which/ Section 9: "Parallel forwarding paths SHOULD be considered." What is the specific action being recommended here? Nits: * "This protection if provided ..." -- s/if/is/ * "... design (see Section 1.1), this method therefore ..." -- I think "this" should begin a new sentence. * "An on-path attacker, able to create ..." -- remove comma * "This could occur when there are multiple paths are concurrently in use." -- s/there are//
Thank you for your updates in response to the SECDIR review by Stephen Farrell (and thanks for the review Stephen!) ** As previously mentioned, please consider the stability of the [I-D.ietf-quic-transport] when advancing this document ** Section 1. Per “ICMP messages can be filtered by middleboxes (including firewalls) [RFC4890]. A stateful firewall could be configured …”, couldn’t all firewalls (stateless and stateful) also do this? ** Editorial Nits: Section 4.1. Typo. s/retransmited/retransmitted/ Section 4.1. s/… which could need the PL to to use a smaller packet size…/…which could force the PL to use a smaller packet size …/ Section 5.1.2. Typo. s/valugreater/value greater/
Thank you for the work put into this document. The document is clear, easy to read and quite useful (perhaps mixing too many PL protocols though). Please find below nee non-blocking NIT. I hope that this helps to improve the document, Nice to read about the outcomes of the NEAT project Regards, -éric ==NIT == -- Section 2 -- Looking for details in " A Packet is the IP header plus the IP payload." is it worth mentioning the IPv6 extension headers or using "IP headers" (plural form) ?
I'm excited for this to go to RFC once the QUIC reference clears. Normative issue: Sec 3, item #9 "An update to the PLPMTU (or MPS) MUST NOT modify the congestion window". Would it not be OK to round down the congestion window to remain a multiple of the MPS? Probably not a big issue in practice, but I would hate to constrain an implementer in this way. I would suggest "MUST NOT increase". Nits: Sec 2. In the definition for EMTU_R, put "the largest datagram size that can be reassembled" in quotes, and delete "by EMTU_R (Effective MTU to receive)" Sec 4.1. delete word in brackets: "A PL that uses a probe packet carrying [an] application data..." Sec 4.6.2 I think this list of inequalities would be easier to read if it was ranked in order of increasing PL_PTB_SIZE. 5.1.1. s/up to data/up to date
[I'm reviewing the native HTML, so line breaks in quoted text are locally generated] We could perhaps benefit from more clarity in several places about whether a given size is measured at the IP layer or the PL (some locations mentioned in the per-section comments). Should RFC 8085 be cited as BCP 145? It would be nice to have a bit more detail about how to effectuate things like robustness to inconsistent paths, but I don't think the lack of such concrete guidance should hold up publication of this document. I mention in the per-section comments a few instances of apparent internal inconsistency; they don't quite rise to DISCUSS-level, but please take a look. Abstract This document updates RFC 4821 to specify the method for datagram PLs, and updates RFC 8085 as the method to use in place of RFC 4821 with UDP datagrams. [...] nit(?): the wording here is a bit odd; I think we want to say that it "updates RFC 8085 to refer to the method specified in this document instead of the method in RFC 4821 for use with UDP datagrams". Section 1.1 Classical PMTUD is subject to protocol failures. One failure arises when traffic using a packet size larger than the actual PMTU is black-holed (all datagrams sent with this size, or larger, are discarded). nit: if "this size" is "the actual PMTU", shouldn't this just be "strictly larger than this size", since packets of exactly that size will pass? When the router issuing the ICMP message drops a tunneled packet, the resulting ICMP message will be directed to the tunnel ingress. This tunnel endpoint is responsible for forwarding the ICMP message and also processing the quoted packet within the payload field to remove the effect of the tunnel, and return a correctly formatted ICMP message to the sender [I-D.ietf-intarea-tunnels]. Failure to do this prevents the PTB message reaching the original sender. Is it supposed to be implie that this is a little bit complicated and many existing tunnels don't do it properly? In this case, when a packet sent by the server encounters a problem after the ECMP router, then any resulting ICMP message also needs to be directed by the ECMP router towards the original sender. Similarly, is it implicit that it's complicated for the ECMP router to do this, it's not currently implemented well, etc.? When an ICMP message is generated by a router in a network segment that has inserted a header into a packet, the quoted packet could contain additional protocol header information that was not included in the original sent packet, and which the PL sender does not process or may not know how to process. This could disrupt the ability of the sender to validate this PTB message. Should/can the entity that added the extra header be inspecting ICMP to remove the header from the quoted packet in the reverse direction? (We could perhaps try to normalize the language between this point and the following on on NAT behavior.) Section 10.2 of [RFC4821] recommended a PLPMTUD probing method for the Stream Control Transport Protocol (SCTP). SCTP utilizes probe packets consisting of a minimal sized HEARTBEAT chunk bundled with a PAD chunk as defined in [RFC4820]. However, RFC 4821 did not provide a complete specification. The present document replaces this by providing a complete specification. nit(?): s/replaces this/replaces that description/? Section 2 Packetization Layer (PL): The PL is a layer of the network stack that places data into packets and performs transport protocol functions. Examples of a PL include: TCP, SCTP, SCTP over DTLS or QUIC. nit: Serial comma, please! PTB_SIZE: The PTB_SIZE is a value reported in a validated PTB message that indicates next hop link MTU of a router along the path nit: missing article ("the next hop link MTU"?) Section 3 A PTB message MUST NOT be used to increase the PLPMTU [RFC8201], but could trigger a probe to test for a larger PLPMTU. A PL_PTB_SIZE that is greater than that currently probed MUST be ignored. A valid PTB_SIZE is converted to a PL_PTB_SIZE before it is to be used in the DPLPMTUD state machine. Perhaps these last two sentences should be swapped, so we talk about how the PL_PTB_SIZE is computed before we talk about ignoring invalid values? The decision about when to send a probe packet does not need to be limited by the congestion controller. Does the congestion controller need to account for the size/bandwidth of the probe packets in its pacing mechanism, though? Path validation: It is RECOMMENDED that methods are robust to path changes that could have occurred since the path characteristics were last confirmed, and to the possibility of inconsistent path information being received. What would such robustness look like? (Why is it not REQUIRED?) Section 4.1 A PL that uses a probe packet carrying application data and needs protection from the loss of this probe packet could perform transport-layer retransmission/repair of the data block (e.g., by retransmission after loss is detected or by duplicating the data block in a datagram without the padding data). This retransmitted data block might possibly need to be sent using a smaller PLPMTU, which could need the PL to to use a smaller packet size to traverse the end-to-end path. (This could utilize endpoint network-layer or a PL that can re-segment the data block into multiple datagrams). nit: check the grammar around "utilize endpoint network-layer"? Section 4.2 Transport protocols can include end-to-end methods that detect and report reception of specific datagrams that they send (e.g., DCCP and SCTP provide keep-alive/heartbeat features). Does a packet-level ACK mechanism (e.g., like QUIC but not TCP) suffice? Section 4.3 When the method detects the current PLPMTU is not supported, DPLPMTUD sets a lower PLPMTU, and sets a lower MPS. The PL then confirms that the new PLPMTU can be successfully used across the path. A probe packet could need to have a size less than the size of the data block generated by the application. Why is "less than half the size" noteworthy? Section 4.4 Operational experience reveals that IP fragmentation can reduce the reliability of Internet communication [I-D.ietf-intarea-frag-fragile], which may reduce the success of retransmission. nit: if "success" is a binary yes/no, then this could be "reduce the probability of success of the retransmission". (I add "the" in "the retransmission" as well, since we refer to the specific retransmission after the initial loss and MPS change, not retransmission in general.) Section 4.6.1 A PL that receives a PTB message from a router or middlebox performs ICMP validation as specified in Section 5.2 of [RFC8085][RFC8201]. If we're going to have a section number, we need only one RFC reference! PTB messages that have been validated MAY be utilized by the DPLPMTUD algorithm, but MUST NOT be used directly to set the PLPMTU. The PL_PTB_SIZE is smaller than the PTB_SIZE because it is reduced by headers below the PL including any IP options or extensions added to the PL packet. nit: this transition is pretty abrupt. Section 4.6.2 Before using the size reported in the PTB message it must first be converted to a PL_PTB_SIZE. A set of checks are intended to provide protection from a router that reports an unexpected PTB_SIZE. The PL also needs to check that the indicated PL_PTB_SIZE is less than the size used by probe packets and at least the minimum size accepted. How do these "checks" relate to the "validation" that was discussed in the previous section? Section 5 DPLPMTUD SHOULD NOT be used by an upper PL or application if it is already used in a lower layer DPLPMTUD SHOULD only be performed once between a pair of endpoints. This seems to imply an expectation of a communications channel between PLs on the same endpoint. I don't see anything in this document discussing such a channel; is there anything (more) that's useful that we can say? Section 5.1.1 DPLPMTUD MAY inhibit sending probe packets when no application data has been sent since the previous probe packet. A PL preferring to use an up-to-date PMTU once user data is sent again, can choose to continue PMTU discovery for each path. However, this could result in sending additional packets. How is this "could" and not "will"/"would"? (Twice) The various timers could be implemented using a single timer Is this an incomplete thought? (There's no full stop, and it doesn't give much clarity for me.) Section 5.1.2 MIN_PLPMTU: The MIN_PLPMTU is the smallest allowed probe packet size. For IPv6, this value is 1280 bytes, as specified in [RFC8200]. For IPv4, the minimum value is 68 bytes. This feels like slightly curious naming, as the "PL" infix suggests that it is the size usable at the packetization layer, but the quoted constants include the IP headers. BASE_PLPMTU: The BASE_PLPMTU is a configured size expected to work for most paths. The size is equal to or larger than the MIN_PLPMTU and smaller than the MAX_PLPMTU. In the case of IPv6, this value is derived from the IPv6 minimum link MTU of 1280 bytes "Derived from" but using what derivation mechanism and/or arriving at what value? Section 5.1.3 PROBED_SIZE: The PROBED_SIZE is the size of the current probe packet. This is a tentative value for the PLPMTU, which is awaiting confirmation by an acknowledgment. Size at which layer? Section 5.1.4 The Base Phase confirms connectivity to the remote peer using packets of the BASE_PLPMTU. This phase is implicit for a connection-oriented PL (where it can be performed in a PL connection handshake). This seems slightly internally inconsistent, in that it seems to say that BASE_PLPMTU size packets are always used, but the implicit connection-handshake method seems unlikely to actually do so. (Also, if we always did so, then the following paragraph about confirming BASE_PLPMTU support would be redundant.) Search Complete: The Search Complete Phase is entered when the PLPMTU is supported across the network path It seems like this would be trivially true for (e.g.) the initial PLPMTU set to BASE_PLPMTU on the Base->Search transition, yet I don't expect we terminate Search immediately in that case! Presumably this is intended to say also that the PLPMTU corresponds to the actual PMTU as well? Section 5.2 nit(?): the diagram shows PTB: PLPTB_SIZE < BASE_PLPMTU as cause for a transition from BASE to ERROR, though of course this would only occur after the PTB is validated. Perhaps that doesn't need to be in the figure, though. Section 5.3.1 The PROBE_COUNT is initialized to zero when the first probe with a size greater than or equal to PLPMTUD is sent. A timer is used to trigger the sending of probe packets of size PROBED_SIZE, larger than the PLPMTU. This seems inconsistent about whether we use PROBE_COUNT when probing with the current PLPMTU size -- we initialize the count to zero when we first send one, but the timer only triggers sending of larger packets. Section 5.3.2 I confess I was expecting a more detailed specification of how to pick probe sizes, here. Section 6.1 To use common method for managing the PLPMTU has benefits, both in the ability to share state between different processes and opportunities to coordinate probing. Nit: singular/plural mismatch "common"/"method" Section 6.2 Section 10.2 of [RFC4821] specified a recommended PLPMTUD probing method for SCTP and Section 7.3 of [RFC4960] and recommended an endpoint apply the techniques in RFC4821 on a per-destination-address basis. nit: spurious "and" after "[RFC4960]". Section 6.9 of [RFC4960] describes dividing the user messages into data chunks sent by the PL when using SCTP. This notes that once an SCTP message has been sent, it cannot be re-segmented. [RFC4960] describes the method to retransmit data chunks when the MPS has reduced, and the use of IP fragmentation for this case. Should we say something about "such behavior is unchanged by this document"? Section 6.2.1.2 The HEARTBEAT chunk carries a Heartbeat Information parameter which includes, besides the information suggested in [RFC4960], the probe size, which is the size of the complete datagram. IIUC the Heartbeat Information layout is entirely at the sender's discretion, so the implementation will have to pick a way to convey the probe size in it; should we be more explicit about this? Probing starts directly after the PL handshake, before data is sent. Assuming this behavior (i.e., the PMTU is smaller than or equal to the interface MTU), this process will take several round trip time periods, dependent on the number of DPLPMTUD probes sent. The Heartbeat timer can be used to implement the PROBE_TIMER. (But sending data doesn't have to wait for DPLPMTUD to complete, right?) Section 6.2.2.2 Packet probing can be performed as specified in Section 6.2.1.2. The maximum payload is reduced by 8 bytes, which has to be considered when filling the PAD chunk. (8 bytes of UDP header?) Section 6.2.3.2 (Interesting that we don't have a note about payload reduction to be considered when filling the PAD chunk, as we did in Section 6.2.2.2.) Section 6.2.3.4 [RFC4960] does not specify a way to validate SCTP/DTLS ICMP message payload. ("and neither does this document"?) Section 6.3.2 QUIC provides an acknowledged PL, a sender can therefore enter the BASE state as soon as connectivity has been confirmed. [This seems entirely redundant with the content of Section 6.3.1.] Section 9 An on-path attacker able to create a PTB message could forge PTB messages that include a valid quoted IP packet. Such an attack could be used to drive down the PLPMTU. Are there cases where such an on-path attacker would not also be able to actually drop traffic larger than the forged PTB PMTU value? It is possible that the information about a path is not stable. This could be a result of forwarding across more than one path that has a different actual PMTU or a single path presents a varying PMTU. The design of a PLPMTUD implementation SHOULD consider how to mitigate the effects of varying path information. One possible mitigation is to provide robustness (see Section 5.4) in the method that avoids oscillation in the MPS. This document specifies some PLPMTUD mechanics; should we say what we do (and don't) say about this topic for those protocols we do cover? A node performing DPLPMTUD could experience conflicting information about the size of supported probe packets. This could occur when multiple paths are concurrently in use and these exhibit a different PMTU. If not considered, this could result in packets not being delivered (black holed) when the PLPMTU results in a packet larger than the smallest actual PMTU. It feels like this content has high overlap with the previous paragraph; is it reasonable to try consolidating them?
Firstly I would like to say thank you for writing this document. I have a general concern, not specifically with this document, but with the overall complexity of the solution. I.e. the algorithm that is described has to contend with a lot of unreliable information (e.g. are packets dropped because of congestion or some other routing failure, hard to know whether PTB messages can be relied upon etc), there are also several different ways that the probes can be sent etc. Not the subject of this document, but it feels like life could be significantly simpler if somehow there was a mechanism to getting a set of agree a set of defined minimum PLPMTU that network may support. E.g. perhaps 1280, 2000, 4000, 8000, 16000 octets. My assumption here is that the underlying core link MTUs would be higher to cope with header overheads. A few comments on specific sections of the document: 1) I would include PTB in the terminology section. 3. Features Required to Provide Datagram PLPMTUD Most of the the requirements in this section use RFC 2119 language, but a few don't: 7. Probing and congestion control: The decision about when to send a probe packet does not need to be limited by the congestion controller. When not controlled by the congestion controller, the interval between probe packets MUST be at least one RTT. If transmission of probe packets is limited by the congestion controller, this could result in transmission of probe packets being delayed or suspended during congestion. Rather than "does not need to be limited", would this be better stated as "SHOULD NOT be limited" or "MAY not be limited"? 11. Probing and flow control: Flow control at the PL concerns the end-to-end flow of data using the PL service. This does not apply to DPLPMTU when probe packets use a design that does not carry user data to the remote application. The second sentence could be stated something like: "Flow control MUST NOT apply to DPLPMTU when probe packets use a design that does not carry user data to the remote application" 5.1.2. Constants MIN_PLPMTU: The MIN_PLPMTU is the smallest allowed probe packet size. For IPv6, this value is 1280 bytes, as specified in [RFC8200]. For IPv4, the minimum value is 68 bytes. Does it really make sense to probe for a path MTU of 68 bytes. Would it make sense for applications to be able to control the MIN_PLPMTU that they find acceptable? E.g. if IPv6 has this at 1280 and QUIC has uses 1280, perhaps many applications/implementations would like to use 1280 as a reasonable lower bound rather than 68 bytes?