NFS Version 4.1 Update for Multi-Server Namespace
Summary: Needs a YES. Has a DISCUSS. Needs 7 more YES or NO OBJECTION positions to pass.
Benjamin Kaduk Discuss
I think this is a useful document to clarify the proper operation of NFS 4.1, and to clarify the distinction between trunking address changes and migration, and intend to change my ballot to Yes once the following issues are resolved. The Security Considerations state or imply in multiple places that the use of RPCSEC_GSS to access a designated server can be used to verify the target name and address resolution, but this only holds when the GSSAPI (and Kerberos) implementation refrains from using insecure DNS to canonicalize the target principal hostname. (Several major ones do not, by default.) It is irresponsible to not note this risk here. Specifically, the following text from Section 1.3 of RFC 4120 is honored more in the breach: Implementations of Kerberos and protocols based on Kerberos MUST NOT use insecure DNS queries to canonicalize the hostname components of the service principal names (i.e., they MUST NOT use insecure DNS queries to map one name to another to determine the host part of the principal name with which one is to communicate). We are deleting Section 11.7 of 5661, but it talks about encoding "opaque" values conveyed from server to server via client in big-endian order, and don't seem to cover that in this document. Is that problematic? I also have a few other issues/questions that don't quite rise to DISCUSS level, but are still important to address; I tagged those with "IMPORTANT" in the comment section, so they don't get drowned in the nits.
In general, my remarks from mv0-trunking-update about clarity of references to "the current document" will apply to this document as well. But I mention this only so that the authors/AD can note it in their interactions with the RFC Editor; I don't think we need to discuss it at all at this point in the process. There's some stuff in here that, taken literally, is removing functionality and/or changing assumptions that a client could make, in a way that is sufficiently incompatible with NFS 4.1 that it should arguably be a manner for a new minor version of the protocol. But, I'm inclined to agree with the assessment of the WG that these are essentially giant errata fixes and are being made to reflect the intent of the protocol, remedying internal inconsistencies in 5661. Section 2 Use the RFC 8174 boilerplate please. (Sorry, I know this is the Nth time you're hearing it.) Section 3.1 o Each NFSv4 server is assumed to have a set of IP addresses to which NFSv4 requests may be sent by clients. These are referred nit: do we want to limit ourselves to only *IP* addresses? I don't remember to what extent RDMA connections do/don't use IP. o Two network addresses connected to the same server such that those addresses can be used to support a single common session are referred to as session-trunkable. Note that two addresses may be server-trunkable without being session-trunkable and that when two connections of different connection types are made to the same network address and are based on a single file system location entry they are always session-trunkable, independent of the connection type, as specified by [RFC5661], since their derivation from the same file system location entry together with the identity of their network addresses assures that both connections are to the same server and will return server-owner information allowing session trunking to be used. We may need to hear from some of my IESG colleagues, but I'm not sure there's anything in principle that prevents a load-balancer or similar device that presents a single public-facing IP address but dispatches to different backends based on TCP/UDP port (which could break this assumption of session-trunkability). Section 3.2 o Reorganization made necessary by the fact that two network access paths to the same file system instance needs to be distinguished clearly from two different replicas since the former share locking state and can share session state. nit: there's maybe a singular/plural mismatch here, but I think the clarity is best if we say "the case of two network access paths" and "the case of two different replicas". o The need for a clear statement regarding the desirability of transparent transfer of state together with a recommendation that either that or a single-fs grace period be provided. nit: maybe expand that the transfer of state is across replicas. o A clarification of the fs_locations_info attribute to specify which portions of the information provided apply to a specific network access path and which to the replica which that path is used to access. nit: s/which to/to which/ Section 4.2 o In some cases, a server will have a namespace more extensive than its local namespace, by using features associated with attributes that provide file system location information. These features, which allow construction of a multi-server namespace are all nit: comma after "namespace" o File system location elements are derived from location entries [...] location element. File system location entries that contain a host name, are resolved using DNS, and may result in one or more nit: no comma after "host name" Section 4.3 NFSv4.1 contains RECOMMENDED attributes that provide information about how (i.e. at what network address and namespace position) a nit: comma after "i.e.". Section 4.4 We don't seem to talk about Section 4.5.3 at all, here. Section 4.5 Where a file system was previously absent, specification of file nit: "previously absent" might be a strange wording; "is absent without a transition to absent" is perhaps the intended scenario (though it's also awkward wording). Section 4.5.2 o The client may fetch the file system location attribute for the file system. This will provide either the name of the server (which can be turned into a set of network addresses using DNS), or a set of server-trunkable location entries. [...] If the set of addresses in the server response are not all guaranteed to be server-trunkable (which is my understanding, since replication addresses for different instsances can also appear), then this text should probably acknowledge that somehow. Section 4.5.3 Fs_locations_info includes a flag, FSLI4TF_RDMA, which, when set indicates that RPC-over-RDMA support is available using the specified location entry, by "stepping up" an existing TCP connection to include support for RDMA operation. This flag makes it convenient for a client wishing to use RDMA to establish a TCP connection and then convert to use of RDMA. [...] nit: ambiguous parsing "use RDMA to establish a TCP connection", perhaps s/ to/, as it can/ Section 4.5.5 Although a single successor location is typical, multiple locations may be provided. When multiple locations are provided, the client will typically use the first one provided. If that is inaccessible for some reason, later ones can be used. In such cases the client might consider that the transition to the new replica as a migration event, even though some of the servers involved might not be aware of the use of the server which was inaccessible. In such a case, a client might lose access to locking state as a result of the access transfer. Just to double-check: "such cases" are only when the first location is inaccessible? Section 4.5.6 (soapbox) There already is an example of a global filesystem namespace, in the form of AFS. I trust the WG to make the right decision as to whether an informational reference is appropriate, though, since I have a conflict of interest. However, there are facilities provided that allow different clients to be directed to different sets of data, to enable adaptation to such client characteristics as CPU architecture. These facilities are described in Section 11.10.3 of [RFC5661] and in Section 12.2.3 of the current document. IMPORTANT: There is no fundamental restriction of this technology to that use, so some hedging language like "such as to enable adaptation" may be in order. Section 4.5.7 o Additions to the list of network addresses for the current file system instance need not be acted on promptly. However the client can choose to use the new address whenever it needs to switch access to a new replica. Is this "replica" correct, or are we talking about the case of switching to a different endpoint for the current instance? Section 5 o The additional Section 9 in the current document discusses the case in which a shift to a different replica is made and state is transferred to allow the client the ability to have continues nit: "continued" access to the accumulated locking state on the new server. nit: maybe a comma before "on the new server", and/or s/the accumulated/its accumulated/ o Sections 11.7 and 11.7.1 of [RFC5661] are to be replaced by Sections 8 8.1 respectively of the current document These sections would appear as Section 11.9 and 11.9.1 in an eventual consolidated document. nits: "8 and 8.1, respectively,", and full stop before "These". o Section 11.77 of [RFC5661] is to be replaced by Section 8.9. nit: "11.7.7" Section 7 o When there are no potential replacement addresses in use but there are valid addresses session-trunkable with the one whose use is to be discontinued, the client can use BIND_CONN_TO_SESSION to access the existing session using the new address. Although the target session will generally be accessible, there may be cases in which that session in no longer accessible, in which case a new session can be created to provide the client continued access to the existing instance. IMPORTANT: When this new session is created, do I still get to use "the same filehandles, stateids, and client IDs" as before and have continuity of lock state? If not, the intro paragraph needs to have an appropriate disclaimer on that list. Section 8 IMPORTANT: I think (but am not 100% sure) that some of the deletions from Section 11.7.2 and subsections are nominally behavior changes by their removal, such as preserving the fsid in both locations or fileid values not changing across the transition. Can someone help point me to the right place for all of these bullet points? Section 8.2 [There appears to be no difference between this text and Section 11.7.3 of RFC 5661.] Section 8.5 [There appears to be no difference between this text and Section 11.7.6 of RFC 5661.] Section 10.1 Note that the specified actions only need to be taken if they are not already going on. For example, when NFS4ERR_MOVED is received when accessing a file system for which transition recovery already going on, the client merely waits for that recovery to be completed while the receipt of SEQ4_STATUS_LEASE_MOVED indication only needs to initiate migration discovery for a server if it is not going on for that server. nit: "it" is perhaps unclear; I suggest "if such discovery is not already underway". Section 10.3 1. Performing an EXCHANGE_ID directed at the location address. This operation is used to register the client-owner with the server, I'm not seeing "client-owner" used as a term much. Do we mean a "client id string" in the terminology introduced in Section 3.1? (Of course, we do have client_owner and eia_clientowner and such as other options...) Section 10.4 In some situations, it is possible for a BIND_CONN_TO_SESSION to succeed without session migration having occurred. If state merger has taken place then the associated client ID may have already had a set of existing sessions, with it being possible that the sessionid of a given session is the same as one that might have been migrated. Is this a situation that the client can detect and act on? Section 10.5 Access to appropriate locking state should need no actions beyond access to the session. [...] I understand that this is an 8174 lowercase "should", but I think that we would benefit from some clarity as to this being the expected state in normal operation but that some exceptional circumstances (examples?) could cause it to not be the case, and as such the following checks are not just pro forma. Section 11 In the event of file system migration, when the client connects to the destination server, it needs to be able to provide the client continued to access the files it had open on the source server. There are two ways to provide this: nit: I think the "it" in "it needs to be able to provide" formally binds to the client, as the subject of the previous clause. I think we mean the destination server, though. These features are discussed separately in Sections 11.2 and 11.3, which discuss Transparent State Migration and session migration respectively. nit: this text is indented as if it is part of the second bullet point, but it seems to be general discussion that applies to both bullet points. Section 11.2 the file system being migrated. In addition to client id string and verifier, the source server needs to provide, for each stateid: IMPORTANT: There seem to be security considerations relating to trusting the client id string (which is, IIRC, spoofable), that are not discussed in the security considerations section. Could an attacker, e.g., claim to be a different client that the attacker knows is going to be scheduled for migration? Section 11.3 One part of the necessary adaptation to these sorts of issues would restrict enforcement of normal slot sequence enforcement semantics until the client itself, by issuing a request using a particular slot on the destination server, established the new starting sequence for that slot on the migrated session. (This means that any reordering that affected the first request received on a particular slot would cause the loss/rejection of requests sent prior to it, right?) Section 12.2.1 Note that both of these items specify attributes of the file system replica and should not be different when there are multiple fs_locations_server4 structures for the same replica, each specifying a network path to the chosen replica. I'm not sure what the "both" is referring to, not seeing any class of things listed where exactly two elements are present. We may also want to specify some error-handling for when multiple fs_locations_server4 structures appear for the same replica with conflicting information. With the exception of the transport-flag field (at offset FSLIBX_TFLAGS with the fls_info array), all of this data applies to the replica specified by the entry, rather that the specific network path used to access it. nit: FSLI4BX_TFLAGS o Explicit definition of the various specific data items within XDR would limit expandability in that any extension within would require yet another attribute, leading to specification and implementation clumsiness. In the context of the NFSv4 extension model in effect at the time fs_locations_info was designed (i.e. that described in [RFC5661]), this would necessitate a new minor to effect any Standards Track extension to the data in in fls_info. nit: "minor version" In light of the new extension model defined in [RFC8178] and the fact that the individual items within fls_info are not explicitly referenced in the XDR, the following practices should be followed when extending or otherwise changing the structure of the data returned in fls_info within the scope of a single minor version. (This text and the items following it suggests that we should have an IANA registry, which would then have a column for replica vs. path, etc. Not a good fit for this document, though.) Section 12.2.3 [There appears to be no difference between this text and Section 11.10.3 of RFC 5661.] Section 13.1 They are all based upon attributes that allow one file system to specify alternate, additional, and new location information which specifies how the client may access to access that file system. nit: s/to access// Also maybe s/which/that/ but the RFC Editor is great about this. Section 13.2 Thus, the identifiers generated by two servers within that set can be assumed compatible so that, in some cases, identifiers by one server in that set that set may be presented to another server of the same scope. Presumably this is "identifiers *generated* by one server"? Section 13.3 Please mention the Section number 18.104.22.168 of RFC 5661 here as well as in the toplevel Section 13. Setion 13.7.3 to another client. This can occur because the reclaim has been done outside of a grace period of implemented by the server, after the nit: s/of implemented/implemented/ Section 14.3 In situations in which the registration of the client_owner has not occurred previously, the client ID must first be used, along with the returned eir_sequenceid, in creating an associated session using CREATE_SESSION. Do we want to mention the BIND_CONN_TO_SESSION usage here as well? A server MUST NOT provide the same client ID to two different incarnations of an eir_clientowner. I see this is basically unchanged from 5661 so maybe it is out of scope, but what does the server use to distinguish different "incarnations" or an eir_clientowner? Section 15.3 It should be noted that there are situations in which a client needs to issue both forms of RECLAIM_COMPLETE. An example is an instance of file system migration in which the file system is migrated to a server for which the client has no clientid. As a result, the client needs to obtain a clientid from the server (incurring the responsibility to do RECLAIM_COMPLETE with rca_one_fs set to FALSE) as well as RECLAIM_COMPLETE with rca_one_fs set to TRUE to complete the per-fs grace period associated with the file system migration. IMPORTANT: Is there an ordering requirement between the rca_one_fs == TRUE/FALSE RECLAIM_COMPLETEs? Section 15.4 o When servers have no support for becoming the destination server of a file system subject to migration, there is no possibility of a per-fs RECLAIM_COMPLETE being done legitimately and occurrences of it SHOULD be ignored. However, the negative consequences of accepting mistaken use are quite limited as long as the does not issue it before all necessary reclaims are done. As long as who does not issue it? o When a server might become the destination for a file system being migrated, inappropriate use per-fs RECLAIM_COMPLETE is more concerning. In the case in which the file system designated is nit: "use of" not within a per-fs grace period, it SHOULD be ignored, with the negative consequences of accepting it being limited, as in the case in which migration is not supported. However, if it should encounter a file system undergoing migration, it cannot be accepted as if it were a global RECLAIM_COMPLETE without invalidating its intended use. We seem to be using "it" throughout here to refer both to RECLAIM_COMPLETE and to the server, which is kind of confusing. Section 16 It seems that neither RFC 5661 nor this document explicitly note that if the attacker gets a client to use bad locations and sends the client to an attacker-controlled server, the attacker can give back arbitrary data for the filesystem hierarchy for the client to use. Hopefully this is obvious, but it may be worth explicitly noting as the motivation for taking the location information so seriously. I agree with the secdir reviewer that: In light of the above, a server should present file system location entries that correspond to file systems on other servers using a host name. This would allow the client to interrogate the should be an RFC 2119 "SHOULD" (not sure what the WG poll turned up). I do see several good proposed changes in response to the secdir review; I'd propose further changing "avoid corruption of data in flight" to "avoid modification of data in flight" since "corruption" can have a connotation of randon bit flips, but the intended meaning is that an RFC 3552 attacker can deliberately change things. (I'm also fairly sympathetic to the desire to not change the MTI algorithms in this document, even though they really are quite overdue for updates. My understanding is that this document is intended as a glorified errata about the migration/trunking behavior and is trying to not change anything else.) Appendix B o Sections 11.8, 11.8.1, 11.8.2, and 11.9, are unchanged although they would be renumber as Sections 11.13 (with included nit: "renumbered"
(Spencer Dawkins) Yes
Deborah Brungard No Objection
Mirja Kühlewind No Objection
I didn't have the time to review this document in depth but I agree with others that the selected approach is hard to follow. I would have preferred a bis document or a stand-alone document that updates RFC5661 content-wise but does not have a patch-style and therefore can "simply" be consumed in addition to RFC5661.
Alexey Melnikov No Objection
I am borderline between Abstain and No Objections on usefulness of this document as a set of patches to another RFC.
(Ben Campbell) Abstain
Thanks for the huge amount of work in this draft, but I cannot recommend publishing it in it's current form. I am balloting "ABSTAIN" rather than "DISCUSS", so that I do not get in the way of publication if the rest of the IESG thinks it's fine as is. I apologize for not doing a more technical review, but I think the structural issues need to be addressed first. I once spent several weeks over a summer in high school applying page updates to a shelf full of US Tax Code binders in a CPAs office. The IRS, (or whoever published the code) would send out boxes of new page-sets, each labeled for which original pages it would replace. When we were done in the CPA office, we had a coherent set of documents, at least to the degree you can apply words like "coherent" to tax codes. This draft does something akin to that, except that we don't end up with a coherent document as a result. If the RFC Editor applied patches from RFCs that "UPDATE" other RFCs to render final text, things would be different. But that doesn't happen with RFCs. That being said, I don't object to RFC updates in general, as long as those updates are simple and fairly self-contained. But this draft is over 100 pages of intricate updates, reorganizations, and explanatory text. This will make life very hard for readers that were not involved in the standards process. I think the right approach for an update of this complexity is to just do a 5661bis draft, and use this one as an input to that. Such a bis draft could be something as simple as applying the changes in this draft (and maybe trunking-update) and publishing the result.
Alissa Cooper Abstain
I agree with Ben Campbell and am therefore abstaining. It seems like the proper path to a more broadly usable specification would have been to obsolete RFC 5661 and produce a bis with the changes applied. I don't see an explanation anywhere for why that was not done. The RFC 8174 boilerplate should be used in Section 2. The shepherd write-up notes an issue with "minor copyright dates." The issue is not one of dates but rather about whether the pre-RFC5378 disclaimer is needed or not.
Suresh Krishnan Abstain
Thanks to the authors for doing the hard work in creating this document. I sincerely tried to give this document a proper review but, like my abstaining co-ADs, I gave up along the way. The biggest issues for me were the reorganizations, lack of information about interoperability and backward compatibility with unupdated implementations, code snippet replacements with no apparent change (e.g. section 14.1 replacing 18.35.1, section 14.2 replacing 18.35.2. etc.) and so on. I also think that doing a new version of RFC5661 with the changes applied is the right way to go about this, but I will abstain so that the WG and the responsible AD can make the call.
(Terry Manderson) Abstain
My review of this document took me some time as I had to do a side my side read with RFC5661. While not normally a bad way to implement 'updates' I found myself lost more times than not. The complexity of the updates in this way loses readability and technical coherence, even though a lot of thought and effort has been invested in this work. I think that the loss of readability and technical coherence should be avoided for many reasons if not based alone on what we currently see in the set of documents which describes DNS. On this document I am balloting ABSTAIN as I feel that there are better ways to handle the depth breadth of this update, however I will not stand in the way of publication.
(Eric Rescorla) Abstain
I concur with my colleagues that a 100 page document consisting of diffs is not really something we can expect people to process
Alvaro Retana Abstain
I agree with Ben's opinion. As it is being discussed on the nsfv4 list, there are enough issues and changes to rfc5661 to justify a bis , the most significant seems to be what is already included in this document.  https://mailarchive.ietf.org/arch/msg/nfsv4/Hj5dTm-qYG3dZFyhy1VBqWv44XA
Adam Roach Abstain
I agree with Ben's comments that this format is hostile to implementors, and that the community would be far better served by a bis document.