Size-Limited Bi-directional Remote Procedure Call On Remote Direct Memory Access Transports
draft-ietf-nfsv4-rpcrdma-bidirection-00
The information below is for an old version of the document.
Document | Type |
This is an older version of an Internet-Draft that was ultimately published as RFC 8167.
|
|
---|---|---|---|
Author | Chuck Lever | ||
Last updated | 2015-06-01 | ||
RFC stream | Internet Engineering Task Force (IETF) | ||
Formats | |||
Reviews | |||
Additional resources | Mailing list discussion | ||
Stream | WG state | WG Document | |
Document shepherd | (None) | ||
IESG | IESG state | Became RFC 8167 (Proposed Standard) | |
Consensus boilerplate | Unknown | ||
Telechat date | (None) | ||
Responsible AD | (None) | ||
Send notices to | (None) |
draft-ietf-nfsv4-rpcrdma-bidirection-00
#x27;s advertised forward direction credit value. The credit value is a guaranteed minimum. However, a receiver can post more receive buffers than its credit value. There is no requirement in the RPC-over-RDMA protocol for a receiver to indicate a credit overrun. Operation continues as long as there are enough receive buffers to handle incoming messages. 2.1.2. Backward Credits Credits work the same way in the backward direction as they do in the forward direction. However, forward direction credits and backward direction credits are accounted separately. In other words, the forward direction credit value is the same whether or not there are backward direction resources associated with an RPC-over-RDMA transport connection. The backward direction credit value MAY be different than the forward direction credit value. The rdma_credit field in a backward direction RPC-over-RDMA message MUST NOT contain the value zero. A backward direction caller (an RPC-over-RDMA service endpoint) requests credits from the responder (an RPC-over-RDMA client endpoint). The responder reports how many credits it can grant. This is the number of backward direction calls the responder is prepared to handle at once. When an RPC-over-RDMA server endpoint is operating correctly, it sends no more outstanding requests at a time than the client endpoint's advertised backward direction credit value. 2.2. Managing Receive Buffers An RPC-over-RDMA transport endpoint must pre-post receive buffers before it can receive and process incoming RPC-over-RDMA messages. If a sender transmits a message for a receiver which has no prepared Lever Expires November 30, 2015 [Page 9] Internet-Draft RPC-over-RDMA Bidirection May 2015 receive buffer, the RDMA provider is allowed to drop the RDMA connection. 2.2.1. Client Receive Buffers Typically an RPC-over-RDMA caller posts only as many receive buffers as there are outstanding RPC calls. A client endpoint without backward direction support might therefore at times have no pre- posted receive buffers. To receive incoming backward direction calls, an RPC-over-RDMA client endpoint must pre-post enough additional receive buffers to match its advertised backward direction credit value. Each outstanding forward direction RPC requires an additional receive buffer above this minimum. When an RDMA transport connection is lost, all active receive buffers are flushed and are no longer available to receive incoming messages. When a fresh transport connection is established, a client endpoint must re-post a receive buffer to handle the reply for each retransmitted forward direction call, and a full set of receive buffers to handle backward direction calls. 2.2.2. Server Receive Buffers A forward direction RPC-over-RDMA service endpoint posts as many receive buffers as it expects incoming forward direction calls. That is, it posts no fewer buffers than the number of RPC-over-RDMA credits it advertises in the rdma_credit field of forward direction RPC replies. To receive incoming backward direction replies, an RPC-over-RDMA server endpoint must pre-post a receive buffer for each backward direction call it sends. When the existing transport connection is lost, all active receive buffers are flushed and are no longer available to receive incoming messages. When a fresh transport connection is established, a server endpoint must re-post a receive buffer to handle the reply for each retransmitted backward direction call, and a full set of receive buffers for receiving forward direction calls. 2.2.3. In the Absense of Backward Direction Support An RPC-over-RDMA transport endpoint might not support backward direction operation. There might be no mechanism in the implementation to do so. Or the Upper Layer Protocol consumer might Lever Expires November 30, 2015 [Page 10] Internet-Draft RPC-over-RDMA Bidirection May 2015 not yet have configured the transport to handle backward direction traffic. A loss of the RDMA connection may result if the receiver is not prepared to receive an incoming message. Thus a denial-of-service could result if a sender continues to send backchannel messages after every transport reconnect to an endpoint that is not prepared to receive them. Generally, for RPC-over-RDMA version 1 transports, the Upper Layer Protocol consumer is responsible for informing its peer when it has no support for the backward direction. Otherwise even a simple backward direction NULL probe from a peer would result in a lost connection. An NFSv4.1 server should never send backchannel messages to an NFSv4.1 client before the NFSv4.1 client has sent a CREATE_SESSION or a BIND_CONN_TO_SESSION operation. As long as an NFSv4.1 client has prepared appropriate backchannel resources before sending one of these operations, denial-of-service is avoided. Legacy versions of NFS should never send backchannel operations. Therefore, an Upper Layer Protocol consumer MUST NOT perform backward direction ONC RPC operations unless the peer consumer has indicated it is prepared to handle them. A description of Upper Layer Protocol mechanisms used for this indication is outside the scope of this document. 2.3. Backward Direction Retransmission In rare cases, an ONC RPC transaction cannot be completed within a certain time. This can be because the transport connection was lost, the call or reply message was dropped, or because the Upper Layer consumer delayed or dropped the ONC RPC request. Typically, the caller sends the transaction again, reusing the same RPC XID. This is known as an "RPC retransmission". In the forward direction, the caller is the ONC RPC client. The client is always responsible for establishing a transport connection before sending again. In the backward direction, the caller is the ONC RPC server. Because an ONC RPC server does not establish transport connections with clients, it cannot send a retransmission if there is no transport connection. It must wait for the ONC RPC client to re-establish the transport connection before it can retransmit ONC RPC transactions in the backward direction. Lever Expires November 30, 2015 [Page 11] Internet-Draft RPC-over-RDMA Bidirection May 2015 If an ONC RPC client has no work to do, it may be some time before it re-establishes a transport connection. Backward direction callers must be prepared to wait indefinitely before a connection is established before a pending backward direction ONC RPC call can be retransmitted. 2.4. Backward Direction Message Size RPC-over-RDMA backward direction messages are transmitted and received using the same buffers as messages in the forward direction. Therefore they are constrained to be no larger than receive buffers posted for forward messages. Typical implementations have chosen to use 1024-byte buffers. It is expected that the Upper Layer Protocol consumer establishes an appropriate payload size limit for backward direction operations, either by advertising that size limit to its peers, or by convention. If that is done, backward direction messages would not exceed the size of receive buffers at either endpoint. If a sender transmits a backward direction message that is larger than the receiver is prepared for, the RDMA provider drops the message and the RDMA connection. If a sender transmits an RDMA message that is too small to convey a complete and valid RPC-over-RDMA and RPC message in either direction, the receiver MUST NOT use any value in the fields that were transmitted. Namely, the rdma_credit field MUST be ignored, and the message dropped. 2.5. Sending A Backward Direction Call To form a backward direction RPC-over-RDMA call message on an RPC- over-RDMA version 1 transport, an ONC RPC service endpoint constructs an RPC-over-RDMA header containing a fresh RPC XID in the rdma_xid field (see Section 1.3.4 for full requirements). The rdma_vers field MUST contain the value one. The number of requested credits is placed in the rdma_credit field (see Section 2.1). The rdma_proc field in the RPC-over-RDMA header MUST contain the value RDMA_MSG. All three chunk lists MUST be empty. The ONC RPC call header MUST follow immediately, starting with the same XID value that is present in the RPC-over-RDMA header. The call header's msg_type field MUST contain the value CALL. Lever Expires November 30, 2015 [Page 12] Internet-Draft RPC-over-RDMA Bidirection May 2015 2.6. Sending A Backward Direction Reply To form a backward direction RPC-over-RDMA reply message on an RPC- over-RDMA version 1 transport, an ONC RPC client endpoint constructs an RPC-over-RDMA header containing a copy of the matching ONC RPC call's RPC XID in the rdma_xid field (see Section 1.3.4 for full requirements). The rdma_vers field MUST contain the value one. The number of granted credits is placed in the rdma_credit field (see Section 2.1). The rdma_proc field in the RPC-over-RDMA header MUST contain the value RDMA_MSG. All three chunk lists MUST be empty. The ONC RPC reply header MUST follow immediately, starting with the same XID value that is present in the RPC-over-RDMA header. The reply header's msg_type field MUST contain the value REPLY. 3. Limits To This Approach 3.1. Payload Size The major drawback to the approach described in this document is the limit on payload size in backward direction requests. o Some NFSv4.1 callback operations can have potentially large arguments or results. For example, CB_GETATTR on a file with a large ACL; or CB_NOTIFY, which can provide a large, complex argument. o Any backward direction operation protected by RPCSEC_GSS may have additional header information that makes it difficult to send backward direction operations with large arguments or results. o Larger payloads could potentially require the use of RDMA data transfers, which are complex and make it more difficult to detect backward direction requests. The msg_type field in the ONC RPC header would no longer be at a fixed location in backward direction requests. 3.2. Preparedness To Handle Backward Requests A second drawback is the exposure of the client transport endpoint to backward direction calls before it has posted receive buffers to handle them. Clients that do not support backward direction operation typically drop messages they do not recognize. However, this does not allow Lever Expires November 30, 2015 [Page 13] Internet-Draft RPC-over-RDMA Bidirection May 2015 bi-direction-capable servers to quickly identify clients that cannot handle backward direction requests. The conventions in this document rely on Upper Layer Protocol consumers to decide when backward direction transport operation is appropriate. 3.3. Long Term To address the limitations described in this section in the long run, a new version of the RPC-over-RDMA protocol would be required. The use of the conventions described in this document to enable backward direction operation is thus a transitional approach that is appropriate only while RPC-over-RDMA version 1 is the predominantly deployed version of the RPC-over-RDMA protocol. 4. Security Considerations As a consequence of limiting the size of backward direction RPC-over- RDMA messages, the use of RPCSEC_GSS integrity and confidentiality services (see [RFC2203]) in the backward direction may be challenging due to the size of the additional RPC header information required for RPCSEC_GSS. 5. IANA Considerations This document does not require actions by IANA. 6. Acknowledgements Tom Talpey was an indispensable resource, in addition to creating the foundation upon which this work is based. Our warmest regards go to him for his help and support. Dave Noveck provided excellent review, constructive suggestions, and navigational guidance throughout the process of drafting this document. Dai Ngo was a solid partner and collaborator. Together we constructed and tested independent prototypes of the conventions described in this document. The author wishes to thank Bill Baker for his unwavering support of this work. In addition, the author gratefully acknowledges the expert contributions of Karen Deitke, Chunli Zhang, Mahesh Siddheshwar, and Tom Tucker. Lever Expires November 30, 2015 [Page 14] Internet-Draft RPC-over-RDMA Bidirection May 2015 Special thanks go to the nfsv4 Working Group chair Spencer Shepler and the WG Editor Tom Haynes for their support. 7. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol Specification", RFC 2203, September 1997. [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol Specification Version 2", RFC 5531, May 2009. [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, January 2010. [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access Transport for Remote Procedure Call", RFC 5666, January 2010. [RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) Version 4 Protocol", RFC 7530, March 2015. Author's Address Charles Lever Oracle Corporation 1015 Granger Avenue Ann Arbor, MI 48104 US Phone: +1 734 274 2396 Email: chuck.lever@oracle.com Lever Expires November 30, 2015 [Page 15]