Skip to main content

Size-Limited Bi-directional Remote Procedure Call On Remote Direct Memory Access Transports
draft-ietf-nfsv4-rpcrdma-bidirection-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft that was ultimately published as RFC 8167.
Author Chuck Lever
Last updated 2015-06-01
RFC stream Internet Engineering Task Force (IETF)
Formats
Reviews
Additional resources Mailing list discussion
Stream WG state WG Document
Document shepherd (None)
IESG IESG state Became RFC 8167 (Proposed Standard)
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-ietf-nfsv4-rpcrdma-bidirection-00
#x27;s advertised forward direction credit value.

   The credit value is a guaranteed minimum.  However, a receiver can
   post more receive buffers than its credit value.  There is no
   requirement in the RPC-over-RDMA protocol for a receiver to indicate
   a credit overrun.  Operation continues as long as there are enough
   receive buffers to handle incoming messages.

2.1.2.  Backward Credits

   Credits work the same way in the backward direction as they do in the
   forward direction.  However, forward direction credits and backward
   direction credits are accounted separately.

   In other words, the forward direction credit value is the same
   whether or not there are backward direction resources associated with
   an RPC-over-RDMA transport connection.  The backward direction credit
   value MAY be different than the forward direction credit value.  The
   rdma_credit field in a backward direction RPC-over-RDMA message MUST
   NOT contain the value zero.

   A backward direction caller (an RPC-over-RDMA service endpoint)
   requests credits from the responder (an RPC-over-RDMA client
   endpoint).  The responder reports how many credits it can grant.
   This is the number of backward direction calls the responder is
   prepared to handle at once.

   When an RPC-over-RDMA server endpoint is operating correctly, it
   sends no more outstanding requests at a time than the client
   endpoint's advertised backward direction credit value.

2.2.  Managing Receive Buffers

   An RPC-over-RDMA transport endpoint must pre-post receive buffers
   before it can receive and process incoming RPC-over-RDMA messages.
   If a sender transmits a message for a receiver which has no prepared

Lever                   Expires November 30, 2015               [Page 9]
Internet-Draft          RPC-over-RDMA Bidirection               May 2015

   receive buffer, the RDMA provider is allowed to drop the RDMA
   connection.

2.2.1.  Client Receive Buffers

   Typically an RPC-over-RDMA caller posts only as many receive buffers
   as there are outstanding RPC calls.  A client endpoint without
   backward direction support might therefore at times have no pre-
   posted receive buffers.

   To receive incoming backward direction calls, an RPC-over-RDMA client
   endpoint must pre-post enough additional receive buffers to match its
   advertised backward direction credit value.  Each outstanding forward
   direction RPC requires an additional receive buffer above this
   minimum.

   When an RDMA transport connection is lost, all active receive buffers
   are flushed and are no longer available to receive incoming messages.
   When a fresh transport connection is established, a client endpoint
   must re-post a receive buffer to handle the reply for each
   retransmitted forward direction call, and a full set of receive
   buffers to handle backward direction calls.

2.2.2.  Server Receive Buffers

   A forward direction RPC-over-RDMA service endpoint posts as many
   receive buffers as it expects incoming forward direction calls.  That
   is, it posts no fewer buffers than the number of RPC-over-RDMA
   credits it advertises in the rdma_credit field of forward direction
   RPC replies.

   To receive incoming backward direction replies, an RPC-over-RDMA
   server endpoint must pre-post a receive buffer for each backward
   direction call it sends.

   When the existing transport connection is lost, all active receive
   buffers are flushed and are no longer available to receive incoming
   messages.  When a fresh transport connection is established, a server
   endpoint must re-post a receive buffer to handle the reply for each
   retransmitted backward direction call, and a full set of receive
   buffers for receiving forward direction calls.

2.2.3.  In the Absense of Backward Direction Support

   An RPC-over-RDMA transport endpoint might not support backward
   direction operation.  There might be no mechanism in the
   implementation to do so.  Or the Upper Layer Protocol consumer might

Lever                   Expires November 30, 2015              [Page 10]
Internet-Draft          RPC-over-RDMA Bidirection               May 2015

   not yet have configured the transport to handle backward direction
   traffic.

   A loss of the RDMA connection may result if the receiver is not
   prepared to receive an incoming message.  Thus a denial-of-service
   could result if a sender continues to send backchannel messages after
   every transport reconnect to an endpoint that is not prepared to
   receive them.

   Generally, for RPC-over-RDMA version 1 transports, the Upper Layer
   Protocol consumer is responsible for informing its peer when it has
   no support for the backward direction.  Otherwise even a simple
   backward direction NULL probe from a peer would result in a lost
   connection.

   An NFSv4.1 server should never send backchannel messages to an
   NFSv4.1 client before the NFSv4.1 client has sent a CREATE_SESSION or
   a BIND_CONN_TO_SESSION operation.  As long as an NFSv4.1 client has
   prepared appropriate backchannel resources before sending one of
   these operations, denial-of-service is avoided.  Legacy versions of
   NFS should never send backchannel operations.

   Therefore, an Upper Layer Protocol consumer MUST NOT perform backward
   direction ONC RPC operations unless the peer consumer has indicated
   it is prepared to handle them.  A description of Upper Layer Protocol
   mechanisms used for this indication is outside the scope of this
   document.

2.3.  Backward Direction Retransmission

   In rare cases, an ONC RPC transaction cannot be completed within a
   certain time.  This can be because the transport connection was lost,
   the call or reply message was dropped, or because the Upper Layer
   consumer delayed or dropped the ONC RPC request.  Typically, the
   caller sends the transaction again, reusing the same RPC XID.  This
   is known as an "RPC retransmission".

   In the forward direction, the caller is the ONC RPC client.  The
   client is always responsible for establishing a transport connection
   before sending again.

   In the backward direction, the caller is the ONC RPC server.  Because
   an ONC RPC server does not establish transport connections with
   clients, it cannot send a retransmission if there is no transport
   connection.  It must wait for the ONC RPC client to re-establish the
   transport connection before it can retransmit ONC RPC transactions in
   the backward direction.

Lever                   Expires November 30, 2015              [Page 11]
Internet-Draft          RPC-over-RDMA Bidirection               May 2015

   If an ONC RPC client has no work to do, it may be some time before it
   re-establishes a transport connection.  Backward direction callers
   must be prepared to wait indefinitely before a connection is
   established before a pending backward direction ONC RPC call can be
   retransmitted.

2.4.  Backward Direction Message Size

   RPC-over-RDMA backward direction messages are transmitted and
   received using the same buffers as messages in the forward direction.
   Therefore they are constrained to be no larger than receive buffers
   posted for forward messages.  Typical implementations have chosen to
   use 1024-byte buffers.

   It is expected that the Upper Layer Protocol consumer establishes an
   appropriate payload size limit for backward direction operations,
   either by advertising that size limit to its peers, or by convention.
   If that is done, backward direction messages would not exceed the
   size of receive buffers at either endpoint.

   If a sender transmits a backward direction message that is larger
   than the receiver is prepared for, the RDMA provider drops the
   message and the RDMA connection.

   If a sender transmits an RDMA message that is too small to convey a
   complete and valid RPC-over-RDMA and RPC message in either direction,
   the receiver MUST NOT use any value in the fields that were
   transmitted.  Namely, the rdma_credit field MUST be ignored, and the
   message dropped.

2.5.  Sending A Backward Direction Call

   To form a backward direction RPC-over-RDMA call message on an RPC-
   over-RDMA version 1 transport, an ONC RPC service endpoint constructs
   an RPC-over-RDMA header containing a fresh RPC XID in the rdma_xid
   field (see Section 1.3.4 for full requirements).

   The rdma_vers field MUST contain the value one.  The number of
   requested credits is placed in the rdma_credit field (see
   Section 2.1).

   The rdma_proc field in the RPC-over-RDMA header MUST contain the
   value RDMA_MSG.  All three chunk lists MUST be empty.

   The ONC RPC call header MUST follow immediately, starting with the
   same XID value that is present in the RPC-over-RDMA header.  The call
   header's msg_type field MUST contain the value CALL.

Lever                   Expires November 30, 2015              [Page 12]
Internet-Draft          RPC-over-RDMA Bidirection               May 2015

2.6.  Sending A Backward Direction Reply

   To form a backward direction RPC-over-RDMA reply message on an RPC-
   over-RDMA version 1 transport, an ONC RPC client endpoint constructs
   an RPC-over-RDMA header containing a copy of the matching ONC RPC
   call's RPC XID in the rdma_xid field (see Section 1.3.4 for full
   requirements).

   The rdma_vers field MUST contain the value one.  The number of
   granted credits is placed in the rdma_credit field (see Section 2.1).

   The rdma_proc field in the RPC-over-RDMA header MUST contain the
   value RDMA_MSG.  All three chunk lists MUST be empty.

   The ONC RPC reply header MUST follow immediately, starting with the
   same XID value that is present in the RPC-over-RDMA header.  The
   reply header's msg_type field MUST contain the value REPLY.

3.  Limits To This Approach

3.1.  Payload Size

   The major drawback to the approach described in this document is the
   limit on payload size in backward direction requests.

   o  Some NFSv4.1 callback operations can have potentially large
      arguments or results.  For example, CB_GETATTR on a file with a
      large ACL; or CB_NOTIFY, which can provide a large, complex
      argument.

   o  Any backward direction operation protected by RPCSEC_GSS may have
      additional header information that makes it difficult to send
      backward direction operations with large arguments or results.

   o  Larger payloads could potentially require the use of RDMA data
      transfers, which are complex and make it more difficult to detect
      backward direction requests.  The msg_type field in the ONC RPC
      header would no longer be at a fixed location in backward
      direction requests.

3.2.  Preparedness To Handle Backward Requests

   A second drawback is the exposure of the client transport endpoint to
   backward direction calls before it has posted receive buffers to
   handle them.

   Clients that do not support backward direction operation typically
   drop messages they do not recognize.  However, this does not allow

Lever                   Expires November 30, 2015              [Page 13]
Internet-Draft          RPC-over-RDMA Bidirection               May 2015

   bi-direction-capable servers to quickly identify clients that cannot
   handle backward direction requests.

   The conventions in this document rely on Upper Layer Protocol
   consumers to decide when backward direction transport operation is
   appropriate.

3.3.  Long Term

   To address the limitations described in this section in the long run,
   a new version of the RPC-over-RDMA protocol would be required.  The
   use of the conventions described in this document to enable backward
   direction operation is thus a transitional approach that is
   appropriate only while RPC-over-RDMA version 1 is the predominantly
   deployed version of the RPC-over-RDMA protocol.

4.  Security Considerations

   As a consequence of limiting the size of backward direction RPC-over-
   RDMA messages, the use of RPCSEC_GSS integrity and confidentiality
   services (see [RFC2203]) in the backward direction may be challenging
   due to the size of the additional RPC header information required for
   RPCSEC_GSS.

5.  IANA Considerations

   This document does not require actions by IANA.

6.  Acknowledgements

   Tom Talpey was an indispensable resource, in addition to creating the
   foundation upon which this work is based.  Our warmest regards go to
   him for his help and support.

   Dave Noveck provided excellent review, constructive suggestions, and
   navigational guidance throughout the process of drafting this
   document.

   Dai Ngo was a solid partner and collaborator.  Together we
   constructed and tested independent prototypes of the conventions
   described in this document.

   The author wishes to thank Bill Baker for his unwavering support of
   this work.  In addition, the author gratefully acknowledges the
   expert contributions of Karen Deitke, Chunli Zhang, Mahesh
   Siddheshwar, and Tom Tucker.

Lever                   Expires November 30, 2015              [Page 14]
Internet-Draft          RPC-over-RDMA Bidirection               May 2015

   Special thanks go to the nfsv4 Working Group chair Spencer Shepler
   and the WG Editor Tom Haynes for their support.

7.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2203]  Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol
              Specification", RFC 2203, September 1997.

   [RFC5531]  Thurlow, R., "RPC: Remote Procedure Call Protocol
              Specification Version 2", RFC 5531, May 2009.

   [RFC5661]  Shepler, S., Eisler, M., and D. Noveck, "Network File
              System (NFS) Version 4 Minor Version 1 Protocol", RFC
              5661, January 2010.

   [RFC5666]  Talpey, T. and B. Callaghan, "Remote Direct Memory Access
              Transport for Remote Procedure Call", RFC 5666, January
              2010.

   [RFC7530]  Haynes, T. and D. Noveck, "Network File System (NFS)
              Version 4 Protocol", RFC 7530, March 2015.

Author's Address

   Charles Lever
   Oracle Corporation
   1015 Granger Avenue
   Ann Arbor, MI  48104
   US

   Phone: +1 734 274 2396
   Email: chuck.lever@oracle.com

Lever                   Expires November 30, 2015              [Page 15]