Skip to main content

Network File System (NFS) Remote Direct Memory Access (RDMA) Problem Statement
draft-ietf-nfsv4-nfs-rdma-problem-statement-08

The information below is for an old version of the document that is already published as an RFC.
Document Type
This is an older version of an Internet-Draft that was ultimately published as RFC 5532.
Authors Chet Juszczak , Tom Talpey
Last updated 2015-10-14 (Latest revision 2008-02-21)
RFC stream Internet Engineering Task Force (IETF)
Intended RFC status Informational
Formats
Reviews
Additional resources Mailing list discussion
Stream WG state (None)
Document shepherd (None)
IESG IESG state Became RFC 5532 (Informational)
Action Holders
(None)
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD Lars Eggert
Send notices to (None)
draft-ietf-nfsv4-nfs-rdma-problem-statement-08
NFSv4 Working Group                                      Tom Talpey
Internet-Draft                                               NetApp
Intended status: Informational                        Chet Juszczak
Expires: August 23, 2008                          February 21, 2008

                       NFS RDMA Problem Statement
             draft-ietf-nfsv4-nfs-rdma-problem-statement-08

Status of this Memo

     By submitting this Internet-Draft, each author represents that any
     applicable patent or other IPR claims of which he or she is aware
     have been or will be disclosed, and any of which he or she becomes
     aware will be disclosed, in accordance with Section 6 of BCP 79.

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups.  Note that
     other groups may also distribute working documents as Internet-
     Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-Drafts
     as reference material or to cite them other than as "work in
     progress."

     The list of current Internet-Drafts can be accessed at
         http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
         http://www.ietf.org/shadow.html.

     This Internet-Draft will expire on August 23, 2008.

Copyright Notice

     Copyright (C) The IETF Trust (2008).

Talpey and Juszczak        Expires August 2008                  [Page 1]
Internet-Draft         NFS RDMA Problem Statement          February 2008

Abstract

     This draft addresses enabling the use of Remote Direct Memory
     Access (RDMA) by the Network File System (NFS) protocols.  NFS
     implementations historically incur significant overhead due to data
     copies on end-host systems, as well as other processing overhead.
     The potential benefits of RDMA to these implementations are
     explored, and the reasons why RDMA is especially well-suited to NFS
     and network file protocols in general are evaluated.

Table Of Contents

     1.   Introduction . . . . . . . . . . . . . . . . . . . . . . .   2
     2.   Problem Statement  . . . . . . . . . . . . . . . . . . . .   5
     3.   File Protocol Architecture . . . . . . . . . . . . . . . .   6
     4.   Sources of Overhead  . . . . . . . . . . . . . . . . . . .   8
     4.1.   Savings from TOE . . . . . . . . . . . . . . . . . . . .   9
     4.2.   Savings from RDMA  . . . . . . . . . . . . . . . . . . .  10
     5.   Application of RDMA to NFS . . . . . . . . . . . . . . . .  10
     6.   Conclusions  . . . . . . . . . . . . . . . . . . . . . . .  11
          Security Considerations  . . . . . . . . . . . . . . . . .  12
          IANA Considerations  . . . . . . . . . . . . . . . . . . .  13
          Acknowledgements . . . . . . . . . . . . . . . . . . . . .  13
          Normative References . . . . . . . . . . . . . . . . . . .  13
          Informative References . . . . . . . . . . . . . . . . . .  13
          Authors' Addresses . . . . . . . . . . . . . . . . . . . .  16
          Intellectual Property and Copyright Statements . . . . . .  16
     Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . .  17

1.  Introduction

     The Network File System (NFS) protocol (as described in [RFC1094],
     [RFC1813], and [RFC3530]) is one of several remote file access
     protocols used in the class of processing architecture sometimes
     called Network Attached Storage (NAS).

     Historically, remote file access has proven to be a convenient,
     cost-effective way to share information over a network, a concept
     proven over time by the popularity of the NFS protocol.  However,
     there are issues in such a deployment.

     As compared to a local (direct-attached) file access architecture,
     NFS removes the overhead of managing the local on-disk filesystem
     state and its metadata, but interposes at least a transport network
     and two network endpoints between an application process and the
     files it is accessing.  This tradeoff has to date usually resulted
     in a net performance loss as a result of reduced bandwidth,

Talpey and Juszczak        Expires August 2008                  [Page 2]
Internet-Draft         NFS RDMA Problem Statement          February 2008

     increased application server CPU utilization, and other overheads.

     Several classes of applications, including those directly
     supporting enterprise activities in high performance domains such
     as database applications and shared clusters, have therefore
     encountered issues with moving to NFS architectures.  While this
     has been due principally to the performance costs of NFS versus
     direct attached files, other reasons are relevant, such as the lack
     of strong consistency guarantees being provided by NFS
     implementations.

     Replication of local file access performance on NAS using
     traditional network protocol stacks has proven difficult, not
     because of protocol processing overheads, but because of data copy
     costs in the network endpoints.  This is especially true since host
     buses are now often the main bottleneck in NAS architectures
     [MOG03] [CHA+01].

     The External Data Representation [RFC4506] employed beneath NFS and
     RPC [RFC1831bis] can add more data copies, exacerbating the
     problem.

     Data copy-avoidance designs have not been widely adopted for a
     variety of reasons.  [BRU99] points out that "many copy avoidance
     techniques for network I/O are not applicable or may even backfire
     if applied to file I/O."  Other designs that eliminate unnecessary
     copies, such as [PAI+00], are incompatible with existing APIs and
     therefore force application changes.

     In recent years, an effort to standardize a set of protocols for
     Remote Direct Memory Access, RDMA, over the standard Internet
     Protocol Suite has been chartered [RDDP].  A complete IP-based RDMA
     procotol suite is available in the published Standards Track
     specifications.

     RDMA is a general solution to the problem of CPU overhead incurred
     due to data copies, primarily at the receiver.  Substantial
     research has addressed this and has borne out the efficacy of the
     approach.  An overview of this is the RDDP "Remote Direct Memory
     Access (RDMA) over IP Problem Statement" document, [RFC4297].

     In addition to the per-byte savings of off-loading data copies,
     RDMA-enabled NICs (RNICS) offload the underlying protocol layers as
     well, e.g., TCP, further reducing CPU overhead due to NAS
     processing.

Talpey and Juszczak        Expires August 2008                  [Page 3]
Internet-Draft         NFS RDMA Problem Statement          February 2008

1.1.  Background

     The RDDP Problem Statement [RFC4297] asserts:

          "High costs associated with copying are an issue primarily for
          large scale systems ... with high bandwidth feeds, usually
          multiprocessors and clusters, that are adversely affected by
          copying overhead.  Examples of such machines include all
          varieties of servers: database servers, storage servers,
          application servers for transaction processing, for e-
          commerce, and web serving, content distribution, video
          distribution, backups, data mining and decision support, and
          scientific computing.

          Note that such servers almost exclusively service many
          concurrent sessions (transport connections), which, in
          aggregate, are responsible for > 1 Gbits/s of communication.
          Nonetheless, the cost of copying overhead for a particular
          load is the same whether from few or many sessions."

     Note that each of the servers listed above could be accessing their
     file data as an NFS client, or NFS serving the data to such
     clients, or acting as both.

     The CPU overhead of the NFS and TCP/IP protocol stacks (including
     data copies or reduced copy workarounds) becomes a significant
     matter in these clients and servers.  File access using locally
     attached disks imposes relatively low overhead due to the highly
     optimized I/O path and direct memory access afforded to the storage
     controller.  This is not the case with NFS, which must pass data
     to, and especially from, the network and network processing stack
     to the NFS stack.  Frequently, data copies are imposed on this
     transfer, in some cases several such copies in each direction.

     Copies are potentially encountered in an NFS implementation
     exchanging data to and from user address spaces, within kernel
     buffer caches, in XDR marshalling and unmarshalling, and within
     network stacks and network drivers.  Other overheads such as
     serialization among multiple threads of execution sharing a single
     NFS mount point and transport connection are additionally
     encountered.

     Numerous upper layer protocols achieve extremely high bandwidth and
     low overhead through the use of RDMA.  [MAF+02] show that the RDMA-
     based Direct Access File System (with a user-level implementation
     of the file system client) can outperform even a zero-copy
     implementation of NFS [CHA+01] [CHA+99] [GAL+99] [KM02].  Also,
     file data access implies the use of large ULP messages.  These

Talpey and Juszczak        Expires August 2008                  [Page 4]
Internet-Draft         NFS RDMA Problem Statement          February 2008

     large messages tend to amortize any increase in per-message costs
     due to the offload of protocol processing incurred when using RNICs
     while gaining the benefits of reduced per-byte costs.  Finally, the
     direct memory addressing afforded by RDMA avoids many sources of
     contention on network resources.

2.  Problem Statement

     The principal performance problem encountered by NFS
     implementations is the CPU overhead required to implement the
     protocol.  Primary among the sources of this overhead is the
     movement of data from NFS protocol messages to its eventual
     destination in user buffers or aligned kernel buffers.  Due to the
     nature of the RPC and XDR protocols, the NFS data payload arrives
     at arbitrary alignment, necessitating a copy at the receiver, and
     the NFS requests are completed in an arbitrary sequence.

     The data copies consume system bus bandwidth and CPU time, reducing
     the available system capacity for applications [RFC4297].
     Achieving zero-copy with NFS has, to date, required sophisticated,
     version-specific "header cracking" hardware and/or extensive
     platform-specific virtual memory mapping tricks.  Such approaches
     become even more difficult for NFS version 4 due to the existence
     of the COMPOUND operation and presence of Kerberos and other
     security information, which further reduce alignment and greatly
     complicate ULP offload.

     Furthermore, NFS is challenged by high-speed network fabrics such
     as 10 Gbits/s Ethernet.  Performing even raw network I/O such as
     TCP is an issue at such speeds with today's hardware.  The problem
     is fundamental in nature and has led the IETF to explore RDMA
     [RFC4297].

     Zero-copy techniques benefit file protocols extensively, as they
     enable direct user I/O, reduce the overhead of protocol stacks,
     provide perfect alignment into caches, etc.  Many studies have
     already shown the performance benefits of such techniques [SKE+01]
     [DCK+03] [FJNFS] [FJDAFS] [KM02] [MAF+02].

     RDMA is compelling here for another reason; hardware offloaded
     networking support in itself does not avoid data copies, without
     resorting to implementing part of the NFS protocol in the NIC.
     Support of RDMA by NFS enables the highest performance at the
     architecture level rather than by implementation; this enables
     ubiquitous and interoperable solutions.

     By providing file access performance equivalent to that of local
     file systems, NFS over RDMA will enable applications running on a

Talpey and Juszczak        Expires August 2008                  [Page 5]
Internet-Draft         NFS RDMA Problem Statement          February 2008

     set of client machines to interact through an NFS file system, just
     as applications running on a single machine might interact through
     a local file system.

3.  File Protocol Architecture

     NFS runs as an ONC RPC [RFC1831bis] application.  Being a file
     access protocol, NFS is very "rich" in data content (versus control
     information).

     NFS messages can range from very small (under 100 bytes) to very
     large (from many kilobytes to a megabyte or more).  They are all
     contained within an RPC message and follow a variable length RPC
     header.  This layout provides an alignment challenge for the data
     items contained in an NFS call (request) or reply (response)
     message.

     In addition to the control information in each NFS call or reply
     message, sometimes there are large "chunks" of application file
     data, for example read and write requests.  With NFS version 4 (due
     to the existence of the COMPOUND operation) there can be several of
     these data chunks interspersed with control information.

     ONC RPC is a remote procedure call protocol that has been run over
     a variety of transports.  Most implementations today use UDP or
     TCP.  RPC messages are defined in terms of an eXternal Data
     Representation (XDR) [RFC4506] which provides a canonical data
     representation across a variety of host architectures.  An XDR data
     stream is conveyed differently on each type of transport.  On UDP,
     RPC messages are encapsulated inside datagrams, while on a TCP byte
     stream, RPC messages are delineated by a record marking protocol.
     An RDMA transport also conveys RPC messages in a unique fashion
     that must be fully described if client and server implementations
     are to interoperate.

     The RPC transport is responsible for conveying an RPC message from
     a sender to a receiver.  An RPC message is either an RPC call from
     a client to a server, or an RPC reply from the server back to the
     client.  An RPC message contains an RPC call header followed by
     arguments if the message is an RPC call, or an RPC reply header
     followed by results if the message is an RPC reply.  The call
     header contains a transaction ID (XID) followed by the program and
     procedure number as well as a security credential.  An RPC reply
     header begins with an XID that matches that of the RPC call
     message, followed by a security verifier and results.  All data in
     an RPC message is XDR encoded.

Talpey and Juszczak        Expires August 2008                  [Page 6]
Internet-Draft         NFS RDMA Problem Statement          February 2008

     The encoding of XDR data into transport buffers is referred to as
     "marshalling", and the decoding of XDR data contained within
     transport buffers and into destination RPC procedure result
     buffers, is referred to as "unmarshalling".  The process of
     marshalling takes place therefore at the sender of any particular
     message, be it an RPC request or an RPC response.  Unmarshalling,
     of course, takes place at the receiver.

     Normally, any bulk data is moved (copied) as a result of the
     unmarshalling process, because the destination address is not known
     until the RPC code receives control and subsequently invokes the
     XDR unmarshalling routine.  In other words, XDR-encoded data is not
     self-describing, and it carries no placement information.  This
     results in a data copy in most NFS implementations.

     One mechanism by which the RPC layer may overcome this is for each
     request to include placement information, to be used for direct
     placement during XDR encode.  This "write chunk" can avoid sending
     bulk data inline in an RPC message and generally results in one or
     more RDMA Write operations.

     Similarly, a "read chunk", where placement information referring to
     bulk data which may be directly fetched via one or more RDMA Read
     operations during XDR decode, may be conveyed.  The "read chunk"
     will therefore be useful in both RPC calls and replies, while the
     "write chunk" is used solely in replies.

     These "chunks" are the key concept in an existing proposal
     [RPCRDMA].  They convey what are effectively pointers to remote
     memory across the network.  They allow cooperating peers to
     exchange data outside of XDR encodings but still use XDR for
     describing the data to be transferred.  And, finally, through use
     of XDR they maintain a large degree of on-the-wire compatibility.

     The central concept of the RDMA transport is to provide the
     additional encoding conventions to convey this placement
     information in transport-specific encoding, and to modify the XDR
     handling of bulk data.

Talpey and Juszczak        Expires August 2008                  [Page 7]
Internet-Draft         NFS RDMA Problem Statement          February 2008

                             Block Diagram

     +------------------------+-----------------------------------+
     |         NFS            |            NFS + RDMA             |
     +------------------------+----------------------+------------+
     |           Operations / Procedures             |            |
     +-----------------------------------------------+            |
     |                   RPC/XDR                     |            |
     +--------------------------------+--------------+            |
     |       Stream Transport         |      RDMA Transport       |
     +--------------------------------+---------------------------+

4.  Sources of Overhead

     Network and file protocol costs can be categorized as follows:

     o    per-byte costs - data touching costs such as checksum or data
          copy.  Today's network interface hardware commonly offloads
          the checksum, which leaves the other major source of per-byte
          overhead, data copy.

     o    per-packet costs - interrupts and lower-layer processing.
          Today's network interface hardware also commonly coalesce
          interrupts to reduce per-packet costs.

     o    per-message (request or response) costs - LLP and ULP
          processing.

     Improvement from optimization becomes more important if the
     overhead it targets is a larger share of the total cost.  As other
     sources of overhead, such as the checksumming and interrupt
     handling above are eliminated, the remaining overheads (primarily
     data copy) loom larger.

     With copies crossing the bus twice per copy, network processing
     overhead is high whenever network bandwidth is large in comparison
     to CPU and memory bandwidths.  Generally with today's end-systems,
     the effects are observable at network speeds at or above 1 Gbits/s.

     A common question is whether an increase in CPU processing power
     alleviates the problem of high processing costs of network I/O.
     The answer is no, it is the memory bandwidth that is the issue.
     Faster CPUs do not help if the CPU spends most of its time waiting
     for memory [RFC4297].

     TCP offload engine (TOE) technology aims to offload the CPU by
     moving TCP/IP protocol processing to the NIC.  However, TOE
     technology by itself does nothing to avoid necessary data copies

Talpey and Juszczak        Expires August 2008                  [Page 8]
Internet-Draft         NFS RDMA Problem Statement          February 2008

     within upper layer protocols.  [MOG03] provides a description of
     the role TOE can play in reducing per-packet and per-message costs.
     Beyond the offloads commonly provided by today's network interface
     hardware, TOE alone (w/o RDMA) helps in protocol header processing,
     but this has been shown to be a minority component of the total
     protocol processing overhead. [CHA+01]

     Numerous software approaches to the optimization of network
     throughput have been made.  Experience has shown that network I/O
     interacts with other aspects of system processing such as file I/O
     and disk I/O.  [BRU99] [CHU96] Zero-copy optimizations based on
     page remapping [CHU96] can be dependent upon machine architecture,
     and are not scalable to multi-processor architectures.  Correct
     buffer alignment and sizing together are needed to optimize the
     performance of zero-copy movement mechanisms [SKE+01].  The NFS
     message layout described above does not facilitate the splitting of
     headers from data nor does it facilitate providing correct data
     buffer alignment.

4.1.  Savings from TOE

     The expected improvement of TOE specifically for NFS protocol
     processing can be quantified and shown to be fundamentally limited.
     [SHI+03] presents a set of "LAWS" parameters which serve to
     illustrate the issues.  In the TOE case, the copy cost can be
     viewed as part of the application processing "a".  Application
     processing increases the LAWS "gamma", which is shown by the paper
     to result in a diminished benefit for TOE.

     For example, if the overhead is 20% TCP/IP, 30% copy and 50% real
     application work, then gamma is 80/20 or 4, which means the maximum
     benefit of TOE is 1/gamma, or only 25%.

     For RDMA (with embedded TOE) and the same example, the "overhead"
     (o) offloaded or eliminated is 50% (20%+30%).  Therefore in the
     RDMA case, gamma is 50/50 or 1, and the inverse gives the potential
     benefit of 1 (100%), a factor of two.

                        CPU overhead reduction factor

                   No Offload   TCP Offload   RDMA Offload
                   -----------+-------------+-------------
                      1.00x        1.25x         2.00x

     The analysis in the paper shows that RDMA could improve throughput
     by the same factor of two, even when the host is (just) powerful
     enough to drive the full network bandwidth without RDMA.  It can

Talpey and Juszczak        Expires August 2008                  [Page 9]
Internet-Draft         NFS RDMA Problem Statement          February 2008

     also be shown that the speedup may be higher if network bandwidth
     grows faster than Moore's Law, although the higher benefits will
     apply to a narrow range of applications.

4.2.  Savings from RDMA

     Performance measurements directly comparing an NFS over RDMA
     prototype with conventional network-based NFS processing are
     described in [CAL+03].  Comparisons of Read throughput and CPU
     overhead were performed on two types of Gigabit Ethernet adapters,
     one type being a conventional adapter, and another type with RDMA
     capability.  The prototype RDMA protocol performed all transfers
     via RDMA Read.  The NFS layer in the study was measured while
     performing read transfers, varying the transfer size and readahead
     depth across ranges used by typical NFS deployments.

     In these results, conventional network-based throughput was
     severely limited by the client's CPU being saturated at 100% for
     all transfers.  Read throughput reached no more than 60MBytes/s.

            I/O Type      Size    Read Throughput     CPU Utilization
            Conventional    2KB          20MB/s              100%
            Conventional   16KB          40MB/s              100%
            Conventional  256KB          60MB/s              100%

     However, over RDMA, throughput rose to the theoretical maximum
     throughput of the platform, while saturating the single-CPU system
     only at maximum throughput.

            I/O Type      Size    Read Throughput     CPU Utilization
            RDMA            2KB          10MB/s               45%
            RDMA           16KB          40MB/s               70%
            RDMA          256KB         100MB/s              100%

     The lower relative throughput of the RDMA prototype at the small
     blocksize may be attributable to the RDMA Read imposed by the
     prototype protocol, which reduced the operation rate since it
     introduces additional latency.  As well, it may reflect the
     relative increase of per-packet setup costs within the DMA portion
     of the transfer.

5.  Application of RDMA to NFS

     Efficient file protocols require efficient data positioning and
     movement.  The client system knows the client memory address where
     the application has data to be written or wants read data
     deposited.  The server system knows the server memory address where

Talpey and Juszczak        Expires August 2008                 [Page 10]
Internet-Draft         NFS RDMA Problem Statement          February 2008

     the local filesystem will accept write data or has data to be read.
     Neither peer however is aware of the others' data destination in
     the current NFS, RPC or XDR protocols.  Existing NFS
     implementations have struggled with the performance costs of data
     copies when using traditional Ethernet transports.

     With the onset of faster networks, the network I/O bottleneck will
     worsen.  Fortunately, new transports that support RDMA have
     emerged.  RDMA excels at bulk transfer efficiency; it is an
     efficient way to deliver direct data placement and remove a major
     part of the problem: data copies.  RDMA also addresses other
     overheads, e.g., underlying protocol offload, and offers separation
     of control information from data.

     The current NFS message layout provides the performance enhancing
     opportunity for an NFS over RDMA protocol that separates the
     control information from data chunks while meeting the alignment
     needs of both.  The data chunks can be copied "directly" between
     the client and server memory addresses above (with a single
     occurrence on each memory bus) while the control information can be
     passed "inline".  [RPCRDMA] describes such a protocol.

6.  Conclusions

     NFS version 4 [RFC3530] has been granted "Proposed Standard"
     status.  The NFSv4 protocol was developed along several design
     points, important among them: effective operation over wide- area
     networks, including the Internet itself;  strong security
     integrated into the protocol;  extensive cross-platform
     interoperability including integrated locking semantics compatible
     with multiple operating systems; and (this is key), protocol
     extension.

     NFS version 4 is an excellent base on which to add the needed
     performance enhancements and improved semantics described above.
     The minor versioning support defined in NFS version 4 was designed
     to support protocol improvements without disruption to the
     installed base.  Evolutionary improvement of the protocol via minor
     versioning is a conservative and cautious approach to current and
     future problems and shortcomings.

     Many arguments can be made as to the efficacy of the file
     abstraction in meeting the future needs of enterprise data service
     and the Internet.  Fine grained Quality of Service (QoS) policies
     (e.g., data delivery, retention, availability, security, ...) are
     high among them.

Talpey and Juszczak        Expires August 2008                 [Page 11]
Internet-Draft         NFS RDMA Problem Statement          February 2008

     It is vital that the NFS protocol continue to provide these
     benefits to a wide range of applications, without its usefulness
     being compromised by concerns about performance and semantic
     inadequacies.  This can reasonably be addressed in the existing NFS
     protocol framework.  A cautious evolutionary improvement of
     performance and semantics allows building on the value already
     present in the NFS protocol, while addressing new requirements that
     have arisen from the application of networking technology.

7.  Security Considerations

     The NFS protocol, in conjunction with its layering on RPC, provides
     a rich and widely interoperable security model to applications and
     systems.  Any layering of NFS over RDMA transports must address the
     NFS security requirements, and additionally must ensure that no new
     vulnerabilities are introduced.  For RDMA, the integrity, and any
     privacy, of the data stream are of particular importance.

     The core goals of an NFS-to-RDMA binding are to reduce overhead and
     to enable high performance.  To support these goals while
     maintaining required NFS security protection presents a special
     challenge.  Historically, the provision of integrity and privacy
     have been implemented within the RPC layer, and their operation
     requires local processing of messages exchanged with the RPC peer.
     This procesing imposes memory and processing overhead on a per-
     message basis, exactly the overhead that RDMA is designed to avoid.

     Therefore, it is a requirement that the RDMA transport binding
     provide a means to delegate the integrity and privacy processing to
     the RDMA hardware, in order to maintain the high level of
     performance desired from the approach, while simultaneously
     providing the existing highest levels of security required by the
     NFS protocol.  This in turn requires a means by which the RPC layer
     may invoke these services from the RDMA provider, and for the NFS
     layer to negotiate their use end-to-end.

     The "Channel Binding" concept [RFC5056] provides a means by which
     the RPC and NFS layers may delegate their session protection to the
     lower RDMA layers.  An extension to the RPCSEC_GSS protocol
     [RPCSECGSSV2] may then be specified to negotiate the use of these
     bindings, and to establish the shared secrets necessary to protect
     the sessions.

     The protocol described in [RPCRDMA] specifies the use of these
     mechanisms, and they are required to implement the protocol.

     An additional consideration is protection of the integrity and
     privacy of local memory by the RDMA transport itself.  The use of

Talpey and Juszczak        Expires August 2008                 [Page 12]
Internet-Draft         NFS RDMA Problem Statement          February 2008

     RDMA by NFS must not introduce any vulnerabilities to system memory
     contents, or to memory owned by user processes.  These protections
     are provided by the RDMA layer specifications, and specifically
     their security models.  It is required that any RDMA provider used
     for NFS transport be conformant to the requirements of [RFC5042] in
     order to satisfy these protections.

8.  IANA Considerations

     This document has no IANA considerations.

9.  Acknowledgements

     The authors wish to thank Jeff Chase who provided many useful
     suggestions.

10.  Normative References

     [RFC3530]
          S. Shepler, et al., "NFS Version 4 Protocol", Standards Track
          RFC

     [RFC1831bis]
          R. Thurlow, Ed., "RPC: Remote Procedure Call Protocol
          Specification Version 2", Standards Track RFC

     [RFC4506]
          M. Eisler, Ed. "XDR: External Data Representation Standard",
          Standards Track RFC

     [RFC1813]
          B. Callaghan, B. Pawlowski, P. Staubach, "NFS Version 3
          Protocol Specification", Informational RFC

     [RPCSECGSSV2]
          M. Eisler, "RPCSEC_GSS Version 2", Internet Draft Work In
          Progress, draft-ietf-nfsv4-rpcsec-gss-v2

     [RFC5056]
          N. Williams, "On the Use of Channel Bindings to Secure
          Channels", Standards Track RFC

     [RFC5042]
          J. Pinkerton, E. Deleganes, "Direct Data Placement Protocol
          (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security"
          Standards Track RFC

Talpey and Juszczak        Expires August 2008                 [Page 13]
Internet-Draft         NFS RDMA Problem Statement          February 2008

11.  Informative References

     [BRU99]
          J. Brustoloni, "Interoperation of copy avoidance in network
          and file I/O", in Proc. INFOCOM '99, pages 534-542, New York,
          NY, Mar. 1999., IEEE.  Also available from
          http://www.cs.pitt.edu/~jcb/publs.html

     [CAL+03]
          B. Callaghan, T. Lingutla-Raj, A.  Chiu, P. Staubach, O. Asad,
          "NFS over RDMA", in Proceedings of ACM SIGCOMM Summer 2003
          NICELI Workshop.

     [CHA+01]
          J. S. Chase, A. J. Gallatin, K. G. Yocum, "Endsystem
          optimizations for high-speed TCP", IEEE Communications,
          39(4):68-74, April 2001.

     [CHA+99]
          J. S. Chase, D. C. Anderson, A. J. Gallatin, A. R. Lebeck, K.
          G. Yocum, "Network I/O with Trapeze", in 1999 Hot
          Interconnects Symposium, August 1999.

     [CHU96]
          H.K. Chu, "Zero-copy TCP in Solaris", Proc. of the USENIX 1996
          Annual Technical Conference, San Diego, CA, January 1996

     [DCK+03]
          M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck, T.
          Talpey, M. Wittle, "The Direct Access File System", in
          Proceedings of 2nd USENIX Conference on File and Storage
          Technologies (FAST '03), San Francisco, CA, March 31 - April
          2, 2003

     [FJDAFS]
          Fujitsu Prime Software Technologies, "Meet the DAFS
          Performance with DAFS/VI Kernel Implementation using cLAN",
          available from
          http://www.pst.fujitsu.com/english/dafsdemo/index.html, 2001.

     [FJNFS]
          Fujitsu Prime Software Technologies, "An Adaptation of VIA to
          NFS on Linux", available from
          http://www.pst.fujitsu.com/english/nfs/index.html, 2000.

     [GAL+99]
          A. Gallatin, J. Chase, K. Yocum, "Trapeze/IP: TCP/IP at Near-
          Gigabit Speeds", 1999 USENIX Technical Conference (Freenix

Talpey and Juszczak        Expires August 2008                 [Page 14]
Internet-Draft         NFS RDMA Problem Statement          February 2008

          Track), June 1999.

     [KM02]
          K. Magoutis, "Design and Implementation of a Direct Access
          File System (DAFS) Kernel Server for FreeBSD", in Proceedings
          of USENIX BSDCon 2002 Conference, San Francisco, CA, February
          11-14, 2002.

     [MAF+02]
          K. Magoutis, S. Addetia, A. Fedorova, M. Seltzer, J. Chase, D.
          Gallatin, R. Kisley, R. Wickremesinghe, E. Gabber, "Structure
          and Performance of the Direct Access File System (DAFS)", in
          Proceedings of 2002 USENIX Annual Technical Conference,
          Monterey, CA, June 9-14, 2002.

     [MOG03]
          J. Mogul, "TCP offload is a dumb idea whose time has come",
          9th Workshop on Hot Topics in Operating Systems (HotOS IX),
          Lihue, HI, May 2003. USENIX.

     [NFSv4.1]
          S. Shepler, ed., "NFSv4 Minor Version 1" Internet Draft work-
          in-progress, draft-ietf-nfsv4-minorversion1

     [PAI+00]
          V. S. Pai, P. Druschel, W. Zwaenepoel, "IO-Lite: a unified I/O
          buffering and caching system", ACM Trans. Computer Systems,
          18(1):37-66, Feb. 2000.

     [RDDP]
          RDDP Working Group charter,
          http://www.ietf.org/html.charters/rddp-charter.html

     [RFC4297]
          A. Romanow, J. Mogul, T. Talpey, S. Bailey, "Remote Direct
          Memory Access (RDMA) over IP Problem Statement", Informational
          RFC

     [RFC1094]
          Sun Microsystems, "NFS: Network File System Protocol
          Specification"

     [RPCRDMA]
          T. Talpey, B. Callaghan, "RDMA Transport for ONC RPC",
          Internet Draft Work in Progress, draft-ietf-nfsv4-rpcrdma

     [SHI+03]
          P. Shivam, J. Chase, "On the Elusive Benefits of Protocol

Talpey and Juszczak        Expires August 2008                 [Page 15]
Internet-Draft         NFS RDMA Problem Statement          February 2008

          Offload", Proceedings of ACM SIGCOMM Summer 2003 NICELI
          Workshop, also available from
          http://issg.cs.duke.edu/publications/niceli03.pdf

     [SKE+01]
          K.-A. Skevik, T. Plagemann, V. Goebel, P. Halvorsen,
          "Evaluation of a Zero-Copy Protocol Implementation", in
          Proceedings of the 27th Euromicro Conference - Multimedia and
          Telecommunications Track (MTT'2001), Warsaw, Poland, September
          2001.

Authors' Addresses

     Tom Talpey
     Network Appliance, Inc.
     1601 Trapelo Road, #16
     Waltham, MA 02451 USA

     Phone: +1 781 768 5329
     Email: thomas.talpey@netapp.com

     Chet Juszczak
     Chet's Boathouse Co.
     P.O. Box 1467
     Merrimack, NH 03054

     Email: chetnh@earthlink.net

Intellectual Property and Copyright Statements

Full Copyright Statement

     Copyright (C) The IETF Trust (2008).
     This document is subject to the rights, licenses and restrictions
     contained in BCP 78, and except as set forth therein, the authors
     retain all their rights.

     This document and the information contained herein are provided on
     an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
     REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
     IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
     WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
     WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
     ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
     FOR A PARTICULAR PURPOSE.

Talpey and Juszczak        Expires August 2008                 [Page 16]
Internet-Draft         NFS RDMA Problem Statement          February 2008

Intellectual Property
     The IETF takes no position regarding the validity or scope of any
     Intellectual Property Rights or other rights that might be claimed
     to pertain to the implementation or use of the technology described
     in this document or the extent to which any license under such
     rights might or might not be available; nor does it represent that
     it has made any independent effort to identify any such rights.
     Information on the procedures with respect to rights in RFC
     documents can be found in BCP 78 and BCP 79.

     Copies of IPR disclosures made to the IETF Secretariat and any
     assurances of licenses to be made available, or the result of an
     attempt made to obtain a general license or permission for the use
     of such proprietary rights by implementers or users of this
     specification can be obtained from the IETF on-line IPR repository
     at http://www.ietf.org/ipr.

     The IETF invites any interested party to bring to its attention any
     copyrights, patents or patent applications, or other proprietary
     rights that may cover technology that may be required to implement
     this standard.  Please address the information to the IETF at ietf-
     ipr@ietf.org.

Acknowledgment
     Funding for the RFC Editor function is provided by the IETF
     Administrative Support Activity (IASA).

Talpey and Juszczak        Expires August 2008                 [Page 17]