Skip to main content

Parallel NFS (pNFS) Flexible File Layout
draft-ietf-nfsv4-flex-files-17

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft that was ultimately published as RFC 8435.
Authors Benny Halevy , Thomas Haynes
Last updated 2018-03-02 (Latest revision 2018-02-27)
RFC stream Internet Engineering Task Force (IETF)
Formats
Reviews
Additional resources Mailing list discussion
Stream WG state Submitted to IESG for Publication
Document shepherd Spencer Shepler
Shepherd write-up Show Last changed 2017-10-25
IESG IESG state Became RFC 8435 (Proposed Standard)
Consensus boilerplate Yes
Telechat date (None)
Needs a YES. Needs 9 more YES or NO OBJECTION positions to pass.
Responsible AD Spencer Dawkins
Send notices to Spencer Shepler <spencer.shepler@gmail.com>
IANA IANA review state IANA OK - Actions Needed
draft-ietf-nfsv4-flex-files-17
quot; access to a file it would not
   normally be allowed to report on.

9.2.3.  ff_iostats4

   <CODE BEGINS>

   /// struct ff_iostats4 {
   ///         offset4           ffis_offset;
   ///         length4           ffis_length;
   ///         stateid4          ffis_stateid;
   ///         io_info4          ffis_read;
   ///         io_info4          ffis_write;
   ///         deviceid4         ffis_deviceid;
   ///         ff_layoutupdate4  ffis_layoutupdate;
   /// };
   ///

   <CODE ENDS>

   Recall that [RFC7862] defines io_info4 as:

   <CODE BEGINS>

   struct io_info4 {
           uint64_t        ii_count;
           uint64_t        ii_bytes;
   };

Halevy & Haynes          Expires August 31, 2018               [Page 32]
Internet-Draft              Flex File Layout               February 2018

   <CODE ENDS>

   With pNFS, data transfers are performed directly between the pNFS
   client and the storage devices.  Therefore, the metadata server has
   no direct knowledge to the I/O operations being done and thus can not
   create on its own statistical information about client I/O to
   optimize data storage location.  ff_iostats4 MAY be used by the
   client to report I/O statistics back to the metadata server upon
   returning the layout.

   Since it is not feasible for the client to report every I/O that used
   the layout, the client MAY identify "hot" byte ranges for which to
   report I/O statistics.  The definition and/or configuration mechanism
   of what is considered "hot" and the size of the reported byte range
   is out of the scope of this document.  It is suggested for client
   implementation to provide reasonable default values and an optional
   run-time management interface to control these parameters.  For
   example, a client can define the default byte range resolution to be
   1 MB in size and the thresholds for reporting to be 1 MB/second or 10
   I/O operations per second.

   For each byte range, ffis_offset and ffis_length represent the
   starting offset of the range and the range length in bytes.
   ffis_read.ii_count, ffis_read.ii_bytes, ffis_write.ii_count, and
   ffis_write.ii_bytes represent, respectively, the number of contiguous
   read and write I/Os and the respective aggregate number of bytes
   transferred within the reported byte range.

   The combination of ffis_deviceid and ffl_addr uniquely identifies
   both the storage path and the network route to it.  Finally, the
   ffl_fhandle allows the metadata server to differentiate between
   multiple read-only copies of the file on the same storage device.

9.3.  ff_layoutreturn4

   <CODE BEGINS>

   /// struct ff_layoutreturn4 {
   ///         ff_ioerr4     fflr_ioerr_report<>;
   ///         ff_iostats4   fflr_iostats_report<>;
   /// };
   ///

   <CODE ENDS>

   When data file I/O operations fail, fflr_ioerr_report<> is used to
   report these errors to the metadata server as an array of elements of
   type ff_ioerr4.  Each element in the array represents an error that

Halevy & Haynes          Expires August 31, 2018               [Page 33]
Internet-Draft              Flex File Layout               February 2018

   occurred on the data file identified by ffie_errors.de_deviceid.  If
   no errors are to be reported, the size of the fflr_ioerr_report<>
   array is set to zero.  The client MAY also use fflr_iostats_report<>
   to report a list of I/O statistics as an array of elements of type
   ff_iostats4.  Each element in the array represents statistics for a
   particular byte range.  Byte ranges are not guaranteed to be disjoint
   and MAY repeat or intersect.

10.  Flexible Files Layout Type LAYOUTERROR

   If the client is using NFSv4.2 to communicate with the metadata
   server, then instead of waiting for a LAYOUTRETURN to send error
   information to the metadata server (see Section 9.1), it MAY use
   LAYOUTERROR (see Section 15.6 of [RFC7862]) to communicate that
   information.  For the flexible files layout type, this means that
   LAYOUTERROR4args is treated the same as ff_ioerr4.

11.  Flexible Files Layout Type LAYOUTSTATS

   If the client is using NFSv4.2 to communicate with the metadata
   server, then instead of waiting for a LAYOUTRETURN to send I/O
   statistics to the metadata server (see Section 9.2), it MAY use
   LAYOUTSTATS (see Section 15.7 of [RFC7862]) to communicate that
   information.  For the flexible files layout type, this means that
   LAYOUTSTATS4args.lsa_layoutupdate is overloaded with the same
   contents as in ffis_layoutupdate.

12.  Flexible File Layout Type Creation Hint

   The layouthint4 type is defined in the [RFC5661] as follows:

   <CODE BEGINS>

   struct layouthint4 {
       layouttype4        loh_type;
       opaque             loh_body<>;
   };

   <CODE ENDS>

   The layouthint4 structure is used by the client to pass a hint about
   the type of layout it would like created for a particular file.  If
   the loh_type layout type is LAYOUT4_FLEX_FILES, then the loh_body
   opaque value is defined by the ff_layouthint4 type.

Halevy & Haynes          Expires August 31, 2018               [Page 34]
Internet-Draft              Flex File Layout               February 2018

12.1.  ff_layouthint4

   <CODE BEGINS>

   /// union ff_mirrors_hint switch (bool ffmc_valid) {
   ///     case TRUE:
   ///         uint32_t    ffmc_mirrors;
   ///     case FALSE:
   ///         void;
   /// };
   ///

   /// struct ff_layouthint4 {
   ///     ff_mirrors_hint    fflh_mirrors_hint;
   /// };
   ///

   <CODE ENDS>

   This type conveys hints for the desired data map.  All parameters are
   optional so the client can give values for only the parameter it
   cares about.

13.  Recalling a Layout

   While Section 12.5.5 of [RFC5661] discusses layout type independent
   reasons for recalling a layout, the flexible file layout type
   metadata server should recall outstanding layouts in the following
   cases:

   o  When the file's security policy changes, i.e., Access Control
      Lists (ACLs) or permission mode bits are set.

   o  When the file's layout changes, rendering outstanding layouts
      invalid.

   o  When existing layouts are inconsistent with the need to enforce
      locking constraints.

   o  When existing layouts are inconsistent with the requirements
      regarding resilvering as described in Section 8.3.

13.1.  CB_RECALL_ANY

   The metadata server can use the CB_RECALL_ANY callback operation to
   notify the client to return some or all of its layouts.  Section 22.3
   of [RFC5661] defines the allowed types of the "NFSv4 Recallable
   Object Types Registry".

Halevy & Haynes          Expires August 31, 2018               [Page 35]
Internet-Draft              Flex File Layout               February 2018

   <CODE BEGINS>

   /// const RCA4_TYPE_MASK_FF_LAYOUT_MIN     = 16;
   /// const RCA4_TYPE_MASK_FF_LAYOUT_MAX     = 17;
   [[RFC Editor: please insert assigned constants]]
   ///

   struct  CB_RECALL_ANY4args      {
       uint32_t        craa_layouts_to_keep;
       bitmap4         craa_type_mask;
   };

   <CODE ENDS>

   Typically, CB_RECALL_ANY will be used to recall client state when the
   server needs to reclaim resources.  The craa_type_mask bitmap
   specifies the type of resources that are recalled and the
   craa_layouts_to_keep value specifies how many of the recalled
   flexible file layouts the client is allowed to keep.  The flexible
   file layout type mask flags are defined as follows:

   <CODE BEGINS>

   /// enum ff_cb_recall_any_mask {
   ///     FF_RCA4_TYPE_MASK_READ = -2,
   ///     FF_RCA4_TYPE_MASK_RW   = -1
   [[RFC Editor: please insert assigned constants]]
   /// };
   ///

   <CODE ENDS>

   They represent the iomode of the recalled layouts.  In response, the
   client SHOULD return layouts of the recalled iomode that it needs the
   least, keeping at most craa_layouts_to_keep Flexible File Layouts.

   The PNFS_FF_RCA4_TYPE_MASK_READ flag notifies the client to return
   layouts of iomode LAYOUTIOMODE4_READ.  Similarly, the
   PNFS_FF_RCA4_TYPE_MASK_RW flag notifies the client to return layouts
   of iomode LAYOUTIOMODE4_RW.  When both mask flags are set, the client
   is notified to return layouts of either iomode.

14.  Client Fencing

   In cases where clients are uncommunicative and their lease has
   expired or when clients fail to return recalled layouts within a
   lease period, the server MAY revoke client layouts and reassign these
   resources to other clients (see Section 12.5.5 in [RFC5661]).  To

Halevy & Haynes          Expires August 31, 2018               [Page 36]
Internet-Draft              Flex File Layout               February 2018

   avoid data corruption, the metadata server MUST fence off the revoked
   clients from the respective data files as described in Section 2.2.

15.  Security Considerations

   The pNFS feature partitions the NFSv4.1+ file system protocol into
   two parts, the control path and the data path (storage protocol).
   The control path contains all the new operations described by this
   feature; all existing NFSv4 security mechanisms and features apply to
   the control path (see Sections 1.7.1 and 2.2.1 of [RFC5661]).  The
   combination of components in a pNFS system is required to preserve
   the security properties of NFSv4.1+ with respect to an entity
   accessing data via a client, including security countermeasures to
   defend against threats that NFSv4.1+ provides defenses for in
   environments where these threats are considered significant.

   The metadata server is primarily responsible for securing the data
   path.  It has to authenticate the client access and provide
   appropriate credentials to the client to access data files on the
   storage device.  Finally, it is responsible for revoking access for a
   client to the storage device.

   The metadata server enforces the file access-control policy at
   LAYOUTGET time.  The client should use RPC authorization credentials
   for getting the layout for the requested iomode (READ or RW) and the
   server verifies the permissions and ACL for these credentials,
   possibly returning NFS4ERR_ACCESS if the client is not allowed the
   requested iomode.  If the LAYOUTGET operation succeeds the client
   receives, as part of the layout, a set of credentials allowing it I/O
   access to the specified data files corresponding to the requested
   iomode.  When the client acts on I/O operations on behalf of its
   local users, it MUST authenticate and authorize the user by issuing
   respective OPEN and ACCESS calls to the metadata server, similar to
   having NFSv4 data delegations.

   The combination of file handle, synthetic uid, and gid in the layout
   are the way that the metadata server enforces access control to the
   data server.  The directory namespace on the storage device SHOULD
   only be accessible to the metadata server and not the clients.  In
   that case, the client only has access to file handles of file objects
   and not directory objects.  Thus, given a file handle in a layout, it
   is not possible to guess the parent directory file handle.  Further,
   as the data file permissions only allow the given synthetic uid read/
   write permission and the given synthetic gid read permission, knowing
   the synthetic ids of one file does not necessarily allow access to
   any other data file on the storage device.

Halevy & Haynes          Expires August 31, 2018               [Page 37]
Internet-Draft              Flex File Layout               February 2018

   The metadata server can also deny access at any time by fencing the
   data file, which means changing the synthetic ids.  In turn, that
   forces the client to return its current layout and get a new layout
   if it wants to continue IO to the data file.

   If the configuration of the storage device is such that clients can
   access the directory namespace, then the access control degrades to
   that of a typical NFS server with exports with a security flavor of
   AUTH_SYS.  Any client which is allowed access can forge credentials
   to access any data file.  The caveat is that the rogue client might
   have no knowledge of the data file's type or position in the metadata
   directory namespace.

   If access is allowed, the client uses the corresponding (READ or RW)
   credentials to perform the I/O operations at the data file's storage
   devices.  When the metadata server receives a request to change a
   file's permissions or ACL, it SHOULD recall all layouts for that file
   and then MUST fence off any clients still holding outstanding layouts
   for the respective files by implicitly invalidating the previously
   distributed credential on all data file comprising the file in
   question.  It is REQUIRED that this be done before committing to the
   new permissions and/or ACL.  By requesting new layouts, the clients
   will reauthorize access against the modified access control metadata.
   Recalling the layouts in this case is intended to prevent clients
   from getting an error on I/Os done after the client was fenced off.

15.1.  RPCSEC_GSS and Security Services

   Because of the special use of principals within the loose coupling
   model, the issues are different depending on the coupling model.

15.1.1.  Loosely Coupled

   RPCSEC_GSS version 3 (RPCSEC_GSSv3) [RFC7861] contains facilities
   that would allow it to be used to authorize the client to the storage
   device on behalf of the metadata server.  Doing so would require that
   each of the metadata server, storage device, and client would need to
   implement RPCSEC_GSSv3 using an RPC-application-defined structured
   privilege assertion in a manner described in Section 4.9.1 of
   [RFC7862].  The specifics necessary to do so are not described in
   this document.  This is principally because any such specification
   would require extensive implementation work on a wide range of
   storage devices, which would be unlikely to result in a widely usable
   specification for a considerable time.

   As a result, the layout type described in this document will not
   provide support for use of RPCSEC_GSS together with the loosely
   coupled model.  However, future layout types could be specified which

Halevy & Haynes          Expires August 31, 2018               [Page 38]
Internet-Draft              Flex File Layout               February 2018

   would allow such support, either through the use of RPCSEC_GSSv3, or
   in other ways.

15.1.2.  Tightly Coupled

   With tight coupling, the principal used to access the metadata file
   is exactly the same as used to access the data file.  The storage
   device can use the control protocol to validate any RPC credentials.
   As a result there are no security issues related to using RPCSEC_GSS
   with a tightly coupled system.  For example, if Kerberos V5 GSS-API
   [RFC4121] is used as the security mechanism, then the storage device
   could use a control protocol to validate the RPC credentials to the
   metadata server.

16.  IANA Considerations

   [RFC5661] introduced a registry for "pNFS Layout Types Registry" and
   as such, new layout type numbers need to be assigned by IANA.  This
   document defines the protocol associated with the existing layout
   type number, LAYOUT4_FLEX_FILES (see Table 1).

     +--------------------+-------+----------+-----+----------------+
     | Layout Type Name   | Value | RFC      | How | Minor Versions |
     +--------------------+-------+----------+-----+----------------+
     | LAYOUT4_FLEX_FILES | 0x4   | RFCTBD10 | L   | 1              |
     +--------------------+-------+----------+-----+----------------+

                     Table 1: Layout Type Assignments

   [RFC5661] also introduced a registry called "NFSv4 Recallable Object
   Types Registry".  This document defines new recallable objects for
   RCA4_TYPE_MASK_FF_LAYOUT_MIN and RCA4_TYPE_MASK_FF_LAYOUT_MAX (see
   Table 2).

   +------------------------------+-------+----------+-----+-----------+
   | Recallable Object Type Name  | Value | RFC      | How | Minor     |
   |                              |       |          |     | Versions  |
   +------------------------------+-------+----------+-----+-----------+
   | RCA4_TYPE_MASK_FF_LAYOUT_MIN | 16    | RFCTBD10 | L   | 1         |
   | RCA4_TYPE_MASK_FF_LAYOUT_MAX | 17    | RFCTBD10 | L   | 1         |
   +------------------------------+-------+----------+-----+-----------+

                Table 2: Recallable Object Type Assignments

   Note, [RFC5661] should have also defined (see Table 3):

Halevy & Haynes          Expires August 31, 2018               [Page 39]
Internet-Draft              Flex File Layout               February 2018

   +-------------------------------+------+-----------+-----+----------+
   | Recallable Object Type Name   | Valu | RFC       | How | Minor    |
   |                               | e    |           |     | Versions |
   +-------------------------------+------+-----------+-----+----------+
   | RCA4_TYPE_MASK_OTHER_LAYOUT_M | 12   | [RFC5661] | L   | 1        |
   | IN                            |      |           |     |          |
   | RCA4_TYPE_MASK_OTHER_LAYOUT_M | 15   | [RFC5661] | L   | 1        |
   | AX                            |      |           |     |          |
   +-------------------------------+------+-----------+-----+----------+

                Table 3: Recallable Object Type Assignments

17.  References

17.1.  Normative References

   [LEGAL]    IETF Trust, "Legal Provisions Relating to IETF Documents",
              November 2008, <http://trustee.ietf.org/docs/
              IETF-Trust-License-Policy.pdf>.

   [RFC1813]  IETF, "NFS Version 3 Protocol Specification", RFC 1813,
              June 1995.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC4121]  Zhu, L., Jaganathan, K., and S. Hartman, "The Kerberos
              Version 5 Generic Security Service Application Program
              Interface (GSS-API) Mechanism Version 2", RFC 4121, July
              2005.

   [RFC4506]  Eisler, M., "XDR: External Data Representation Standard",
              STD 67, RFC 4506, May 2006.

   [RFC5531]  Thurlow, R., "RPC: Remote Procedure Call Protocol
              Specification Version 2", RFC 5531, May 2009.

   [RFC5661]  Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
              "Network File System (NFS) Version 4 Minor Version 1
              Protocol", RFC 5661, January 2010.

   [RFC5662]  Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
              "Network File System (NFS) Version 4 Minor Version 1
              External Data Representation Standard (XDR) Description",
              RFC 5662, January 2010.

   [RFC7530]  Haynes, T. and D. Noveck, "Network File System (NFS)
              version 4 Protocol", RFC 7530, March 2015.

Halevy & Haynes          Expires August 31, 2018               [Page 40]
Internet-Draft              Flex File Layout               February 2018

   [RFC7861]  Adamson, W. and N. Williams, "Remote Procedure Call (RPC)
              Security Version 3", November 2016.

   [RFC7862]  Haynes, T., "NFS Version 4 Minor Version 2", RFC 7862,
              November 2016.

   [pNFSLayouts]
              Haynes, T., "Requirements for pNFS Layout Types", draft-
              ietf-nfsv4-layout-types-07 (Work In Progress), August
              2017.

17.2.  Informative References

   [RFC4519]  Sciberras, A., Ed., "Lightweight Directory Access Protocol
              (LDAP): Schema for User Applications", RFC 4519, DOI
              10.17487/RFC4519, June 2006,
              <http://www.rfc-editor.org/info/rfc4519>.

Appendix A.  Acknowledgments

   Those who provided miscellaneous comments to early drafts of this
   document include: Matt W. Benjamin, Adam Emerson, J. Bruce Fields,
   and Lev Solomonov.

   Those who provided miscellaneous comments to the final drafts of this
   document include: Anand Ganesh, Robert Wipfel, Gobikrishnan
   Sundharraj, Trond Myklebust, Rick Macklem, and Jim Sermersheim.

   Idan Kedar caught a nasty bug in the interaction of client side
   mirroring and the minor versioning of devices.

   Dave Noveck provided comprehensive reviews of the document during the
   working group last calls.  He also rewrote Section 2.3.

   Olga Kornievskaia made a convincing case against the use of a
   credential versus a principal in the fencing approach.  Andy Adamson
   and Benjamin Kaduk helped to sharpen the focus.

   Benjamin Kaduk and Olga Kornievskaia also helped provide concrete
   scenarios for loosely coupled security mechanisms.  And in the end,
   Olga proved that as defined, the loosely coupled model would not work
   with RPCSEC_GSS.

   Tigran Mkrtchyan provided the use case for not allowing the client to
   proxy the I/O through the data server.

   Rick Macklem provided the use case for only writing to a single
   mirror.

Halevy & Haynes          Expires August 31, 2018               [Page 41]
Internet-Draft              Flex File Layout               February 2018

Appendix B.  RFC Editor Notes

   [RFC Editor: please remove this section prior to publishing this
   document as an RFC]

   [RFC Editor: prior to publishing this document as an RFC, please
   replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the
   RFC number of this document]

Authors' Addresses

   Benny Halevy

   Email: bhalevy@gmail.com

   Thomas Haynes
   Primary Data, Inc.
   4300 El Camino Real Ste 100
   Los Altos, CA  94022
   USA

   Email: loghyr@gmail.com

Halevy & Haynes          Expires August 31, 2018               [Page 42]