Network Working Group                                            N. Zong
Internet-Draft                                       Huawei Technologies
Intended status: Informational                             July 15, 2013
Expires: January 16, 2014


 Problem Statement for Reliable Virtualized Network Function (VNF) Pool
                draft-zong-vnfpool-problem-statement-00

Abstract

   Network Function Virtualization (NFV), conceptualized by the European
   Telecommunications Standards Institute (ETSI) NFV Industry
   Specification Group (ISG) , is gaining significant momentum within
   the the telecoms industry.  A key area currently being discussed
   within the ETSI NFV ISG is the reliability and availability of the
   network service implemented by a set of Virtualized Network Functions
   (VNFs) building on top of the virtualization infrastructure.

   This document mainly focus on problem statement and gap analysis.  It
   provides an overview of the problem space related to NFV reliability,
   it then briefly reviews an applicable architecture to scope potential
   solution space.  Finally it identifies the gaps of several existing
   approaches to NFV reliability for potential reuse and extension.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 16, 2014.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.





Zong                    Expires January 16, 2014                [Page 1]


Internet-Draft              Reliable VNF Pool                  July 2013


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Problem Statement . . . . . . . . . . . . . . . . . . . . . .   3
     3.1.  Failure on Hypervisor . . . . . . . . . . . . . . . . . .   4
     3.2.  Reliable Data Connection  . . . . . . . . . . . . . . . .   4
     3.3.  Failure Separation  . . . . . . . . . . . . . . . . . . .   4
     3.4.  Reliability Class . . . . . . . . . . . . . . . . . . . .   5
   4.  Reliable Virtualized Network Function Pool  . . . . . . . . .   5
   5.  Gap Analysis and Related Works  . . . . . . . . . . . . . . .   7
     5.1.  Reliable Server Pool  . . . . . . . . . . . . . . . . . .   7
     5.2.  Multipath TCP . . . . . . . . . . . . . . . . . . . . . .   8
     5.3.  VNF Forwarding Graph  . . . . . . . . . . . . . . . . . .   9
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  10
   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  10
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  10
     9.2.  Informative References  . . . . . . . . . . . . . . . . .  10
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Background

   Network Function Virtualization (NFV) utilizes IT virtualization
   technology to consolidate application specific network equipment onto
   industry standard high volume servers and switches, which could be
   located within the Data Center (DC), network nodes and customer
   premises.  For a full overview from the principle operators involved
   with establishing NFV see [NFV-WP].

   The European Telecommunications Standards Institute (ETSI)
   established an NFV Industry Specification Group (ISG) to study the
   operator use cases, high level requirements, and functional
   architecture of NFV.  A key group within the NFV ISG is the
   Reliability and Availability Working Group (RELAV WG) tasked with
   identifying and documenting the important aspects of NFV reliability



Zong                    Expires January 16, 2014                [Page 2]


Internet-Draft              Reliable VNF Pool                  July 2013


   and availability.  A fundamental objective is to manage Virtualized
   Network Function (VNF) building on top of the virtualization
   infrastructure to meet a various levels of reliability and
   availability requirements of the network services [NFV-REL].

   In this document, we firstly overviews the problem space related to
   NFV reliability.  We then present an applicable architecture of
   reliable VNF to scope potential solution space.  This document is not
   intended as a catchall for all reliability issues identified by the
   RELAV working group.  We focus on providing the base for reliable VNF
   instances and reliable transport connections between VNF instances,
   which will support a wide range of network services.  Finally, we
   identify the gaps of several existing approaches to NFV reliability
   for potential reuse and extension to existing mechanisms.

2.  Terminology

   Network Service: a service (e.g. telephony, messaging, Internet
   connectivity) that is composed of a set of network functions (e.g.
   firewall, load balancer).

   Virtualized Network Function (VNF): a VNF provides the same
   functional behaviour and interfaces as the equivalent network
   function, but is deployed as software instances (e.g. VMs) inside the
   virtualization infrastructure (e.g. hypervisor) [NFV-TERM].

   VNF Pool: a set of software instances (e.g. VMs) where each instance
   can be configured to implement a specific VNF, and connected with
   each other to support network service.

   Pool Manager (PM): an entity that interacts with VNF pool for pool
   management, and network service for service request and response.

3.  Problem Statement

   In the context of NFV, a network service essentially consists of a
   set of VNF with each VNF building on top of virtualization
   infrastructure to implement a specific VNF, and the data connections
   between VNFs, as shown below.

                  Network Service (e.g. VOIP, Web)
     +----------+           +----------+           +----------+
     |   VNF    | data conn |   VNF    | data conn |   VNF    |
     |          |-----------|          |- ... ... -|          |
   | +----------+           +----------+           +----------+  |
   |_____________________________________________________________|
                                 ^
                                 | Virtualization



Zong                    Expires January 16, 2014                [Page 3]


Internet-Draft              Reliable VNF Pool                  July 2013


     +----------------------------------------------------------+
     |     Virtualization Infrastructure (e.g.hypervisors)      |
     +----------------------------------------------------------+

      Figure 1: Network Service and Virtualized Network Function.


   Although there are existing approaches to reliable network service,
   such as active-standby server mode, there are several reliability
   issues unique to the NFV environment.  These issues are described in
   the following subsections.

3.1.  Failure on Hypervisor

   It is likely that more than one VNF instance will run on top of a
   single hypervisor.  Thus the failure of a single hypervisor will
   affect multiple VNF instances which will potentially result in a
   wider failure of network services.  Therefore, the need to ensure
   that the hypervisor does not become a single point of failure is
   critical.  Failover approaches to hypervisor layering, including
   hypervisor monitoring, fault detection, resource scaling, migration
   of VNF instances between host hypervisors also need consideration.

3.2.  Reliable Data Connection

   Establishing reliable VNF instances is important, as is reliable data
   connections between VNF instances for reliable communication within
   the network service.  It is possible to achieve a certain level of
   resiliency for data connection utilizing hypervisor layers, e.g.,
   virtual network interfaces, however there will always be potential
   factors such as congestion and link failures in the physical network
   layer that will affect the data connections between VNF instances.
   Therefore, failover mechanisms which include link status monitoring
   and redirection of traffic from the fault affected data path to
   another data path is required.

3.3.  Failure Separation

   It is important to limit the impact on the service performance in
   case of potential failure.  The failure of a single hypervisor should
   affect a minimum number of VNF instances, as well as a minimum number
   of concurrent network services, based on the fact that multiple VNF
   instances in one or more network services are likely to be hosted by
   the same hypervisor.  Therefore, an application may need to define
   some affinity rules regarding the deployment of VNF instances, e.g.,
   separate hypervisors, separate DC sites.





Zong                    Expires January 16, 2014                [Page 4]


Internet-Draft              Reliable VNF Pool                  July 2013


3.4.  Reliability Class

   It is well-known that network services will require different levels
   of reliability.  For example, real-time applications will required
   reliable VNF instances to negate disruption to delay-sensitive
   services.  However, different VNF instances may have varying
   reliability and performance due to some varying factors, these
   include physical resource state (e.g., server load, network
   bandwidth).  Therefore, a network service may need to request for
   certain class of reliability which would be provided once the
   admission control (policy) has established application or user rights
   for the requested reliability level.

4.  Reliable Virtualized Network Function Pool

   Some implementations of Cloud Management System (CMS) may have
   already provided APIs to the cloud applications to improve
   reliability, such as status notifications from different
   Infrastructure as a Service (IaaS) products via plug-ins.  However,
   certain degree of standardization is required in order to allow
   application, orchestrator and virtualization infrastructure to be
   developed independently from each other.

   Reliability and availability is a wide ranging problem space within
   the ETSI NFV ISG and wider NFV environment.  We don't target on
   solving all the reliability issues of NFV.  Instead, we focus on
   developing some tools to improve the reliability of NFV by using
   reliable VNF pool - including the set of reliable VNF instances and
   the reliable transport connections between VNF instances, which are
   applicable to a wide range of network services.

   We introduce an applicable architecture of reliable VNF pool.  Note
   that the main purpose of this section is to scope potential solution
   space.  The specification of the components and interfaces of the
   reliable VNF pool should be addressed in separate drafts [RSNDP].  A
   high level diagram of reliable VNF pool is illustrated as below.

                     +-----------------+
                     | Network Service |
                     +-----------------+
                             ^
                             | Service Request / Response
                             V
                      +--------------+
                      | Pool Manager |
                      +--------------+
                             ^
                             | Pool Management



Zong                    Expires January 16, 2014                [Page 5]


Internet-Draft              Reliable VNF Pool                  July 2013


                             | (e.g. VNF instance failover,
                             | transport conn failover)
                             V
      +--------------------------------------------------+
      | +----------+   +----------+         +----------+ |
      | |   VNF    |   |   VNF    |         |   VNF    | |
      | | Instance |   | Instance | ... ... | Instance | |
      | +----------+   +----------+         +----------+ |
      |                   VNF Pool                       |
      +--------------------------------------------------+

                  Figure 2: Reliable VNF Pool.


   There are two major parts in the reliable VNF pool.  The first part
   is the interface between Pool Manager (PM) and VNF pool, which may
   include the following functions:

      1) VNF instance registration to PM.  Characteristics of a VNF
      instance include instance ID, VNF type, host hypervisor ID.

      2) VNF instance status collection and fault management by PM.  The
      status of VNF instance may include load, liveness.  Fault
      management of VNF instance may include replacement of VNF
      instance, as well as re-establishment of the associated transport
      connections with other VNF instances.

      3) Hypervisor status collection and fault management by PM.  Fault
      management of hypervisor may include migration of VNF instances
      between host hypervisors, as well as re-establishment of the
      associated transport connections.

      4) Transport connection status collection and fault management by
      PM.  The status of transport connection may include congestion,
      link failure.  Fault management of transport connection may
      include redirection of data traffic from one path to another.

   The second part is the interface between PM and network service,
   which may include the following functions:

      1) Reliability class.  A network service may request to a PM the
      desired or required class of reliability of the service.  The PM
      will accordingly select VNF instances and establish transport
      connections to fulfill specific reliability requirements;
      otherwise a PM will notify the service regarding the resource
      availability.





Zong                    Expires January 16, 2014                [Page 6]


Internet-Draft              Reliable VNF Pool                  July 2013


      2) Failure separation.  A network service may request to a PM
      specific affinity rules regarding the deployment of VNF instances,
      e.g., separate hypervisors, separate DC sites.  The PM will select
      VNF instances based on the affinity criteria to fulfill the
      request; in the event that the request cannot be met the PM will
      notify the service the resource availability.

5.  Gap Analysis and Related Works

   This section presents some prior work and discusses the suitability
   of existing solutions.  Where applicable, the document will also
   highlight work which may be extended to meet the requirements and
   objectives of reliable VNF pools.

5.1.  Reliable Server Pool

   Reliable Server Pool (RSerPool) supports high availability and
   scalability of applications through the use of pools of servers
   [RFC5351].  The basic functions of RSerPool are:

      1) Server pool management including server registration, server
      fault management, load balancing, etc.;

      2) Receive requests and a way for the client to bind to a desired
      server.

   The main protocol developed by RSerPool is called Aggregate Server
   Access Protocol (ASAP), which is responsible for the abstraction of
   the transport layer protocols (e.g. TCP, SCTP), load balancing, fault
   management, as well as presentation to the applications via a unified
   primitive interface [RFC5352].  The architecture of RSerPool is shown
   as below.

                      +--------------+
                      |  Application |
                      +--------------+
                             ^
                             | Service Request / Response
                             V
                      +--------------+
                      |     PR       |
                      +--------------+
                             ^
                             | Pool Management
                             | (e.g. PE failover)
                             V
      +--------------------------------------------------+
      | +----------+   +----------+         +----------+ |



Zong                    Expires January 16, 2014                [Page 7]


Internet-Draft              Reliable VNF Pool                  July 2013


      | |    PE    |   |    PE    | ... ... |    PE    | |
      | +----------+   +----------+         +----------+ |
      | Server Pool                                      |
      +--------------------------------------------------+

                  Figure 3: Reliable Server Pool.


   The similarity and applicability of RSerPool to reliable VNF pool
   includes:

      1) Pool Elements (PEs) can be regarded as VNF instances;

      2) Pool Registrar (PR) in RSerPool has similar role with PM in
      reliable VNF pool in the perspective of PE registration, PE fault
      management, PE selection, etc.

   Nevertheless, there are some gaps for RSerPool such as:

      1) No reliability class and failure separation support;

      2) No transport layer reliability between any pair of VNF
      instances;

      3) No hypervisor layer fault management.

5.2.  Multipath TCP

   Multipath TCP (MPTCP) is a modified version of TCP that implements a
   multipath transport and achieves enhanced reliability of the data
   connection by pooling multiple paths within a transport connection,
   transparently to the application [RFC6182].  MPTCP is primarily
   concerned with utilizing multiple paths end-to-end, where one or both
   of the end hosts are multi-homed.  The following diagram illustrates
   a typical usage scenario for MPTCP [RFC6182].

      +------+           __________           +------+
      |      |A1 ______ (          ) ______ B1|      |
      | Host |--/      (            )      \--| Host |
      |      |        (   Internet   )        |      |
      |  A   |--\______(            )______/--|   B  |
      |      |A2        (__________)        B2|      |
      +------+                                +------+

                Figure 4: Scenario of MPTCP.


   The applicability of MPTCP to reliable VNF pool includes:



Zong                    Expires January 16, 2014                [Page 8]


Internet-Draft              Reliable VNF Pool                  July 2013


      1) Transport layer reliability based on multiple transport between
      VNF instances.

   There are some identified constraints to the implementation of MPTCP
   such as potential shared bottlenecks, interpose of middleware
   terminating TCP sessions which is common in the context of end-to-end
   virtualized network service.  Other gaps for MPTCP will be further
   studied.

5.3.  VNF Forwarding Graph

   VNF forwarding graph (a.k.a. service chain in a wider sense) defines
   the sequence of VNF instances per user session must traverse [NFVUC].
   An example of a VNF forwarding graph is where user packets traverse a
   sequence of following VNF instances:

      1) Intrusion Detection Device;

      2) Firewall;

      3) Network Address Translation (NAT);

      4) Server Load Balancer.

   Different network services have different VNF forwarding graphs based
   on specific user and therefore service logic.

   VNF forwarding graph and reliable VNF pool are independent but
   complementary with each other in the following aspects:

      1) VNF forwarding graph determines the sequential relation between
      VNF instances, while reliable VNF pool selects reliable VNF
      instances and reliable transport connections between VNF
      instances;

      2) Reliable VNF pool focuses on transport layer (e.g. SCTP, MPTCP)
      for reliable data connection, which is independent to packet
      forwarding layer (e.g. L2/L3).

6.  Security Considerations

   TBD.









Zong                    Expires January 16, 2014                [Page 9]


Internet-Draft              Reliable VNF Pool                  July 2013


7.  IANA Considerations

   This document has no actions for IANA.

8.  Acknowledgements

   The authors would like to than Daniel King from Lancaster University,
   UK for the valuable comments to this draft.

9.  References

9.1.  Normative References

   TBD.

9.2.  Informative References

   [NFV-WP] NFV Whitepaper: "Network Function Virtualization", issue 1,
   2012, http://portal.etsi.org/NFV/NFV_White_Paper.pdf.

   [NFV-REL] ETSI GS NFV REL 001: "Network Function Virtualization;
   Resiliency Requirements", Version 0.0.1, 2013.

   [NFV-TERM] ETSI GS NFV 003: "Terminology for Main Conceptional
   Entities in NFV", Version 0.0.4, 2013.

   [RSNDP] Q. Wu, "An Overview of Reliable Service Nodes discovery and
   provision Protocols", draft-wu-rsndp-overview-00, 2013.

   [RFC5351] P. Lei, L. Ong, M. Tuexen and T. Dreibholz, "An Overview of
   Reliable Server Pooling Protocols", RFC5351, September 2008.

   [RFC5352] R. Stewart, Q. Xie, M. Stillman and M. Tuexen, "Aggregate
   Server Access Protocol (ASAP)", RFC5352, September 2008.

   [RFC6182] A. Ford, C. Raiciu, M. Handley, S. Barre and J. Iyengar,
   "Architectural Guidelines for Multipath TCP Development", RFC6182,
   March 2011.

   [NFV-UC] ETSI GS NFV 001: "Network Function Virtualization; Use
   Cases", Version 0.0.2, 2013.

10.  References

Author's Address






Zong                    Expires January 16, 2014               [Page 10]


Internet-Draft              Reliable VNF Pool                  July 2013


   Ning Zong
   Huawei Technologies

   Email: zongning@huawei.com















































Zong                    Expires January 16, 2014               [Page 11]