Network Working Group N. Zong
Internet-Draft Huawei Technologies
Intended status: Informational July 15, 2013
Expires: January 16, 2014
Problem Statement for Reliable Virtualized Network Function (VNF) Pool
draft-zong-vnfpool-problem-statement-00
Abstract
Network Function Virtualization (NFV), conceptualized by the European
Telecommunications Standards Institute (ETSI) NFV Industry
Specification Group (ISG) , is gaining significant momentum within
the the telecoms industry. A key area currently being discussed
within the ETSI NFV ISG is the reliability and availability of the
network service implemented by a set of Virtualized Network Functions
(VNFs) building on top of the virtualization infrastructure.
This document mainly focus on problem statement and gap analysis. It
provides an overview of the problem space related to NFV reliability,
it then briefly reviews an applicable architecture to scope potential
solution space. Finally it identifies the gaps of several existing
approaches to NFV reliability for potential reuse and extension.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 16, 2014.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
Zong Expires January 16, 2014 [Page 1]
Internet-Draft Reliable VNF Pool July 2013
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 3
3.1. Failure on Hypervisor . . . . . . . . . . . . . . . . . . 4
3.2. Reliable Data Connection . . . . . . . . . . . . . . . . 4
3.3. Failure Separation . . . . . . . . . . . . . . . . . . . 4
3.4. Reliability Class . . . . . . . . . . . . . . . . . . . . 5
4. Reliable Virtualized Network Function Pool . . . . . . . . . 5
5. Gap Analysis and Related Works . . . . . . . . . . . . . . . 7
5.1. Reliable Server Pool . . . . . . . . . . . . . . . . . . 7
5.2. Multipath TCP . . . . . . . . . . . . . . . . . . . . . . 8
5.3. VNF Forwarding Graph . . . . . . . . . . . . . . . . . . 9
6. Security Considerations . . . . . . . . . . . . . . . . . . . 9
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
9.1. Normative References . . . . . . . . . . . . . . . . . . 10
9.2. Informative References . . . . . . . . . . . . . . . . . 10
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10
1. Background
Network Function Virtualization (NFV) utilizes IT virtualization
technology to consolidate application specific network equipment onto
industry standard high volume servers and switches, which could be
located within the Data Center (DC), network nodes and customer
premises. For a full overview from the principle operators involved
with establishing NFV see [NFV-WP].
The European Telecommunications Standards Institute (ETSI)
established an NFV Industry Specification Group (ISG) to study the
operator use cases, high level requirements, and functional
architecture of NFV. A key group within the NFV ISG is the
Reliability and Availability Working Group (RELAV WG) tasked with
identifying and documenting the important aspects of NFV reliability
Zong Expires January 16, 2014 [Page 2]
Internet-Draft Reliable VNF Pool July 2013
and availability. A fundamental objective is to manage Virtualized
Network Function (VNF) building on top of the virtualization
infrastructure to meet a various levels of reliability and
availability requirements of the network services [NFV-REL].
In this document, we firstly overviews the problem space related to
NFV reliability. We then present an applicable architecture of
reliable VNF to scope potential solution space. This document is not
intended as a catchall for all reliability issues identified by the
RELAV working group. We focus on providing the base for reliable VNF
instances and reliable transport connections between VNF instances,
which will support a wide range of network services. Finally, we
identify the gaps of several existing approaches to NFV reliability
for potential reuse and extension to existing mechanisms.
2. Terminology
Network Service: a service (e.g. telephony, messaging, Internet
connectivity) that is composed of a set of network functions (e.g.
firewall, load balancer).
Virtualized Network Function (VNF): a VNF provides the same
functional behaviour and interfaces as the equivalent network
function, but is deployed as software instances (e.g. VMs) inside the
virtualization infrastructure (e.g. hypervisor) [NFV-TERM].
VNF Pool: a set of software instances (e.g. VMs) where each instance
can be configured to implement a specific VNF, and connected with
each other to support network service.
Pool Manager (PM): an entity that interacts with VNF pool for pool
management, and network service for service request and response.
3. Problem Statement
In the context of NFV, a network service essentially consists of a
set of VNF with each VNF building on top of virtualization
infrastructure to implement a specific VNF, and the data connections
between VNFs, as shown below.
Network Service (e.g. VOIP, Web)
+----------+ +----------+ +----------+
| VNF | data conn | VNF | data conn | VNF |
| |-----------| |- ... ... -| |
| +----------+ +----------+ +----------+ |
|_____________________________________________________________|
^
| Virtualization
Zong Expires January 16, 2014 [Page 3]
Internet-Draft Reliable VNF Pool July 2013
+----------------------------------------------------------+
| Virtualization Infrastructure (e.g.hypervisors) |
+----------------------------------------------------------+
Figure 1: Network Service and Virtualized Network Function.
Although there are existing approaches to reliable network service,
such as active-standby server mode, there are several reliability
issues unique to the NFV environment. These issues are described in
the following subsections.
3.1. Failure on Hypervisor
It is likely that more than one VNF instance will run on top of a
single hypervisor. Thus the failure of a single hypervisor will
affect multiple VNF instances which will potentially result in a
wider failure of network services. Therefore, the need to ensure
that the hypervisor does not become a single point of failure is
critical. Failover approaches to hypervisor layering, including
hypervisor monitoring, fault detection, resource scaling, migration
of VNF instances between host hypervisors also need consideration.
3.2. Reliable Data Connection
Establishing reliable VNF instances is important, as is reliable data
connections between VNF instances for reliable communication within
the network service. It is possible to achieve a certain level of
resiliency for data connection utilizing hypervisor layers, e.g.,
virtual network interfaces, however there will always be potential
factors such as congestion and link failures in the physical network
layer that will affect the data connections between VNF instances.
Therefore, failover mechanisms which include link status monitoring
and redirection of traffic from the fault affected data path to
another data path is required.
3.3. Failure Separation
It is important to limit the impact on the service performance in
case of potential failure. The failure of a single hypervisor should
affect a minimum number of VNF instances, as well as a minimum number
of concurrent network services, based on the fact that multiple VNF
instances in one or more network services are likely to be hosted by
the same hypervisor. Therefore, an application may need to define
some affinity rules regarding the deployment of VNF instances, e.g.,
separate hypervisors, separate DC sites.
Zong Expires January 16, 2014 [Page 4]
Internet-Draft Reliable VNF Pool July 2013
3.4. Reliability Class
It is well-known that network services will require different levels
of reliability. For example, real-time applications will required
reliable VNF instances to negate disruption to delay-sensitive
services. However, different VNF instances may have varying
reliability and performance due to some varying factors, these
include physical resource state (e.g., server load, network
bandwidth). Therefore, a network service may need to request for
certain class of reliability which would be provided once the
admission control (policy) has established application or user rights
for the requested reliability level.
4. Reliable Virtualized Network Function Pool
Some implementations of Cloud Management System (CMS) may have
already provided APIs to the cloud applications to improve
reliability, such as status notifications from different
Infrastructure as a Service (IaaS) products via plug-ins. However,
certain degree of standardization is required in order to allow
application, orchestrator and virtualization infrastructure to be
developed independently from each other.
Reliability and availability is a wide ranging problem space within
the ETSI NFV ISG and wider NFV environment. We don't target on
solving all the reliability issues of NFV. Instead, we focus on
developing some tools to improve the reliability of NFV by using
reliable VNF pool - including the set of reliable VNF instances and
the reliable transport connections between VNF instances, which are
applicable to a wide range of network services.
We introduce an applicable architecture of reliable VNF pool. Note
that the main purpose of this section is to scope potential solution
space. The specification of the components and interfaces of the
reliable VNF pool should be addressed in separate drafts [RSNDP]. A
high level diagram of reliable VNF pool is illustrated as below.
+-----------------+
| Network Service |
+-----------------+
^
| Service Request / Response
V
+--------------+
| Pool Manager |
+--------------+
^
| Pool Management
Zong Expires January 16, 2014 [Page 5]
Internet-Draft Reliable VNF Pool July 2013
| (e.g. VNF instance failover,
| transport conn failover)
V
+--------------------------------------------------+
| +----------+ +----------+ +----------+ |
| | VNF | | VNF | | VNF | |
| | Instance | | Instance | ... ... | Instance | |
| +----------+ +----------+ +----------+ |
| VNF Pool |
+--------------------------------------------------+
Figure 2: Reliable VNF Pool.
There are two major parts in the reliable VNF pool. The first part
is the interface between Pool Manager (PM) and VNF pool, which may
include the following functions:
1) VNF instance registration to PM. Characteristics of a VNF
instance include instance ID, VNF type, host hypervisor ID.
2) VNF instance status collection and fault management by PM. The
status of VNF instance may include load, liveness. Fault
management of VNF instance may include replacement of VNF
instance, as well as re-establishment of the associated transport
connections with other VNF instances.
3) Hypervisor status collection and fault management by PM. Fault
management of hypervisor may include migration of VNF instances
between host hypervisors, as well as re-establishment of the
associated transport connections.
4) Transport connection status collection and fault management by
PM. The status of transport connection may include congestion,
link failure. Fault management of transport connection may
include redirection of data traffic from one path to another.
The second part is the interface between PM and network service,
which may include the following functions:
1) Reliability class. A network service may request to a PM the
desired or required class of reliability of the service. The PM
will accordingly select VNF instances and establish transport
connections to fulfill specific reliability requirements;
otherwise a PM will notify the service regarding the resource
availability.
Zong Expires January 16, 2014 [Page 6]
Internet-Draft Reliable VNF Pool July 2013
2) Failure separation. A network service may request to a PM
specific affinity rules regarding the deployment of VNF instances,
e.g., separate hypervisors, separate DC sites. The PM will select
VNF instances based on the affinity criteria to fulfill the
request; in the event that the request cannot be met the PM will
notify the service the resource availability.
5. Gap Analysis and Related Works
This section presents some prior work and discusses the suitability
of existing solutions. Where applicable, the document will also
highlight work which may be extended to meet the requirements and
objectives of reliable VNF pools.
5.1. Reliable Server Pool
Reliable Server Pool (RSerPool) supports high availability and
scalability of applications through the use of pools of servers
[RFC5351]. The basic functions of RSerPool are:
1) Server pool management including server registration, server
fault management, load balancing, etc.;
2) Receive requests and a way for the client to bind to a desired
server.
The main protocol developed by RSerPool is called Aggregate Server
Access Protocol (ASAP), which is responsible for the abstraction of
the transport layer protocols (e.g. TCP, SCTP), load balancing, fault
management, as well as presentation to the applications via a unified
primitive interface [RFC5352]. The architecture of RSerPool is shown
as below.
+--------------+
| Application |
+--------------+
^
| Service Request / Response
V
+--------------+
| PR |
+--------------+
^
| Pool Management
| (e.g. PE failover)
V
+--------------------------------------------------+
| +----------+ +----------+ +----------+ |
Zong Expires January 16, 2014 [Page 7]
Internet-Draft Reliable VNF Pool July 2013
| | PE | | PE | ... ... | PE | |
| +----------+ +----------+ +----------+ |
| Server Pool |
+--------------------------------------------------+
Figure 3: Reliable Server Pool.
The similarity and applicability of RSerPool to reliable VNF pool
includes:
1) Pool Elements (PEs) can be regarded as VNF instances;
2) Pool Registrar (PR) in RSerPool has similar role with PM in
reliable VNF pool in the perspective of PE registration, PE fault
management, PE selection, etc.
Nevertheless, there are some gaps for RSerPool such as:
1) No reliability class and failure separation support;
2) No transport layer reliability between any pair of VNF
instances;
3) No hypervisor layer fault management.
5.2. Multipath TCP
Multipath TCP (MPTCP) is a modified version of TCP that implements a
multipath transport and achieves enhanced reliability of the data
connection by pooling multiple paths within a transport connection,
transparently to the application [RFC6182]. MPTCP is primarily
concerned with utilizing multiple paths end-to-end, where one or both
of the end hosts are multi-homed. The following diagram illustrates
a typical usage scenario for MPTCP [RFC6182].
+------+ __________ +------+
| |A1 ______ ( ) ______ B1| |
| Host |--/ ( ) \--| Host |
| | ( Internet ) | |
| A |--\______( )______/--| B |
| |A2 (__________) B2| |
+------+ +------+
Figure 4: Scenario of MPTCP.
The applicability of MPTCP to reliable VNF pool includes:
Zong Expires January 16, 2014 [Page 8]
Internet-Draft Reliable VNF Pool July 2013
1) Transport layer reliability based on multiple transport between
VNF instances.
There are some identified constraints to the implementation of MPTCP
such as potential shared bottlenecks, interpose of middleware
terminating TCP sessions which is common in the context of end-to-end
virtualized network service. Other gaps for MPTCP will be further
studied.
5.3. VNF Forwarding Graph
VNF forwarding graph (a.k.a. service chain in a wider sense) defines
the sequence of VNF instances per user session must traverse [NFVUC].
An example of a VNF forwarding graph is where user packets traverse a
sequence of following VNF instances:
1) Intrusion Detection Device;
2) Firewall;
3) Network Address Translation (NAT);
4) Server Load Balancer.
Different network services have different VNF forwarding graphs based
on specific user and therefore service logic.
VNF forwarding graph and reliable VNF pool are independent but
complementary with each other in the following aspects:
1) VNF forwarding graph determines the sequential relation between
VNF instances, while reliable VNF pool selects reliable VNF
instances and reliable transport connections between VNF
instances;
2) Reliable VNF pool focuses on transport layer (e.g. SCTP, MPTCP)
for reliable data connection, which is independent to packet
forwarding layer (e.g. L2/L3).
6. Security Considerations
TBD.
Zong Expires January 16, 2014 [Page 9]
Internet-Draft Reliable VNF Pool July 2013
7. IANA Considerations
This document has no actions for IANA.
8. Acknowledgements
The authors would like to than Daniel King from Lancaster University,
UK for the valuable comments to this draft.
9. References
9.1. Normative References
TBD.
9.2. Informative References
[NFV-WP] NFV Whitepaper: "Network Function Virtualization", issue 1,
2012, http://portal.etsi.org/NFV/NFV_White_Paper.pdf.
[NFV-REL] ETSI GS NFV REL 001: "Network Function Virtualization;
Resiliency Requirements", Version 0.0.1, 2013.
[NFV-TERM] ETSI GS NFV 003: "Terminology for Main Conceptional
Entities in NFV", Version 0.0.4, 2013.
[RSNDP] Q. Wu, "An Overview of Reliable Service Nodes discovery and
provision Protocols", draft-wu-rsndp-overview-00, 2013.
[RFC5351] P. Lei, L. Ong, M. Tuexen and T. Dreibholz, "An Overview of
Reliable Server Pooling Protocols", RFC5351, September 2008.
[RFC5352] R. Stewart, Q. Xie, M. Stillman and M. Tuexen, "Aggregate
Server Access Protocol (ASAP)", RFC5352, September 2008.
[RFC6182] A. Ford, C. Raiciu, M. Handley, S. Barre and J. Iyengar,
"Architectural Guidelines for Multipath TCP Development", RFC6182,
March 2011.
[NFV-UC] ETSI GS NFV 001: "Network Function Virtualization; Use
Cases", Version 0.0.2, 2013.
10. References
Author's Address
Zong Expires January 16, 2014 [Page 10]
Internet-Draft Reliable VNF Pool July 2013
Ning Zong
Huawei Technologies
Email: zongning@huawei.com
Zong Expires January 16, 2014 [Page 11]