Network Working Group S. Poretsky
Internet Draft Allot Communications
Expires: Jan 2011 Rajiv Papneja
Intended Status: Informational Isocore
J. Karthik
S. Vapiwala
Cisco Systems
July 2010
Benchmarking Terminology
for Protection Performance
<draft-ietf-bmwg-protection-term-09.txt >
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on 7 Jan, 2011.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described
in Section 4.e of the Trust Legal Provisions and are provided
without warranty as described in the Simplified BSD License.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s)
controlling the copyright in such materials, this document may not
be modified outside the IETF Standards Process, and derivative
works of it may not be created outside the IETF Standards Process,
except to format it for publication as an RFC or to translate it
into languages other than English.
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 1]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Abstract
This document provides common terminology and metrics for benchmarking
the performance of sub-IP layer protection mechanisms. The performance
benchmarks are measured at the IP-Layer with protection may be
provided at the Sub-IP layer. The benchmarks and terminology can be
applied in methodology documents for different sub-IP layer protection
mechanisms such as Automatic Protection Switching (APS), Virtual Router
Redundancy Protocol (VRRP), Stateful High Availability (HA), and
Multi-Protocol Label Switching Fast Reroute (MPLS-FRR).
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 2]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Table of Contents
1. Introduction..............................................3
2. Existing definitions......................................6
3. Test Considerations.......................................7
3.1. Paths................................................7
3.1.1. Path............................................7
3.1.2. Working Path....................................8
3.1.3. Primary Path....................................8
3.1.4. Protected Primary Path..........................8
3.1.5. Backup Path.....................................9
3.1.6. Standby Backup Path.............................10
3.1.7. Dynamic Backup Path.............................10
3.1.8. Disjoint Paths..................................10
3.1.9. Point of Local repair (PLR).....................11
3.1.10. Shared Risk Link Group (SRLG)..................11
3.2. Protection Mechanisms................................12
3.2.1. Link Protection.................................12
3.2.2. Node Protection.................................12
3.2.3. Path Protection.................................12
3.2.4. Backup Span.....................................13
3.2.5. Local Link Protection...........................13
3.2.6. Redundant Node Protection.......................14
3.2.7 State Control Interface.........................14
3.2.8. Protected Interface.............................15
3.3. Protection Switching.................................15
3.3.1. Protection Switching System.....................15
3.3.2. Failover Event..................................15
3.3.3. Failure Detection...............................16
3.3.4. Failover........................................17
3.3.5. Restoration.....................................17
3.3.6. Reversion.......................................18
3.4. Nodes................................................18
3.4.1. Protection-Switching Node.......................18
3.4.2. Non-Protection Switching Node...................19
3.4.3. Headend Node....................................19
3.4.4. Backup Node.....................................19
3.4.5. Merge Node......................................20
3.4.6. Primary Node....................................20
3.4.7. Standby Node....................................21
3.5. Benchmarks...........................................21
3.5.1. Failover Packet Loss............................21
3.5.2. Reversion Packet Loss...........................22
3.5.3. Failover Time...................................22
3.5.4. Reversion Time..................................23
3.5.5. Additive Backup Delay...........................23
3.6 Failover Time Calculation Methods.....................24
3.6.1 Time-Based Loss Method...........................24
3.6.2 Packet-Loss Based Method.........................25
3.6.3 Timestamp-Based Method...........................25
4. Acknowledgments...........................................26
5. IANA Considerations.......................................26
6. Security Considerations...................................26
7. References................................................26
8. Authors' Addresses........................................27
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 3]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
1. Introduction
The IP network layer provides route convergence to protect data
traffic against planned and unplanned failures in the internet. Fast
convergence times are critical to maintain reliable network
connectivity and performance. Convergence Events [6] are recognized
at the IP Layer so that Route Convergence [6] occurs. Technologies
that function at sub-IP layers can be enabled to provide further
protection of IP traffic by providing the failure recovery at the
sub-IP layers so that the outage is not observed at the IP-layer.
Such sub-IP protection technologies include, but are not limited to,
High Availability (HA) stateful failover, Virtual Router Redundancy
Protocol (VRRP) [8], Automatic Link Protection (APS) for SONET/SDH,
Resilient Packet Ring (RPR) for Ethernet, and Fast Reroute for
Multi-Protocol Label Switching (MPLS-FRR) [9].
1.1 Scope
Benchmarking terminology was defined for IP-layer convergence in
[6]. Different terminology and methodologies specific to
benchmarking sub-IP layer protection mechanisms are required. The
metrics for benchmarking the performance of sub-IP protection
mechanisms are measured at the IP layer, so that the results are
always measured in reference to IP and independent of the specific
protection mechanism being used. The purpose of this document is
to provide a single terminology for benchmarking sub-IP protection
mechanisms.
A common terminology for Sub-IP layer protection mechanism
benchmarking enables different implementations of a protection
mechanism to be benchmarked and evaluated. In addition,
implementations of different protection mechanisms can be
benchmarked and evaluated. It is intended that there can exist
unique methodology documents for each sub-IP protection mechanism
based upon this common terminology document. The terminology
can be applied to methodologies that benchmark sub-IP protection
mechanism performance with a single stream of traffic or
multiple streams of traffic. The traffic flow may be
uni-directional or bi-directional as to be indicated in the
methodology.
1.2 General Model
The sequence of events to benchmark the performance of Sub-IP
Protection Mechanisms is as follows:
1. Failover Event - Primary Path fails
2. Failure Detection- Failover Event is detected
3. Failover - Backup Path becomes the Working Path due to Failover
Event
4. Restoration - Primary Path recovers from a Failover Event
5. Reversion (optional) - Primary Path becomes the Working Path
These terms are further defined in this document.
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 4]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Figures 1 through 5 show models that MAY be used when benchmarking
Sub-IP Protection mechanisms, which MUST use a Protection Switching
System that consists of a minimum of two Protection-Switching Nodes,
an Ingress Node known as the Headend Node and an Egress Node known
as the Merge Node. The Protection Switching System MUST include
either a Primary Path and Backup Path, as shown in Figures 1 through
4, or a Primary Node and Standby Node, as shown in Figure 5. A
Protection Switching System may provide link protection, node
protection, path protection, local link protection, and high
availability, as shown in Figures 1 through 5 respectively. A
Failover Event occurs along the Primary Path or at the Primary Node.
The Working Path is the Primary Path prior to the Failover Event and
the Backup Path after the Failover Event. A Tester is set outside
the two paths or nodes as it sends and receives IP traffic along the
Working Path. The tester MUST record the IP packet sequence numbers,
departure time, and arrival time so that the metrics of Failover
Time, Additive Latency, Packet Reordering, Duplicate Packets, and
Reversion Time can be measured. The Tester may be a single device
or a test system. If Reversion is supported then the Working Path is
the Primary Path after Restoration (Failure Recovery) of the Primary
Path.
Link Protection, as shown in Figure 1, provides protection when a
Failover Event occurs on the link between two nodes along the Primary
Path. Node Protection, as shown in Figure 2, provides protection
when a Failover Event occurs at a Node along the Primary Path.
Path Protection, as shown in Figure 3, provides protection for link
or node failures for multiple hops along the Primary Path. Local
Link Protection, as shown in Figure 4, provides Sub-IP Protection of
a link between two nodes, without a Backup Node. An example of such
a Sub-IP Protection mechanism is SONET APS. High Availability
Protection, as shown in Figure 5, provides protection of a Primary
Node with a redundant Standby Node. State Control is provided
between the Primary and Standby Nodes. Failure of the Primary Node
is detected at the Sub-IP layer to force traffic to switch to the
Standby Node, which has state maintained for zero or minimal packet
loss.
+-----------+
+--------------| Tester |<-----------------------+
| +-----------+ |
| IP Traffic | Failover IP Traffic |
| | Event |
| ------------ | ---------- |
+--->| Ingress/ | V | Egress/ |---+
|Headend Node|------------------|Merge Node| Primary
------------ ---------- Path
| ^
| --------- | Backup
+--------| Backup |-------------+ Path
| Node |
---------
Figure 1. System Under Test (SUT) for Sub-IP Link Protection
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 5]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
+-----------+
+--------------------| Tester |<-----------------+
| +-----------+ |
| IP Traffic | Failover IP Traffic |
| | Event |
| V |
| ------------ -------- ---------- |
+--->| Ingress/ | |MidPoint| | Egress/ |---+
|Headend Node|----| Node |----|Merge Node| Primary
------------ -------- ---------- Path
| ^
| --------- | Backup
+--------| Backup |-------------+ Path
| Node |
---------
Figure 2. System Under Test (SUT) for Sub-IP Node Protection
+-----------+
+---------------------------| Tester |<----------------------+
| +-----------+ |
| IP Traffic | Failover IP Traffic |
| | Event |
| Primary Path | |
| ------------ -------- | -------- ---------- |
+--->| Ingress/ | |MidPoint| V |Midpoint| | Egress/ |---+
|Headend Node|----| Node |---| Node |---|Merge Node|
------------ -------- -------- ----------
| ^
| --------- -------- | Backup
+--------| Backup |----| Backup |--------+ Path
| Node | | Node |
--------- --------
Figure 3. System Under Test (SUT) for Sub-IP Path Protection
+-----------+
+--------------------| Tester |<-------------------+
| +-----------+ |
| IP Traffic | Failover IP Traffic |
| | Event |
| Primary | |
| +--------+ Path v +--------+ |
| | |------------------------>| | |
+--->| Ingress| | Egress |----+
| Node |- - - - - - - - - - - - >| Node |
+--------+ Backup Path +--------+
| |
| IP-Layer Forwarding |
+<----------------------------------------->+
Figure 4. System Under Test (SUT) for Sub-IP Local Link Protection
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 6]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
+-----------+
+-----------------| Tester |<--------------------+
| +-----------+ |
| IP Traffic | Failover IP Traffic |
| | Event |
| V |
| --------- -------- ---------- |
+--->| Ingress | |Primary | | Egress/ |------+
| Node |----| Node |----|Merge Node| Primary
--------- -------- ---------- Path
| State |Control ^
| Interface |(Optional) |
| --------- |
+---------| Standby |---------+
| Node |
---------
Figure 5. System Under Test (SUT) for Sub-IP Redundant Node Protection
Some protection switching technologies may use a series of
steps that differ from the general model. The specific differences
SHOULD be highlighted in each technology-specific methodology.
Note that some protection switching technologies are endowed
with the ability to re-optimize the working path after a
node or link failure.
2. Existing definitions
This document uses existing terminology defined in other BMWG
work. Examples include, but are not limited to:
Latency [Ref.[2], section 3.8]
Frame Loss Rate [Ref.[2], section 3.6]
Throughput [Ref.[2], section 3.17]
Device Under Test (DUT) [Ref.[3], section 3.1.1]
System Under Test (SUT) [Ref.[3], section 3.1.2]
Offered Load [Ref.[3], section 3.5.2]
Out-of-order Packet [Ref.[4], section 3.3.2]
Duplicate Packet [Ref.[4], section 3.3.3]
Forwarding Delay [Ref.[4], section 3.2.4]
Jitter [Ref.[4], section 3.2.5]
Packet Loss [Ref.[6], Section 3.5]
Packet Reordering [Ref.[7], section 3.3]
This document has the following frequently used acronyms:
DUT Device Under Test
SUT System Under Test
This document adopts the definition format in Section 2 of RFC 1242
[2]. Terms defined in this document are capitalized when used
within this document.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119 [5].
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 7]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
RFC 2119 defines the use of these key words to help make the
intent of standards track documents as clear as possible. While this
document uses these keywords, this document is not a standards track
document.
3. Test Considerations
3.1. Paths
3.1.1 Path
Definition:
A unidirectional sequence of nodes, <R1, ..., Rn>, and links
<L12,... L(n-1)n> with the following properties:
a. R1 is the ingress node and forwards IP packets, which input
into DUT/SUT, to R2 as sub-IP frames over link L12.
b. Ri is a node which forwards data frames to R(i+1) over Link
Li(i+1) for all i, 1<i<n-1, based on information in the sub-IP
layer.
c. Rn is the egress node and it outputs sub-IP frames from
DUT/SUT as IP packets. L(n-1)n is the link between the R(n-1)
and Rn.
Discussion:
The path is defined in the sub-IP layer in this document, unlike
an IP path in RFC 2026 [1]. One path may be regarded as being
equivalent to one IP link between two IP nodes, i.e., R1 and Rn.
The two IP nodes may have multiple paths for protection. A
packet will travel on only one path between the nodes. Packets
belonging to a microflow [10] will traverse one or more paths.
The path is unidirectional. For example, the link between R1
and R2 in the direction from R1 to R2 is L12. For traffic
flowing in the reverse direction from R2 to R1, the link is L21.
Example paths are the SONET/SDH path and the label switched path
for MPLS.
Measurement units:
n/a
Issues:
"A bidirectional path", which transmits traffic in both
directions along the same nodes, consists of two unidirectional
paths. Therefore, the two unidirectional paths belonging to
"one bidirectional path" will be treated independently when
benchmarking for "a bidirectional path".
See Also:
Working Path
Primary Path
Backup Path
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 8]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
3.1.2. Working Path
Definition:
The path that the DUT/SUT is currently using to forward
packets.
Discussion:
A Primary Path is the Working Path before occurrence of a
Failover Event. A Backup Path shall become the Working Path
after a Failover Event.
Measurement units:
n/a
Issues:
See Also:
Path
Primary Path
Backup Path
3.1.3. Primary Path
Definition:
The preferred point to point path for forwarding traffic
between two or more nodes.
Discussion:
The Primary Path is the Path that traffic traverses
prior to a Failover Event.
Measurement units:
n/a
Issues:
None
See Also:
Path
Failover Event
3.1.4. Protected Primary Path
Definition:
A Primary Path that is protected with a Backup Path.
Discussion:
A Protected Primary Path must include at least one Protection
Switching Node.
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 9]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Measurement units:
n/a
Issues: None
See Also:
Path
Primary Path
3.1.5. Backup Path
Definition:
A path that exists to carry data traffic only if a Failover
Event occurs on a Primary Path.
Discussion:
The Backup Path shall become the Working Path upon a Failover
Event. A Path may have one or more Backup Paths. A Backup
Path may protect one or more Primary Paths. There are various
types of Backup Paths:
a. dedicated recovery Backup Path (1+1) or (1:1), which has
100% redundancy for a specific ordinary path,
b. shared Backup Path (1:N), which is dedicated to the
protection for more than one specific Primary Path
c. associated shared Backup Path (M:N) for which a specific
set of Backup Paths protects a specific set of more than one
Primary Path.
A Backup Path may be signaled or unsignaled. The Backup Path
must be created prior to the Failover Event. The backup path
generally originates at the point of local repair (PLR), and
terminates at a node along a primary path.
Measurement units:
n/a
Issues:
See Also:
Path
Working Path
Primary Path
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 10]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
3.1.6. Standby Backup Path
Definition:
A Backup Path that is established prior to a Failover Event
to protect a Primary Path.
Discussion:
The Standby Backup Path and Dynamic Backup Path provide
protection, but are established at different times.
Measurement units: n/a
Issues: None
See Also:
Backup Path
Primary Path
Failover Event
3.1.7. Dynamic Backup Path
Definition:
A Backup Path that is established upon occurrence of a
Failover Event.
Discussion:
The Standby Backup Path and Dynamic Backup Path provide
protection, but are established at different times.
Measurement units: n/a
Issues: None
See Also:
Backup Path
Standby Backup Path
Failover Event
3.1.8. Disjoint Paths
Definition:
A pair of paths that do not share a common link or nodes.
Discussion:
Two paths are disjoint if they do not share a common node or link
other than the ingress and egress.
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 11]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Measurement units: n/a
Issues: None
See Also:
Path
Primary Path
SRLG
3.1.9. Point of Local Repair (PLR)
Definition:
A node capable of Failover along the Primary Path that is
also the ingress node for the Backup Path to protect another
node or link.
Discussion:
Any node along the Primary Path from the ingress node to
the penultimate node may be a PLR. The PLR may use
a single Backup Path for protecting one or more Primary
Paths. There can be multiple PLRs along a Primary Path.
The PLR must be an ingress to a Backup Path. The PLR can
be any node along the Primary Path except the egress node
of the Primary Path. The PLR may simultaneously be a
Headend Node when it is serving the role as ingress to
the Primary Path and the Backup Path. If the PLR is
also the Headend Node, then the Backup Path is a Disjoint
Path from the ingress to the Merge Node.
Measurement units: n/a
Issues: None
See Also:
Primary Path
Backup Path
Failover
3.1.10. Shared Risk Link Group (SRLG)
Definition:
SRLG is a set of links which share the same risk (physical
or logical) within a network.
Discussion:
SRLG is considered the set of links to be avoided when
the primary and secondary paths are considered disjoint.
The SRLG will fail as a group if the shared resource
(physical or anything abstract such as software version)
fails.
Measurement units: n/a
Issues: None
See Also:
Path
Primary Path
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 12]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
3.2. Protection
3.2.1. Link Protection
Definition:
A Backup Path that is signaled to at least one Backup Node
to protect for failure of interfaces and links along a
Primary Path.
Discussion:
Link Protection may or may not protect the entire Primary
Path. Link protection is shown in Figure 1.
Measurement units: n/a
Issues: None
See Also:
Primary Path
Backup Path
3.2.2. Node Protection
Definition:
A Backup Path that is signaled to at least one Backup Node
to protect for failure of interfaces, links, and nodes
along a Primary Path.
Discussion:
Node Protection may or may not protect the entire Primary
Path. Node Protection also provides Link Protection.
Node Protection is shown in Figure 2.
Measurement units: n/a
Issues: None
See Also:
Link Protection
3.2.3. Path Protection
Definition:
A Backup Path that is signaled to at least one Backup Node
to provide protection along the entire Primary Path.
Discussion:
Path Protection provides Node Protection and Link Protection
for every node and link along the Primary Path. A Backup
Path providing Path Protection may have the same ingress
node as the Primary Path. Path Protection is shown in
Figure 3.
Measurement units: n/a
Issues: None
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 13]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
See Also:
Primary Path
Backup Path
Node Protection
Link protection
3.2.4. Backup Span
Definition:
The number of hops used by a Backup Path.
Discussion:
The Backup Span is an integer obtained by counting the
number of nodes along the Backup Path.
Measurement units:
number of nodes
Issues:
None
See Also:
Primary Path
Backup Path
3.2.5. Local Link Protection
Definition:
A Backup Path that is a redundant path between two nodes
which does not use a Backup Node.
Discussion:
Local Link Protection must be provided as a Backup Path
between two nodes along the Primary Path without the use
of a Backup Node. Local Link Protection is provided by
Protection Switching Systems such as SONET APS. Local
Link Protection is shown in Figure 4.
Measurement units: None
Issues: None
See Also:
Backup Path
Backup Node
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 14]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
3.2.6. Redundant Node Protection
Definition:
A Protection Switching System with a Primary Node
protected by a Standby Node along the Primary Path.
Discussion:
Redundant Node Protection is provided by Protection
Switching Systems such as VRRP and HA. The protection
mechanisms occur at Sub-IP layers to switch traffic from
a Primary Node to Backup Node upon a Failover Event at
the Primary Node. Traffic continues to traverse the
Primary Path through the Standby Node. The failover may
be stateful, in which the state information may be
exchanged in-band or over an out-of-band state control
interface. The Standby Node may be active or passive.
Redundant Node Protection is shown in Figure 5.
Measurement units: None
Issues: None
See Also:
Primary Path
Primary Node
Standby Node
3.2.7. State Control Interface
Definition:
An out-of-band control interface used to exchange state
information between the Primary Node and Standby Node.
Discussion:
The State Control Interface may be used for Redundant Node
Protection. The State Control Interface should be out-of-band.
It is possible to have Redundant Node Protection in which
there is no state control or state control is provided
in-band. The State Control Interface between the Primary
and Standby Node may be one or more hops.
Measurement units: None
Issues: None
See Also:
Primary Node
Standby Node
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 15]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
3.2.8. Protected Interface
Definition:
An interface along the Primary Path that is protected by
a Backup Path.
Discussion:
A Protected Interface is an interface protected by a
Protection Switching System that provides Link
Protection, Node Protection, Path Protection, Local
Link Protection, and Redundant Node Protection.
Measurement units: None
Issues: None
See Also:
Primary Path
Backup Path
3.3. Protection Switching
3.3.1. Protection Switching System
Definition:
A DUT/SUT that is capable of Failure Detection and Failover
from a Primary Path to a Backup Path or Standby Node when a
Failover Event occurs.
Discussion:
The Protection Switching System must include either a
Primary Path and Backup Path, as shown in Figures 1 through
4, or a Primary Node and Standby Node, as shown in Figure
5. The Backup Path may be a Standby Backup Path or a
dynamic Backup Path. The Protection Switching System
includes the mechanisms for both Failure Detection and
Failover.
Measurement units: n/a
Issues: None
See Also:
Primary Path
Backup Path
Failover
3.3.2. Failover Event
Definition:
The occurrence of a planned or unplanned action in the network
that results in a change in the Path that data traffic traverses.
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 16]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Discussion:
Failover Events include, but are not limited to, link failure
and router failure. Routing changes are considered Convergence
Events [6] and are not Failover Events. This restricts
Failover Events to sub-IP layers. Failover may be at the PLR or
at the ingress. If the failover is at the ingress it is
generally on a disjoint path from the ingress to egress.
Failover Events may results from failures such as link failure
or router failure. The change in path after Failover may have
a Backup Span of one or more nodes. Failover Events are
distinguished from routing changes and Convergence Events [6]
by the detection of the failure and subsequent protection
switching at a sub-IP layer. Failover occurs at a Point of
Local Repair (PLR) or Primary Node.
Measurement units:
n/a
Issues: None
See Also:
Path
Failure Detection
Disjoint Path
3.3.3. Failure Detection
Definition:
The process to identify at a sub-IP layer a Failover Event
at a Primary Node or along the Primary Path.
Discussion:
Failure Detection occurs at the Primary Node or ingress node
of the Primary Path. Failure Detection occurs via a sub-IP
mechanism such as detection of a link down event or timeout for
receipt of a control packet. A failure may be completely
isolated. A failure may affect a set of links which share a
single SRLG (e.g. port with many sub-interfaces). A failure may
affect multiple links that are not part of SRLG.
Measurement units: n/a
Issues:
See Also:
Primary Path
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 17]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
3.3.4. Failover
Definition:
The process to switch data traffic from the protected Primary
Path to the Backup Path upon Failure Detection of a Failover
Event.
Discussion:
Failover to a Backup Path provides Link Protection, Node
Protection, or Path Protection. Failover is complete when
Packet Loss [6], Out-of-order Packets [4], and Duplicate
Packets [4] are no longer observed. Forwarding Delay [4]
may continue to be observed.
Measurement units:
n/a
Issues:
See Also:
Primary Path
Backup Path
Failover Event
3.3.5. Restoration
Definition:
The state of failover recovery in which the Primary Path
has recovered from a Failover Event, but is not yet
forwarding packets because the Backup Path remains the
Working Path.
Discussion:
Restoration must occur while the Backup Path is the
Working Path. The Backup Path is maintained as the
Working Path during Restoration. Restoration produces
a Primary Path that is recovered from failure, but is
not yet forwarding traffic. Traffic is still being
forwarded by the Backup Path functioning as the Working
Path.
Measurement units:
n/a
Issues:
See Also:
Primary Path
Failover Event
Failure Recovery
Working Path
Backup Path
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 18]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
3.3.6. Reversion
Definition:
The state of failover recovery in which the Primary Path has
become the Working Path so that it is forwarding packets.
Discussion:
Protection Switching Systems may or may not support Reversion.
Reversion, if supported, must occur after Restoration.
Packet forwarding on the Primary Path resulting from Reversion
may occur either fully or partially over the Primary Path. A
potential problem with Reversion is the discontinuity in end to
end delay when the Forwarding Delays [4] along the Primary Path
and Backup Path are different, possibly causing Out of Order
Packets [4], Duplicate Packets [4], and increased Jitter [4].
Measurement units: n/a
Issues: None
See Also:
Protection Switching System
Working Path
Primary Path
3.4. Nodes
3.4.1. Protection-Switching Node
Definition:
A node that is capable of participating in a Protection
Switching System.
Discussion:
The Protection Switching Node may be an ingress or egress for
a Primary Path or Backup Path, such as used for MPLS Fast
Reroute configurations. The Protection Switching Node may
provide Redundant Node Protection as a Primary Node in a
Redundant chassis configuration with a Standby Node, such as
used for VRRP and HA configurations.
Measurement units:
n/a
Issues:
See Also:
Protection Switching System
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 19]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
3.4.2. Non-Protection Switching Node
Definition:
A node that is not capable of participating in a Protection
Switching System, but may exist along the Primary Path or
Backup Path.
Discussion:
Measurement units:
n/a
Issues:
See Also:
Protection Switching System
Primary Path
Backup Path
3.4.3. Headend Node
Definition:
The ingress node of the Primary Path.
Discussion:
The Headend Node may also be a PLR when it is serving in
the dual role as the ingress to the Backup Path.
Measurement units: n/a
Issues:
See Also:
Primary Path
Point of Local Repair (PLR)
Failover
3.4.4. Backup Node
Definition:
A node along the Backup Path.
Discussion:
The Backup Node can be any node along the Backup Path.
There may be one or more Backup Nodes along the Backup Path.
A Backup Node may be the ingress, mid-point, or egress of
the Backup Path. If the Backup Path has only one Backup
Node, then that Backup Node is the ingress and egress of the
Backup Path.
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 20]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Measurement units: n/a
Issues:
See Also:
Backup Path
3.4.5. Merge Node
Definition:
A node along the Primary Path where Backup Path terminates.
Discussion:
The Merge Node can be any node along the Primary Path
except the ingress node of the Primary Path. There can be
multiple Merge Nodes along a Primary Path. A Merge Node
can be the egress node for a single or multiple Backup
Paths. The Merge Node must be the egress to the Backup
Path. The Merge Node may also be the egress of the
Primary Path or Point of Local Repair (PLR).
Measurement units:
n/a
Issues:
See Also:
Primary Path
Backup Path
PLR
Failover
3.4.6. Primary Node
Definition:
A node along the Primary Path that is capable of Failover to a
redundant Standby Node.
Discussion:
The Primary Node may be used for Protection Switching Systems
that provide Redundant Node Protection, such as VRRP and HA
Measurement units: n/a
Issues:
See Also:
Protection Switching System
Redundant Node Protection
Standby Node
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 21]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
3.4.7. Standby Node
Definition:
A redundant node to a Primary Node that forwards traffic along
the Primary Path upon Failure Detection of the Primary Node.
Discussion:
The Standby Node must be used for Protection Switching
Systems that provide Redundant Node Protection, such as VRRP
and HA. The Standby Node must provide protection along the
same Primary Path. If the failover is to a Disjoint Path then
it is a Backup Node. The Standby Node may be configured
for 1:1 or N:1 protection.
The communication between the Primary Node and Standby Node
may be in-band or across an out-of-band State Control
interface. The Standby Node may be geographically dispersed
from the Primary Node. When geographically dispersed, the
number of hops of separation may increase failover time.
The Standby Node may be passive or active. The Passive Standby
Node is not offered traffic and does not forward traffic until
Failure Detection of the Primary Node. Upon Failure Detection
of the Primary Node, traffic offered to the Primary Node is
instead offered to the Passive Standby Node. The Active
Standby Node is offered traffic and forwards traffic along the
Primary Path while the Primary Node is also active. Upon
Failure Detection of the Primary Node, traffic offered to the
Primary Node is switched to the Active Standby Node.
Measurement units: n/a
Issues:
See Also:
Primary Node
State Control Interface
3.5. Benchmarks
3.5.1. Failover Packet Loss
Definition:
The amount of packet loss produced by a Failover Event until
Failover completes, where the measurement begins when the last
unimpaired packet is received by the Tester on the Protected
Primary Path and ends when the first unimpaired packet is
received by the Tester on the Backup Path.
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 22]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Discussion:
Packet loss can be observed as a reduction of forwarded
traffic from the maximum forwarding rate. Failover Packet
Loss includes packets that were lost, reordered, or delayed.
Failover Packet Loss may reach 100% of the offered load.
Measurement units:
Number of Packets
Issues: None
See Also:
Failover Event
Failover
3.5.2. Reversion Packet Loss
Definition:
The amount of packet loss produced by Reversion, where the
measurement begins when the last unimpaired packet is received
by the Tester on the Backup Path and ends when the first
unimpaired packet is received by the Tester on the Protected
Primary Path .
Discussion:
Packet loss can be observed as a reduction of forwarded
traffic from the maximum forwarding rate. Reversion Packet
Loss includes packets that were lost, reordered, or delayed.
Reversion Packet Loss may reach 100% of the offered load.
Measurement units: Number of Packets
Issues: None
See Also:
Reversion
3.5.3. Failover Time
Definition:
The amount of time it takes for Failover to successfully
complete.
Discussion:
Failover Time can be calculated using the Time-Based Loss
Method (TBLM), Packet-Loss Based Method (PLBM), or
Timestamp-Based Method (TBM). It is RECOMMENDED that the
TBM is used.
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 23]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Measurement units:
milliseconds
Issues: None
See Also:
Failover
Failover Time
Time-Based Loss Method (TBLM)
Packet-Loss Based Method (PLBM)
Timestamp-Based Method (TBM)
3.5.4. Reversion Time
Definition:
The amount of time it takes for Reversion to complete so
that the Primary Path is restored as the Working Path.
Discussion:
Reversion Time can be calculated using the Time-Based Loss
Method (TBLM), Packet-Loss Based Method (PLBM), or
Timestamp-Based Method (TBM). It is RECOMMENDED that the
TBM is used.
Measurement units:
milliseconds
Issues: None
See Also:
Reversion
Primary Path
Working Path
Reversion Packet Loss
Time-Based Loss Method (TBLM)
Packet-Loss Based Method (PLBM)
Timestamp-Based Method (TBM)
3.5.5. Additive Backup Delay
Definition:
The amount of increased Forwarding Delay [4] resulting
from data traffic traversing the Backup Path instead of
the Primary Path.
Discussion:
Additive Backup Delay is calculated using Equation 1 as
shown below:
(Equation 1)
Additive Backup Delay =
Forwarding Delay(Backup Path) -
Forwarding Delay(Primary Path).
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 24]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Measurement units:
milliseconds
Issues:
Additive Backup Latency may be a negative result.
This is theoretically possible, but could be indicative
of a sub-optimum network configuration .
See Also:
Primary Path
Backup Path
Primary Path Latency
Backup Path Latency
3.6 Failover Time Calculation Methods
The following Methods may be assessed on a per-flow basis using
at least 16 flows spread over the routing table (more flows is
better). Otherwise, the impact of a prefix-dependency in the
implementation of a particular protection technology could be
missed. However, the test designer must be aware of the number
of packets per second sent to each prefix, as this establishes
sampling of the path and the time resolution for measurement
of Failover time on a per-flow basis.
3.6.1 Time-Based Loss Method (TBLM)
Definition:
The method to calculate Failover Time (or Reversion Time) using a
time scale on the Tester to measure the interval of Failover
Packet Loss.
Discussion:
The Tester must provide statistics which show the duration of
failure on a time scale based on occurrence of packet loss on
a time scale. This is indicated by the duration of non-zero
packet loss. The TBLM includes failure detection time and
time for data traffic to begin traversing the Backup Path.
Failover Time and Reversion Time are calculated using the
TBLM as shown in Equation 2:
(Equation 2)
(Equation 2a)
TBLM Failover Time = Time(Failover) - Time(Failover Event)
(Equation 2b)
TBLM Reversion Time = Time(Reversion) - Time(Restoration)
Where as Time(Failover)= Time on the tester at the receipt of the
first unimpaired packet at egress node after the backup path
became the working path
Time(Failover Event)= Time on the tester at the receipt of the
last unimpaired packet at egress node on the primary path
before failure
Measurement units:
milliseconds
Issues:
None
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 25]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
See Also:
Failover
Packet-Loss Based Method
3.6.2 Packet-Loss Based Method (PLBM)
Definition:
The method used to calculate Failover Time (or Reversion Time)
from the amount of Failover Packet Loss.
Discussion:
PLBM includes failure detection time and time for data traffic to
begin traversing the Backup Path. Failover Time can be
calculated using PLBM from the amount Failover Packet Loss as
shown below in Equation 3. Note: If traffic is sent to more than 1
destination, PLBM gives the average loss over the measured
destinations
(Equation 3)
(Equation 3a)
PLBM Failover Time =
(Number of packets lost /
Offered Load rate) * 1000)
(Equation 3b)
PLBM Restoration Time =
(Number of packets lost /
Offered Load rate) * 1000)
Units are packets/(packets/second) = seconds
Measurement units:
milliseconds
Issues:
None
See Also:
Failover
Time-Based Loss Method
3.6.3 Timestamp-Based Method (TBM)
Definition:
The method to calculate Failover Time (or Reversion Time)
using a time scale to quantify the interval between
unimpaired packets arriving in the test stream.
Discussion:
The purpose of this method is to quantify the duration of
failure or reversion on a time scale based on the
observation of unimpaired packets. The TBM is calculated
from Equation 2 with the values obtained from the timestamp
in the packet payload, rather than from the Tester clock as
is used for the values when using the TBLM.
Unimpaired packets are normal packets that are not lost,
reordered, or duplicated. A reordered packet is defined in
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 26]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
[10, section 3.3]. A duplicate packet is defined in
[4, section 3.3.3]. A lost packet is defined in
[7, Section 3.5]. Unimpaired packets may be detected by checking
a sequence number in the payload, where the sequence number equals
the next expected number for an unimpaired packet. A sequence gap
or sequence reversal indicates impaired packets.
For calculating Failover Time, the TBM includes failure
detection time and time for data traffic to begin traversing the
Backup Path. For calculating Reversion Time, the TBM includes
Reversion Time and time for data traffic to begin traversing the
Primary Path.
Measurement units:
milliseconds
Issues: None
See Also:
Failover
Failover Time
Reversion
Reversion Time
4. Acknowledgements
We would like thank the BMWG and particularly Al Morton and Curtis
Villamizar for their reviews, comments, and contributions to this
work.
5. IANA Considerations
This document requires no IANA considerations.
6. Security Considerations
Benchmarking activities as described in this memo are limited to
technology characterization using controlled stimuli in a laboratory
environment, with dedicated address space and the constraints
specified in the sections above.
The benchmarking network topology will be an independent test setup
and MUST NOT be connected to devices that may forward the test
traffic into a production network, or misroute traffic to the test
management network.
Further, benchmarking is performed on a "black-box" basis, relying
solely on measurements observable external to the DUT/SUT.
Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
benchmarking purposes. Any implications for network security arising
from the DUT/SUT SHOULD be identical in the lab and in production
networks.
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 27]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
7. References
7.1. Normative References
[1] Bradner, S., "The Internet Standards Process -- Revision 3",
RFC 2026, October 1996.
[2] Bradner, S., Editor, "Benchmarking Terminology for
Network Interconnection Devices", RFC 1242, July 1991.
[3] Mandeville, R., "Benchmarking Terminology for LAN
Switching Devices", RFC 2285, February 1998.
[4] Poretsky, S., et al., "Terminology for Benchmarking
Network-layer Traffic Control Mechanisms", RFC 4689,
November 2006.
[5] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", RFC 2119, July 1997.
[6] Poretsky, S., Imhoff, B., "Benchmarking Terminology for IGP
Convergence", draft-ietf-bmwg-igp-dataplane-conv-term-21,
work in progress, May 2010.
[7] Morton, A., et al, "Packet Reordering Metrics", RFC 4737,
November 2006.
[8] Hinden, R., "Virtual Router Redundancy Protocol", RFC 5798,
March 2010.
7.2. Informative References
[9] Pan., P. et al, "Fast Reroute Extensions to RSVP-TE for LSP
Paths", RFC 4090, May 2005.
[10] Nichols, K., et al, "Definition of the Differentiated
Services Field (DS Field) in the IPv4 and IPv6 Headers",
RFC 2474, December 1998.
8. Authors' Addresses
Scott Poretsky
Allot Communications
67 South Bedford Street, Suite 400
Burlington, MA 01803
USA
Phone: + 1 508 309 2179
Email: sporetsky@allot.com
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 28]
Internet-Draft Benchmarking Terminology for July 2010
Protection Performance
Rajiv Papneja
Isocore
12359 Sunrise Valley Drive
Reston, VA 22102
USA
Phone: +1 703 860 9273
Email: rpapneja@isocore.com
Jay Karthik
Cisco Systems
300 Beaver Brook Road
Boxborough, MA 01719
USA
Phone: +1 978 936 0533
Email: jkarthik@cisco.com
Samir Vapiwala
Cisco System
300 Beaver Brook Road
Boxborough, MA 01719
USA
Phone: +1 978 936 1484
Email: svapiwal@cisco.com
Poretsky, Papneja, Karthik, Vapiwala Expires Jan 2011 [Page 29]