Network Working Group H. Naderi
Internet-Draft B. Carpenter, Ed.
Intended status: Informational Univ. of Auckland
Expires: October 24, 2015 April 22, 2015
Experience with IPv6 path probing
draft-naderi-ipv6-probing-01
Abstract
This document reports on experience and simulations of dynamic
probing of alternate paths between two IPv6 hosts when network
failures occur. Two models for such probing were investigated: the
SHIM6 REAchability Protocol (REAP) and the Multipath Transmission
Control Protocol (MPTCP). The motivation for this document is to
identify some aspects of path probing at large or very large scale
that may be broadly relevant to future protocol design.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 24, 2015.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
Naderi & Carpenter Expires October 24, 2015 [Page 1]
Internet-Draft IPv6 Probing April 2015
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Results for SHIM6 and REAP . . . . . . . . . . . . . . . . . 3
2.1. Experiments over the Internet . . . . . . . . . . . . . . 3
2.2. Lab Experiments . . . . . . . . . . . . . . . . . . . . . 5
2.3. Large scale simulation . . . . . . . . . . . . . . . . . 5
3. Results for MPTCP . . . . . . . . . . . . . . . . . . . . . . 7
4. Operational issues . . . . . . . . . . . . . . . . . . . . . 8
5. Implications for future designs . . . . . . . . . . . . . . . 9
6. Security Considerations . . . . . . . . . . . . . . . . . . . 9
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10
9. Change log [RFC Editor: Please remove] . . . . . . . . . . . 10
10. Informative References . . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11
1. Introduction
A common situation in the Internet today is that a host trying to
contact another host has a choice of IP addresses for one or both
ends of the communication. Multiple addresses are expected to be
quite common for IPv6 hosts [RFC2460]. Some approaches to this
situation envisage either switching paths during the course of the
communication or using multiple paths in parallel. Examples include
"Happy Eyeballs" [RFC6555] which tries alternative paths at the
start, SHIM6 [RFC5533] and Stream Control Transmission Protocol
(SCTP) [RFC4960] which change paths when there is a failure, and
Multipath TCP (MPTCP) [RFC6824] which shares the paths dynamically.
Some of these methods involve active path probing to choose the best
one. SHIM6 probes all available paths using the REAchability
Protocol (REAP) [RFC5534] when the current path fails, and MPTCP
effectively probes all paths continuously, and shifts load according
to the results. In this document we summarise results and
observations from SHIM6 and MPTCP operated or simulated at large
scale. These observations may be of help in designing future path
probing mechanisms. In particular, we are interested in minimising
both the time taken to recover to the maximum possible throughput
after a path failure, and the amount of overhead traffic caused by
the probing process.
In summary, we ran a series of SHIM6 experiments, each including 250
path failures, between Auckland and Dublin, measuring the time and
overhead traffic for each instance of path probing and recovery.
Naderi & Carpenter Expires October 24, 2015 [Page 2]
Internet-Draft IPv6 Probing April 2015
Then we repeated essentially the same experiment in the laboratory in
Auckland (i.e., with negligible RTT instead of round-the-world RTT).
Then we built a Stochastic Activity Network (SAN) simulation model of
the same scenarios, and validated it by comparison with the
experimental results. Finally we used this model to simulate path
failure and recovery using REAP at very large scale (10,000
simultaneous sessions on a single site experiencing path failure).
Both TCP and DCCP [RFC4340] were used for the transport layer, with a
simple application sending meaningless data in one direction only.
This was followed by roughly equivalent simulations of recovery from
path failure for MPTCP sessions. In this case we validated the SAN
model by comparison with a completely different MPTCP simulator
developed elsewhere [Wischik10].
One advantage of the SAN model is that there are SAN analysis
software tools which allow very large scale simulations. Another is
that it makes it relatively easy to experiment with variations of the
protocol itself, so we did test the impact of certain protocol
changes. However, unlike conventional network simulation tools, the
user has to program a complete protocol behaviour model. We used the
Moebius tool [Moebius].
Details of the experiments and results have been described in two
papers [Naderi10] [Naderi14b] and in H. Naderi's thesis [Naderi14a].
This document limits itself to outlining the results and their
implications for the design of path probing mechanisms in the
Internet.
2. Results for SHIM6 and REAP
2.1. Experiments over the Internet
We set up a test environment which enabled us to run a set of
experiments over the Internet with the LinShim6 implementation of
SHIM6 [Barre08]. We have used two SHIM6-enabled multi- addressed
hosts, located in the University of Auckland (New Zealand) and
Waterford Institute of Technology (Dublin, Ireland). Each host was
equipped with two network interface cards and configured with two
prefixes from two different providers. The SHIM6 host in Auckland
was connected to a router which was a Linux machine and was
configured as an IPv6 router. This router simulated link failures
for the experiments.
Source Address Dependent Routing (SADR) is necessary for effective
use of SHIM6. Hosts decide what source and destination address to
use when host-centric solutions, like SHIM6, are used. Without SADR,
or similar mechanism for routing, packets might be forwarded to the
Naderi & Carpenter Expires October 24, 2015 [Page 3]
Internet-Draft IPv6 Probing April 2015
wrong address providers and dropped because of ingress filtering
according to BCP 38 [RFC2827] [RFC3704]. Unfortunately, we could not
convince the university network administrators to enable SADR on the
Auckland University edge router. To run the experiments, they agreed
to add static routes to the edge router's routing table, to forward
packets destined to the host in Dublin through different providers
according to their destination addresses. Therefore, only two
address pairs out of four possible address pairs could work. To
resolve this issue, we have changed LinShim6 to shuffle the list of
address pairs before starting the exploration process in order to put
the working address pair in a random location in the list. As a
result, the working address pair could appear in any location in the
list and thus create different recovery cases.
This configuration enabled us to run experiments with four address
pairs over the Internet. For each experiment, we artificially
created 250 failures and for each case measured the REAP exploration
time (EP), number of sent (SP) and received probes (RP) and
application recovery time (ART).
Comparing results from experiments with TCP and DCCP shows that when
DCCP is employed, EP, SP and RP are bigger than when TCP is used.
The main reason for this is that DCCP employs delayed
acknowledgement. It sends ACKs every RTT (300 ms), while in case of
TCP, they are sent more frequently (less than 100 ms apart). Since
the RTT is long, the communications look different from REAP's view
point although the behaviour of the application is the same in both
experiments. Since TCP sends ACKs faster, REAP treats it more like a
bi-directional communication while DCCP communication is treated more
like uni-directional. As a result, in the DCCP experiment, the
sender always detects the failure first and then reports it to the
receiver, while in the TCP experiment both sides detect failure and
start exploration almost at the same time. In other words, in case
of TCP, exploration is performed in parallel on both sides and takes
less time and generates less traffic. This result also shows that
the efficiency of the solutions, like SHIM6, which are implemented
inside the protocol stack may be affected by the behaviour of the
other layers of the protocol stack as well.
We also observed some signs of probe loss in the results. Probe
losses can affect EP, SP, RP and ART. When a probe is lost, it might
cause the exploration process to go to a second round, and then an
exponential backoff algorithm causes the exploration process to take
longer and generate more traffic.
Naderi & Carpenter Expires October 24, 2015 [Page 4]
Internet-Draft IPv6 Probing April 2015
2.2. Lab Experiments
We repeated similar experiments in the lab. The main difference was
RTT which was much smaller (0.3 ms) than in the Internet experiments.
We setup two SHIM6 hosts in the lab, each equipped with four network
interfaces. Thus, in addition to experiments with four address pairs
(similar to the Internet experiments), we could run experiments with
9 and 16 address pairs as well.
In the lab, we got similar results from the TCP and DCCP experiments.
Since RTT is small, DCCP sends ACKs faster, and therefore there is no
difference from REAP's viewpoint.
Probe losses are observable in the lab experiments too. Probe loss
causes REAP to go to the second round for scanning the list of
address pairs, which leads to sending more probes and also longer
exploration time.
Experiments with 16 address pairs fail when the working address pair
is located at or close to the end of the list of address pairs. REAP
employs exponential backoff after sending its initial probes, to
avoid generating large bursts of traffic during exploration. For 16
address pairs, this delay sometimes causes the connection to time out
and stop the experiment. In some cases, SHIM6 removes the context
without finding the new address pair. In such cases it seems that
packet losses cause the exploration process to go to the second round
of exploration and the resulting longer delays cause SHIM6 to
actually stop exploration and remove the context.
2.3. Large scale simulation
To study the behaviour of REAP in a very large scale network (e.g.,
an enterprise network), we built a simulation model of REAP and
conducted some experiments which simulated a link failure event in a
network with 10,000 simultaneously active SHIM6-monitored
communications. The aim of the experiments was to see how REAP
reacts to path failures in a large SHIM6-enabled multihomed network.
In our practical tests, nine address pairs seems to be the limit but
we have included larger numbers in our simulations to obtain a
clearer view of REAP's behaviour.
We focused on REAP recovery time and probe traffic as two important
performance parameters. REAP recovery time is the time that REAP
takes to detect the failure and find a new working address pair.
REAP traffic is the traffic which is generated by REAP itself during
its exploration process.
Naderi & Carpenter Expires October 24, 2015 [Page 5]
Internet-Draft IPv6 Probing April 2015
We measured average and total REAP recovery time for different
numbers of address pairs for 10,000 instances of REAP. We define
total REAP recovery time as the recovery time for the whole site,
i.e., the time between failure occurrence and recovering the last
context. In other words, it shows the recovery time for the last
context that is recovered. The average recovery time is calculated
by dividing the sum of recovery times for REAP instances by the
number of REAP instances. It should be noted that recovery time
includes failure detection and address exploration times.
A typical average recovery time for 4 address pairs is 10 to 12
seconds. The results show that the average and maximum recovery time
increase when the number of address pairs is increased. The
correlation is not linear because REAP uses an exponential backoff
algorithm for increasing the time interval between probes. As a
result, REAP shows poor performance when the number of address pairs
exceeds 9, for example exceeding 100 seconds to recover with 16
address pairs.
We also measured the average and total number of probes sent during
the address exploration process in the experiments. The results show
that there is a linear correlation between number of address pairs
and number of sent probes. They also show that a large quantity of
probes is sent at the start of exploration. For example, in the case
of four address pairs, 93% of the probes, and in the case of 25
address pairs 34% of probes, are sent during the first 10 seconds.
The reason is that all contexts detect failure within 10 seconds and
start exploration by sending initial probes (the first four probes,
which are sent in two seconds). After that, there are some intervals
when very few probes are sent. This can be seen more clearly in the
experiments with more address pairs, e.g. 16 or 25 address pairs.
This means that for some SHIM6 contexts the time interval between
probes is large, because of the exponential backoff, so REAP
instances have to wait for a long time before probing the next
address pair. Some connections might be dropped by the transport or
application layer before REAP can recover them. For example, in case
of 25 address pairs, 50% of contexts need more than five minutes to
recover.
Although the peak of the REAP traffic is generated in the first 10
seconds (before employing the exponential backoff algorithm), our
results show that this traffic is small compared to normal traffic
for a large network, and cannot cause a major problem. For example,
in the case of 25 address pairs, about 4800 probes per second are
sent during the first 10 seconds of the exploration process, which is
the peak of the traffic. Every probe in the first 10 seconds carries
at most seven address pairs; four initial address pairs and three
more after employing exponential backoff. Thus, the average probe
Naderi & Carpenter Expires October 24, 2015 [Page 6]
Internet-Draft IPv6 Probing April 2015
size in the first 10 seconds is 232 bytes; each probe needs 72 bytes
for the fixed part and 40 bytes for each address pair. As a result,
a load of 4800 probes per second does not occupy more than one MB/s
of the site's available link capacity. Large sites usually have high
bandwidth links to the Internet and this amount of traffic does not
cause a significant problem for them. In any case this traffic will
occur at a time when normal traffic from the same sessions has been
interrupted.
We also tried two changes to REAP to improve recovery time:
Increasing the number of initial probes, and sending initial probes
in parallel. In both cases, we also measured the probe traffic. The
results showed that those modifications improved recovery time while
their effect on the traffic were not big. For example, in case of
nine address pairs, increasing the number of initial probes from four
to five caused about 6.5% increase in traffic in the first 10 seconds
of the recovery process, 22% decrease in average recovery time and
34% decrease in maximum recovery time. Sending initial probes in
parallel, in the case of nine address pairs, caused an 11% decrease
in average recovery time, 4.5% decrease in maximum recovery time, and
8.2% increase in traffic. In both cases, these modifications
increased traffic but not to the level that could not be handled in a
large network.
3. Results for MPTCP
MPTCP does not use any specific mechanism for probing paths. In
fact, every subflow runs as a TCP flow and it is the TCP congestion
control mechanism which monitors the used path. When congestion is
detected, the load from the congested path is transferred to other
available paths, if they present less congestion. The MPCTP
congestion control algorithm, known as SEMICOUPLED, reacts to
congestion reports from subflows and adjusts the load on the used
paths to achieve performance and fairness. TCP never sets the
congestion window for a subflow to less than 1. Therefore, even on a
highly congested path or a broken path, it performs the equivalent of
probing by setting the congestion window size to 1, so that any
improvements in the path can be detected. Expiration of the TCP
retransmission timer for the subflow on a broken path triggers
sending a segment once in a while, acting as a probe, to ensure a
recovery in the path can be detected. How fast this mechanism can
detect an improvement in a broken path depends on the value of the
time-out for this timer (RTO). The minimum value is usually set to 1
second and consequent expirations, the case for a broken path, back
off the timer value and multiplies RTO by 2. The traffic generated
by this mechanism in this case is low and may be handled easily, even
in a large network.
Naderi & Carpenter Expires October 24, 2015 [Page 7]
Internet-Draft IPv6 Probing April 2015
We simulated MPTCP with up to 8 paths and with RTTs between 80 and
150 ms, observing the expected behaviour, with the load in the steady
state spread across the paths. When the loss rate of a path is
higher, the throughput of that path is lower. For a given loss rate,
a smaller RTT increases throughput on that path. However, total
throughput increases sublinearly with more paths, due to the way
SEMICOUPLED links the congestion windows of the various subflows.
For example, we simulated a scenario in which the steady state
throughput for 8 paths was only about 25% greater than for a single
path (Figure 5.10 in [Naderi14a]). This suggests that a scenario
with as many as 8 paths is of limited value in a reasonably reliable
network.
We simulated a permanent failure of a single path in a scenario with
four paths in operation. As may be deduced from the previous point,
the throughput recovered in the steady state to within a small
percentage of its previous value. This recovery took about 6 seconds
(Figure 5.15 in [Naderi14a]), which is significantly faster than
observed with SHIM6 due to MPTCP's effectively continuous probing.
Simulations of temporary path failures showed that returning to the
original steady state using all paths took a similar time.
Finally we simulated the effect of variable loss rates on MPTCP
performance with two paths operating. We observed that for loss
rates varying randomly in the range up to 1%, MPTCP effectively
maintains its steady state throughput.
4. Operational issues
Many if not most site border firewalls today drop packets containing
the SHIM6 extension header. In our Internet experiments we had to
bypass the site firewall at both ends. This issue is discussed in
[RFC7045].
Source Address Dependent Routing (SADR) is necessary for effective
use of multiple paths. Without it, packets may be sent to the wrong
exit router, or to an ISP that will immediately discard them due to
ingress filtering. With ingress filtering in place, packets with a
given source address may only be sent via an ISP that accepts packets
from that source address. If this is not taken correctly into
account by the source host and by the local routing configuration,
the host will waste resources trying to explore paths that are
certain to fail.
Naderi & Carpenter Expires October 24, 2015 [Page 8]
Internet-Draft IPv6 Probing April 2015
5. Implications for future designs
We suggest several conclusions from the above results that should be
relevant to the design of any probing mechanism for exploiting
alternative paths between two hosts:
o The interaction between round-trip time, the transport layer
acknowledgement mechanism, and the failure detection mechanism is
quite subtle and significantly affects the time taken to start
recovery after a failure.
o When probing is linked to congestion control, packet loss rates
may also affect recovery times.
o Probe traffic is unlikely to cause overload, especially since
normal traffic stops during recovery from failure.
o Exponential backoff leads to significantly slower recovery time,
and (due to the previous point) is probably unnecessary.
o Probing all alternative paths in parallel leads to significantly
faster recovery times with only a minor increase in the intensity
of probe traffic, although this does occur on the paths that are
still carrying normal traffic. However, full sized probe packets
(as used by MPTCP, because they are normal data packets) have more
impact than short probe packets (as used by SHIM6).
o The probe packets should resemble normal data packets as much as
possible, in order to avoid being treated specially or dropped by
middleboxes such as firewalls or load balancers.
o If Source Address Dependent Routing (SADR) is unavailable, it is
better to avoid probing address pairs that will fail as a result.
(Probing all paths in parallel would in fact mask this problem.)
o There is little to be gained by having more than two or three
alternative paths.
6. Security Considerations
Apart from the need for SHIM6 to bypass firewalls, no security issues
were identified during this work.
7. IANA Considerations
This document requests no action by IANA.
Naderi & Carpenter Expires October 24, 2015 [Page 9]
Internet-Draft IPv6 Probing April 2015
8. Acknowledgements
This document was produced using the xml2rfc tool [RFC2629].
Some text was adapted from [Naderi14a].
John Ronan from the Telecommunications Software and Systems Group,
Waterford Institute of Technology, and the University of Auckland
Information Technology Services (ITS) helped to run the SHIM6
experiments over the Internet between Auckland and Dublin.
9. Change log [RFC Editor: Please remove]
draft-naderi-ipv6-probing-01: editorial improvements, 2015-04-22.
draft-naderi-ipv6-probing-00: original version, 2014-10-21.
10. Informative References
[Barre08] Barre, S., "LinShim6 - implementation of the Shim6
protocol", Technical Report, Universite catholique de
Louvain , February 2008.
[Moebius] Deavours, D., Clark, G., Courtney, T., Daly, D., Derisavi,
S., Doyle, J., Sanders, W., and P. Webster, "The Moebius
framework and its implementation", IEEE Transactions on
Software Engineering 28(10):956-969, October 2002.
[Naderi10]
Naderi, H. and B. Carpenter, "A Performance Study on
REAchability Protocol in Large Scale IPv6 Networks",
Second International Conference on Computer and Network
Technology (ICCNT 2010), Bangkok 28-32, April 2010.
[Naderi14a]
Naderi, H., "Evaluating and Improving SHIM6 and MPTCP: Two
Solutions for IPv6 Multihoming", Ph.D. Thesis, The
University of Auckland , July 2014.
[Naderi14b]
Naderi, H. and B. Carpenter, "Putting SHIM6 into
Practice", Australasian Telecommunication Networks and
Applications Conference (ATNAC 2014), Melbourne , November
2014.
[RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", RFC 2460, December 1998.
Naderi & Carpenter Expires October 24, 2015 [Page 10]
Internet-Draft IPv6 Probing April 2015
[RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
June 1999.
[RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering:
Defeating Denial of Service Attacks which employ IP Source
Address Spoofing", BCP 38, RFC 2827, May 2000.
[RFC3704] Baker, F. and P. Savola, "Ingress Filtering for Multihomed
Networks", BCP 84, RFC 3704, March 2004.
[RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram
Congestion Control Protocol (DCCP)", RFC 4340, March 2006.
[RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC
4960, September 2007.
[RFC5533] Nordmark, E. and M. Bagnulo, "Shim6: Level 3 Multihoming
Shim Protocol for IPv6", RFC 5533, June 2009.
[RFC5534] Arkko, J. and I. van Beijnum, "Failure Detection and
Locator Pair Exploration Protocol for IPv6 Multihoming",
RFC 5534, June 2009.
[RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with
Dual-Stack Hosts", RFC 6555, April 2012.
[RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
"TCP Extensions for Multipath Operation with Multiple
Addresses", RFC 6824, January 2013.
[RFC7045] Carpenter, B. and S. Jiang, "Transmission and Processing
of IPv6 Extension Headers", RFC 7045, December 2013.
[Wischik10]
Wischik, D., Raiciu, C., and M. Handley, "Balancing
resource pooling and equipoise in multipath transport",
8th USENIX Symposium on Networked Systems Design and
Implementation, San Jose , April 2010.
Authors' Addresses
Naderi & Carpenter Expires October 24, 2015 [Page 11]
Internet-Draft IPv6 Probing April 2015
Habib Naderi
Department of Computer Science
University of Auckland
PB 92019
Auckland 1142
New Zealand
Email: habib@cs.auckland.ac.nz
Brian Carpenter (editor)
Department of Computer Science
University of Auckland
PB 92019
Auckland 1142
New Zealand
Email: brian.e.carpenter@gmail.com
Naderi & Carpenter Expires October 24, 2015 [Page 12]