P2PSIP Working Group J. Maenpaa
Internet-Draft Ericsson
Intended status: Standards Track A. Swaminathan
Expires: January 7, 2010 S. Das
Qualcomm, Inc.
G. Camarillo
J. Hautakorpi
Ericsson
July 6, 2009
A Topology Plug-in for REsource LOcation And Discovery
draft-maenpaa-p2psip-topologyplugin-00
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 7, 2010.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
Maenpaa, et al. Expires January 7, 2010 [Page 1]
Internet-Draft Topology Plug-in for RELOAD July 2009
Abstract
REsource LOcation And Discovery (RELOAD) is a peer-to-peer signaling
protocol that can be used to maintain an overlay network, and to
store data in and retrieve data from the overlay. This document
defines a new topology plug-in for RELOAD that is more appropriate
for real world large scale overlays. This topology plug-in
implements three important functionalities that allow RELOAD to
operate under real world constraints. First, it includes a load
balancing algorithm that specifies efficient allocation of load to
different nodes in the network. Second, the document describes
robust techniques for stabilization of fingers and successors and
specifies self tuning mechanisms that allow dynamic and automatic
adjustment of parameters needed for these advanced techniques in the
topology plug-in. Finally, it specifies a locality aware finger
selection algorithm that reduces average lookup latency.
Maenpaa, et al. Expires January 7, 2010 [Page 2]
Internet-Draft Topology Plug-in for RELOAD July 2009
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6
3. Need for Advanced Topology Plug-in . . . . . . . . . . . . . . 8
3.1. Need for Load Balancing . . . . . . . . . . . . . . . . . 8
3.2. Need for Robust Stabilization . . . . . . . . . . . . . . 9
3.3. Need for Locality Awareness . . . . . . . . . . . . . . . 9
3.4. Need for Self-Tuning of System Parameters . . . . . . . . 10
4. Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5. Load Balancing in the Proposed Topology Plug-in for RELOAD . . 11
5.1. Basic Load Balancing scheme . . . . . . . . . . . . . . . 11
5.2. Recommendations on Virtual Servers . . . . . . . . . . . . 12
5.3. Routing . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.4. Obtaining Virtual Server Identities . . . . . . . . . . . 14
5.4.1. Without Enrollment Server . . . . . . . . . . . . . . 14
5.4.2. With Enrollment Server . . . . . . . . . . . . . . . . 15
5.5. Extensions to Overlays with Heterogeneous Nodes . . . . . 16
6. Stabilizing Fingers, Successors, and Predecessors in the
Topology Plug-in . . . . . . . . . . . . . . . . . . . . . . . 16
6.1. Choice of Approach to Stabilization . . . . . . . . . . . 16
6.2. Update Messages for Stabilization . . . . . . . . . . . . 18
6.3. Finger Stabilization . . . . . . . . . . . . . . . . . . . 21
6.3.1. Locality-aware Finger Selection . . . . . . . . . . . 22
6.4. Successor Stabilization . . . . . . . . . . . . . . . . . 23
6.5. Predecessor Stabilization . . . . . . . . . . . . . . . . 24
6.6. Joining the Overlay . . . . . . . . . . . . . . . . . . . 24
6.6.1. Contents of the Join Message . . . . . . . . . . . . . 25
6.7. Leaving the Overlay . . . . . . . . . . . . . . . . . . . 26
6.7.1. Contents of the Leave Message . . . . . . . . . . . . 26
6.8. Self Tuning System Parameters . . . . . . . . . . . . . . 27
6.8.1. Self Tuning Load Balancing Algorithm Parameters . . . 28
6.8.2. Self Tuning the Stabilization Interval . . . . . . . . 32
7. Security Considerations . . . . . . . . . . . . . . . . . . . 37
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 37
10. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
10.1. Comparison of the Load Balancing Algorithm with Chord . . 38
10.2. Performance of the Load Balancing Algorithm as Network
Grows . . . . . . . . . . . . . . . . . . . . . . . . . . 38
11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40
11.1. Normative References . . . . . . . . . . . . . . . . . . . 40
11.2. Informative References . . . . . . . . . . . . . . . . . . 40
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42
Maenpaa, et al. Expires January 7, 2010 [Page 3]
Internet-Draft Topology Plug-in for RELOAD July 2009
1. Introduction
REsource LOcation And Discovery (RELOAD) [1] is a peer-to-peer
signaling protocol that can be used to maintain an overlay network,
and to store data in and retrieve data from the overlay. For
interoperability reasons, RELOAD specifies one overlay algorithm that
is mandatory to implement. Additionally, RELOAD supports a variety
of other overlay algorithms through the use of topology plug-ins. A
topology plug-in implements the topology defined by a specific
overlay algorithm.
This document defines a new topology plug-in for RELOAD that is more
appropriate for real world large scale overlays that have to deal
with object storage in a fair manner, provide good lookup performance
to a variety of applications, and self-organize to deal with churn.
This topology plug-in implements three important functionalities that
allow RELOAD to operate under these real world constraints. First,
it includes a load balancing algorithm that specifies efficient
allocation of load to different nodes in the network. Second, the
document describes robust techniques for stabilization of fingers and
successors and specifies self tuning mechanisms that allow dynamic
and automatic adjustment of parameters needed for these advanced
techniques in the topology plug-in; this avoids the need for
constants that may only work in specific scenarios. Finally, it
specifies a locality aware finger selection algorithm that reduces
average lookup latency.
Load balancing is essential to effectively manage data and provide
services on overlays. Load balancing, as an integral part of the
overlay, encourages participation by imposing collective fate sharing
on the nodes. Without such a scheme, overlay adoption may be
significantly affected. However, the mandatory-to-implement RELOAD
DHT protocol based on Chord does not support operating the overlay in
a load balanced manner. For instance, for a system with N nodes, it
can been shown that the imbalance factor, defined as the maximum load
on any node on the network divided by the average load, is of the
order of O(log2(N)) in the number of items even if all objects are
assumed to be homogeneous. In the case of heterogeneous networks,
where the capabilities of nodes (storage and bandwidth) can differ by
multiple orders of magnitude, the problem of load balancing becomes
more important because the imbalance in load distribution could
potentially create a bottleneck in the system. Thus to enable
scalable, real world deployments of RELOAD, the topology plug-in in
this document specifies a scheme for load balancing.
Peer-to-peer overlay networks face a fundamental challenge with
churn: joining and leaving of nodes. This can affect the integrity
of the routing structure, cause a node to become disconnected from
Maenpaa, et al. Expires January 7, 2010 [Page 4]
Internet-Draft Topology Plug-in for RELOAD July 2009
the overlay, and cause overhead in maintaining consistency. Several
research studies have been performed on the type of stabilization
that can be used in DHT based peer-to-peer overlays such as RELOAD.
This document specifies a method for stabilization to deal with churn
to allow RELOAD to be scalable and reliable in real world conditions.
We specifically prescribe the use of a periodic stabilization routine
to counter the undesirable effects of churn on routing.
To enable load balancing and stabilization to deal with churn some
parameters need to be set (described later). The use of specific
constants for such parameters leads to deployment that may only work
for specific scenarios (where the parameters are evaluated). Thus,
this document specifies self tuning mechanisms for system parameters
to allow RELOAD to scale naturally as network dynamics dictate. For
example, as churn increases, the topology plug-in specified in this
document adapts by increasing the frequency of stabilization.
Similarly, as network size increases, load balancing parameters and
DHT parameters are modified to ensure quick finger and successor
updates and to keep the load balancing property over time. To enable
self tuning of system parameters, some characteristics such as churn
rate and network size are estimated. These characteristics are then
used to dynamically adjust the topology plug-in parameters such as
the size of the successor set, size of the routing table, and rate of
maintenance messages. The benefit of this approach is that it avoids
the problem with static parameters. Using static parameters, it is
not possible to achieve a low failure rate and a low communication
overhead. The topology plug-in specified in this document allows the
system to take into account the evolution of network conditions and
adapt to them.
Even with up-to-date fingers and successors, making progress in the
identifier space can be expensive in terms of network latency because
small progress in identifier space could result in a significant leap
in physical distance. The successor and predecessor lists can be
used to optimize network latency by relaxing the requirement for
finger selection. Specifically, at each overlay hop, as progress is
made in the identifier space, small physical distance hops are used
so as to avoid high latencies in overall lookup. More details on the
proposed locality aware finger selection are described later in this
document.
In summary, the topology plug-in proposed in this document has the
following advantages:
o First, building on the RELOAD framework, this document introduces
a load balancing algorithm that provides an imbalance factor of
the order of O(1). The solution proposed in the document can also
be extended to the case of heterogeneous nodes in such a way that
Maenpaa, et al. Expires January 7, 2010 [Page 5]
Internet-Draft Topology Plug-in for RELOAD July 2009
the number of objects assigned to a node is proportional to the
amount of load it can handle;
o Second, this topology plug-in specifies stabilization techniques
that deal effectively with churn;
o Third, this document proposes self-tuning of parameters required
in these algorithms such that users no longer need to tune every
DHT parameter correctly for a given operating environment. By
periodically computing the network parameters, the system
automatically adapts to changing operating conditions; and
o Finally, the locality aware finger selection algorithm
incorporated with the DHT algorithm further optimizes the
selection of successors and predecessors to reduce network
latency.
This document is organized as follows. We first describe the
algorithms for load balancing, stabilization, and locality aware
finger selection. Then we describe the major components required to
operate the topology plug-in: joining, leaving, routing, dealing with
failures etc. Finally, we describe self tuning of the system
parameters to make the topology plug-in adjust itself to changing
network conditions.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [2].
This document uses the terminology and definitions from the Concepts
and Terminology for Peer-to-Peer SIP [1] draft.
Chord ring: The Chord DHT orders identifiers on an identifier circle
of size 2^m (m is the number of bits in peer identifiers). This
identifier circle is called the Chord ring.
DHT: Distributed Hash Tables (DHTs) are a class of decentralized
distributed systems that provide a lookup service similar to a
hash table. Given a key, any participating peer can retrieve the
value associated with that key. The responsibility for
maintaining the mapping from keys to values is distributed among
the peers.
Maenpaa, et al. Expires January 7, 2010 [Page 6]
Internet-Draft Topology Plug-in for RELOAD July 2009
Finger table: A data structure with up to m entries maintained by
each peer in a Chord-based overlay. The ith entry in the finger
table of peer n contains the identity of the first primary virtual
server that succeeds n by at least 2^(m-i) on the Chord ring.
This peer is called the ith finger of peer n. As an example, the
first entry in the finger table of peer n contains a peer half-way
around the Chord ring from peer n. The purpose of the finger
table is to accelerate lookups.
log2(N): Logarithm of N with base 2.
N: The number of nodes in the overlay unless otherwise specified.
Neighborhood set: Consists of successor and predecessor lists.
O(g(n)): Informally, saying that some equation f(n) = O(g(n)) means
that f(n) is less than some constant multiple of g(n).
Omega(g(n)): Informally, saying that some equation f(n) =
Omega(g(n)) means that f(n) is more than some constant multiple of
g(n).
Predecessor list: A data structure containing the predecessors of a
peer.
Successor list: A data structure containing the successors of a
peer.
Virtual server: Each physical peer instantiates one or more virtual
servers with random IDs that act as peers in the DHT.
Primary virtual server: The primary ID assigned to the peer when it
joins the network. This location is chosen uniformly at random
among the available IDs in [0, 2^m-1].
Secondary virtual server: List of all identities owned by the
physical peer excluding the primary virtual server. The location
of the secondary virtual servers are selected so that the ith
virtual server is distributed between [vp-i*delta*2^m, vp-
(i+1)*delta*2^m] where vp is the primary virtual server. These
locations may change to adapt to network dynamics.
alpha: number of virtual servers per physical ID. This includes one
primary virtual server and (alpha-1) secondary virtual servers.
Maenpaa, et al. Expires January 7, 2010 [Page 7]
Internet-Draft Topology Plug-in for RELOAD July 2009
delta: The value of delta defines the spacing between the virtual
servers. The location of the secondary virtual servers are
selected so that the ith virtual server is distributed between
[vp-i*delta*2^m, vp-(i+1)*delta*2^m] where vp is the location of
the primary virtual server.
Routing table: The set of peers that a node can use to route overlay
messages. The routing table consists of the finger table,
successor list and predecessor list, all populated with primary
virtual server IDs.
3. Need for Advanced Topology Plug-in
This section details the need for specifying load balancing
mechanisms, robust stabilization and locality awareness for a RELOAD
topology plug-in to make it efficient and scalable.
3.1. Need for Load Balancing
Most DHTs, such as Chord, Pastry, CAN, Tapestry, and the RELOAD base
topology plug-in employ uniform hashing to generate object IDs so
that the object IDs are uniformly distributed over the ID space
(example 0 to D = 2^128 - 1). Objects are then assigned to the nodes
based on their IDs and the exact way that this is done differs in
different DHTs. Let us assume that there are N nodes in the network.
Suppose the node IDs are generated uniformly at random, then the
probability that the ith largest nodeID is equal to k is given by:
P(nodeID(i) == k) = (N-1)C(i-1) * (k/D)^(i-1) * (N-i)C(1) * (1/D) *
(1 - (k+1)/D)^(N-i+1)
For large N and D, this distribution can be approximated as follows:
P(nodeID(i) == k) = e^(-k*N/D) * (k*N/D)^1 * (1/i!)
Therefore, ith largest node IDs follows a Poisson distribution. The
inter-node distance then follows an exponential distribution with
parameter N/D. As consequence of this property, the number of data
items assigned to a node is proportional to the inter-node distance
and, thus, follows an exponential distribution (c denotes an
arbitrary constant)
Pr(#items in node == x) = c * (N/D) * e^(c*(N/D)*x)
The goodness of Chord for load balancing can be measured in terms of
the imbalance metric defined as:
Maenpaa, et al. Expires January 7, 2010 [Page 8]
Internet-Draft Topology Plug-in for RELOAD July 2009
Maximum # of data items
imbalance factor = -------------------------------
Average # of data items
Here, the maximum value is calculated as the one which is largest
with probability: O(1 - (1/N^l)) where l is any constant. The
imbalance factor measures the storage load of a server; the longer
the interval is, the more data has to be stored in the server.
For the case of Chord, this imbalance factor is of the order of
O(log2(N)) and there are nodes in the overlay that manage log2(N)
times the average node's load. Further, this result suggests that
the imbalance factor for Chord increases as the number of nodes in
the network and as the number of nodes increase the maximum number of
data items handled per node increases of the order of
O(T*(log2(N)/N)) where T denotes the total number of objects on the
overlay. Ideally, we would want the maximum value to be close to the
average value of (T/N) giving an imbalance factor close to 1 for a
good load balancing algorithm. Therefore, RELOAD base DHT is not
very efficient for load balancing and there is a need for load
balancing mechanisms in a topology plug-in that is widely usable.
3.2. Need for Robust Stabilization
To ensure correct lookups in the presence of churn and to ensure
optimal routing of queries, a node needs to ensure that its fingers,
successor list, and predecessor list are up to date. However,
stabilization of these features comes at a cost: peers in the overlay
network need to consume network bandwidth to maintain routing state.
DHTs use stabilization routines to counter the undesirable effects of
churn on routing. The purpose of stabilization is to keep the
routing information of each peer in the overlay consistent with the
constantly changing overlay topology.
The current RELOAD base topology plug-in proposes reactive
stabilization. Research studies have shown that this approach may
not be the most robust to a wide variety of network conditions. In
this document, we revisit the choice of stabilization and recommend
techniques likely to work efficiently across deployment types.
3.3. Need for Locality Awareness
A major performance issue with structured peer-to-peer based topology
plug-ins is the need for multi-overlay-hop routing to lookup a piece
of data such as the SIP Address of Record (AoR )to IP mapping. Since
the identifier space is completely random, a lookup for a data item
stored 10ms RTT away can potentially take multiple intercontinental
Maenpaa, et al. Expires January 7, 2010 [Page 9]
Internet-Draft Topology Plug-in for RELOAD July 2009
hops before getting answered significantly affecting lookup latency.
Additionally, the fact that routers in such peer-to-peer networks are
expected to be constrained non-infrastructure nodes adds on
additional delays. Locality aware routing aims to mitigate the
routing stretch of the topology plug-in so that a lookup is answered
by making progress in the identifier space while minimizing the
amount of distance traveled in physical space at each hop.
3.4. Need for Self-Tuning of System Parameters
Two main advantages of a self-tuning DHT are that users no longer
need to tune every DHT parameter correctly for a given operating
environment and that the system adapts to changing operating
conditions.
4. Chord
This document proposes a new topology plugin for RELOAD that is based
on the Chord DHT algorithm. Topology plugins allow RELOAD to support
a variety of overlay algorithms. The proposed topology plugin uses a
load balanced self-tuning version of Chord. It can be used as an
alternative to the default DHT specified by RELOAD.
Chord [4] is a structured peer-to-peer algorithm that uses consistent
hashing to build a DHT out of several independent peers. Consistent
hashing assigns each peer and resource an m-bit identifier. Peers
MUST use SHA-1 as the base hash fuction to generate the identifiers.
The length of the identifiers MUST be m=128 bits. The identifiers
are ordered on an identifier circle of size 2^m. On the identifier
circle, key k MUST be assigned to the first peer whose identifier
equals or follows the identifier of k in the identifier space. The
identifier circle is called the Chord ring.
Different DHTs differ significantly in performance when bandwidth is
limited. It has been shown that when compared to other DHTs, the
advantages of Chord include that it uses bandwidth efficiently and
can achieve low lookup latencies at little cost [5].
A simple lookup mechanism could be implemented on a Chord ring by
requiring each peer to only know how to contact its current successor
on the identifier circle. Queries for a given identifier could then
be passed around the circle via the successor pointers until they
encounter the first peer whose identifier is equal to or larger than
the desired identifier. Such a lookup scheme uses a number of
messages that grows linearly with the number of peers. To reduce the
cost of lookups, Chord also maintains additional routing information;
each peer n MUST maintain a data structure with up to m entries,
Maenpaa, et al. Expires January 7, 2010 [Page 10]
Internet-Draft Topology Plug-in for RELOAD July 2009
called the finger table. The first entry in the finger table of peer
n contains the peer half-way around the ring from peer n. The second
entry contains the peer that is 1/4th of the way around, the third
entry the peer that is 1/8th of the way around, etc. In other words,
the ith entry in the finger table at peer n SHOULD contain the
identity of the first peer s that succeeds n by at least 2^(m-i) on
the Chord ring. This peer is called the ith finger of peer n. The
interval between two consecutive fingers is called a finger interval.
The ith finger interval of peer n covers the range [n.id + 2^(m-i),
n.id + 2^(m-i+1)) on the Chord ring. In an N-peer network, each peer
maintains information about O(log2(N)) other peers in its finger
table. As an example, if N=1000, it is sufficient to maintain 10
fingers.
To increase robustness in the event of peer failures, each Chord peer
MUST maintain a successor list containing the peer's immediate
successors on the Chord ring. The successor list will be further
described in Section 5.3. Peers MUST also maintain a predecessor
list containing the peer's immediate predecessors on the Chord ring.
The recommeded value for the size of the predecessor list is 3, as is
also specified in [1].
5. Load Balancing in the Proposed Topology Plug-in for RELOAD
5.1. Basic Load Balancing scheme
The basic Chord DHT suffers from drawbacks. The uniform hashing used
in Chord to generate object IDs result in an imbalance factor close
to O(log2(N)); this implies that there are peers in the Chord ring
that would have to manage O(log2(N)) times the load that is to be
handled by an average peer. In this section, we build upon the basic
Chord DHT and present the virtual servers algorithm that results in
an imbalance factor close to 1.
Several research papers have proposed schemes to provide load
balanced DHTs. Some of the schemes have general techniques while
others are targeted towards optimizing a specific DHT. This solution
proposed in this document takes into account results and ideas from
previous ideas for load balancing in [6], [7], [8], [4], [9], [10],
[11], [12], [13], [14], [15], [16].
The basic scheme for load balancing proposed in this document is an
extension of the virtual servers approach [4] over Chord with the
benefit of reducing the routing state. At a high level, virtual
servers approach works in improving load balancing as follows. The
work in [4] suggested that load balancing can be performed
efficiently if each peer simulates a logarithmic number of "virtual
Maenpaa, et al. Expires January 7, 2010 [Page 11]
Internet-Draft Topology Plug-in for RELOAD July 2009
servers". The method works by associating keys with virtual servers,
and mapping multiple virtual servers (with unrelated identifiers) to
each real node. The authors show that this will provide a more
uniform coverage of the identifier space. For example, if we
allocate O(log2(N)) randomly chosen virtual servers to each real
node, with high probability each of the N bins will contain
O(log2(N)) nodes. The downside of this approach is the additional
routing state and its maintenance cost.
This document proposes load balancing based on a proposal in [14],
and is an improvement over the basic virtual servers approach
introduced in [4]. In our approach, each physical node instantiates
one or more virtual servers that act as peers in the DHT. The
locations of these virtual servers are chosen close to each other in
the nodeID space allowing the node to share a single set of overlay
links among the virtual servers.
The main system parameters of the solution in this document are: how
many virtual servers should be chosen exactly and what should the
spacing between the node identifiers of those virtual servers. The
next sections describe this document's recommendations on virtual
nodes and routing for a load balanced DHT.
Since the choices of the system parameters depend on network
dynamics, this document further discusses dynamically adjusting the
topology plug-in with network dynamics and recommends a periodic
stabilization model to keep the system parameters up-to-date.
5.2. Recommendations on Virtual Servers
Selection of virtual servers require two decisions to be made: (1)
how many virtual servers (denoted by alpha) should we choose and (2)
what should be the namespace spacing (represented by delta*2^m) in
between these virtual servers. This section provides the present
document's recommendation on choosing alpha and delta.
Each physical peer maintains two sets of virtual server identities,
namely, primary virtual server identities and secondary virtual
server identities. The location of the primary virtual server
(denoted as vp) is chosen using existing techniques in RELOAD. These
locations remain fixed throughout the entire lifetime of the node.
In addition to the primary virtual server, each node maintains
secondary virtual servers. The locations of these secondary virtual
servers are chosen such that the ith virtual server, vs_i, is
uniformly distributed in [vp-i*delta*2^m, vp-(i+1)*delta*2^m].
The value of delta * 2^m defines the spacing between the virtual
Maenpaa, et al. Expires January 7, 2010 [Page 12]
Internet-Draft Topology Plug-in for RELOAD July 2009
nodes and this document requires that nodes MUST choose the value of
delta as 1/N where N is the number of nodes in the network. This
value was found to perform well in simulations and is consistent with
the value reported in [14].
Each node MUST maintain a total of alpha = 2*log2(N) virtual server
identities for load balancing where N is the number of nodes in the
overlay. This value of alpha includes one primary virtual server and
(alpha-1) secondary virtual server identities. The choice of
2*log2(N) for alpha has been found to work well in simulations [14].
While both these parameters, alpha and delta, depend on N (the size
of the network), Section 6.8.1 describes an algorithm used to
determine alpha and delta without performing explicit network size
estimation. Note that explicit network size estimation only needs to
be performed when self-tuning the DHT parameters, since in that case
a more accurate estimate of the network size is needed.
5.3. Routing
Assume an identifier space [0, 2^m-1]. As in the basic Chord
approach, the recommended topology plug-in maintains two types of
routing information such as finger tables and successor tables. The
finger tables are constructed based on the location of the primary
virtual server, i.e., the ith finger at peer v contains the identity
of the first peer s that succeeds its primary virtual server, vp, by
at least 2^(m-i) on the Chord ring.
The neighbor table of peer v contains the entries of all nodes which
have ownership of any part of the log2(N)/N space that node v owns.
As in the case of Chord, each virtual server belonging to the node
obtains its successor list of its immediate successor. If this total
length is greater than log2(N), the clockwise-farthest entry is
dropped to restrict the number of neighbor table entries to log2(N).
Further, it can be shown that the maximum number of node IDs that
fall within this space is O(log2(N)) resulting in those many neighbor
table entries. The added benefit of having log2(N) successors is
that if each peer fails independently with probability p, the
probability that all log2(N) successors fail simultaneously is only
p^log2(N); therefore, the system is much more resilient to failures.
To obtain the entries in the peer's neighbor table, the peer starts
with the alpha-th secondary virtual ID, denoted as vs_alpha. The
peer then proceeds clockwise from vs_alpha on the Chord ring and
identifies the virtual IDs of peers that fall in this region. When
it encounters a virtual server vs', it stores the location of both
the virtual server, vs', and the location of the corresponding
primary virtual server, vp'. On the other hand, when it sees its own
Maenpaa, et al. Expires January 7, 2010 [Page 13]
Internet-Draft Topology Plug-in for RELOAD July 2009
virtual server, it is dropped and no information is stored. This
process is followed until log2(N) distinct primary virtual servers
are obtained to populate the neighbor table. It is to be noted that
while the size of the successor list may be greater than log2(N)
because it includes both the primary and the secondary virtual
servers, the number of successor connections is limited to log2(N).
To locate any object in the overlay, the nodes MUST first employ the
finger tables to reach the virtual server ID that is closest to the
query object ID. At the end of this step, the chosen physical node
may or may not have the object. If the physical node does not have
the object, then it forwards the object request to its neighbors
using the neighbor table entries. Therefore the total number of
message hops required for locating an object is of the order of
O(log2(N)) + 1.
Thus, this solution proposed in this document, based on [14],
requires maintaining a lower number of connections than that of a
pure virtual servers based approach for load balancing. This is
because, while the basic Chord scheme with O(log2(N)) virtual servers
per physical node requires maintaining O(log2(N)) sets of fingers
(and O(log2(N)) routing tables) for each physical node for efficient
routing, this document requires maintaining just one set of fingers
(and routing table) per physical node.
In Section 6, we discuss approaches to keep the fingers and
successors up-to-date that is required to ensure correct look-ups.
5.4. Obtaining Virtual Server Identities
This section describes the procedure for obtaining the primary and
virtual server identities.
5.4.1. Without Enrollment Server
When a new node, v, joins the system, the first primary node identity
vp MUST be computed by applying the digest specified in the self-
signed-permitted element of the overlay configuration document to the
DER representation of the node's public key. The subsequent (alpha -
1) virtual node IDs MUST be computed as follows: the ith virtual ID
is chosen as random(vp - i*delta*2^m, vp - (i+1)*delta*2^m)
This ensures that the chosen virtual servers are close to each other
and a maximum of log2(N)/N apart.
A node's public key relates to the vp and can be verified by other
nodes. However, all other virtual server IDs cannot be related to
this public key so they need to be verified based on vp. There are
Maenpaa, et al. Expires January 7, 2010 [Page 14]
Internet-Draft Topology Plug-in for RELOAD July 2009
two options to do this verification:
o a. Use a constant fixed offset for the ith virtual server as (vp
- i*delta*2^m) exactly. This removes the random component of the
virtual server ID selection. Since all virtual server IDs are at
fixed intervals from vp, a node can weakly verify the node id.
However node ID collisions may make this hard.
o b. Use the current technique described in Section 5.2 with random
selection in an interval. In this case a node can verify that the
virtual server ID is in a small range near vp.
OPEN ISSUE: At the moment we recommend option (b), but this issue
needs further analysis.
5.4.2. With Enrollment Server
When an enrollment server is present, the added security benefits of
the enrollment server certified node identities are for virtual
server identities. The enrollment server is provided with the values
of the system parameters alpha and delta, and based on them, the
enrollment server gives out a set of virtual server identities to a
node. Similar to what a node itself does in a deployment without an
enrollment server; the enrollment server MUST use a uniform hash
function to randomly choose the primary virtual server ID from the
space (0, 2^128 - 1); call this virtual server ID vp. The remaining
(alpha - 1) node IDs MUST be then chosen by the enrollment server
around vp in such a way that the ith virtual serverID of v is chosen
to be uniformly distributed in (vp - i/N, vp - (i+1)/N). These node
IDs are then passed on to the node.
When a node initially joins, it does not have an estimate of alpha
and delta since that is based on the number of fingers in the routing
table. Thus, the enrollment server MUST use existing values of alpha
and delta in requests it receives from nodes in the current overlay
or based on its own diagnostic messages sent into the overlay to
determine the number of virtual servers and the spacing in between
the virtual servers and consequently the node IDs handed out to the
joining node. If the overlay has recently formed the enrollment
server MAY bootstrap the values of alpha and delta as 20 and 1/1000
respectively. The enrollment server may also choose to use past
overlay size estimates it may possess to bootstrap alpha and delta as
2log2(Nestimated) and 1/Nestimated. For example, an enrollment
server at a venue based overlay which is torn down can use past size
estimates.
Note that the enrollment procedure in RELOAD already defines
providing multiple node identities to the enrolling node. The change
Maenpaa, et al. Expires January 7, 2010 [Page 15]
Internet-Draft Topology Plug-in for RELOAD July 2009
needed is to include the values of alpha and delta in the simple
enrollment request.
5.5. Extensions to Overlays with Heterogeneous Nodes
This load balancing solution can be extended for overlays with nodes
with heterogeneous node capacities. Let the capacity of the node-v
in the overlay be represented by Cv.
If an overlay has nodes with heterogeneous nodes, nodes in the
overlay MUST use alpha = 2*Cv*log2(N) virtual servers per physical
node. The location of these (2 Cv log2(N)) virtual nodes MUST be
chosen near the primary node location vp such that the location of
the ith virtual serverID is uniformly distributed in (vp - i* delta *
2^m, vp - (i+1) * delta * 2^m). Each physical node then maintains a
total of O(Cv log2(N)) neighbor entries and O(Cv log2(N)) routing
fingers. The imbalance factor for this scenario, defined as,
Maximum # of data items
imbalance factor = ---------------------------------
Cv * Average # of data items
can be shown to be close to 1.
Note that in addition to alpha and delta, the node capacity of the
node MUST be passed to the enrollment server in the simple enrollment
request. This capacity value is used by the enrollment server to
calculate Cv as
node_capacity(node n)
Cv(node n) = ----------------------------------------
sum_{k=1}^N node_capacity(node k)
In deployments with no enrollment servers, the node MUST estimate
Cv(node n) by obtaining its neighbors' node capacities and building
an estimate of average node capacity.
6. Stabilizing Fingers, Successors, and Predecessors in the Topology
Plug-in
6.1. Choice of Approach to Stabilization
There are two alternative approaches to stabilization: periodic and
reactive. Periodic stabilization can either use a fixed
stabilization rate or calculate the stabilization rate in an adaptive
Maenpaa, et al. Expires January 7, 2010 [Page 16]
Internet-Draft Topology Plug-in for RELOAD July 2009
fashion.
In reactive stabilization, a peer reacts to the loss of a peer in its
neighborhood set or to the appearance of a new peer that should be
added to its neighborhood set by sending a copy of its neighbor table
to all peers in the neighborhood set. Periodic recovery, in
contrast, takes place independently of changes in the neighborhood
set. In periodic recovery, a peer periodically shares its
neighborhood set with each of the members of that set.
The mandatory-to-implement Chord DHT algorithm in RELOAD [1] uses
reactive stabilization for the neighborhood set, unlike the original
Chord algorithm, which uses periodic stabilization. It has been
shown in [17] that reactive stabilization works well for small
neighborhood sets (i.e., small overlays) and moderate churn.
However, in large-scale (e.g., 1000 peers or more [17]) or high-churn
overlays, reactive stabilization runs the risk of creating a positive
feedback cycle, which can eventually result in congestion collapse.
In [17], it is shown that a 1000-peer overlay under churn uses
significantly less bandwidth and has lower latencies when periodic
stabilization is used than when reactive stabilization is used.
Although in the experiments carried out in [17], reactive recovery
performed well when there was no churn, its bandwidth use was
observed to jump dramatically under churn. At higher churn rates and
larger scale overlays, periodic stabilization uses less bandwidth and
the resulting lower contention for the network leads to lower
latencies. For this reason, most DHTs such as CAN, Chord, Pastry,
Bamboo, etc. use periodic stabilization. As an example, the first
version of Bamboo used reactive recovery, which caused Bamboo to
suffer from degradation in performance under churn. To fix this
problem, Bamboo was modified to use periodic stabilization.
In Chord, periodic stabilization is typically done both for
successors and fingers. An alternative strategy is analyzed in [18].
In this strategy, called the correction-on- change maintenance
strategy, a peer periodically stabilizes its successors but does not
do so for its fingers. Instead, finger pointers are stabilized in a
reactive fashion. The results obtained in [18] imply that although
the correction-on-change strategy works well when churn is low,
periodic stabilization outperforms the correction-on-change strategy
when churn is high.
In this document, we propose to use periodic stabilization for
fingers, successors, and predecessors based on these insights. Each
peer MUST maintain a stabilization timer. When the stabilization
timer fires, the peer MUST restart the timer and carry out the
stabilization operations. The stabilization routine is described
next and more details on computing the stabilization interval are
Maenpaa, et al. Expires January 7, 2010 [Page 17]
Internet-Draft Topology Plug-in for RELOAD July 2009
elaborated in Section 6.8.2.
6.2. Update Messages for Stabilization
The stabilization procedures are implemented using Update requests
and answers. To describe the contents of these messages, the syntax
defined in [1] is used. A Chord Update request is defined as:
enum { reserved (0), notify(1), succ_stab(2), pred_stab(3),
full(4), virtualserver_stab_join(5),
virtualserver_stab_leave(6), (255) }
ChordUpdateType;
struct {
ChordUpdateType type;
NodeId sender_id;
select(type) {
case notify:
uint32 uptime;
NodeId sender_virtual_ids <0..2^16-1>;
case pred_stab: /* Empty */
;
case succ_stab: /* Empty */
;
case virtualserver_stab_join:
NodeId sender_virtual_ids <0..2^16-1>;
case virtualserver_stab_leave:
NodeId sender_virtual_ids <0..2^16-1>;
case full:
uint32 uptime;
uint32 alpha;
uint32 delta;
NodeId sender_virtual_ids <0..2^16-1>;
NodeId predecessors <0..2^16-1>;
NodeId successors <0..2^16-1>;
NodeId fingers <0..2^16-1>;
};
} UpdateReq;
The "type" field MUST indicate the reason why the Update was sent:
notify: the sender of the Update wishes to notify the recipient of
the sender's existence. Upon receiving the Update, the recipient
SHOULD insert the sender into its routing table, if appropriate.
The 'notify' message MUST include the virtual server IDs of the
sender in the 'sender_virtual_ids' field.
Maenpaa, et al. Expires January 7, 2010 [Page 18]
Internet-Draft Topology Plug-in for RELOAD July 2009
succ_stab: the Update request is related to the successor
stabilization routine.
pred_stab: the Update request is related to the predecessor
stabilization routine.
virtualserver_stab_join and virtualserver_stab_leave: the Update
request is related to the DHT parameter stabilization routine.
full: the Update request contains the entire routing and neighbor
table of the sender.
The sender_id field contains the sender's primary virtual ID.
The sender_virtual_ids contains the list of all secondary virtual
server IDs belonging to the sender.
If the type of the Update request is 'pred_stab' or 'succ_stab', the
request MUST NOT carry any additional information.
If the type of the Update request is 'notify', the request MUST
contain the sender's current uptime in seconds and the location of
the sender's current virtual server IDs.
If the type of the request is 'virtualserver_stab_join' or
'virtualserver_stab_leave', the contents of the message MUST include
the list of primary and the secondary virtual server IDs of the
sender.
If the type of the request is 'full', the contents of the message
MUST be:
o uptime: The sender's current uptime in seconds;
o alpha: The sender's current DHT parameter value - alpha;
o delta: The sender's current DHT parameter value - delta;
o sender_virtual_ids: The sender's list of current virtual server
IDs;
o predecessors: The sender's predecessor list;
o successors: The sender's successor list;
o fingers: The sender's finger table.
In the introduced topology plug-in, each peer decides independently
Maenpaa, et al. Expires January 7, 2010 [Page 19]
Internet-Draft Topology Plug-in for RELOAD July 2009
the appropriate size for the successor list, predecessor list, finger
table, and system parameters (alpha and delta). Thus, the
'predecessors', 'successors', and 'fingers' fields are of variable
length. The number of virtual IDs assigned to a node along with the
spacing between the virtual IDs are chosen based on the system
parameters and may change as the network changes. In keeping with
this change, the length of the 'sender_virtual_ids' field MUST also
be of variable length and would include the list of its virtual
server IDs assigned to the sender. As specified in RELOAD [1],
variable length fields are on the wire preceded by length bytes. In
the case of the successor list, predecessor list, sender_virtual_ids,
and finger table, there are two length bytes (allowing lengths up to
2^16-1). The number of NodeId structures included in each field can
be calculated based on the length bytes since the size of a single
NodeId structure is 16 bytes. If a peer receives more entries than
fit into its successor list, predecessor list or finger table, the
peer SHOULD ignore the extra entries. If the peer is assigned more
virtual IDs than fit into its ID list, it SHOULD reject the
assignment. If a peer receives fewer entries than it currently has
in its own data structure, the peer SHOULD NOT drop the extra entries
from its data structure.
If the Update request succeeds, the responding peer sends an
UpdateAns message, which is defined as:
enum { reserved (0), notify(1), succ_stab(2), pred_stab(3),
full(4), virtualserver_stab_join(5),
virtualserver_stab_leave(6), (255) }
ChordUpdateType;
struct {
ChordUpdateType type;
select(type) {
case full: /* Empty */
;
case virtualserver_stab_join: /* Empty */
;
case virtualserver_stab_leave: /* Empty */
;
case notify:
uint32 uptime;
case pred_stab:
NodeId predecessors <0..2^16-1>;
case succ_stab:
NodeId predecessors <0..2^16-1>;
NodeId successors <0..2^16-1>;
};
Maenpaa, et al. Expires January 7, 2010 [Page 20]
Internet-Draft Topology Plug-in for RELOAD July 2009
} UpdateAns;
If the type of the Update answer is 'full',
'virtualserver_stab_join', or 'virtualserver_stab_leave', the answer
MUST NOT carry any additional information. If the type is 'notify',
the answer MUST contain the sender's current uptime in seconds. If
the type is 'pred_stab', the answer SHOULD carry the predecessor list
of the responding peer. If the type is 'succ_stab', the answer
SHOULD include the predecessor and successor lists of the responding
peer.
6.3. Finger Stabilization
The purpose of the finger stabilization procedure is to incorporate
new peers into the finger table. In the procedure, peer v MUST
maintain a counter 'next', which stores the index of the next finger
that should be stabilized. The counter MUST be initialized to value
one and it MUST be incremented by one after each finger stabilization
procedure. When the stabilization timer fires, peer v MUST choose
one finger interval i from the set of finger_table_size finger
intervals it maintains:
i = next % (finger_table_size + 1),
and send a Probe request addressed to the first identifier belonging
to the chosen finger interval i. The peer f responding to the Probe
request SHOULD become the ith finger of v. Peer v SHOULD send an
Attach request to peer f to initiate a new connection to it.
This document defines a new ProbeInformationType value 'uptime'.
When this value is present in the requested_info field of a Probe
request, it indicates that the receiver MUST include in the Probe
response its current uptime in a ProbeInformation structure. A Probe
request that is sent as part of the finger stabilization procedure
MUST contain the 'uptime' ProbeInformationType in its requested_info
field. The extended ProbeInformation structure that is returned in
the Probe response is defined as:
Maenpaa, et al. Expires January 7, 2010 [Page 21]
Internet-Draft Topology Plug-in for RELOAD July 2009
enum { responsible_set(1), num_resources(2), uptime(3),
(255) }
ProbeInformationType;
struct {
ProbeInformationType type;
select (type) {
case responsible_set:
uint32 responsible_ppb;
case num_resources:
uint32 num_resources;
case uptime:
uint32 uptime;
};
} ProbeInformation;
The types "responsible_ppb" and "num_resources" have been specified
in RELOAD [1]. The "uptime" is a new type and contains the sender's
current uptime in seconds.
6.3.1. Locality-aware Finger Selection
Making progress in the identifier space can be expensive in terms of
network latency. The successor and predecessor lists can be used to
optimize network latency by relaxing the requirement for finger
selection. Specifically, for each finger table entry, a node (say v)
first determines a node n that matches the identifier of the finger.
It then retrieves the successors and predecessors of n from n. Node
v then PINGs the successors and predecessors of node n and chooses
the topologically closest node among these as the choice for the
finger table entry. The sizes of the successor and predecessor lists
have an impact on network latency; the greater the number of
successors and predecessors, the higher the probability of finding a
topologically close finger table entry. Our simulations of the basic
Chord protocol with just three successors and three predecessors
itself shows a reduction close to 41-46% in delay of lookup in the
DHT. Another alternate simulation study reported in [19] confirm our
results and show 31% to 40% lookup stretch reductions using 2^(16)
nodes and a Euclidean and transit-stub model. We expect that the
lookup delay performance would further reduce in the proposed
topology plug-in with log2(N) successors.
Maenpaa, et al. Expires January 7, 2010 [Page 22]
Internet-Draft Topology Plug-in for RELOAD July 2009
6.4. Successor Stabilization
Both the primary virtual IDs and secondary virtual IDs of the nodes
are stored as part of the successor and predecessor tables.
In the successor stabilization routine, a peer v asks the peer s that
is the first entry in its successor table for the virtual ID of the
successor's first predecessor p. If the successor's first
predecessor pointer does not point to v's virtual ID but instead to p
(for instance, because p has joined the overlay between v and s), the
peer with virtual ID p should become v's first successor instead of
s. Thus, v adds p to the front of its successor list and notifies p
of v's existence, so that p can change its predecessor to v.
Also successor lists are stabilized as part of the successor
stabilization routine. In order to do this, peer v copies the
successor list of its successor s, removing the last entry and
prepending s to it. If peer v, as a result of running the successor
stabilization routing, notices that its successor has failed, then it
does the following:
o Using the virtual ID, s, of its successor, it looks up its virtual
ID to physical ID mapping to identify which physical node, say S,
has failed. Here, the term 'physical ID' refers to the primary
virtual ID of the peer.
o The peer then uses the same virtual ID to physical ID mapping
table to identify the location of other virtual IDs in its
successor list that correspond to the physical node S. These IDs
are marked for replacement.
o The peer then replaces the successor with the first live entry in
its successor list, say n, and contacts this node for its
successor list. The peer synchronizes n's successor list with its
own removing the IDs that are marked for replacement. This step
is repeated by contacting the other live entries, one after the
other, until all the IDs marked for replacement are updated.
The successor stabilization routine is executed when the
stabilization timer fires. To begin the successor stabilization
routine, peer v MUST send an Update request to its first successor s.
The type of the Update request MUST be 'succ_stab'. Upon receiving
the Update request, peer s MUST send an Update answer to peer v. The
Update answer SHOULD include the successor and predecessor lists of
peer s. If v learns from the predecessor and successor lists
included in the answer that new peers should be included in its
neighborhood set, v MUST send Attach requests to the new peers. Once
a direct connection has been established with each new peer as a
Maenpaa, et al. Expires January 7, 2010 [Page 23]
Internet-Draft Topology Plug-in for RELOAD July 2009
result of the Attach procedure, peer v MUST send an Update request of
type 'notify' to each new peer. This allows the new peers to insert
v into their neighborhood sets.
6.5. Predecessor Stabilization
The predecessor stabilization routine is executed when the
stabilization timer fires. To begin the predecessor stabilization
routine, a peer v MUST send an Update request to its predecessor p.
The type of the Update request MUST be 'pred_stab'. Upon receiving
the Update request, peer p MUST send an Update answer to peer v. The
Update answer SHOULD include the predecessor list of peer p. Peer v
SHOULD use the predecessor list carried in the answer to update its
own predecessor list. If new peers are inserted into the predecessor
list, peer v MUST send Attach requests and Update requests of type
'notify' to the new peers in the same way as during the successor
stabilization routine.
6.6. Joining the Overlay
The process of joining an overlay is as follows:
1. The Joining Peer (JP) SHOULD connect to a bootstrap peer.
2. The JP SHOULD send an Attach request to the bootstrap peer,
which SHOULD route the request towards the Admitting Peer (AP).
Here, the AP is the node that is the successor of JP's primary
virtual server. Once the Attach procedure is finished, there is
a direct connection between the JP and the AP.
3. The JP SHOULD send a Join request to the AP. The AP returns a
Join answer.
4. The AP MUST send an Update request of type 'full' to the JP.
The Update request SHOULD contain the contents of AP's routing
table. The JP SHOULD use the contents of the Update request to
initialize its finger table and DHT parameters, i.e., alpha and
delta. The JP SHOULD set the size of its successor list,
predecessor list, finger table, to the same values that the AP
uses. The values of alpha and delta are also set to the ones
that AP uses. If an enrollment server is present in the
overlay, the information about the choice of DHT parameters,
alpha and delta, can be obtained from it directly.
5. The JP then chooses the virtual server locations, vs_i for i =
2, 3, ..., alpha. This step is not performed if there is an
enrollment server in the overlay. In this scenario, the
locations of the virtual servers are obtained from the
Maenpaa, et al. Expires January 7, 2010 [Page 24]
Internet-Draft Topology Plug-in for RELOAD July 2009
enrollment server apriori.
6. The JP SHOULD send an Attach request to the successor of
vs_alpha, call it n_alpha. The node returns a Attach answer.
7. The peer n_alpha MUST send an Update request of type 'full' to
the JP. The Update request SHOULD contain the contents of the
node's successors. The JP SHOULD use the contents of the Update
request to initialize its successor and predecessor lists.
8. The JP MUST send Attach requests to initiate connections to each
of the peers in its predecessor list, successor list, and finger
table. Since the JP is already connected to the AP and n_alpha,
there is no need to send a new Attach request to these nodes.
9. The JP MUST send an Update request of type 'notify' to each of
the peers in its predecessor and successor lists (except for the
AP and n_alpha that are already aware of the JP).
10. The JP MUST send a Probe request carrying the 'uptime'
ProbeInformationType value in the requested_info field to each
of its fingers. This way the JP will learn the uptimes of its
fingers (the uptimes of predecessors and successors are learned
from Update responses in the previous step). The uptimes are
needed when estimating the join rate of peers in the overlay.
It should be noted that these Probe requests are not routed via
the overlay but are sent on a direct connection.
11. Now, the JP takes ownership of regions in the overlay based on
its virtual server locations, vs_i. In this step, the JP MUST
send one Update message to the successor of each vs_i for all i.
The type of the Update message MUST be set to
'virtualserver_stab_join'. The peer, say n_i, receiving this
message makes a note of this change and locally updates its
ownership locations. Peer n_i then sends an UpdateAns message
with type 'virtualserver_stab_join' to the JP acknowledging the
change. Peer n_i SHOULD then issue a series of Store requests
to JP to transfer ownership of the resources. This step is
repeated for all i = alpha, alpha-1, ..., 1. At the end of this
step all the new virtual servers corresponding to JP have joined
the overlay, and the JP stores the data for these virtual
locations.
6.6.1. Contents of the Join Message
This topology plug-in extends the Join request defined in RELOAD [1].
In addition to the joining_peer_id, nodes MAY choose to send its
virtual server locations as part of the join message if they are
Maenpaa, et al. Expires January 7, 2010 [Page 25]
Internet-Draft Topology Plug-in for RELOAD July 2009
obtained apriori from the enrollment server as part of the enrollment
process. The JoinReq message is defined as:
struct {
NodeId joining_peer_id;
NodeId joining_peer_virtual_server_ids <0..2^16-1>;
opaque overlay_specific_data <0..2^16-1>;
} JoinReq;
The JoinReq contains the Node-ID which the sending peer wishes to
assume and MAY include other virtual server locations that the
joining peer wishes to occupy.
If the request succeeds, the responding peer responds with a JoinAns
message; this is defined as in the case of RELOAD.
6.7. Leaving the Overlay
The process of leaving the overlay is as follows:
1. If no replication is being performed in the overlay, leaving peer
SHOULD issue a series of Store requests to the successor of each
of its virtual servers, vs_i, to transfer the ownership of the
resource records it is storing. Note that if replication is
being used, the successor of peer is already storing replicas and
the amount of data transferred can be minimized.
2. The leaving peer MUST send a Leave request to the predecessor and
successor of each virtual server, vs_i. The Leave request that
is sent to the vs_i's successor SHOULD contain the predecessor
list of the leaving peer. The Leave request that is sent to the
vs_i's predecessor SHOULD contain the successor list of the
leaving peer. The first successor SHOULD use the predecessor
list carried in the Leave request to update its own predecessor
list. The first predecessor SHOULD use the successor list
carried in the Leave request to update its own successor list.
6.7.1. Contents of the Leave Message
This topology plug-in extends the Leave request defined in RELOAD
[1]. In addition to the leaving_peer_id, a node MUST send its
virtual server locations as part of the leave message as defined in:
public struct {
NodeId leaving_peer_id;
NodeId leaving_peer_virtual_server_ids <0..2^16-1>;
opaque overlay_specific_data <0..2^16-1>;
Maenpaa, et al. Expires January 7, 2010 [Page 26]
Internet-Draft Topology Plug-in for RELOAD July 2009
} LeaveReq;
The overlay_specific_data field of the Leave request MUST contain a
ChordLeaveData structure:
enum { reserved (0),
from_succ(1), from_pred(2), (255) }
ChordLeaveType;
struct {
ChordLeaveType type;
select(type) {
case from_succ:
NodeId successors <0..2^16-1>;
case from_pred:
NodeId predecessors <0..2^16-1>;
};
} ChordLeaveData;
The 'type' field indicates whether the Leave request was sent by a
predecessor or a successor of the recipient:
from_succ
The Leave request was sent by a successor.
from_pred
The Leave request was sent by a predecessor.
If the type of the request is 'from_succ', the contents will include
the sender's successor list.
If the type of the request is 'from_pred', the contents will include
the sender's predecessor list.
6.8. Self Tuning System Parameters
This section describes how the system parameters for load balancing
and stabilization adapt to changing network conditions.
Maenpaa, et al. Expires January 7, 2010 [Page 27]
Internet-Draft Topology Plug-in for RELOAD July 2009
6.8.1. Self Tuning Load Balancing Algorithm Parameters
All DHTs require constant updating as the size of the network grows.
For example, in the case of Chord, each node starts out with m
entries in the finger table (if node-IDs take values from 0 to
2^m-1). When the number of nodes in the network is small, some of
these m entries collapse to the same node. As the number of nodes in
the network increase, the finger table entries degenerate to pointing
to different nodes and the number of fingers grows as O(log2(N)).
Thus, the number of fingers that each node has to maintain grows as
O(log2(N)) as N increases.
A similar update step is required by the solution in this document
scheme as well because the system parameters, alpha and delta, also
depend upon the value of N. This section describes how nodes MUST
update the values of alpha, delta, and the location of virtual
servers as the network changes in size. This document proposes to
update the system values each time the network size doubles or
halves.
In addition to finger, successor, and predecessor stabilization, each
peer also needs to perform DHT parameter stabilization. Unlike the
other stabilization routines that are done periodically when the
stabilization timer fires, the parameter stabilization routine is
done whenever the number of fingers is observed to change.
Performing stabilization in this way is sufficient for updating the
load balancing parameters (i.e., alpha and delta) because it is not
the highest priority update and the overlay would function
effectively even without this update. In the proposed topology
plug-in, we employ lazy updates for stabilizing the parameters, alpha
and delta. More specifically, we use the change in number of fingers
as a trigger to perform DHT parameter update.
The parameter stabilization routine is executed when the number of
fingers in a node changes. We consider two possible scenarios that
can arise:
1. Number of connections decrease: Let f be the lost connection.
The peer v MUST first check the ID of the lost connection to see
if there is any response. Let n be the ID of the node responding
to the request. If n is the same as f, the peer must re-connect
to f. If n is not the same as f and does not correspond to any
node that is already in the finger table of v, then v MUST send
an Attach request to n. In the above two cases, the parameter
stabilization routine is not performed since the number of
fingers does not change. On the other hand, if n is not the same
as f and corresponds to some other node in the finger table of v,
then v understands that the network size has reduced and invokes
Maenpaa, et al. Expires January 7, 2010 [Page 28]
Internet-Draft Topology Plug-in for RELOAD July 2009
the parameter stabilization routine.
2. Number of connections increase: In this case, v invokes the
parameter stabilization routine.
When the parameter stabilization routine is invoked, the following
functions are performed:
o Step 1: The (alpha-1) virtual servers corresponding to v leave the
overlay.
o Step 2: The value of alpha and delta are updated and the new
values are obtained. The corresponding virtual server locations
are also modified to get vs_i.
o Step 3: The node v joins the overlay at the new virtual server
locations, vs_i.
o Step 4: The successors and predecessors of v are updated.
During the parameter stabilization routine, the portion of the data
space owned by the corresponding physical node changes and so data
needs to be moved to match this change. The locations of successors
and predecessors also change to adjust to the updates in virtual
server locations. However, the fingers do not change because these
are associated with the primary virtual ID and the location of the
primary node-ID is fixed.
In Step 1 of the stabilization routine (see above), the peer v MUST
send one Update message to the successor of the virtual server, vs_i.
The type of the Update message MUST be set to
'virtualserver_stab_leave'. The peer n_i receiving this message
makes a note of this change and locally updates its ownership
locations. Peer n_i then sends an UpdateAns message with type
'virtualserver_stab_leave' to vs_i acknowledging the change. On
receiving the UpdateAns message, peer v SHOULD issue a series of
Store requests to n_i to transfer ownership of the resources. This
step is repeated for all i = 2, ... alpha. At the end of this step,
all the virtual servers corresponding v have left the overlay and
peer v no longer stores the data corresponding to these virtual
locations.
In Step 2, the peer v obtains its new virtual server locations, vs_i.
In the presence of an enrollment server, the peer v MUST request the
enrollment server for a new set of identities and stop participating
in the overlay network with the old node identities. The enrollment
server MAY choose to implement diagnostics using mechanisms in [3] to
ascertain that the node requesting identities is not requesting more
Maenpaa, et al. Expires January 7, 2010 [Page 29]
Internet-Draft Topology Plug-in for RELOAD July 2009
or less than what it should based on the finger table sizes of a
random sampling of other nodes in the overlay.
In the absence of the enrollment server, the following protocol is
implemented to obtain the new values of alpha, delta, and vs_i.
Maenpaa, et al. Expires January 7, 2010 [Page 30]
Internet-Draft Topology Plug-in for RELOAD July 2009
UpdateAlphaDelta()
{
If (number_of_fingers increases by 1) {
// Can be detected as in the case of Chord when finger tables
// are updated. This implies that the network size has about
// doubled.
delta = delta/2; // update delta
// Update existing virtual server locations
// vs_1 = vp as the location of the primary virtual server
// does not change.
for i = 2 to alpha
vs_i = vp - (vp - vs_i)/2;
// Choose new virtual server location between
// (vp - alpha*delta*2^m, vp - (alpha+1)*delta*2^m);
vs_(alpha+1) = random (vp - alpha*delta*2^m,
vp - (alpha+1)*delta*2^m);
vs_(alpha+2) = random (vp (alpha+1)*delta*2^m,
vp - (alpha+2)*delta*2^m);
alpha = alpha + 2;
}
If (number_of_fingers decreases by 1) {
// Can be detected as in the case of Chord when finger tables
// are updated. This implies that the network size has about
// doubled.
// Leave network with those virtual ids
Remove virt_server_(alpha-1);
Remove virt_server_(alpha);
delta = delta * 2; // update delta
alpha = alpha - 2; // update alpha
// Update existing virtual server locations
// vs_1 = vp as the location of the primary virtual server
// does not change.
for i = 2 to alpha
vs_i = vp - (vp - vs_i) * 2;
}
}
Once the location of the virtual servers are obtained, the peer re-
joins the overlay at these positions. During this step, the peer v
Maenpaa, et al. Expires January 7, 2010 [Page 31]
Internet-Draft Topology Plug-in for RELOAD July 2009
MUST send one Update message to the successor of vs_i in Step 3. The
type of the Update message MUST be set to 'virtualserver_stab_join'.
The peer n_i receiving this message makes a note of this change and
locally updates its ownership locations. Peer n_i then sends an
UpdateAns message with type 'virtualserver_stab_join' to v
acknowledging the change. Peer n_i SHOULD then issue a series of
Store requests to v to transfer ownership of the resources. In this
process, the virtual server's n_i are added to the successor list of
v. This step is repeated for all i = alpha, alpha-1, ..., 2. At the
end of this step all the new virtual servers corresponding to v have
joined the overlay, and peer v stores the data for these virtual
locations.
If replication is performed on the overlay, some of the data may
already be present in the node v and this would reduce the amount of
Store requests.
Successor and predecessor stabilization routines are invoked as
described in Section 6.4 and Section 6.5, respectively, in Step 4 and
this completes the parameter stablization routine.
6.8.2. Self Tuning the Stabilization Interval
To ensure that lookups produce correct results as the set of
participating peers changes and to ensure that all peers' connections
be up to date, each peer MUST run a stabilization protocol
periodically in the background. The stabilization protocol uses
three operations: finger stabilization, successor stabilization, and
predecessor stabilization as defined earlier. Each peer MUST
maintain a stabilization timer. When the stabilization timer fires,
the peer MUST restart the timer and carry out the stabilization
operations. In this section, we present methods to compute the
stabilization timer.
When periodic stabilization is used, one faces the problem of
selecting an appropriate execution rate for the stabilization
procedure. If the execution rate of periodic stabilization is high,
changes in the system can be quickly detected, but at the
disadvantage of increased communication overhead. On the other hand,
if the stabilization rate is low and the churn rate is high, routing
tables become inaccurate and DHT performance deteriorates. Thus, the
problem is setting the parameters so that the overlay achieves the
desired reliability and performance even in challenging conditions,
such as under heavy churn. This naturally results in high cost
during periods when the churn level is lower than expected, or
alternatively, poor performance or even network partitioning in worse
than expected conditions.
Maenpaa, et al. Expires January 7, 2010 [Page 32]
Internet-Draft Topology Plug-in for RELOAD July 2009
The current approach is to configure overlays statically. This works
in situations where perfect information about the future is
available. In situations where the operating conditions of the
network are known in advance and remain static throughout the
lifetime of the system, it is possible to choose fixed optimal values
for parameters such as stabilization rate, neighborhood set size, and
routing table size. However, if the operating conditions (e.g., the
size of the overlay and its churn rate) do not remain static but
evolve with time, it is not possible to achieve both a low lookup
failure rate and a low communication overhead by using fixed
parameters [20].
As an example, to configure the Chord DHT algorithm, one needs to
select appropriate values for the size of successor list and the
stabilization interval. To select an appropriate value for the
stabilization interval, one needs to know the expected churn rate and
overlay size. According to [21], a Chord network in a ring-like
state remains in a ring-like state as long as peers send
Omega(log2^2(N)) messages before N new peers join or N/2 peers fail.
Thus, in a 500-peer overlay churning at a rate such that one peer
joins and one peer leaves the network every 30 seconds, an
appropriate stabilization interval would be on the order of 93s.
According to [4], the size of the successor list should be on the
order of log2(N). Having a successor list of size O(log2(N)) makes
it unlikely that a peer will lose all of its successors, which would
cause the Chord ring to become disconnected. Thus, in a 500-peer
network each peer should maintain on the order of nine successors.
However, if the churn rate doubles and the network size remains
unchanged, the stabilization rate should double as well. That is,
the appropriate maintenance interval would now be on the order of
46s. On the other hand, if the churn rate becomes e.g. six-fold and
the size of the network grows to 2000 peers, on the order of eleven
successors should be maintained and the stabilization interval should
be on the order of 42s. If one continued using the old values, this
could result in inaccurate routing tables, network partitioning, and
deteriorating performance.
The proposed topology plug-in takes into consideration the continuous
evolution of network conditions and adapts to them. Each peer
collects statistical data about the network and adaptively adjusts
its stabilization rate and successor list size based on the analysis
of the data [20]. Reference [22] shows that by using a self-tuning
mechanism, it is possible to achieve high reliability and performance
even in adverse conditions with low maintenance cost. Adaptive
stabilization has been shown to outperform periodic stabilization in
terms of both lookup failure and communication overhead [20].
The following sub-sections specify methods to determine the
Maenpaa, et al. Expires January 7, 2010 [Page 33]
Internet-Draft Topology Plug-in for RELOAD July 2009
appropriate stabilization rate in an adaptive fashion. The proposed
mechanism is based on [22][21][20]. To calculate an appropriate
stabilization rate, the values of three parameters MUST be estimated:
overlay size N, failure rate U, and join rate L. Peers in the overlay
MUST re-calculate the values of the parameters to self-tune the
algorithm at the end of each stabilization period before re-starting
the stabilization timer.
6.8.2.1. Estimating Overlay Size
Techniques for estimating the size of an overlay network have been
proposed for instance in [22] [23] [24] [25] and [20]. In Chord, the
density of peer identifiers in the successor set can be used to
produce an estimate of the size of the overlay, N [22]. Since peer
identifiers are picked randomly with uniform probability from the
m-bit identifier space, the average distance between peer identifiers
in the successor set is (2^m)/N.
To estimate the overlay network size, a peer MUST compute the average
inter-peer distance d between the successive peers starting from the
most distant predecessor and ending to the most distant successor in
the successor list. The estimated network size MUST be calculated
as:
2^m
N = ---------
d
This estimate has been found to be accurate within 15% of the real
network size [20]. Of course, the size of the neighborhood set
affects the accuracy of the estimate.
When a peer joins the network, the admitting peer sends the joining
peer a copy of its neighborhood set. Thus, a joining peer
immediately has enough information at its disposal to calculate an
estimate of the network size.
6.8.2.2. Estimating Failure Rate
A typical approach is to assume that peers join the overlay according
to a Poisson process with rate L and leave according to a Poisson
process with rate parameter U [22]. The value of U can be estimated
using peer failures in the finger table and neighborhood set [22].
If peers fail with rate U, a peer with M unique peer identifiers in
its routing table should observe K failures in time K/(M*U). Every
peer in the overlay MUST maintain a history of the last K failures.
The current time MUST be inserted into the history when the peer
Maenpaa, et al. Expires January 7, 2010 [Page 34]
Internet-Draft Topology Plug-in for RELOAD July 2009
joins the overlay. The estimate of U MUST be calculated as:
k
U = ----------,
M * Tk
where M is the number of unique peer identifiers in the routing
table, Tk is the time between the first and the last failure in the
history, and k is the number of failures in the history. If k is
smaller than K, the estimate is computed as if there was a failure at
the current time. It has been shown that an estimate calculated in a
similar manner is accurate within 17% of the real value of U [20].
The size of the failure history K affects the accuracy of the
estimate of U. One can increase the accuracy by increasing K.
However, this has the side effect of decreasing responsiveness to
changes in the failure rate. On the other hand, a small history size
may cause a peer to overreact each time a new failure occurs. In
[20], K is set 25% of the routing table size.
6.8.2.2.1. Estimating Join Rate
Reference [20] proposes that a peer can estimate the peer join rate
based on the uptime of the peers in its routing table. An increase
in peer join rate will be reflected by a decrease in the average age
of peers in the routing table. Thus, each peer MUST maintain an
array of the ages of the peers in its routing table sorted in
increasing order. Using this information, an estimate of the global
peer join rate L MUST be calculated as:
N 1
L = --- * ---------------,
4 Ages[rsize/4]
where Ages is an array containing the ages of the peers in the
routing table sorted in increasing order and rsize is the size of the
routing table. Only the ages of the 25% of the youngest peers in the
routing table SHOULD be used to reduce the bias that a small number
of peers with very old ages can cause [20]. It has been shown that
the estimate obtained by using this method is accurate within 22% of
the real join rate [20]. Of course, the size of the routing table
affects the accuracy.
In order for this mechanism to work, peers need to exchange
information about the time they have been present in the overlay.
Peers learn the uptimes of their successors and predecessors when
Maenpaa, et al. Expires January 7, 2010 [Page 35]
Internet-Draft Topology Plug-in for RELOAD July 2009
adding the successors and predecessors to their routing tables since
Update requests and answers that are of type 'notify' carry uptime
values. Peers learn the uptimes of their fingers because the Probe
responses sent as part of the finger stabilization routine carry
uptime values. A joining peer learns the admitting peer's uptime
since an Update request of type 'full' contains uptime information.
6.8.2.2.2. Calculating the Stabilization Interval
According to [21], a Chord network in a ring-like state remains in a
ring-like state as long as peers send Omega(log2^2(N)) messages
before N new peers join or N/2 peers fail. We can use the estimate
of peer failure rate, U, to calculate the time Tf in which N/2 peers
fail:
1
Tf = --------
2*U
Based on this estimate, a stabilization interval Tstab-1 is
calculated as:
Tf
Tstab-1 = -----------
log2^2(N)
Further, the estimated join rate L can be used to calculate the time
in which N new peers join the overlay. Based on the estimate of L, a
stabilization interval Tstab-2 is calculated as:
N
Tstab-2 = ---------------
L * log2^2(N)
Finally, the actual stabilization interval Tstab that SHOULD be used
can be obtained by taking the minimum of Tstab-1 and Tstab-2.
The results obtained in [26] indicate that making the stabilization
interval too small has the effect of making the overlay less stable
(e.g., in terms of detected loops and path failures). Thus, a lower
limit should be used for the stabilization period. Based on the
results in [26], a lower limit of 15s is proposed, since using a
stabilization period smaller than this will with a high probability
cause too much traffic in the overlay.
Maenpaa, et al. Expires January 7, 2010 [Page 36]
Internet-Draft Topology Plug-in for RELOAD July 2009
7. Security Considerations
One important concern for the virtual servers approach is that it is
not easily enforceable. Given a set of virtual serverIDs for a
physical node, it must be ensured that the physical node actually
takes ownership of all the virtual serverIDs assigned to it. In the
presence of a centralized enrollment server, this server can ensure
that each physical node gets its share of the number of virtual
servers. In the absence of the enrollment server, the verification
can be performed during the join process. For instance, in the case
of Chord, when a new node joins the overlay, it contacts its
neighbors to obtain neighbor relations and finger table entries. A
similar approach can be adopted in the case of virtual servers
approach. In this scenario, each physical node provides its
neighbors a list of 2 log2(N) virtual server IDs. The neighbors
first verify that the number of virtual server locations received is
close to the number of virtual server locations that it owns before
sending the finger table entries to the joining node. In this way,
it can be ensured that each node has O(log2(N)) virtual locations on
the overlay.
8. IANA Considerations
(a) A new overlay algorithm type should be defined for the proposed
new topology plug-in.
(b) This document defines one new Probe Information Type value:
+-----------------+------+---------------+
| Probe Option | Code | Specification |
+-----------------+------+---------------+
| uptime | 3 | RFC-AAAA |
+-----------------+------+---------------+
(c) Other IANA considerations are TBD.
9. Acknowledgments
This document benefited from design discussions with Vidya Narayanan
from Qualcomm Inc.
10. Appendix
This appendix lists a few performance results of the load balancing
Maenpaa, et al. Expires January 7, 2010 [Page 37]
Internet-Draft Topology Plug-in for RELOAD July 2009
solution proposed in this document.
10.1. Comparison of the Load Balancing Algorithm with Chord
Compared to the virtual markers approach in [7], the solution
proposed in this document does not require O(log2(N)) nodes to change
their location upon each node arrival. For instance, in the case of
the virtual markers approach, if O(log2(N)) nodes change location,
then O(log2(N)/N) data objects need to be reassigned and at least O(2
log2(N)) nodes are involved in this step. This is in addition to the
messages required for updating routing tables which is O(log2^3(N)).
In contrast to the virtual markers approach, the proposed solution
requires O(1/N) data objects to be reassigned and around O(log2(N))
nodes send data to the joining node. The number of routing messages
required is still O(log2^2(N)) similar to the case of Chord with no
virtual servers; this is because the number of connections is still
O(log2(N)) links per node.
We performed some simulation studies to compare the performance of
the solution in this document with Chord. In order to study the
percentage of nodes with significant load imbalance, we look at the
q-percentile load imbalance factor defined as the ratio of the
q-percentile load and the average load. For example, a 99-percentile
imbalance factor of 5 implies that less than 1 percent of nodes in
the system have a load more than the value 5 times the average load.
We tested the solution in this document under N = 2^(15) and compared
the results with Chord. We found that the 99-percentile imbalance
factor for the solution in this document is around 2. In comparison,
our results with Chord suggest that the 100-percentile imbalance
factor is around 9, the 99-percentile imbalance factor is around 4.5
and so on. This result suggests that in Chord, around 1% of the
nodes have an imbalance between 4.5 and 9, 2% have an imbalance
greater than 4 and so on compared to the solution in this document
where less than 1% of the nodes has an imbalance greater than 2.
With regard to routing, our simulations compared the average route
length for Chord (with one virtual server) with this document's
solution (2 log2(N)) servers; the results showed that the both these
DHT implementations require around the same number of hops.
10.2. Performance of the Load Balancing Algorithm as Network Grows
In this section, we take a closer look at the performance of the
solution in this document as network size grows in terms of load
imbalance factor and routing state maintenance.
Maenpaa, et al. Expires January 7, 2010 [Page 38]
Internet-Draft Topology Plug-in for RELOAD July 2009
(1) Load Imbalance: In our simulations, we fix the value of the
estimated network size to be Nest = 2^9, and chose alpha = 18
and delta = 2^(-9). The actual network size is then increased
from 2^2 to 2^(14). When N < Nest, the spacing between virtual
servers (chosen as 1/Nest) is very small and the solution in
this document becomes similar to Chord. In this case, the load
imbalance factor is of the order of O(log2(N)). However, since
the numerical value of N is small, the imbalance factor is still
low and around 3 for most nodes as demonstrated by our
simulations. As the value of network size, N, increases, the
spacing, delta, comes into effect and the virtual servers help
balance the load. Our simulations indicate that the imbalance
is around its lowest value of 2 when N = Nest, and increases
slowly as N becomes larger than Nest.
(2) Routing State: Here, we examine the performance of this solution
with regard to the amount of routing state that it needs to
maintain as the network grows. We analyze the performance of
the solution in this document analytically to determine the
number of hops required to reach a destination as a function of
N and Nest. We perform the analysis in two steps:
* [Step 1] Routing within a distance of 1/N from destination:
If nodes in this solution employ only their primary virtual
servers (and the corresponding O(log2(N)) connections) for
routing as in the case of Chord, it can be shown that any
discretization built upon the solution in this document would
be able to get within a distance of 1/N from the destination
node in O(log2(N)) hops with O(log2(N)) fingers per node.
This results is unaffected by the relative values of N and
Nest.
* [Step 2] Last-mile: The number of hops required to route
messages from within a distance of 1/N to the exact
destination (referred to as the last-mile problem) would
depend upon the exact realization of DHT and its construction
of neighbor tables. In this document, each node has alpha
virtual servers chosen uniformly at random separated by a
distance O(delta*2^m). Therefore, the number of nodes within
a 1/N distance from a chosen location would be of the order
of O(log2(N) + N*log2(1+s alpha delta)), where 's' is a
constant independent of N. If alpha and delta are chosen
using Nest, then the number of nodes within a 1/N distance
can be approximated to be of the order of O(log2(N) + N*
log2(Nest)/Nest) for large Nest. Therefore, a random walk
would lead to the final destination within O(log2(N) + N*
log2(Nest)/Nest) hops.
Maenpaa, et al. Expires January 7, 2010 [Page 39]
Internet-Draft Topology Plug-in for RELOAD July 2009
This result indicates that when N < Nest, the solution in this
document performs similar to Chord and messages can be routed to
any node in O(log2(N)) hops by maintaining O(log2(N)) fingers
and O(log2(N)) successors. On the other hand, when N is much
larger compared to Nest, O(N) hops might be required to reach
the destination if the number of fingers per node is O(log2(N)).
However, this situation arises only when the network size
estimation is very much lower than N and the value of alpha and
delta are not updated as the network grows.
11. References
11.1. Normative References
[1] Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and H.
Schulzrinne, "REsource LOcation And Discovery (RELOAD) Base
Protocol", draft-ietf-p2psip-base-02 (work in progress),
March 2009.
[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[3] Yongchao, S., Jiang, X., Even, R., and D. Bryan, "P2PSIP
Overlay Diagnostics", draft-ietf-p2psip-diagnostics-01 (work in
progress), June 2009.
11.2. Informative References
[4] Stoica, I., Morris, R., Karger, D., Kaashoek, M., and H.
Balakrishnan, "Chord: A scalable peer-to-peer lookup service
for internet applications", In Proc. of the ACM SIGCOMM, 2001.
[5] Li, J., Strinbling, J., Gil, T., and M. Kaashoek, "Comparing
the performance of distributed hash tables under churn", In
Proc. of the 3rd International Workshop on Peer-to-Peer
Systems, 2004.
[6] Bienkowski, M., Korzeniowski, M., and F. auf der Heide,
"Dynamic load balancing in distributed hash tables", In Proc.
of IPTPS, 2005.
[7] Karger, D. and M. Ruhl, "Simple efficient load balancing
algorithms for peer-to-peer systems", In 3rd International
Workshop on Peer-to-Peer Systems (IPTPS), 2004.
[8] Awerbuch, B. and C. Scheideler, "Group spreading: A protocol
for provably secure distributed name service", In Proc. of the
Maenpaa, et al. Expires January 7, 2010 [Page 40]
Internet-Draft Topology Plug-in for RELOAD July 2009
31st Int. Colloquium on Automata, Languages, and Programming
(ICALP), 2004.
[9] Fraigniaud, P. and C. Gavoille, "The content-addressable
network d2b", Technical Report 1349, LRI, Univ. Paris-Sud,
France, 2003.
[10] Kaashoek, F. and D. Karger, "Koorde: A simple degree-optimal
hash table", In Proc. 2nd International Workshop on Peer-to-
Peer Systems (IPTPS), 2003.
[11] Naor, M. and U. Wieder, "Novel architectures for peer to peer
applications: the continuous-discrete approach.", In Proc. of
the 15th ACM Symp. on Parallel Algorithms and Architectures
(SPAA), 2003.
[12] Byers, J., Considine, J., and M. Mitzenmacher, "Simple load
balancing for distributed hash tables", In 2nd International
Workshop on Peer-to-Peer Systems (IPTPS), 2003.
[13] Manku, G., "Routing Networks for Distributed Hash Tables", In
Proc. of the Principles of Distributed Computing (PODC), 2003.
[14] Godfrey, B. and I. Stoica, "Heterogenity and load balance in
Distributed Hash Tables", IEEE INFOCOM, 2005.
[15] Godfrey, B., Lakshminarayanan, K., Surana, S., Karp, R., and I.
Stoica, "Load balancing in dynamic structured peer to peer
systems", In 23rd Conference of the IEEE Communications
Society (INFOCOM), 2004.
[16] Rao, A., Lakshminarayanan, K., Surana, S., Karp, R., and I.
Stoica, "Load balancing in structured peer to peer systems",
In 2nd International Workshop on Peer-to-Peer Systems (IPTPS),
2004.
[17] Rhea, S., Geels, D., Roscoe, T., and J. Kubiatowicz, "Handling
churn in a DHT", In Proc. of the USENIX Annual Techincal
Conference, 2004.
[18] Krishnamurthy, S., El-Ansary, S., Aurell, E., and S. Haridi,
"Comparing maintenance strategies for overlays", In Proc. of
16th Euromicro Conference on Parallel, Distributed and Network-
Based Processing, 2008.
[19] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek,
M., Dabek, F., and H. Balakrishnan, "Chord: A scalable peer-to-
peer lookup protocol for internet applications", IEEE/ACM
Maenpaa, et al. Expires January 7, 2010 [Page 41]
Internet-Draft Topology Plug-in for RELOAD July 2009
Transactions on Networking, 2003.
[20] Ghinita, G. and Y. Teo, "An adaptive stabilization framework
for distributed hash tables", 20th International Parallel and
Distributed Processing Symposium, 2006.
[21] Liben-Nowell, D., Balakrishnan, H., and D. Karger,
"Observations on the dynamic evolution of peer-to-peer
networks", In Proc. of the First International Workshop on
Peer-to-Peer Systems, 2002.
[22] Mahajan, R., Castro, M., and A. Rowstron, "Controlling the cost
of reliability in peer-to-peer overlays", In Proceedings of
the 2nd International Workshop on Peer-to- Peer Systems, 2003.
[23] Horowitz, K. and D. Malkhi, "Estimating Network Size from Local
Information", Information Processing Letters, Volume 88, Issue
5, pp. 237-243, 2003.
[24] Kostoulas, D., Psaltoulis, D., Gupta, I., Birman, K., and A.
Demers, "Decentralized schemes for size estimation in large and
dynamic groups", Fourth IEEE International Symposium on
Network Computing and Applications, pp. 41-48, 2005.
[25] Binzenhofer, A., Kunzmann, G., and R. Henjes, "A scalable
algorithm to monitor chord-based peer to peer systems at
runtime", 20th International Parallel and Distributed
Processing Symposium, 2006.
[26] Maenpaa, J. and G. Camarillo, "A study on maintenance
operations in a Chord-based Peer-to-Peer Session Initiation
Protocol overlay network", Accepted to Sixth International
Workshop on Hot Topics in P2P Systems (HotP2P 2009), 2009.
[27] Ktari, S., Zoubert, M., Hecker, A., and H. Labiod, "Performance
evaluation of replication strategies in DHTs under churn", In
Proc. of the 6th International Conference on Mobile and
Ubiquitous Multimedia, 2007.
Maenpaa, et al. Expires January 7, 2010 [Page 42]
Internet-Draft Topology Plug-in for RELOAD July 2009
Authors' Addresses
Jouni Maenpaa
Ericsson
Hirsalantie 11
Jorvas 02420
Finland
Email: Jouni.Maenpaa@ericsson.com
Ashwin Swaminathan
Qualcomm, Inc.
5775 Morehouse Dr
San Diego, CA
USA
Phone: +1 858-845-8775
Email: sashwin@qualcomm.com
Saumitra M. Das
Qualcomm, Inc.
3195 Kifer Road
Santa Clara, CA
USA
Phone: +1 408-533-9529
Email: saumitra@qualcomm.com
Gonzalo Camarillo
Ericsson
Hirsalantie 11
Jorvas 02420
Finland
Email: Gonzalo.Camarillo@ericsson.com
Jani Hautakorpi
Ericsson
Hirsalantie 11
Jorvas 02420
Finland
Email: Jani.Hautakorpi@ericsson.com
Maenpaa, et al. Expires January 7, 2010 [Page 43]