Internet Engineering Task Force C-Y Lee
INTERNET DRAFT L. Andersson
Expires April 2000 Nortel Networks
Ken Carlberg
SAIC
Bora Akyol
Pluris
October 1999
Engineering Paths for Multicast Traffic
<draft-leecy-multicast-te-01.txt>
Status of this memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document describes a solution to engineer paths for IP multicast
traffic in a network, by directing the control messages to setup
multicast trees on engineered paths. This enables the network
operator to have control over the topology of multicast trees.
This proposal partitions the multicast traffic engineering problem
such that multicast routing protocols do not have to be modified to
allocate resources for multicast traffic nor do resource allocation
protocols such as RSVP or CR-LDP have to be able to setup forwarding
states (in this case labels) like multicast routing protocols.
Resources are allocated on the same trip that paths are selected and
setup. This prevent the problem of data being forwarded on branches
of the tree where resources have not being allocated yet. An
important aspect of this proposal is that it enables multicast paths
Expires April 2000 [Page 1]
Internet Draft Engineering Paths for Multicast Traffic April 2000
to be engineered in an aggregatable manner, allowing this solution to
scale in the backbone.
1. Overview
In general, traffic is engineered to traverse certain paths so as to
utilize resources in a network in a more optimal manner, while at the
same time improving the level of service that can be offered.
In conventional IP routing, traffic may be engineered to use a path
by configuring preferred links towards a destination with a lower
metric. This method only allows traffic to be engineered based on the
destination address. Since the forwarding is based on the
destination address only, traffic cannot be engineered based on other
attributes (which maybe useful for traffic engineering purposes) of
the packet such as the source address of a packet or the requested
service level. In contrast, MPLS abstracts the forwarding paradigm
and allows traffic to be forwarded based on attributes (known as
forwarding equivalence class (FEC) in MPLS) in addition to the
destination address. This provides a versatile and convenient syntax
for traffic engineering purposes.
This document describes a way to provide a basic traffic engineering
mechanism for multicast. Traffic Engineering (TE) functionalities (in
the MPLS entity) are used to decide where to forward the join control
messages of multicast protocols, based on different traffic
engineering requirements and to allocate resources. (Note that
multicast data packets however are forwarded based on Layer 3 (L3)
address information and are not label switched. )
Using this basic multicast traffic engineering mechanism, ISPs can
define particular FECs for their network, resources required to
receive traffic from certain root prefix, decrease fanouts at a node
by limiting the number paths towards the node(prefix), allowing only
certain paths to carry multicast traffic, experiment with heuristics
to better engineer multicast trees, use a function to dynamically
compute suitable paths based on current or predicted network
resources. All these additional network or content provider specific
functions to engineer traffic can be developed independently of the
basic multicast traffic engineering scheme.
2.0 Motivation
The fundamental problem with doing multicast Traffic Engineering (TE)
is the difficulty in doing it in a scalable manner. Multicast routes
are very difficult (and some claim impossible) to aggregate. One can
associate a label with a unicast route(prefix) and packets sent to
that destination can be aggregated by associating them with the
Expires April 2000 [Page 2]
Internet Draft Engineering Paths for Multicast Traffic April 2000
label.
Since multicast routes are not aggregatable in general, associating a
label with a multicast route implies per flow/group resource
allocation. In essence, this kind of association will result in RSVP
(or ATM) style resource allocation and is more applicable to per flow
QOS than traffic engineering.
In contrast the approach taken in this proposal decouples traffic
engineering from multicast route setup, thereby allowing the
resources and paths for multicast data delivery to be independently
allocated. What this implies is, resources and paths can be
aggregated and engineered; and traffic can be statistically
multiplexed, enabling network operators to provide differentiated
services for multicast traffic in a scalable manner.
3.0 Scope
This draft described mechanisms which is applicable to multicast
routing protocols such as PIM-SM, CBT, BGMP, Express or Simple
Multicast, which will be called 'control driven' in this draft. The
mechanisms to handle 'Data driven' or flood and prune protocols (eg
DVMRP and PIM-DM) is FFS. This proposal assumes a multicast
group/tree has a common service level requirement. It is envisaged
that heterogeneous receivers requirement can be met by layer encoding
data in different multicast groups or other variation of layer
encoding.
It should be noted that the MPLS concepts of interest here are the
FEC, ERO and resource allocation and path selection. Although the
proposed scheme do not use label switching the solution is described
in MPLS terms since the concepts of interest here have already been
defined in MPLS.
4.0 Approach
A control driven multicast routing protocol sends a 'join' message to
graft a node to a multicast distribution tree, creating multicast
routes in the process. Since the join messages are forwarded based on
unicast routes, if the conventional routing table is used, the
multicast routes setup will be based on conventional routes. To
constrain multicast paths, the join message can be sent via paths,
computed or statically configured.
This draft describes a scheme where multicast routing control
messages (including join messages) are forwarded by the TE entity in
a router on the constraint path.
Expires April 2000 [Page 3]
Internet Draft Engineering Paths for Multicast Traffic April 2000
To allow a router to process control messages, the control messages
should contain the router alert option. The control message is
identified at the egress router by its FEC. Based on the FEC, the
MPLS entity can derive the path the control message should take and
allocate resources as specified. A multicast routing protocol would
setup the forwarding state on the ports/interface where the join is
received. To enable the establishment of multicast forwarding state
based on constraint (unicast) routes, multicast routing protocols
which verify the Reverse Path Forwarding (RPF) must turn off this
check or be able to obtain the 'constraint' RPF via a Constraint
Based Routing (CBR) API. To prevent redundant data and loops, a loop
avoidance scheme based on the concepts described in [MPLS-LOOP-AVOID]
or [SM] can be used in the routing protocol. If there is a loop, the
routing protocol should not create forwarding states for the group on
the port where the join is received.
Other alternatives to send the join on the engineered path such as -
extending CR-LDP/TE-RSVP to send and merge joins for the multicast
tree associated with a label - changing the multicast routing
protocol to send the join along the explicit route, either require
multicast routing protocol functionalities to be present in MPLS or
MPLS functionalities to be incorporated into multicast routing
protocols. This proposal uses MPLS (label and explicit route object)
to cause engineered paths to be selected but forward data using
multicast routing. It does not require MPLS or multicast routing
protocols to be merged, an exercise which tend to - result in
redundant or the reinventing, of functionalities at L2/L3; increase
the complexity of multicast traffic engineering while not providing
any means of aggregating multicast traffic engineering.
The alternative approaches listed above require traffic to be
engineered for each group/tree since multicast labels/routes are most
likely to be not aggregatable. Each group must be assigned a
different label as well. In contrast this proposal allows a network
provider to aggregate the engineered path towards a root or root
prefix (since resource allocation and path selection can be
independent of the setup of forwarding states/routes). The root
prefix could be a subnet or domain. Multicast traffic in the backbone
network can then be, provisioned in a more scalable manner and
statistically multiplexed on the (aggregated) engineered paths.
5.0 Procedure
5.1 At the Egress Router
At any egress router (a router where multicast data exits the
network) the IP fields of interest in the control message (referred
to as FEC here, for lack of a better term), the associated path
Expires April 2000 [Page 4]
Internet Draft Engineering Paths for Multicast Traffic April 2000
selection mechanisms are defined in a Traffic Configuration table.
These FECs correlate to the control messages of routing protocols.
(eg, destination = root prefix/target-node address, ToS=codepoint).
Note that the message carrying this information traverses the network
from egress to ingress. The path selection mechanisms can be based
on, a static table or a Constraint Based Routing (CBR) table, or a
path selection algorithm (dynamic). The resources required for the
FEC can be statically configured at the egress router or obtain via
other means as described in [MC_DS_PROV].
Figure 1 shows the passage of control messages in an egress router
(dotted lines) and the interface between the various entities in the
router (+++ lines)
When a join message arrives at the egress router the packet is
processed by the appropriate multicast routing protocol, to setup
multicast forwarding states. If there are already forwarding states,
a join message is discarded, otherwise, the multicast routing
protocol calls an API provided by the Multicast Traffic Engineering
(MCTE) entity to get the next hop to the root.
The form of the API is represented in terms of the following:
get_MCTE_next_hop(Target-Node, Group);
Target-Node is a mandatory value. The value of Target-Node is in the
form of an IP address. Group is not required for (a)-(c) and
optional for (d) below. The return value is the next hop to the
Target-Node.
The MCTE entity : a) obtains the route from conventional routing if
no path or path selection mechanism is specified in the Traffic
Configuration table, or b) obtains the manually configured explicit
route in the Traffic Configuration table or c) obtains the explicit
routes via a CBR process (Refer to [MPLS-TE] and [ISIS-TE]/[OSPF-TE]
for details) or d) invokes the path selection algorithm, specified in
the Traffic Configuration table. (Note: the routes in (a)-(c) are
based on the network topology, whereas (d) may take into account the
tree topology in the computation of routes)
The MCTE entity stores the route(s) obtained or computed for this
FEC, and used these routes when it prepends a MCTE header in the
control message later.
The form of the API provided by the path selection algorithm in (d)
above is represented in terms of the following:
get_MCTE_route(Target-Node, Group, Type-of-Metric)
Expires April 2000 [Page 5]
Internet Draft Engineering Paths for Multicast Traffic April 2000
Target-Node is a mandatory value, and the rest are optional in their
usage or applicability. The value of Target-Node is in the form of
an IP address. The return value is a list of explicit route(s).
(Note: currently, the above API assumes IPv4. A different API will
be used for IPv6)
The other parameters of the API are optional. The Group represent an
added level of granularity by which network administrators can base
their traffic engineering decisions (e.g this allows per group/flow
traffic engineering). (Note: currently, port values are not included
due to the common practice of correlating session to group address).
Finally, the Type-of-Metric value correlates to different types of
metrics used to distinguish one path from another. The default value
is (1), which correlates to hop count. Other defined values consist
of: (2) bandwidth, (4) delay, and (8) fan-out. In cases where the
underlying algorithm (of get_MCTE_route) does not support metrics
other than hop count, this field is ignored. The Type-of-Metric is
specified with the path selection algorithm in the Traffic
Configuration table.
----------
| MCTE API |
----------
+
+
-------------------------
| Multicast Routing |
-------------------------
^ |
| |
| v
____________ | ------------ ______________________
| IP|Ctl Msg| ---->| | MCTE | ----> | IP | MCTE | Ctl Msg
|
_____________ -----------
_______________________
+
+
+
---------------
| FEC,Path and |
| Resource |
| Specification |
----------------
Fig. 1 At the egress (wrt data flow) router
Expires April 2000 [Page 6]
Internet Draft Engineering Paths for Multicast Traffic April 2000
After the multicast forwarding states are setup, the control message
is forwarded towards the root. If the control message matches a
defined FEC, it is diverted to the MCTE entity. How the outgoing
control message is diverted to the MCTE entity is implementation
dependent. The MCTE entity calls an API provided by the MRP
(Multicast Routing protocol)to find out whether the control message
is a path setup (join), path teardown (leave) message or other
maintenance message. If it is a path setup, resources specified in
the Traffic Configuration table is allocated, if it is a path
teardown message the resources are deallocated. If it is a
maintenance control message, the control message is forwarded as is
without any MCTE header and will be forwarded by the multicast
routing protocol in intermediate routers as per normal.
If it is either a path setup or path teardown message, the MCTE
entity prepends a MCTE header - containing the FEC, explicit routes
(provided by the path selection mechanism) resources required (e.g
Traffic Parameter, service level) and the protocol id of the control
message. The IP protocol id is set to IPPROTO_MCTE.
The MCTE header is placed between the IP header and the control
message. Resources as specified in the Traffic Configuration table
are allocated/deallocated before the MCTE message is forwarded to the
next hop returned by the path selection mechanism specified. To
allow other routers to process this MCTE message (which includes the
control message), the packet will be labeled as Router Alert.
5.2 At the Intermediate Routers
Figure 2 shows the passage of control messages in an intermediate
router (dotted lines) and the interface between the various entities
in the router (+++ lines)
----------
| MCTE API|
----------
+
+
-------------------------
| Multicast Routing |
-------------------------
^ |
__________ | | __________
|IP|Ctl Msg| | | |IP|Ctl Msg|
____________ | | ____________
| v
_______________ ---------- ------------ ________________
Expires April 2000 [Page 7]
Internet Draft Engineering Paths for Multicast Traffic April 2000
|IP|MCTE|CtlMsg|---> | MCTE | | MCTE | ----> |IP|MCTE|Ctl Msg|
| | | Entity | | Entity | | |
________________ ---------- ------------ _________________
+ +
+ +
+ +
----------------
| MCTE |
| State |
----------------
Fig. 2 At an intermediate router
When the next hop (or other intermediate nodes) receives the packet
with Router Alert, it will be taken out of the forwarding path and
directed to the MCTE entity since the IP protocol id is IPPROTO_MCTE.
The MCTE entity allocates/deallocates the resources requested by the
MCTE message, creates a transient state for the MCTE message, called
the MCTE state, for short. The appropriate mutlicast routing protocol
(MRP), depending on the value of protocol id in the MCTE message, is
then invoked. The exact mechanisms used in the router to accomplish
this is implementation dependent.
The MRP creates the forwarding state for the group and forwards the
join message towards the root. As in the egress router, the next hop
towards the root is obtained from an MCTE API. Since the FEC for this
control message matches the MCTE state created earlier, the join
message is diverted to the MCTE entity. The MCTE entity placed the
corresponding MCTE header on the control message and forwards the
message to the next hop. The transient MCTE state is removed at this
point.
Note that the FEC is only configured at the egress router (wrt to
multicast data), intermediate routers are informed of the FEC
information by previous hops. Similarly, the explicit or constraint
route is only configured or computed at the egress router; the next
hop and other intermediate nodes learn of the explicit routes via the
explcit route list propagated from the egress router.
5.3 Loops
If the MPLS control message specifies looping explicit routes :
* then if the tree is uni-directional, only the join message will
loop. Data will not loop since data flow is only in one direction
Expires April 2000 [Page 8]
Internet Draft Engineering Paths for Multicast Traffic April 2000
from root to members.
* then if the tree is bi-directional, the join message will loop, but
because permanent states would not be established in this case, data
will not be forwarded on the looping path.
However if there is a change in next hop towards the root at a node
where there is already an existing forwarding state, then multicast
routing protocols which uses bi-directional trees or a hybrid of
uni-directional and bi-directional branches could invoke a loop
avoidance procedure. One way to avoid loops in this case is (using
splice message) described in [SM] and [MPLS-LOOP-AVOID].
6.0 Path Selection
This proposal allows different path selection algorithms to be used,
depending on the FEC and path selection mechanism association. Paths
can be configured, computed, discovered or obtain through other
means.
A path selection mechanism will return the constraint routes given
for e.g the group address, root of multicast tree and possibly other
criteria. How the paths are selected are independent of this
proposal, but a generic interface (API) between path selection
algorithms and this multicast traffic engineering scheme is required
and is specified in Section 5.1.
7.0 Applications
This section list some possible applications of this proposal.
a) A network operator may define an explicit route [Rx, Ry, Rz]
towards a domain with prefix 10.0.0.0 for multicast traffic. Any
member joining a group where the root address has the prefix 10.0.0.0
will have data delivered to it via the explicit route [Rz, Ry, Rx]
(data is in the reverse direction of the join control message).
This explicit route may be a Loose Source Route, or a route
calculated by an algorithm eg an Internal Gateway Protocol (IGP)
which can provide constraint based routes.
It is worth noting that the explicit route can be the desired path
from a root towards a member instead of the reverse path (from member
towards the root).
b) Another variation of the above may define an additional field of
interest in the FEC, the TOS. This will allow a network operator, to
engineer paths or/and provision resources for traffic requiring
Expires April 2000 [Page 9]
Internet Draft Engineering Paths for Multicast Traffic April 2000
Expedited Forwarding [EF] or Assured Forwarding [AF]. (Refer to
[MCPROV]).
c) To decrease fanout, egress routers (where multicast data traffic
exits) can obtain the contraint routes towards the root of the tree
and construct the tree along these paths instead. These routes can
be statically configured or provided by an algorithm which takes into
account fanout in route computation and this can be developed
independently of the basic TE scheme described in this proposal.
d) Load Balancing - a load balancing algorithm can provide an
alternative path that a control message can take depending on the
service level requirement of the group and the current utilization of
the equal cost paths.
e) Policy routing - Different paths may be defined for different
groups.
8.0 Acknowledgments
The authors are grateful to Dirk Ooms and Yunzhou Li for reviewing
this draft and their helpful suggestions to improve this proposal,
Jamal Hadi-Salim for his technical advice and Jon Crowcroft for
providing insightful comments.
References
[ARCH] E. Rosen, A. Viswanathan, R. Callon, "Multiprotocol Label
Switching Architecture", Work in Progress, July 1998.
[MPLS-TE] Awduche, D. et al., "Requirements for Traffic Engineering over
MPLS", Internet Draft, draft-ietf-mpls-traffic-eng-00.txt, October 1998.
[CRLDP] L. Andersson, A. Fredette, B. Jamoussi, R. Callon, P. Doolan,
N. Feldman, E. Gray, J. Halpern, J. Heinanen T. E. Kilty, A. G.
Malis, M. Girish, K. Sundell, P. Vaananen, T. Worster, L. Wu, R.
Dantu, "Constraint-Based LSP Setup using LDP", Work in Progress,
January, 1999.
[ISIS_TE] Smit, H. and T. Li, "ISIS Extensions for Traffic Engineering,"
draft-ietf-isis-traffic-00.txt, work in progress.
[OSPF-TE], D Katz, D Yeung, "Traffic Engineering Extensions to OSPF",
draft-katz-yeung-ospf-traffic-00.txt
[TE-RSVP] D. Awduche, L. Berger, D-H. Gan, T. Li, G. Swallow,
Vijay Srinivasan,
Expires April 2000 [Page 10]
Internet Draft Engineering Paths for Multicast Traffic April 2000
Internet Draft, draft-ietf-mpls-rsvp-lsp-tunnel-02.txt, September 1999
Multicast Routing with resource reservation,
Journal of High Speed Networks 7 (1998) 113-139,
B. Rajagopalan, R. Nair
CBT, Core Based Tree Multicast Routing,
Internet-Draft, March 1998, Ballardie, Cain, Zhang
PIM-SM, Protocol independent multicast-sparse mode Specification,
RFC-2117, June 1997
Estrin, Farinacci, Helmy, Thaler, Deering, Handley,
Jacobson, Liu, Sharma, and Wei.
BGMP, Border Gateway Multicast Protocol Specification,
Internet-Draft, March 1998, Thaler, Estrin, Meyers
Express, H. Holbrook, D. Cheriton
Sigcomm Paper
SM, Simple Multicast, Internet-Draft, March 1999,
draft-perlman-simple-multicast-02.txt, Perlman et al
YAM, K. Carlberg, J. Crowcroft
Hipparch 1998
[MPLS-LOOP-AVOID] "Avoiding Loops in MPLS", Internet Draft,
draft-leecy-mpls-loop-avoid-00.txt, June 1999
C-Y Lee, L. Andersson, Y. Ohba,
[CLARK] D. Clark and J. Wroclawski, "An Approach to Service
Allocation in the Internet", Internet Draft
[DSHEAD] K. Nichols and S. Blake, "Definition of the
Differentiated Services Field (DS Byte) in the IPv4 and IPv6
Headers", Internet Draft, May 1998.
[AF] J.Heinanen, F.Baker, W. Weiss, J. Wroclawski
Assured Forwarding PHB Group RFC2597, June 1999
[EF] V.Jacobson, K. Nichos, K. Poduri,
Expedited Forwarding Per Hop Behavior, RFC2598, June 1999
[MCPROV] C-Y Lee,
Provisioning Resources for Multicast Traffic in a
Differentiated Services Network, Internet Draft October 1999
Expires April 2000 [Page 11]
Internet Draft Engineering Paths for Multicast Traffic April 2000
Authors' Information
Cheng-Yin Lee
Nortel Networks
PO Box 3511, Station C
Ottawa, ON K1Y 4H7, Canada
leecy@nortelnetworks.com
Loa Andersson
Nortel Networks Inc
Kungsgatan 34, PO Box 1788
111 97 Stockholm
Sweden
Phone: +46 8 441 78 34
obile: +46 70 522 78 34
email: loa_andersson@nortelnetworks.com
Ken Carlberg
SAIC
S 1-2-8
1710 Goodridge Drive
McLean, VA. 22102
carlberg@time.saic.com
Bora Akyol
Pluris Terabit Network Systems
10445 Bandley Drive
Cupertino, CA 95014
USA
akyol@pluris.com
Phone: (408) 861-3302
Fax: (408) 863-0271
email: akyol@pluris.com
Expires April 2000 [Page 12]