Internet Engineering Task Force                              Rohit Dube
Internet Draft                           Bell Labs, Lucent Technologies
Expiration Date: May 1999                               John G. Scudder
                                        Internet Engineering Group, LLC

                    Route Reflection Considered Harmful

                draft-dube-route-reflection-harmful-00.txt


1. Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   To view the entire list of current Internet-Drafts, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
   Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific
   Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).


2. Abstract

   Route reflection as defined by [2] is a popular way of reducing the
   full-mesh IBGP peering required by routers running the Border Gateway
   Protocol [1]. There are cases where a topology built using route
   reflectors produces persistent loops or does not produce the same
   results as what one would expect with a full IBGP mesh. This document
   describes these problems.



3. Introduction

   Route reflectors by design are selective as to which routes they
   forward to their peers (i.e. reflect). Specifically, if many routes
   to the same NLRI are available, a route reflector will reflect only
   the route it has selected for its own use. Typically this reduces the
   number of routes each peer in the AS must store in its RIB as well as
   the volume of BGP update traffic.  By this very nature of route
   reflection, every peer in the network doesn't have a full view of all
   the routes to a prefix to choose from. This coupled with the
   specifics of BGP causes problems as we now describe.




Dube, Scudder                                                 [Page 1]


Internet Draft                                             November 1998


4. Persistent Loops

   Consider the topology in Figure 1.

                      +----------------------+
                      | +------------+       |
                      | |            |       |
               E1=====RR1=====R3=====R4=====RR2=====E2
                 <--->         |             | <--->
                               +-------------+

                            Figure 1
                            --------

   RR1, RR2, R3 and R4 are bgp routers in the same AS. E1 and E2 are BGP
   routers in some other AS peering with RR1 and RR2 respectively via
   EBGP. RR1 is configured as a route reflector with R4 as a client and
   RR2 is configured as a reflector in a different cluster with R3 as a
   client. The IBGP sessions are denoted in the diagram above by +---+
   and the EBGP sessions by <--->.  For simplicity, assume that all the
   physical links (denoted by ===) have the same IGP cost.

   Now if both E1 and E2 advertise the same prefix to RR1 and RR2
   respectively, all other things being equal, RR1 picks the route
   through E1 for this prefix on account of lower IGP cost. RR1 then
   reflects this route to R4 which now routes to the prefix in question
   through R3 and RR1 Similarly RR2 picks the route through E2 and
   reflects it to R3 which now routes to the prefix in question through
   R4 and RR2. Clearly a data packet for this prefix will loop between
   R3 and R4.

   Note that the problem would disappear if the topology is reverted to
   full-mesh IBGP - R3 would pick the route through RR1 and R4 would
   pick the route through RR2, both on account of lower IGP cost.


5. Incorrect Routing Decision

   Consider the topology in Figure 2.

                    [RR1]------------------[RR2]
                     /\                      |
                    /  \                     |
                   /    \                    |
                 [R1]   [R2]                [R3]
                  |      |                   |
                  |      |                   |
                  |      |                   |
                 [E1]   [F1]                [E2]

                            Figure 2
                            --------


Dube, Scudder                                                 [Page 2]


Internet Draft                                             November 1998


   RR1, RR2, R1, R2, R3 are bgp routers in the same AS R. RR1 is a route
   reflector with clients R1 and R2 and RR2 is a route reflector in a
   different cluster with client R3. E1 and E2 are bgp routers in AS E
   and EBGP peer with R1 and R3 respectively. F1 is a bgp router in AS F
   which EBGP peers with R2. Assume that E1, E2 and F advertise the same
   prefix to R1, R2, R3 in accordance with the following table -

   Router    AS    Router-id    MED
   --------------------------------
   E1        E     3.3.3.3      50
   F1        F     2.2.2.2      -
   E2        E     1.1.1.1      100

   All other attributes of the prefix in question are the same.

   Further assume that RR1's IGP cost to R1 (and E1) is the same as its
   cost R2 (and F1) and RR2's IGP cost to R3 (and E2) is the same as
   its IGP cost to R1 (and E1) and R2 (F1). (The --- lines in Figure 2
   denote both physical and BGP connectivity).

   Now, RR1 chooses the route thru F1 on account of lower router-id as
   compared to the route through E1 (which wins over the route from E2
   on account of MEDs). RR2 on the other hand chooses the route through
   E2 on account of lower router-id as compared to F. Note that RR1
   sends only the route through F1 to RR2 and not the route through E1.

   Instead if we had a full-mesh, RR2 would see all the 3 routes and
   pick the one thru F1 - the route through E1 wins over the route
   through E2 on MEDs and the route through F1 wins over the route
   through E1 on account of lower router-id.

   A network operator shifting from a topology without to reflectors to
   the one above with reflectors would have a problem. Packets destined
   for the prefix in question would flow from RR2 through E2 instead of
   the original F1.


6. Characterization

   Problem 1 (Section 4) has two ingredients - a) the selective nature
   of route reflectors which prevents some routes from getting to some
   clients and b) The fact the some of the BGP decision process --
   specifically the "prefer lowest IGP cost" rule -- depend on the
   router's location in the network.  Thus the route reflector's
   decision can never perfectly mirror the decision its client would
   have made.  Note that b) implies that reflector topologies can be
   out of sync with the physical topologies but bad things happen only
   when they get out of sync enough that clients would make decisions
   (in this case based on IGP cost) different from their servers if
   reflection was replaced by full-mesh.




Dube, Scudder                                                 [Page 3]


Internet Draft                                             November 1998


   Problem 2 (Section 5) has two components too - a) the selective
   nature of route reflectors as above and b) the partial order that
   MEDs impose upon competing routes (this is because MEDs can be
   compared only between routes from the same AS). If all decision
   criteria used by BGP imposed a total order on the routes (i.e all BGP
   routes for a prefix could be arranged in strict order of precedence),
   then b) would not be an issue and in-spite of a) this problem would
   not happen.

   For both examples discussed, it is possible to come up with several
   other topologies which suffer from the problems described above.


7. Avoidance Guidelines

   Since there are no protocol mechanisms currently available to detect
   the problems mentioned above, we provide guidelines to avoid
   situations where these problems could surface.

   As noted in section 6, problem 1 happens because the IBGP reflector
   topology doesn't follow the physical topology. A simple way of
   avoiding this problem would be to ensure that reflector clusters are
   constrained to follow the physical connectivity between the routers.
   It is always safe (at least with respect to this problem) to deploy
   route reflection such that no IBGP session between a pair of route
   reflectors will ever physically transit a reflector client. One
   common mode of deployment is to fully mesh all the routers in a
   "backbone" region, and to do route reflection to/from/between the
   routers in a POP, using one or more of the backbone routers as the
   reflector(s).

   Problem 2 can be avoided by always making sure that reflectors are
   never forced to decide on the best BGP route based on MEDs. This can
   be achieved either by setting the local preference of a route at the
   border router to reflect the MED values or by configuring community
   based policies using which the reflector can decide on the best
   route.


8. Acknowledgments

   The First author would like to thank to Harry Mantakos, James Da
   Silva and Arvind Srivaths (all at Torrent Networking Technologies
   Corp.), Rob Coltun (Fore Systems) and Tony Przgyienda (Bell Labs,
   Lucent Technologies) for discussions on this topic. The second
   author would like to thank Ravi Chandra and Tony Bates (both at
   Cisco Systems) for similar discussions.







Dube, Scudder                                                 [Page 4]


Internet Draft                                             November 1998


9. References

   [1] Rekhter, Y., and  Li, T., "A Border Gateway Protocol 4 (BGP-4)",
       RFC 1771, March 1995.

   [2] Bates, T., and Chandra, R., "BGP Route Reflection An
       alternative to full mesh IBGP", RFC 1966, June 1996.


10.Author Information

   Rohit Dube
   Bell Labs, Lucent Technologies Inc.
   4C-508, 101 Crawfords Corner Road
   Holmdel, NJ 07724
   e-mail: rohitd@dnrc.bell-labs.com

   John G. Scudder
   Internet Engineering Group, LLC
   122 S. Main, Suite 280
   Ann Arbor, MI 48104
   e-mail: jgs@ieng.com
































Dube, Scudder                                                 [Page 5]