PIM Working Group                                         Deborah Estrin
Internet Draft                                                   ISI/USC
Expiration Date: May, 2000                                  Mark Handley
                                                                   ACIRI
                                                         Isidor Kouvelas
                                                           cisco Systems
                                                        Lorenzo Vicisano
                                                           cisco Systems
                                                        October 19, 1999

                 A New Proposal for Bi-directional PIM
                 <draft-kouvelas-pim-bidir-new-00.txt>

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   This document discusses a new form of Bi-directional PIM, a variant
   of PIM Sparse Mode [1] that builds bi-directional shared trees con-
   necting multicast sources and receivers.

   The ideas presented in this document are similar to those described
   in [2]. The main difference between the two proposals is in the
   method used to forward packets traveling upstream from a source to
   the RP. In particular [2] uses an IP option (UMP option) on data
   packets to assist with upstream forwarding. The UMP option identifies
   the next hop router responsible for forwarding the packet upstream.

Estrin, Handley, Kouvelas, Vicisano                             [Page 1]


Internet Draft               New bidir PIM                  October 1999

   In contrast, this proposal does not alter data packets to embed con-
   trol information. Instead the identification of the next hop upstream
   forwarder is performed at RP discovery time using a fail-safe elec-
   tion mechanism. This significantly simplifies forwarding procedures
   and eliminates forwarding loops and packet duplication problems that
   exist in [2].

1 Introduction

   This document discusses Bi-directional PIM, a variant of PIM Sparse
   Mode [1] that builds bi-directional shared trees connecting multicast
   sources and receivers.

   PIM Sparse-Mode (PIM-SM) version 1 and version 2 construct uni-
   directional shared trees that are used to forward data from senders
   to receivers of a multicast group.  PIM-SM also allows the construc-
   tion of source specific trees, but this capability is not related to
   the proposal described in this document.

   The shared tree for each multicast group is rooted at a multicast
   router called the Rendezvous Point (RP). Different multicast group
   ranges can use separate RPs within a PIM domain.

   In unidirectional PIM-SM, there are two possible methods for distri-
   buting data packets on the shared tree, which differ in the way pack-
   ets are forwarded from a source to the RP:

   o Initially when a source starts transmitting, it's first hop router
     encapsulates data packets in special control messages (Registers)
     which are unicast to the RP. After reaching the RP the packets are
     decapsulated and distributed on the shared tree.

   o A transition from the above distribution mode can be made at a
     later stage.  This is achieved by building source specific state on
     all routers along the path between the source and the RP.  This
     state is then used to natively forward packets from that source.

   Both these mechanisms suffer from problems. Encapsulation results in
   significant processing, bandwidth and delay overheads. Source state
   has additional protocol and memory requirements.

   Bi-directional PIM dispenses with both encapsulation and source state
   by allowing packets to be natively forwarded from a source to the RP
   using shared tree state. For a complete discussion of the pros and
   cons of Bi-directional PIM consult [2].

   The ideas presented in this document are similar to those described

Estrin, Handley, Kouvelas, Vicisano                             [Page 2]


Internet Draft               New bidir PIM                  October 1999

   in [2]. The main difference between the two proposals is in the
   method used to forward packets traveling upstream from a source to
   the RP. In particular [2] uses an IP option (UMP option) on data
   packets to assist with upstream forwarding. The UMP option identifies
   the next hop router responsible for forwarding the packet upstream.

   In contrast, this proposal does not alter data packets to embed con-
   trol information. Instead the identification of the next hop upstream
   forwarder is performed at RP discovery time using a fail-safe elec-
   tion mechanism. This significantly simplifies forwarding procedures
   and eliminates forwarding loops and packet duplication problems that
   exist in [2].  Section 8 presents a comparison between the proposal
   in this document and [2].

   The rest of this document is structured as follows. Section 2 defines
   basic terms. Section 3 describes bidirectional tree formation and
   forwarding.  The new forwarding rules rely heavily on an election
   mechanism described in section 4.

2 Definitions

   In the discussion below, the terms upstream, downstream and RPF
   interface are always referring to the shared tree rooted at the Ren-
   dezvous Point. Downstream indicates the direction on which packets
   travel from the RP to receivers along the shared tree. Upstream indi-
   cates the opposite direction used by packets traveling from sources
   to the RP. The RPF interface for a group is the interface unicast
   routing uses to reach the RP.

   We assume that the reader is familiar with the unidirectional PIM-SM
   protocol [1] as much of the functionality is common to the version of
   bidir PIM described below. In particular in the rest of this document
   we will use the concepts of (*,G), (S,G) and (*,*,RP) state and their
   component fields (olist, iif, ...). We will also reference Join and
   Prune messages whose semantics and packet formats are defined in [1].
   In the context of this document Join and Prune messages always have
   the RP and WC bits set.  Also, for consistency with [1], (*,G)
   entries always have the RP and WC bits set. Finally, default timer
   values are the ones given in [1].

   The protocol presented in this document is largely based on the con-
   cept of a Designated Forwarder (DF). A single DF exists for each RP
   on every link within a PIM domain (this includes both multi-access
   and point-to-point links). The DF is the router on the link with the
   best unicast route to the RP.  A DF for a given RP is in charge of
   forwarding downstream traffic onto the link, and forwarding upstream
   traffic from the link towards the RP.  It does this for all the bi-
   directional groups served by the RP. The DF election procedures are

Estrin, Handley, Kouvelas, Vicisano                             [Page 3]


Internet Draft               New bidir PIM                  October 1999

   described in section 4.

3 Tree Building and Forwarding

   This section describes how bi-directional tree building procedures
   and forwarding rules vary from normal PIM-SM operation.

   A router learns which multicast addresses will be used for regular
   PIM and which will be for bidirectional groups along with the candi-
   date RP information through PIM-SM bootstrap messages.  Thus uni-
   directional and bidirectional groups can coexist in the same domain.

   Throughout the section it is assumed that on each link, all the
   routers have a consistent view on which router has the best path to
   the RP. This router is called the DF for that RP on the link. This
   assumption rests on the DF election procedures described in section
   4.

   In the procedures described in the rest of this section, if DF infor-
   mation is required but not available (election is incomplete), then
   no tree building or forwarding action is taken.

3.1 Tree Building

3.1.1 Joining the (Shared) Tree

   The procedures for joining the (*,G) shared tree, are almost identi-
   cal to those used in PIM-SM with the difference that the tasks of the
   DR are handled by the DF.

   When a router receives a membership indication from IGMP for a
   bidirectional group G with rendezvous point RP, and it is the DF for
   the group on the link on which the report was received, the following
   steps are taken:

   o If (*,G) state exists but the interface on which the report was
     received is not in the olist of the entry, then the interface is
     added to the olist.

   o If no (*,G) state exists for the group, then a (*,G) entry is
     created and populated with the RP DF information with the interface
     in the olist.

   o If (*,G) state existed or was created, we also follow standard
     PIM-SM procedures [1] for updating timers and originating a Join
     message for the group directed upstream. The Join is directed to
     the DF for the (*,G) incoming interface.

Estrin, Handley, Kouvelas, Vicisano                             [Page 4]


Internet Draft               New bidir PIM                  October 1999

   When a router receives a Join message addressed to it for a bidir
   group G with rendezvous point RP, it must determine if it is the DF
   on the link for this group.  To do this it consults (*,G) state or
   the RP election information if no (*,G) state exists.  If the router
   is not the DF, it must ignore the Join message.  If it is the DF,
   then the following steps are taken:

   o If no (*,G) state exists, then it is created and populated with the
     RP DF information, and the olist contains only the interface on
     which the join was received.

   o If (*,G) state exists but the interface on which the join was
     received is not included in the (*,G) olist, then it is added.

   o Standard PIM-SM procedures for updating timers and originating a
     Join message for the group directed upstream are followed. The Join
     is directed to the DF for the (*,G) incoming interface.

3.1.2 Leaving the (shared) tree

   When the DF for a link receives notification that an interface is no
   longer required in the olist of a group (either through IGMP or by
   receiving a Prune), it follows standard PIM-SM procedures except any
   originated prunes are addressed to the DF on the (*,G) iif.

3.1.3 Designated Forwarder Change

   When the DF for a RP on a link changes to a different router, tree
   maintenance has to take place to ensure that traffic is still
   delivered for all affected groups.

3.1.3.1 Old DF Actions

   On losing its status, the old DF has to take the following actions
   for existing groups that are affected.

   o If there were downstream receivers (discovered through IGMP or
     downstream Joins), the router has to delete the interface to the
     link from its olist.

   o If the interface deletion results in a null olist for the (*,G)
     then the usual actions are taken to propagate a Prune upstream.

3.1.3.2 New DF Actions

   On assuming the role of the DF a router has to take the following
   actions for each existing group that is affected. If the router has
   IGMP information from local receivers for a group, the interface to

Estrin, Handley, Kouvelas, Vicisano                             [Page 5]


Internet Draft               New bidir PIM                  October 1999

   the link must be added to the olist for the (*,G). If the (*,G) entry
   did not exist then it must be created and populated with the RP DF
   information.  If the (*,G) olist was previously null then the usual
   actions are taken to propagate a Join upstream.

3.1.3.3 Downstream Router Actions

   When learning about a switch to a new DF on the RPF interface, down-
   stream routers must take the following actions for all affected
   groups.

   o If the router has a (*,G) entry with a non-null olist, it must send
     a Join for the group towards the new DF.

   o The router may also send a Prune for the group towards the old DF.

3.2 Forwarding Data

   The following responsibilities are uniquely assigned to the DF of a
   link:

   o The DF is the only router that forwards packets traveling down-
     stream onto the link.

   o The DF is the only router that picks-up upstream traveling packets
     off the link to forward towards the RP.

   Non-DF routers on a link that use that link to reach the RP, may per-
   form the following forwarding actions for bidirectional groups:

   o Forward packets from the link towards downstream receivers.

   o Forward packets from downstream sources onto the link (provided
     they are the DF for the downstream link from which the packet was
     picked-up).

   When a router receives a multicast packet sent to a bidir group G, it
   first looks for a (*,G) matching entry. If this is not found, then
   the matching (*,*,RP) state may be used. Alternatively (*,G) state
   may be created with a null olist and populated with the RP DF infor-
   mation.

   The router must forward the packet if either:

   o it was received on the incoming interface (iif) of the entry
     (always forward downstream traveling packets)

Estrin, Handley, Kouvelas, Vicisano                             [Page 6]


Internet Draft               New bidir PIM                  October 1999

   o the router is the Designated Forwarder (DF) for G for the interface
     on which the packet was received (only the DF forwards upstream).

   If a decision to forward the packet is made, then it is forwarded on
   all the interfaces in the olist as well as the entry's incoming
   interface with the exception of the interface the packet was received
   on.  Otherwise the packet is discarded.

   Note: A major advantage of using a Designated Forwarder in bi-
   directional PIM is that special treatment is no longer required for
   sources that are directly connected to a router. Data from such
   sources does not need to be differentiated from other multicast
   traffic and will automatically be picked up by the DF. This removes
   the need for performing a directly connected check for data to groups
   that do not have existing (*,G) state.

3.2.1 Source-only Branches

   Source-only branches of the distribution tree for a group are
   branches which do not lead to any receivers, but which are used to
   forward packets traveling upstream from sources towards the RP.
   Routers along source-only branches do not have an olist for the group
   and hence do not need to maintain (*,G) state. Upstream forwarding
   can be performed using (*,*,RP) state.  An implementation may decide
   to maintain (*,G) state for accounting or performance reasons.

4 Designated Forwarder

   Many of the most complicated aspects in the operation of the PIM pro-
   tocol suite are in place to enable operation on multi-access links.
   The most notable example is the bi-directional PIM proposal [2] where
   an UMP option is required to nominate the upstream router responsible
   for forwarding packets towards the RP. A similar problem exists both
   in bidir [2] and Sparse-Mode PIM [1] where a Designated Router (DR)
   has to be elected to handle directly connected sources.

   In both of the above cases, the choice of the router on the link to
   perform the desired operation is critical. In bidir PIM [2], if the
   router elected as the DR is different from that chosen by downstream
   neighbors for joining the tree, loops in the topology can occur.  The
   main shortcoming of the DR is that its election does not take into
   consideration the location of the RP. Similarly loops can occur if
   two different downstream routers on a multi-access link direct joins
   and UMP data packets to separate upstream neighbors (see section 8
   for a detailed explanation of these problems).

   This section presents a fail-safe mechanism for electing a per-RP

Estrin, Handley, Kouvelas, Vicisano                             [Page 7]


Internet Draft               New bidir PIM                  October 1999

   designated router on each link in a PIM domain. We call this router
   the Designated Forwarder (DF).

4.1 DF Requirements

   The DF election chooses the best router on a link to assume the
   responsibility of forwarding traffic between the RP and the link for
   a given bi-directional group. Different multicast groups that share a
   common RP must use the same bi-directional tree for data forwarding.
   Hence, the election of an upstream forwarder on each link does not
   have to be a group specific decision but instead can be RP-specific.
   As the number of RPs is typically small, the number of elections that
   have to be performed is significantly reduced by this observation.

   To optimise tree creation, it is desirable that the winner of the
   election process should be the router on the link with the "best"
   unicast routing metric to the RP. When comparing metrics from dif-
   ferent unicast routing protocols, we use the same comparison rules
   used in the PIM assert process [1].

   The election process needs to take place when information on a new RP
   initially becomes available, and can be re-used as new bidir groups
   for the same RP are encountered. There are however some conditions
   where an update to the election is required:

   o There is a change in unicast metric to reach the RP for any of the
     routers on the link.

   o The interface on which the RP is reachable changes to an interface
     for which the router was previously the DF.

   o A new PIM neighbor starts up on a link.

   o The elected DF dies.

   The election process has to be robust enough to ensure with very high
   probability that all routers on the link have a consistent view of
   the DF. This is because with the forwarding rules described in sec-
   tion 3.2, if multiple routers end-up thinking that they should be
   responsible for forwarding, loops may result.

4.2 DF Election Description

   To perform the election of the DF for a particular RP, routers on a
   link need to exchange their unicast routing metric information for
   reaching the RP.

Estrin, Handley, Kouvelas, Vicisano                             [Page 8]


Internet Draft               New bidir PIM                  October 1999

   In the election protocol described below, many message exchanges are
   repeated 3 times for reliability. In all those cases the message
   retransmissions are spaced in time by a small random interval.

   For the purposes of the election, interface specific counters and
   timers need to be maintained for each RP. When (*,G) entries are
   created, they inherit information on the elected DF from the
   corresponding RP database entry. Subsequent changes in the winner of
   the DF election for a RP are propagated to all dependent (*,G)
   entries.

4.2.1 Bootstrap Election

   Initially when no DF has been elected, routers finding out about a
   new RP start participating in the election by sending Offer messages.
   Offer messages include the router's metric to reach the RP. Offers
   are periodically retransmitted with a period randomly chosen in the
   interval [0.5 * Offer-Interval, Offer-Interval].

   If a router hears a better offer from a neighbor, it stops partici-
   pating in the election for a period of [3 * Offer-Interval]. If dur-
   ing this period no winner is elected, then it restarts the election
   from the beginning.  If a router receives an offer with worse
   metrics, then it restarts the election from the beginning.

   The result should be that all routers except the best candidate stop
   advertising.

   A router assumes the role of the DF after having advertised its
   metrics 3 times without receiving any offer from any other neighbor.
   At that point it transmits a Winner message which declares to every
   other router on the link the identity of the winner and the metrics
   it is using.

   Routers hearing a winner message stop participating in the election
   and record the identity and metrics of the winner. If the local
   metrics are better than those of the winner then the router records
   the identity of the winner but reinitiates the election.

4.2.2 Loser Metric Changes

   Whenever the unicast metric to a RP changes for a non-DF router to a
   value that is better than that previously advertised by the DF, the
   router with the new metric should take action to eventually assume
   forwarding responsibility. After the metric change is detected, the
   new candidate restarts participating in the election. If no response
   is received after 3 retransmissions, the router assumes the role of
   the DF following the usual Winner announcement procedure.

Estrin, Handley, Kouvelas, Vicisano                             [Page 9]


Internet Draft               New bidir PIM                  October 1999

   Upon receipt of an offer that is worse than its current metric, the
   DF will respond with a Winner message declaring its status and
   advertising its metric. Upon receiving this message, the originator
   of the Offer records the identity of the DF and aborts the election.

   Upon receipt of an offer that is better the its current metric, the
   DF records the identity and metrics of the offering router and
   responds with a Backoff message. This instructs the offering router
   to hold off for a short period of time while the unicast routing sta-
   bilises. The Backoff message includes the offering router's new
   metric and address.  All routers on the link who have pending offers
   with metrics equal or worse than those in the backoff message
   (including the original offering router) will hold further offers for
   a period of time defined in the Backoff message.

   If during the backoff period, a third router sends a new better
   offer, the Backoff message is repeated for the new offer and the
   backoff period restarted.

   Before the backoff period expires, the acting DF nominates the router
   having made the best offer as the new DF using a Pass message. This
   message includes the IDs and metrics of both the old and new DFs.
   The old DF stops performing its tasks as soon as the transmission is
   made.  The new DF assumes the role of the DF as soon as it receives
   the Pass message. All other routers on the link take note of the new
   DF and its metric.

4.2.3 Winner Metric Changes

   If the DF's routing metric to reach the RP changes to a worse value,
   it sends a set of 3 randomly spaced Offer messages on the link,
   advertising the new metric. Routers who receive this announcement but
   have a better metric may respond with an Offer message which results
   in the same handoff procedure described above.  All routers assume
   the DF has not changed until they see a Pass or Winner message indi-
   cating the change.

   There is no pressure to make this handoff quickly if the acting DF
   still has a path to the RP. The old path may now be suboptimal but it
   will still work while the re-election is in progress.

   If the routing metric at the DF changes to a better value, a single
   Winner message is sent advertising the new metric.

4.2.4 Winner Loses Path

   If the DF's path to the RP switches to be through the link for which
   it is the DF, then it can no longer provide forwarding services.  It

Estrin, Handley, Kouvelas, Vicisano                            [Page 10]


Internet Draft               New bidir PIM                  October 1999

   therefore immediately stops being the DF and restarts the election.
   As its path to the RP is through the link, an infinite metric is used
   in the Offer message it sends.

   [At this stage the old DF will have a new RPF neighbor on the link
   (indicated by unicast routing) which it could use in a Pass message
   but this adds unnecessary complication to the election process.]

4.2.5 Late Router Starting Up

   A late router starting up will have no knowledge of a previous elec-
   tion outcome. As a result it will start advertising its metric in
   Offer messages. As soon as this happens, the Winner will respond
   either with a Winner or with a Backoff message.

4.2.6 Winner Dies

   Whenever the DF dies, a new DF has to be elected. The speed at which
   this can be achieved depends on whether there are any downstream
   routers on the link.

   If there are downstream routers, then typically their RPF neighbor
   reported by the unicast routing protocol will be the DF. They will
   therefore notice a change in RPF neighbor away from the DF.  They
   will then restart the election by transmitting Offer messages.  If
   the RP is now reachable through the link via another upstream router,
   then they will use an infinite metric in the Offer.

   If no downstream routers are present then the only way for other
   upstream routers to detect a DF failure is by the timeout of the PIM
   neighbor information, which will take significantly longer.

4.3 Election Protocol Specification

4.3.1 Protocol State

   The operation of the election protocol makes use of the variables and
   timers described below. These are maintained per RP for each multi-
   cast enabled link on the router.

        Offer-Count (O-count)
            Used to maintain the number of times an Offer or Winner mes-
            sage has been transmitted.

        Best-Offer
            Used by the DF to record who has made the last offer for
            sending the Pass message.

Estrin, Handley, Kouvelas, Vicisano                            [Page 11]


Internet Draft               New bidir PIM                  October 1999

        Offer-Timer (O-timer)
            Used to schedule transmission of Offer and Winner messages.

        Pass-Timer (P-timer)
            Used on the DF to schedule transmission of a Pass message.

4.3.2 Message Summary

   The election uses the following control messages:

        Offer (OfferingID, Metric)
            Sent by routers that believe they have a better metric to
            the RP than the metric that has been on offer so far.

        Winner (DF-ID, DF-Metric)
            Sent by a router when assuming the role of the DF or when
            re-asserting in response to worse offers.

        Backoff (DF-ID, DF-Metric, OfferingID, OfferMetric,
                 BackoffInterval)
            Used by the DF to acknowledge better offers. It instructs
            other routers with equal or worse offers to wait till the DF
            passes responsibility to the sender of the offer.

        Pass (Old-DF-ID, Old-DF-Metric, New-DF-ID, New-DF-Metric)
            Used by the old DF to pass forwarding responsibility to a
            router that has previously made an offer.  The Old-DF-Metric
            is the current metric of the DF at the time the pass is
            sent.

4.3.3 Protocol Events

   During protocol operation, in addition to the expiration of the two
   timers and reception of the four messages, the following events can
   take place:

   o Discovery of new RP

   o Metric change

   o DF loses path

   o Detection of DF failure (unicast routing changed for downstream or
     Hello expired)

4.3.4 Protocol Operation

Estrin, Handley, Kouvelas, Vicisano                            [Page 12]


Internet Draft               New bidir PIM                  October 1999

   In the two tables below the following rules and notation apply:

   o Whenever the notation "?=" is used to assign a value to a timer,
     the value is assigned only if the timer is not running or the time
     left running is longer than the new value.

   o Whenever the DF is set, the associated metrics are also recorded.

   o Timers in square brackets are randomly chosen between 0.5 and 1
     times the supplied value.

   o When a router has a path to the RP through the link on which the
     election is taking place, then an infinite metric is used in Offer
     messages.

Estrin, Handley, Kouvelas, Vicisano                            [Page 13]


Internet Draft               New bidir PIM                  October 1999

   Event    Condition       Non-DF action           DF action
   =======================================================================
   Offer   |Local metric   |O-count = 0            |Send Backoff
   rcvd    |worse          |O-timer = [Offer-Int]  |Best-Offer = sender
           |               |       + 3 * Offer-Int |P-timer = Backoff-Int
           |--------------------------------------------------------------
           |Local metric   |O-count = 0            |Send Winner
           |better         |O-timer ?= [Offer-Int] |
   -----------------------------------------------------------------------
   Winner  |Local metric   |O-count = 0
   rcvd    |worse          |Stop O-timer
           |               |DF = sender
           |--------------------------------------------------------------
           |Local metric   |O-count = 0
           |better         |O-timer ?= [Offer-Int]
           |               |DF = sender
   -----------------------------------------------------------------------
   Backoff |Local metric   |O-count = 0
   rcvd    |worse or to us |O-timer = Backoff-Int + [Offer-Int]
           |               |DF = sender
           |--------------------------------------------------------------
           |Local metric   |O-count = 0
           |better         |O-timer ?= [Offer-Int]
           |               |DF = sender
   -----------------------------------------------------------------------
   Pass    |Local metric   |O-count = 0
   rcvd    |worse or to us |Stop O-timer
           |               |DF = destination
           |--------------------------------------------------------------
           |Local metric   |O-count = 0
           |better         |0-timer ?= [Offer-Int]
           |               |DF = destination
   -----------------------------------------------------------------------

Estrin, Handley, Kouvelas, Vicisano                            [Page 14]


Internet Draft               New bidir PIM                  October 1999

   Event    Condition       Non-DF action           DF action
   =======================================================================
   New RP  |               |O-count = 0            |N/A
           |               |O-timer ?= [Offer-Int] |
   -----------------------------------------------------------------------
   Metric  |DF metric      |nop                    |O-count = 0
   change  |better (*)     |                       |O-timer ?= [Offer-Int]
           |--------------------------------------------------------------
           |DF metric      |O-count = 0            |Send Winner
           |worse (*)      |O-timer ?= [Offer-Int] |
   -----------------------------------------------------------------------
   No path |               |nop                    |Send Offer (**)
   to RP   |               |                       |O-count = 1
           |               |                       |O-timer ?= [Offer-Int]
           |               |                       |DF = unknown
   -----------------------------------------------------------------------
   DF      |               |O-count = 0            |N/A
   failure |               |O-timer ?= [Offer-Int] |
   -----------------------------------------------------------------------
   O-timer |O-count <= 3   |Send Offer
   expires |               |O-count++
           |               |O-timer ?= [Offer-Int]
           |---------------|----------------------------------------------
           |else           |Send Winner
           |               |O-count = 0
           |               |DF = us
   -----------------------------------------------------------------------
   P-timer |               |DF = Best-Offer
   expires |               |Send Pass
   -----------------------------------------------------------------------

   (*) These comparisons are made against the previously stored DF
   metrics.  In the case of the DF, the old local metrics are used to
   compare against. So "DF metric better" means that the metric has
   actually become worse.

   (**) As the path to the RP is now through the link an infinite metric
   is used in the offer.

4.4 Election Message Formats

   All election messages are sent with a TTL of 1 and are multicast to
   the ALL-PIM-ROUTERS group. The structure of Encoded-Unicast addresses
   is described in [1].

4.4.1 Common Header

Estrin, Handley, Kouvelas, Vicisano                            [Page 15]


Internet Draft               New bidir PIM                  October 1999

   The header below is common to all election messages.

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |PIM Ver| Type  |Subtype| Rsvd  |           Checksum            |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                  Encoded-Unicast-RP-Address                   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                   Sender Metric Preference                    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        Sender Metric                          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Type
            TBD

        Subtype
            Used to distinguish between different election messages and
            is set according to the table below:

                    Message         Subtype
                    -----------------------
                    Offer           1
                    Winner          2
                    Backoff         3
                    Pass            4

        Checksum
            Calculated as specified in [1].

        RP-Address
            The address of the bidir RP for which the election is taking
            place.

        Sender Metric Preference
            Preference value assigned to the unicast routing protocol
            that the message sender used to obtain the route to the RP-
            address.

        Sender Metric
            The unicast routing table metric used by the message sender
            to reach the RP. The metric is in units applicable to the
            unicast routing protocol used.

   The Backoff and Pass messages have the additional fields described
   below.

Estrin, Handley, Kouvelas, Vicisano                            [Page 16]


Internet Draft               New bidir PIM                  October 1999

4.4.2 Backoff Message

   The Backoff message uses the following fields in addition to the com-
   mon ones described above.

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |               Encoded-Unicast-Offering-Address                |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                  Offering Metric Preference                   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                       Offering Metric                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Interval           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Offering Address
            The address of the router that made the last (best) Offer.

        Offering Metric Preference
            Preference value assigned to the unicast routing protocol
            that the offering router used to obtain the route to RP-
            address.

        Offering Metric
            The unicast routing table metric used by the offering router
            to reach the RP. The metric is in units applicable to the
            unicast routing protocol used.

        Interval
            The backoff interval in milliseconds to be used by routers
            with worse metrics than the offering router.

4.4.3 Pass Message

   The Pass message uses the following fields in addition to the common
   ones described above.

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Encoded-Unicast-New-Winner-Address               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                 New Winner Metric Preference                  |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      New Winner Metric                        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        New Winner Address
            The address of the router that made the last (best) Offer.

Estrin, Handley, Kouvelas, Vicisano                            [Page 17]


Internet Draft               New bidir PIM                  October 1999

        New Winner Metric Preference
            Preference value assigned to the unicast routing protocol
            that the offering router used to obtain the route to RP-
            address.

        New Winner Metric
            The unicast routing table metric used by the offering router
            to reach the RP. The metric is in units applicable to the
            unicast routing protocol used.

4.5 Timer Values

   The Offer-Interval is 100 ms.

   The default Backoff-Interval used in Backoff messages is 1 sec.

5 Advertising Bi-directional Groups

   Routers discover that a group operates in bi-directional mode from
   the Encoded-Group Address fields in PIM Bootstrap and Candidate-RP
   Advertisement messages. The Encoded-Group Address field is modified
   to include the Bidir-bit (B bit) as specified below:

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | Addr Family   | Encoding Type |B|   Reserved  |  Mask Len     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                   Group Multicast Address                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   When the Bidir-bit is set, all upgraded bi-directional PIM routers
   will follow the forwarding rules described in this specification.

6 Interoperability with legacy code

   The rules provided in [2] for interoperating between legacy PIM-SM
   routers and new bi-directional capable routers change only slightly
   to support this new proposal. The only difference is in the defini-
   tion of a boundary between a bi-directional capable area and a legacy
   area of the network.  In [2], a bidir capable router forwarding
   upstream, register encapsulates the data packet to the RP if its RPF
   neighbor is not bidir capable.

   In our proposal all the routers on a link need to co-operate to elect
   the Designated Forwarder, if even one of the routers on the link is a
   legacy router, the election cannot take place. As a result register
   encapsulation is necessary if one or more routers on the RPF

Estrin, Handley, Kouvelas, Vicisano                            [Page 18]


Internet Draft               New bidir PIM                  October 1999

   interface are not bi-directional capable.

   As in [2], a Hello option must be used to differentiate between bi-
   directional capable and legacy routers, and (S,G) state must be
   created on the router doing the register encapsulation to prevent
   loops.

7 Comparison with PIM-SM

   This section needs work...

   o Bidir only uses a shared tree for data distribution.

   o No assert at all, but DF election (DF forwards upstream also).  The
     only special requirement when new receiver branches are added, is
     that Join/Prune messages on links are always sent to the election
     winner (DF).

   o The DF election problem is easier than the assert problem because
     there is a small number of RPs and you can do the per RP DF elec-
     tion a priori.  With the assert mechanism, in addition to each RP,
     a forwarder has to be elected for each possible source to a group.
     This can not be done before data is avaialble.

   o Sender-only branches do not need to keep per group state.

   o A router that has (*,G) and gets a packet on the iif, does not need
     to check if the packet comes from a directly connected source. This
     case does not need special handling.

8 Comparison with UMP based bidirectional PIM

   Using an UMP option for upstream forwarding has the following dissad-
   vantages:

   o Using the DF election, only routers willing to be forwarders can be
     elected. In contrast in [2], the downstream router designates the
     upstream neighbor responsible for forwarding (using Joins and UMP
     packets).

   o Using the UMP option, regular data packets are overloaded with con-
     trol information for the routing protocol.

   o Inserting the extra option in multicast packets transmitted from a
     source may result in a packet size exceeding the MTU which will
     result in fragmentation.

   o The use of an option complicates the router forwarding mechanism.

Estrin, Handley, Kouvelas, Vicisano                            [Page 19]


Internet Draft               New bidir PIM                  October 1999

     Additional code to process the new special packet type needs to be
     written.

   o The contents of the UMP option have to be rewritten and the packet
     checksum adjusted on each hop towards the RP at data forwarding
     time.  This introduces additional per packet processing overhead.

   In bidir PIM [2], if the router elected as the DR is different from
   that chosen by downstream neighbors for joining the tree, loops can
   occur.  The main shortcoming of the DR is that its election does not
   take into consideration the location of the RP. To resolve this prob-
   lem the DR priority draft [3] provides a method for manually confi-
   guring the DR election winner. Although this provides a solution it
   has two drawbacks:

   o It requires a case by case manual configuration.

   o It cannot solve the problem if there are different RPs in a domain
     serving separate multicast group ranges. In this scenario the
     requirements of each RP for the DR positioning on a particular link
     may differ.

9 Security Considerations

   All PIM control messages MAY use IPsec to address security concerns.

10 References

   [1] Estrin, et al., "Protocol Independent Multicast-Sparse Mode (PIM-
       SM): Protocol Specification", RFC 2362, June 1998.

   [2] D. Estrin, D. Farinacci, "Bi-directional Shared Trees in PIM-SM",
       Work In Progress, <draft-farinacci-bidir-pim-01.txt>, May 1999.

   [3] Wei, L., Farinacci, D., "PIM Version 2 DR Election Priority Option",
       INTERNET-DRAFT, March 1998.

11 Appendix A: Election Reliability Enhancements

   For the correct operation of bi-directional PIM it is very important
   to avoid situations where two routers consider themselves to be
   Designated Forwarders for the same link. The two precautions below
   are not required for correct operation but can help diagnose
   anomalies and correct them.

11.1 Missing Pass

Estrin, Handley, Kouvelas, Vicisano                            [Page 20]


Internet Draft               New bidir PIM                  October 1999

   After a DF has been elected, a router whose metrics change to become
   better than the forwarder will attempt to take over. If during the
   re-election the acting DF has a condition that causes it to lose all
   of the election messages (like a CPU overload), the new candidate
   will transmit three offers and assume the role of the forwarder. This
   situation is pathological and should be corrected by fixing the over-
   loaded router. However, it is desirable that such an event can be
   detected.

   When a router becomes the DF for a link without receiving a Pass mes-
   sage from the known old DF, the PIM neighbor information for the old
   DF can be marked to this effect. Upon receiving the next PIM Hello
   message from the old DF, the router can retransmit Winner messages
   for all the RPs for which it acting as the DF.

11.2 Periodic Winner Announcement

   An additional degree of safety can be achieved by having the DF for
   each RP periodically announce its status in a Winner message. Having
   this periodic control traffic in areas of the network without senders
   or receivers for a particular RP can be avoided.  Transmission of the
   periodic Winner message can be restricted to occur only for RPs which
   have active groups.

12 Acknowledgments

   The bidir proposal in this draft is heavily based on the ideas and
   text presented by Estrin and Farinacci in [2]. The main difference
   between the two proposals is in the method chosen for upstream for-
   warding.

   We would also like to thank Nidhi Bhaskar, Yiqun Cai, Tony Speakman,
   Rajitha Sumanasakera and Chris White at cisco for their contributions
   and comments to this draft.

13 Author Information

   Deborah Estrin
   ISI/USC
   estrin@usc.edu

   Mark Handley
   mjh@aciri.org
   AT&T Center for Internet Research at ICSI

   Isidor Kouvelas
   kouvelas@cisco.com
   cisco Systems

Estrin, Handley, Kouvelas, Vicisano                            [Page 21]


Internet Draft               New bidir PIM                  October 1999

   Lorenzo Vicisano
   lorenzo@cisco.com
   cisco Systems

Estrin, Handley, Kouvelas, Vicisano                            [Page 22]