Skip to main content

Multicast in MPLS/BGP IP VPNs
draft-ietf-l3vpn-2547bis-mcast-10

The information below is for an old version of the document that is already published as an RFC.
Document Type
This is an older version of an Internet-Draft that was ultimately published as RFC 6513.
Authors Rahul Aggarwal , Eric C. Rosen
Last updated 2015-10-14 (Latest revision 2010-01-28)
RFC stream Internet Engineering Task Force (IETF)
Intended RFC status Proposed Standard
Formats
Reviews
Additional resources Mailing list discussion
Stream WG state Submitted to IESG for Publication
Document shepherd (None)
IESG IESG state Became RFC 6513 (Proposed Standard)
Action Holders
(None)
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD Stewart Bryant
Send notices to (None)
draft-ietf-l3vpn-2547bis-mcast-10
quot;
   described in section 5.1.

   If BGP (rather than PIM) is used to distribute the C-multicast
   routing information, and if option b of section 10 of [RFC4364] is in
   use, then the C-multicast routes will be installed in the ASBRs along
   the path from each multicast source in the MVPN to each multicast
   receiver in the MVPN.  If option b is not in use, the C-multicast
   routes are not installed in the ASBRs.  The handling of the
   C-multicast routes in either case is thus exactly analogous to the
   handling of unicast VPN-IP routes in the corresponding case.

8.1.3. Inter-AS P-Tunnels

   The procedures described earlier in this document can be used to
   instantiate either an I-PMSI or an S-PMSI with inter-AS P-tunnels.
   Specific tunneling techniques require some explanation.

   If ingress replication is used, the inter-AS PE-PE P-tunnels will use
   the inter-AS tunneling procedures for the tunneling technology used.

   Procedures in [RSVP-P2MP] are used for inter-AS RSVP-TE P2MP
   P-Tunnels.

   Procedures for using PIM  to set up the P-tunnels are discussed in
   the next section.

8.1.3.1. PIM-Based Inter-AS P-Multicast Trees

   When PIM is used to set up an inter-AS P-multicast tree, the PIM
   Join/Prune messages used to join the tree contain the IP address of
   the upstream PE.  However, there are two special considerations that
   must be taken into account:

     - It is possible that the P routers within one or more of the ASes
       will not have routes to the upstream PE.  For example, if an AS
       has a "BGP-free core", the P routers in an AS will not have
       routes to addresses outside the AS.

     - If the PIM Join/Prune message must travel through several ASes,
       it is possible that the ASBRs will not have routes to he PE
       routers.  For example, in an inter-AS VPN constructed according
       to "option b" of section 10 of [RFC4364], the ASBRs do not
       necessarily have routes to the PE routers.

Rosen & Raggarwa                                               [Page 58]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   If either of these two conditions obtains, then "ordinary" PIM
   Join/Prune messages cannot be routed to the upstream PE.  Therefore,
   in that case the PIM Join/Prune messages MUST contain the "PIM MVPN
   Join Attribute".  This allows the multicast distribution tree to be
   properly constructed even if routes to PEs in other ASes do not exist
   in the given AS's IGP, and even if the routes to those PEs do not
   exist in BGP.  The use of an PIM MVPN Join Attribute in the PIM
   messages allows the inter-AS trees to be built.

   The use of the PIM MVPN Join Attribute allows the following
   information needs to be added to the PIM Join/Prune messages: a
   "Proxy Address", which contains the address of the next ASBR on the
   path to the upstream PE.  When the PIM Join/Prune arrives at the ASBR
   that is identified by the "proxy address", that ASBR must change the
   proxy address to identify the next hop ASBR.

   This information allows the PIM Join/Prune to be routed through an AS
   even if the P routers of that AS do not have routes to the upstream
   PE.  However, this information is not sufficient to enable the ASBRs
   to route the Join/Prune if the ASBRs themselves do not have routes to
   the upstream PE.

   However, even if the ASBRs do not have routes to the upstream PE, the
   procedures of this draft ensure that they will have Inter-AS I-PMSI
   A-D routes that lead to the upstream PE.  If non-segmented inter-AS
   P-tunnels are being used, the ASBRs (and PEs) will have Intra-AS
   I-PMSI A-D routes that have been distributed inter-AS.

   So rather than having the PIM Join/Prune messages routed by the ASBRs
   along a route to the upstream PE, the PIM Join/Prune messages MUST be
   routed along the path determined by the Intra-AS I-PMSI A-D routes.

   If the only Intra-AS A-D route for a given MVPN is the "Intra-AS
   I-PMSI Route", the PIM Join/Prunes will be routed along that.
   However, if the PIM Join/Prune message is for a particular P-group
   address, and there is an "Intra-AS S-PMSI Route" specifying that
   particular P-group address as the P-tunnel for a particular S-PMSI,
   then the PIM Join/Prunes MUST be routed along the path determined by
   those Intra-AS A-D routes.

   The basic format of a PIM Join Attribute is specified in
   [PIM-ATTRIB].  The details of the PIM MVPN Join Attribute are
   specified in the next section.

Rosen & Raggarwa                                               [Page 59]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

8.1.3.2. The PIM MVPN Join Attribute

8.1.3.2.1. Definition

   In [PIM-ATTRIB], the notion of a "join attribute" is defined, and a
   format for included join attributes in PIM Join/Prune messages is
   specified.  We now define a new join attribute, which we call the
   "MVPN Join Attribute".

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F|E| Attr_Type | Length        |     Proxy IP address
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                    |      RD
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-.......

    The Attr_Type field of the MVPN Join Attribute is set to 1.

    The F bit is set to 0.

    Two information fields are carried in the MVPN Join attribute:

      - Proxy: The IP address of the node towards which the PIM
        Join/Prune message is to be forwarded.  This will either be an
        IPv4 or an IPv6 address, depending on whether the PIM Join/Prune
        message itself is IPv4 or IPv6.

      - RD: An eight-byte RD.  This immediately follows the proxy IP
        address.

    The PIM message also carries the address of the upstream PE.

    In the case of an intra-AS MVPN, the proxy and the upstream PE are
    the same.  In the case of an inter-AS MVPN, proxy will be the ASBR
    that is the exit point from the local AS on the path to the upstream
    PE.

8.1.3.2.2. Usage

   When a PE router creates a PIM Join/Prune message in order to set up
   an inter-AS I-PMSI, it does so as a result of having received a
   particular Intra-AS A-D route. It includes an MVPN Join attribute
   whose fields are set as follows:

Rosen & Raggarwa                                               [Page 60]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

     - If the upstream PE is in the same AS as the local PE, then the
       proxy field contains the address of the upstream PE.  Otherwise,
       it contains the address of the BGP next hop on the route to the
       upstream PE.

     - The RD field contains the RD from the NLRI of the Intra-AS A-D
       route.

     - The upstream PE field contains the address of the PE that
       originated the Intra-AS A-D route (obtained from the NLRI of that
       route).

   When a PIM router processes a PIM Join/Prune message with an MVPN
   Join Attribute, it first checks to see if the proxy field contains
   one of its own addresses.

   If not, the router uses the proxy IP address in order to determine
   the RPF interface and neighbor.  The MVPN Join Attribute must be
   passed upstream, unchanged.

   If the proxy address is one of the router's own IP addresses, then
   the router looks in its BGP routing table for an Intra-AS A-D route
   whose NLRI consists of the upstream PE address prepended with the RD
   from the Join attribute.  If there is no match, the PIM message is
   discarded.  If there is a match the IP address from the BGP next hop
   field of the matching route is used in order to determine the RPF
   interface and neighbor. When the PIM Join/Prune is forwarded
   upstream, the proxy field is replaced with the address of the BGP
   next hop, and the RD and upstream PE fields are left unchanged.

   The use of non-segmented inter-AS trees constructed via BIDIR-PIM is
   outside the scope of this document.

8.2. Segmented Inter-AS P-Tunnels

   The procedures for setting up and maintaining Segmented Inter-AS
   Inclusive and Selective P-Tunnels may be found in [MVPN-BGP].

Rosen & Raggarwa                                               [Page 61]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

9. Preventing Duplication of Multicast Data Packets

   Consider the case of an egress PE that receives packets of a
   particular C-flow,(C-S,C-G), over a non-aggregated S-PMSI.  The
   procedures described so far will never cause the PE to receive
   duplicate copies of any packet in that stream.  It is possible that
   the (C-S,C-G) stream is carried in more than one S-PMSI; this may
   happen when the site that contains C-S is multihomed to more than one
   PE.  However, a PE that needs to receive (C-S,C-G) packets only joins
   one of these S-PMSIs, and so only receives one copy of each packet.

   However, if the data packets of stream (C-S,C-G) are carried in
   either an I-PMSI or in an aggregated S-PMSI, then the procedures
   specified so far make it possible for an egress PE to receive more
   than one copy of each data packet.  Additional procedures are needed
   to either make this impossible, or to ensure that the egress PE does
   not forward duplicates to the CE routers.

   This section covers only the situation where the C-trees are
   unidirectional, in either the ASM or SSM service models.  The case
   where the C-trees are bidirectional is considered separately in
   section 11.

   There are two cases where the procedures specified so far make it
   possible for an egress PE to receive duplicate copies of a multicast
   data packet.  These are:

      1. The first case occurs when both of the following conditions
         hold:

            a. an MVPN site that contains C-S or C-RP is multihomed to
               more than one PE, and

            b. either an I-PMSI or an aggregated S-PMSI is used for
               carrying the packets originated by C-S.

         In this case, an egress PE may receive one copy of the packet
         from each PE to which the site is homed.  This case is
         discussed further in section 9.2.

      2. The second case occurs when all of the following conditions
         hold:

            a. the IP destination address of the customer packet, C-G,
               identifies a multicast group that is operating in ASM
               mode, and whose C-multicast tree is set up using PIM-SM

Rosen & Raggarwa                                               [Page 62]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

            b. an MI-PMSI is used for carrying the data packets, and

            c. a router or a CE in a site connected to the egress PE
               switches from the C-RP tree to C-S tree.

         In this case, it is possible to get one copy of a given packet
         from the ingress PE attached to the C-RP's site, and one from
         the ingress PE attached to the C-S's site.  This case is
         discussed further in section 9.3.

   Additional procedures are therefore needed to ensure that no MVPN
   customer sees steady state multicast data packet duplication.  There
   are three procedures that may be used:

      1. Discarding data packets received from the "wrong" PE

      2. Single Forwarder Selection

      3. Native PIM methods

   These methods are described in section 9.1.  Their applicability to
   the two scenarios where duplication is possible is discussed in
   section 9.2 and 9.3.

9.1. Methods for Ensuring Non-Duplication

   Every MVPN MUST use at least one of the three methods for ensuring
   non-duplication.

9.1.1. Discarding Packets from Wrong PE

   Per section 5.1.3, an egress PE, say PE1, chooses a specific upstream
   PE, for given (C-S,C-G).  When PE1 receives a (C-S,C-G) packet from a
   PMSI, it may be able to identify the PE that transmitted the packet
   onto the PMSI.  If that transmitter is other than the PE selected by
   PE1 as the upstream PE, then PE1 can drop the packet.  This means
   that the PE will see a duplicate, but the duplicate will not get
   forwarded.

   The method used by an egress PE to determine the ingress PE for a
   particular packet, received over a particular PMSI, depends on the
   P-tunnel technology that is used to instantiate the PMSI.  If the
   P-tunnel is a P2MP LSP, a PIM-SM or PIM-SSM tree, or a unicast
   P-tunnel that uses IP encapsulation, then the tunnel encapsulation
   contains information that can be used (possibly along with other
   state information in the PE) to determine the ingress PE, as long as

Rosen & Raggarwa                                               [Page 63]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   the P-tunnel is instantiating an intra-AS PMSI, or an inter-AS PMSI
   which is supported by a non-segmented inter-AS tunnel.

   Even when inter-AS segmented P-tunnels are used, if an aggregated
   S-PMSI is used for carrying the packets, the tunnel encapsulation
   must have some information that can be used to identify the PMSI, and
   that in turn implicitly identifies the ingress PE.

   Consider the case of an I-PMSI that spans multiple ASes and that is
   instantiated by segmented Inter-AS P-tunnels.  Suppose it is carrying
   data this is traveling along a particular C-tree.  Suppose also that
   the C-root of that C-tree is multi-homed to two or more PEs, and that
   each such PE is in a different AS than the others.  Then if there is
   any duplicate traffic, the duplicates will arrive on a different
   P-tunnel. Specifically, if the PE was expecting the traffic on an
   particular inter-AS P-tunnel, duplicate traffic will arrive either on
   an intra-AS P-tunnel (not an intra-AS segment of an inter-AS
   P-tunnel), or on some other inter-AS P-tunnel.  To detect duplicates
   the PE has to keep track of which inter-AS A-D route the PE uses for
   sending MVPN multicast routing information towards C-S/C-RP. The PE
   MUST process received (multicast) traffic originated by C-S/C-RP only
   from the Inter-AS P-tunnel that was carried in the best Inter-AS A-D
   route for the MVPN and that was originated by the AS that contains
   C-S/C-RP (where "the best" is determined by the PE). The PE MUST
   discard, as duplicates, all other multicast traffic originated by
   C-S/C-RP, but received on any other P-tunnel.

   If, for a given MVPN, (a) MI-PMSI is used for carrying multicast data
   packets, (b) the MI-PMSI is instantiated by a segmented Inter-AS
   P-tunnel, (c) C-S or C-RP is multi-homed to different PEs, and (d) at
   least two of such PEs are in the same AS, then depending on the
   tunneling technology used to instantiate the MI-PMSI, it may not
   always be possible for the egress PE to determine the upstream PE.
   In that case the procedure of section 9.1.2 or 9.1.3 must be used.

   N.B.: Section 10 describes an exception case where PE1 has to accept
   a packet even if it is not from the selected upstream PE.

9.1.2. Single Forwarder Selection

   Section 5.1 specifies a procedure for choosing a "default upstream PE
   selection", such that (except during routing transients) all PEs will
   choose the same default upstream PE.  To ensure that duplicate
   packets are not sent through the backbone (except during routing
   transients), an ingress PE does not forward to the backbone any
   (C-S,C-G) multicast data packet it receives from a CE, unless the PE
   is the default upstream PE selection.

Rosen & Raggarwa                                               [Page 64]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   One difference in effect between this procedure and the procedure of
   section 9.1.1 is that this procedure sends only one copy of each
   packet to each egress PE, rather than sending multiple copies and
   forcing the egress PE to discard all but one.

9.1.3. Native PIM Methods

   If PE-PE multicast routing information for a given MVPN is being
   disseminated by running PIM over an MI-PMSI, then native PIM methods
   will prevent steady state data packet duplication.  The PIM Assert
   mechanism prevents steady state duplication in the scenario of
   section 9.2, even if Single Forwarder Selection is not done.  The PIM
   Prune(S,G,rpt) mechanism addresses the scenario of section 9.3.

9.2. Multihomed C-S or C-RP

   Any of the three methods of section 9.1 will prevent steady state
   duplicates in the case of a multihomed C-S or C-RP.

9.3. Switching from the C-RP tree to C-S tree

9.3.1. How Duplicates Can Occur

   If some PEs are on the C-S tree and some on the C-RP tree then a PE
   may also receive duplicate data traffic after a (C-*,C-G) to
   (C-S,C-G) switch.

   If PIM is being used on an MI-PMSI to disseminate multicast routing
   information, native PIM methods (in particular, the use of the
   Prune(S,G,rpt) message) prevent steady state data duplication in this
   case.

   If BGP C-multicast routing is being used, then the procedure of
   section 9.1.1, if applicable, can be used to prevent duplication.
   However, if that procedure is not applicable, then the procedure of
   section 9.1.2 is not sufficient to prevent steady state data
   duplication in all scenarios.

   In the scenario where (a) BGP C-multicast routing is being used, (b)
   there are inter-site shared C-trees, and (c) there are inter-site
   source C-trees, then additional procedures are needed.  To see this,
   consider the following topology:

Rosen & Raggarwa                                               [Page 65]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

                        CE1---C-RP
                         |
                         |
                  CE2---PE1-- ... --PE2---CE5---C-S
                              ...
           C-R1---CE3---PE3-- ... --PE4---CE4---C-R2

Suppose that C-R1 and C-R2 use PIM to join the (C-*,C-G) tree, where
C-RP is the RP corresponding to C-G.  As a result, CE3 and CE4 will send
PIM Join(*,G) messages to PE3 and PE4 respectively.  This will cause PE3
and PE4 to originate C-multicast Shared Tree Join Routes, specifying
(C-*,C-G).  These routes will identify PE1 as the upstream PE.

Now suppose that C-S is a transmitter for multicast group C-G, and that
C-S sends its multicast data packets to C-RP in PIM register messages.
Then PE1 will receive (C-S,C-G) data packets from CE1, and will forward
them over an I-PMSI to PE3 and PE4, who will forward them in turn to CE3
and CE4 respectively.

When C-R1 receives (C-S,C-G) data packets, it may decide to join the
(C-S,C-G) source tree, by sending a PIM Join(S,G) to CE3.  This will in
turn cause CE3 to send a PIM Join(S,G) to PE3, which will in turn cause
PE3 to originate a C-multicast Source Tree Join Route, specifying
(C-S,C-G), and identifying PE2 as the upstream PE.  As a result, when
PE2 receives (C-S,C-G) data packets from CE5, it will forward them on a
PMSI to PE3.

At this point, the following situation obtains:

  - If PE1 receives (C-S,C-G) packets from CE1, PE1 must forward them on
    the I-PMSI, because PE4 is still expecting to receive the (C-S,C-G)
    packets from PE1.

  - PE3 must continue to receive packets from the I-PMSI, since there
    may be other sources transmitting C-G traffic, and PE3 currently has
    no other way to receive that traffic.

  - PE3 must also receive (C-S,C-G) traffic from PE2.

As a result, PE3 may receive two copies of each (C-S,C-G) packet.  The
procedure of section 9.1.2 (single forwarder selection) does not prevent
PE3 from receiving two copies, because it does not prevent one PE from
forwarding (C-S,C-G) traffic along the shared C-tree while another
forwards (C-S,C-G) traffic along a source-specific C-tree.

So if PE3 cannot apply the method of section 9.1.1 (discard packet from

Rosen & Raggarwa                                               [Page 66]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

wrong PE), perhaps because the tunneling technology does not allow the
egress PE to identify the ingress PE, then additional procedures are
needed.

9.3.2. Solution using Source Active A-D Routes

   The issue described in section 9.3.2 is resolved through the use of
   Source Active A-D Routes.  In the remainder this section, we provide
   an example of how this works, along with an informal description of
   the procedures.

   A full and precise specification of the relevant procedures can be
   found in section 13 of [MVPN-BGP].  In the event of any conflicts or
   other discrepancies between the description below and the description
   in [MVPN-BGP], [MVPN-BGP] is to be considered to be the authoritative
   document.

   Please note that the material in this section only applies when
   inter-site shared trees are being used.

   Whenever a PE creates an (C-S,C-G) state as a result of receiving a
   C-multicast route for (C-S,C-G) from some other PE, and the C-G group
   is an ASM group, the PE that creates the state MUST originate a
   Source Active A-D route (see [MVPN-BGP] section 4.5).  The NLRI of
   the route includes C-S and C-G. By default, the route carries the
   same set of Route Targets as the Intra-AS I-PMSI A-D route of the
   MVPN originated by the PE.  Using the normal BGP procedures, the
   route is propagated to all the PEs of the MVPN. For more details see
   Section 13.1 ("Source Within a Site - Source Active Advertisement")
   of [MVPN-BGP].

   When as a result of receiving a new Source Active A-D route a PE
   updates its VRF with the route, the PE MUST check if the newly
   received route matches any (C-*,C-G) entries. If (a) there is a
   matching entry, (b) the PE does not have (C-S,C-G) state in its
   MVPN-TIB for (C-S,C-G) carried in the route, and (c) the received
   route is selected as the best (using the BGP route selection
   procedures), then the PE takes the following action:

     - If the PE's (C-*,C-G) state has a PMSI as a downstream interface,
       the PE acts as if all the other PEs had pruned C-S off the
       (C-*,C-G) tree.  That is,

         * If the PE receives (C-S,C-G) traffic from a CE, it does not
           transmit it to other PEs.

Rosen & Raggarwa                                               [Page 67]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

         * Depending on the PIM state of the PE's PE-CE interfaces, the
           PE may or may not need to invoke PIM procedures to prune C-S
           off the (C-*,C-G) tree by sending a PIM Prune(S,G,rpt) to one
           or more of the CEs.  This is determined by ordinary PIM
           procedures. If this does need to be done, the PE SHOULD delay
           sending the Prune until it first runs a timer; this helps
           ensure that the source is not pruned from the shared tree
           until all PEs have had time to receive the Source Active A-D
           route.

     - If the PE's (C-*,C-G) state does not have a PMSI as a downstream
       interface, the PE sets up its forwarding path to receive
       (C-S,C-G) traffic from the originator of the selected Source
       Active A-D route.

   Whenever a PE deletes the (C-S,C-G) state that was previously created
   as a result of receiving a C-multicast route for (C-S,C-G) from some
   other PE, the PE that deletes the state also withdraws the Source
   Active A-D route (if there is one) that was advertised when the state
   was created.

   In the example topology of section 9.3.1, this procedure will cause
   PE2 to generate a Source Active A-D route for (C-S,C-G).  When this
   route is received, PE4 will set up its forwarding state to expect
   (C-S,C-G) packets from PE2.  PE1 will change its forwarding state so
   that (C-S,C-G) packets that it receives from CE1 are not forwarded to
   any other PEs.  (Note though that PE1 may still forward (C-S,C-G)
   packets received from CE1 to CE2, if CE2 has receivers for C-G and
   those receivers did not switch from the (C-*,C-G) tree to the
   (C-S,C-G) tree.)  As a result, PE3 and PE4 do not receive duplicate
   packets of the (C-S,C-G) C-flow.

   With this procedure in place, there is no need to have any kind of
   C-multicast route that has the semantics of a PIM Prune(S,G,rpt)
   message.

   It is worth noting that if, as a result of this procedure, a PE sets
   up its forwarding state to receive (C-S,C-G) traffic from the source
   tree, the UMH is not necessarily the same as it would be if the PE
   had joined the source tree as a result of receiving a PIM Join for
   the same source tree from a directly attached CE.

   Note that the mechanism described in section 7.4.1 can be leveraged
   to advertise an S-PMSI binding along with the source active messages.
   This is accomplished by using the same BGP Update message to carry
   both the NLRI of the S-PMSI A-D route and the NLRI of the Source
   Active A-D route.  (Though an implementation processing the received
   routes cannot assume that this will always be the case.)

Rosen & Raggarwa                                               [Page 68]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

10. Eliminating PE-PE Distribution of (C-*,C-G) State

   In the ASM service model, a node that wants to become a receiver for
   a particular multicast group G first joins a shared tree, rooted at a
   rendezvous point.  When the receiver detects traffic from a
   particular source it has the option of joining a source tree, rooted
   at that source.  If it does so, it has to prune that source from the
   shared tree, to ensure that it receives packets from that source on
   only one tree.

   Maintaining the shared tree can require considerable state, as it is
   necessary not only to know who the upstream and downstream nodes are,
   but to know which sources have been pruned off which branches of the
   share tree.

   The BGP-based signaling procedures defined in this document and in
   [MVPN-BGP] eliminate the need for PEs to distribute to each other any
   state having to do with which sources have been pruned off a shared
   C-tree.  Those procedures do still allow multicast data traffic to
   travel on a shared C-tree, but they do not allow a situation in which
   some CEs receive (S,G) traffic on a shared tree and some on a source
   tree.  This results in a considerable simplification of the PE-PE
   procedures with minimal change to the multicast service seen within
   the VPN.  However, shared C-trees are still supported across the VPN
   backbone.  That is, (C-*,C-G) state is distributed PE-PE, but (C-*,
   C-G, RPT-bit) state is not.

   In this section, we specify a number of optional procedures which go
   further, and which completely eliminate the support for shared
   C-trees across the VPN backbone.  In these procedures, the PEs keep
   track of the active sources for each C-G.  As soon as a CE tries to
   join the (*,G) tree, the PEs instead join the (S,G) trees for all the
   active sources.  Thus all distribution of (C-*,C-G) state is
   eliminated.  These procedures are optional because they require some
   additional support on the part of the VPN customer, and because they
   are not always appropriate.  (E.g., a VPN customer may have his own
   policy of always using shared trees for certain multicast groups.)
   There are several different options, described in the following
   sub-sections.

Rosen & Raggarwa                                               [Page 69]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

10.1. Co-locating C-RPs on a PE

   [MVPN-REQ] describes C-RP engineering as an issue when PIM-SM (or
   BIDIR-PIM) is used in "Any Source Multicast (ASM) mode" [RFC4607] on
   the VPN customer site. To quote from [MVPN-REQ]:

   "In some cases this engineering problem is not trivial: for instance,
   if sources and receivers are located in VPN sites that are different
   than that of the RP, then traffic may flow twice through the SP
   network and the CE-PE link of the RP (from source to RP, and then
   from RP to receivers) ; this is obviously not ideal.  A multicast VPN
   solution SHOULD propose a way to help on solving this RP engineering
   issue."

   One of the C-RP deployment models is for the customer to outsource
   the RP to the provider. In this case the provider may co-locate the
   RP on the PE that is connected to the customer site [MVPN-REQ]. This
   section describes how anycast-RP can be used for achieving this. This
   is described below.

10.1.1. Initial Configuration

   For a particular MVPN, at least one or more PEs that have sites in
   that MVPN, act as an RP for the sites of that MVPN connected to these
   PEs.  Within each MVPN all these RPs use the same (anycast) address.
   All these RPs use the Anycast RP technique.

10.1.2. Anycast RP Based on Propagating Active Sources

   This mechanism is based on propagating active sources between RPs.

10.1.2.1. Receiver(s) Within a Site

   The PE that receives C-Join for (*,G) does not send the information
   that it has receiver(s) for G until it receives information about
   active sources for G from an upstream PE.

   On receiving this (described in the next section), the downstream PE
   will respond with Join for (C-S,C-G). Sending this information could
   be done using any of the procedures described in section 5.  Only the
   upstream PE will process this information.

Rosen & Raggarwa                                               [Page 70]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

10.1.2.2. Source Within a Site

   When a PE receives PIM-Register from a site that belongs to a given
   VPN, PE follows the normal PIM anycast RP procedures. It then
   advertises the source and group of the multicast data packet carried
   in PIM-Register message to other PEs in BGP using the following
   information elements:

     - Active source address

     - Active group address

     - Route target of the MVPN.

   This advertisement goes to all the PEs that belong to that MVPN. When
   a PE receives this advertisement, it checks whether there are any
   receivers in the sites attached to the PE for the group carried in
   the source active advertisement. If yes, then it generates an
   advertisement for (C-S,C-G) as specified in the previous section.

10.1.2.3. Receiver Switching from Shared to Source Tree

   No additional procedures are required when multicast receivers in
   customer's site shift from shared tree to source tree.

10.2. Using MSDP between a PE and a Local C-RP

   Section 10.1 describes the case where each PE is a C-RP.  This
   enables the PEs to know the active multicast sources for each MVPN,
   and they can then use BGP to distribute this information to each
   other.  As a result, the PEs do not have to join any shared C-trees,
   and this results in a simplification of the PE operation.

   In another deployment scenario, the PEs are not themselves C-RPs, but
   use MSDP [RFC3618] to talk to the C-RPs.  In particular, a PE that
   attaches to a site that contains a C-RP becomes an MSDP peer of that
   C-RP.  That PE then uses BGP to distribute the information about the
   active sources to the other PEs.  When the PE determines, by MSDP,
   that a particular source is no longer active, then it withdraws the
   corresponding BGP update.  Then the PEs do not have to join any
   shared C-trees, but they do not have to be C-RPs either.

   MSDP provides the capability for a Source Active (SA) message to
   carry an encapsulated data packet.  This capability can be used to
   allow an MSDP speaker to receive the first (or first several)
   packet(s) of an (S,G) flow, even though the MSDP speaker hasn't yet

Rosen & Raggarwa                                               [Page 71]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   joined the (S,G) tree.  (Presumably it will join that tree as a
   result of receiving the SA message that carries the encapsulated data
   packet.)  If this capability is not used, the first several data
   packets of an (S,G) stream may be lost.

   A PE that is talking MSDP to an RP may receive such an encapsulated
   data packet from the RP.  The data packet should be decapsulated and
   transmitted to the other PEs in the MVPN.  If the packet belongs to a
   particular (S,G) flow, and if the PE is a transmitter for some S-PMSI
   to which (S,G) has already been bound, the decapsulated data packet
   should be transmitted on that S-PMSI.  Otherwise, if an I-PMSI exists
   for that MVPN, the decapsulated data packet should be transmitted on
   it.  (If a MI-PMSI exists, this would typically be used.)  If neither
   of these conditions hold, the decapsulated data packet is not
   transmitted to the other PEs in the MVPN.  The decision as to whether
   and how to transmit the decapsulated data packet does not effect the
   processing of the SA control message itself.

   Suppose that PE1 transmits a multicast data packet on a PMSI, where
   that data packet is part of an (S,G) flow, and PE2 receives that
   packet from that PMSI.  According to section 9, if PE1 is not the PE
   that PE2 expects to be transmitting (S,G) packets, then PE2 must
   discard the packet.  If an MSDP-encapsulated data packet is
   transmitted on a PMSI as specified above, this rule from section 9
   would likely result in the packet's getting discarded.  Therefore, if
   MSDP-encapsulated data packets being decapsulated and transmitted on
   a PMSI, we need to modify the rules of section 9 as follows:

      1. If the receiving PE, PE2, has already joined the (S,G) tree,
         and has chosen PE1 as the upstream PE for the (S,G) tree, but
         this packet does not come from PE1, PE2 must discard the
         packet.

      2. If the receiving PE, PE2, has not already joined the (S,G)
         tree, but is a PIM adjacency to a CE that is downstream on the
         (*,G) tree, the packet should be forwarded to the CE.

11. Support for PIM-BIDIR C-Groups

   In BIDIR-PIM, each multicast group is associated with an RPA
   (Rendezvous Point Address).  The Rendezvous Point Link (RPL) is the
   link that attaches to the RPA.  Usually it's a LAN where the RPA is
   in the IP subnet assigned to the LAN.  The root node of a BIDIR-PIM
   tree is a node that has an interface on the RPL.

   On any LAN (other than the RPL) that is a link in a PIM-bidir tree,
   there must be a single node that has been chosen to be the DF.  (More

Rosen & Raggarwa                                               [Page 72]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   precisely, for each RPA there is a single node that is the DF for
   that RPA.)  A node that receives traffic from an upstream interface
   may forward it on a particular downstream interface only if the node
   is the DF for that downstream interface.  A node that receives
   traffic from a downstream interface may forward it on an upstream
   interface only if that node is the DF for the downstream interface.

   If, for any period of time, there is a link on which each of two
   different nodes believes itself to be the DF, data forwarding loops
   can form. Loops in a bidirectional multicast tree can be very
   harmful.  However, any election procedure will have a convergence
   period.  The BIDIR-PIM DF election procedure is very complicated,
   because it goes to great pains to ensure that if convergence is not
   extremely fast, then there is no forwarding at all until convergence
   has taken place.

   Other variants of PIM also have a DF election procedure for LANs.
   However, as long as the multicast tree is unidirectional,
   disagreement about who the DF is can result only in duplication of
   packets, not in loops.  Therefore the time taken to converge on a
   single DF is of much less concern for unidirectional trees and it is
   for bidirectional trees.

   In the MVPN environment, if PIM signaling is used among the PEs, then
   the standard LAN-based DF election procedure can be used.  However,
   election procedures that are optimized for a LAN may not work as well
   in the MVPN environment.  So an alternative to DF election would be
   desirable.

   If BGP signaling is used among the PEs, an alternative to DF election
   is necessary.  One might think that the "single forwarder selection"
   procedures described in sections 5 and 9 could be used to choose a
   single PE "DF" for the backbone (for a given RPA in a given MVPN).
   However, that is still likely to leave a convergence period of at
   least several seconds during which loops could form, and there could
   be a much longer convergence period if there is anything disrupting
   the smooth flow of BGP updates.  So a simple procedure like that is
   not sufficient.

   The remainder of this section describes two different methods that
   can be used to support BIDIR-PIM while eliminating the DF election.

Rosen & Raggarwa                                               [Page 73]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

11.1. The VPN Backbone Becomes the RPL

   On a per MVPN basis, this method treats the whole service provider(s)
   infrastructure as a single RPL (RP Link). We refer to such an RPL as
   an "MVPN-RPL".  This eliminates the need for the PEs to engage in any
   "DF election" procedure, because PIM-bidir does not have a DF on the
   RPL.

   However, this method can only be used if the customer is
   "outsourcing" the RPL/RPA functionality to the SP.

   An MVPN-RPL could be realized either via an I-PMSI (this I-PMSI is on
   a per MVPN basis and spans all the PEs that have sites of a given
   MVPN), or via a collection of S-PMSIs, or even via a combination of
   an I-PMSI and one or more S-PMSIs.

11.1.1. Control Plane

   Associated with each MVPN-RPL is an address prefix that is
   unambiguous within the context of the MVPN associated with the
   MVPN-RPL.

   For a given MVPN, each VRF connected to an MVPN-RPL of that MVPN is
   configured to advertise to all of its connected CEs the address
   prefix of the MVPN-RPL.

   Since in PIM Bidir there is no Designated Forwarder on an RPL, in the
   context of MVPN-RPL there is no need to perform the Designated
   Forwarder election among the PEs (note there is still necessary to
   perform the Designated Forwarder election between a PE and its
   directly attached CEs, but that is done using plain PIM Bidir
   procedures).

   For a given MVPN a PE connected to an MVPN-RPL of that MVPN should
   send multicast data (C-S,C-G) on the MVPN-RPL only if at least one
   other PE connected to the MVPN-RPL has a downstream multicast state
   for C-G. In the context of MVPN this is accomplished by requiring a
   PE that has a downstream state for a particular C-G of a particular
   VRF present on the PE to originate a C-multicast route for (C-*,
   C-G).  The RD of this route should be the same as the RD associated
   with the VRF. The RTs carried by the route should be such as to
   ensure that the route gets distributed to all the PEs of the MVPN.

Rosen & Raggarwa                                               [Page 74]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

11.1.2. Data Plane

   A PE that receives (C-S,C-G) multicast data from a CE should forward
   this data on the MVPN-RPL of the MVPN the CE belongs to only if the
   PE receives at least one C-multicast route for (C-*, C-G).
   Otherwise, the PE should not forward the data on the RPL/I-PMSI.

   When a PE receives a multicast packet with (C-S,C-G) on an MVPN-RPL
   associated with a given MVPN, the PE forwards this packet to every
   directly connected CE of that MVPN, provided that the CE sends Join
   (C-*,C-G) to the PE (provided that the PE has the downstream
   (C-*,C-G) state). The PE does not forward this packet back on the
   MVPN-RPL.  If a PE has no downstream (C-*,C-G) state, the PE does not
   forward the packet.

11.2. Partitioned Sets of PEs

   This method does not require the use of the MVPN-RPL, and does not
   require the customer to outsource the RPA/RPL functionality to the
   SP.

11.2.1. Partitions

   Consider a particular C-RPA, call it C-R, in a particular MVPN.
   Consider the set of PEs that attach to sites that have senders or
   receivers for a BIDIR-PIM group C-G, where C-R is the RPA for C-G.
   (As always we use the "C-" prefix to indicate that we are referring
   to an address in the VPN's address space rather than in the
   provider's address space.)

   Following the procedures of section 5.1, each PE in the set
   independently chooses some other PE in the set to be its "upstream
   PE" for those BIDIR-PIM groups with RPA C-R.  Optionally, they can
   all choose the "default selection" (described in section 5.1), to
   ensure that each PE to choose the same upstream PE.  Note that if a
   PE has a route to C-R via a VRF interface, then the PE may choose
   itself as the upstream PE.

   The set of PEs can now be partitioned into a number of subsets.
   We'll say that PE1 and PE2 are in the same partition if and only if
   there is some PE3 such that PE1 and PE2 have each chosen PE3 as the
   upstream PE for C-R.  Note that each partition has exactly one
   upstream PE.  So it is possible to identify the partition by
   identifying its upstream PE.

   Consider packet P, and let PE1 be its ingress PE.  PE1 will send the

Rosen & Raggarwa                                               [Page 75]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   packet on a PMSI so that it reaches the other PEs that need to
   receive it.  This is done by encapsulating the packet and sending it
   on a P-tunnel.  If the original packet is part of a PIM-BIDIR group
   (its ingress PE determines this from the packet's destination address
   C-G), and if the VPN backbone is not the RPL, then the encapsulation
   MUST carry information that can be used to identify the partition to
   which the ingress PE belongs.

   When PE2 receives a packet from the PMSI, PE2 must determine, by
   examining the encapsulation, whether the packet's ingress PE belongs
   to the same partition (relative to the C-RPA of the packet's C-G)
   that PE2 itself belongs to.  If not, PE2 discards the packet.
   Otherwise PE2 performs the normal BIDIR-PIM data packet processing.
   With this rule in place, harmful loops cannot be introduced by the
   PEs into the customer's bidirectional tree.

   Note that if there is more than one partition, the VPN backbone will
   not carry a packet from one partition to another.  The only way for a
   packet to get from one partition to another is for it to go up
   towards the RPA and then to go down another path to the backbone.  If
   this is not considered desirable, then all PEs should choose the same
   upstream PE for a given C-RPA.  Then multiple partitions will only
   exist during routing transients.

11.2.2. Using PE Distinguisher Labels

   If a given P-tunnel is to be used to carry packets traveling along a
   bidirectional C-tree, then, EXCEPT for the case described in sections
   11.1 and 11.2.3, the packets that travel on that P-tunnel MUST carry
   a PE Distinguisher Label (defined in section 4), using the
   encapsulation discussed in section 12.3.

   When a given PE transmits a given packet of a bidirectional C-group
   to the P-tunnel, the packet will carry the PE Distinguisher Label
   corresponding to the partition, for the C-group's C-RPA, that
   contains the transmitting PE.  This is the PE Distinguisher Label
   that has been bound to the upstream PE of that partition; it is not
   necessarily the label that has been bound to the transmitting PE.

   Recall that the PE Distinguisher Labels are upstream-assigned labels
   that are assigned and advertised by the node that is at the root of
   the P-tunnel.  The information about PE Distinguisher labels is
   distributed with Intra-AS I-PMSI A-D routes and/or S-PMSI A-D routes
   by encoding it into the PE Distinguisher Label attribute carried by
   these routes

   When a PE receives a packet with a PE label that does not identify

Rosen & Raggarwa                                               [Page 76]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   the partition of the receiving PE, then the receiving PE discards the
   packet.

   Note that this procedure does not necessarily require the root of a
   P-tunnel to assign a PE Distinguisher Label for every PE that belongs
   to the tunnel.  If the root of the P-tunnel is the only PE that can
   transmit packets to the P-tunnel, then the root needs to assign PE
   Distinguisher Labels only for those PEs that the root has selected to
   be the UMHs for the particular C-RPAs known to the root.

11.2.3. Partial Mesh of MP2MP P-Tunnels

   There is one case in which support for BIDIR-PIM C-groups does not
   require the use of a PE Distinguisher Label.  For a given C-RPA,
   suppose a distinct MP2MP LSP is used as the P-tunnel serving that
   partition.  Then for a given packet, a PE receiving the packet from a
   P-tunnel can be inferred the partition from the tunnel.  So PE
   Distinguisher Labels are not needed in this case.

12. Encapsulations

   The BGP-based auto-discovery procedures will ensure that the PEs in a
   single MVPN only use tunnels that they can all support, and for a
   given kind of tunnel, that they only use encapsulations that they can
   all support.

12.1. Encapsulations for Single PMSI per P-Tunnel

12.1.1. Encapsulation in GRE

   GRE encapsulation can be used for any PMSI that is instantiated by a
   mesh of unicast P-tunnels, as well as for any PMSI that is
   instantiated by one or more PIM P-tunnels of any sort.

Rosen & Raggarwa                                               [Page 77]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   Packets received        Packets in transit      Packets forwarded
   at ingress PE           in the service          by egress PEs
                           provider network

                           +---------------+
                           |  P-IP Header  |
                           +---------------+
                           |      GRE      |
   ++=============++       ++=============++       ++=============++
   || C-IP Header ||       || C-IP Header ||       || C-IP Header ||
   ++=============++ >>>>> ++=============++ >>>>> ++=============++
   || C-Payload   ||       || C-Payload   ||       || C-Payload   ||
   ++=============++       ++=============++       ++=============++

   The IP Protocol Number field in the P-IP Header MUST be set to 47.
   The Protocol Type field of the GRE Header is set to either 0x800 or
   0x86dd, depending on whether the C-IP Header is IPv4 or IPv6
   respectively..

   When an encapsulated packet is transmitted by a particular PE, the
   source IP address in the P-IP header must be the same address that
   the PE uses to identify itself in the VRF Route Import Extended
   Communities that it attaches to any of VPN-IP routes eligible for UMH
   determination that it advertises via BGP (see section 5.1).

   If the PMSI is instantiated by a PIM tree, the destination IP address
   in the P-IP header is the group P-address associated with that tree.
   The GRE key field value is omitted.

   If the PMSI is instantiated by unicast P-tunnels, the destination IP
   address is the address of the destination PE, and the optional GRE
   Key field is used to identify a particular MVPN.  In this case, each
   PE would have to advertise a key field value for each MVPN; each PE
   would assign the key field value that it expects to receive.

   [RFC2784] specifies an optional GRE checksum, and [RFC2890] specifies
   an optional GRE sequence number fields.

   The GRE sequence number field is not needed because the transport
   layer services for the original application will be provided by the
   C-IP Header.

   The use of GRE checksum field must follow [RFC2784].

   To facilitate high speed implementation, this document recommends
   that the ingress PE routers encapsulate VPN packets without setting
   the checksum, or sequence fields.

Rosen & Raggarwa                                               [Page 78]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

12.1.2. Encapsulation in IP

   IP-in-IP [RFC2003] is also a viable option.  The following diagram
   shows the progression of the packet as it enters and leaves the
   service provider network.

   Packets received        Packets in transit      Packets forwarded
   at ingress PE           in the service          by egress PEs
                           provider network

                           +---------------+
                           |  P-IP Header  |
   ++=============++       ++=============++       ++=============++
   || C-IP Header ||       || C-IP Header ||       || C-IP Header ||
   ++=============++ >>>>> ++=============++ >>>>> ++=============++
   || C-Payload   ||       || C-Payload   ||       || C-Payload   ||
   ++=============++       ++=============++       ++=============++

   When the P-IP Header is an IPv4 header, its Protocol Number field is
   set to either 4 or 41, depending on whether the C-IP header is an
   IPv4 header or an IPv6 header, respectively.

   When the P-IP Header is an IPv6 header, its Next Header field is set
   to either 4 or 41, depending on whether the C-IP header is an IPv4
   header or an IPv6 header, respectively.

   When an encapsulated packet is transmitted by a particular PE, the
   source IP address in the P-IP header must be the same address that
   the PE uses to identify itself in the VRF Route Import Extended
   Communities that it attaches to any of VPN-IP routes eligible for UMH
   determination that it advertises via BGP (see section 5.1).

12.1.3. Encapsulation in MPLS

   If the PMSI is instantiated as a P2MP MPLS LSP or a MP2MP LSP, MPLS
   encapsulation is used. Penultimate-hop-popping MUST be disabled for
   the LSP.

   If other methods of assigning MPLS labels to multicast distribution
   trees are in use, these multicast distribution trees may be used as
   appropriate to instantiate PMSIs, and appropriate additional MPLS
   encapsulation procedures may be used.

Rosen & Raggarwa                                               [Page 79]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   Packets received        Packets in transit      Packets forwarded
   at ingress PE           in the service          by egress PEs
                           provider network

                           +---------------+
                           | P-MPLS Header |
   ++=============++       ++=============++       ++=============++
   || C-IP Header ||       || C-IP Header ||       || C-IP Header ||
   ++=============++ >>>>> ++=============++ >>>>> ++=============++
   || C-Payload   ||       || C-Payload   ||       || C-Payload   ||
   ++=============++       ++=============++       ++=============++

12.2. Encapsulations for Multiple PMSIs per P-Tunnel

   The encapsulations for transmitting multicast data messages when
   there are multiple PMSIs per P-tunnel are based on the encapsulation
   for a single PMSI per P-tunnel, but with an MPLS label used for
   demultiplexing.

   The label is upstream-assigned and distributed via BGP as specified
   in section 4.  The label must enable the receiver to select the
   proper VRF, and may enable the receiver to select a particular
   multicast routing entry within that VRF.

12.2.1. Encapsulation in GRE

   Rather than the IP-in-GRE encapsulation discussed in section 12.1.1,
   we use the MPLS-in-GRE encapsulation.  This is specified in
   [MPLS-IP].  The GRE protocol type MUST be set to 0x8847. [The reason
   for using the unicast rather than the multicast value is specified in
   [MPLS-MCAST-ENCAPS].

12.2.2. Encapsulation in IP

   Rather than the IP-in-IP encapsulation discussed in section 12.1.2,
   we use the MPLS-in-IP encapsulation.  This is specified in [MPLS-IP].
   The IP protocol number MUST be set to the value identifying the
   payload as an MPLS unicast packet. (There is no "MPLS multicast
   packet" protocol number.)

Rosen & Raggarwa                                               [Page 80]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

12.3. Encapsulations Identifying a Distinguished PE

12.3.1. For MP2MP LSP P-tunnels

   As discussed in section 9, if a multicast data packet is traveling on
   a unidirectional C-tree, it is highly desirable for the PE that
   receives the packet from a PMSI to be able to determine the identity
   of the PE that transmitted the data packet onto the PMSI.  The
   encapsulations of the previous sections all provide this information,
   except in one case.  If a PMSI is being instantiated by a MP2MP LSP,
   then the encapsulations discussed so far do not allow one to
   determine the identity of the PE that transmitted the packet onto the
   PMSI.

   Therefore, when a packet traveling on a unidirectional C-tree is
   traveling on a MP2MP LSP P-tunnel, it MUST carry, as its second
   label, a label that has been bound to the packet's ingress PE.  This
   label is an upstream-assigned label that the LSP's root node has
   bound to the ingress PE and has distributed via the PE Distinguisher
   Labels attribute of a PMSI A-D Route (see section 4).  This label
   will appear immediately beneath the labels that are discussed in
   sections 12.1.3 and 12.2.

   A full specification of the procedures for advertising and for using
   the PE Distinguisher Labels in this case is outside the scope of this
   document.

12.3.2. For Support of PIM-BIDIR C-Groups

   As was discussed in section 11, when a packet belongs to a PIM-BIDIR
   multicast group, the set of PEs of that packet's VPN can be
   partitioned into a number of subsets, where exactly one PE in each
   partition is the upstream PE for that partition.  When such packets
   are transmitted on a PMSI, then unless the procedures of section
   11.2.3 are being used, it is necessary for the packet to carry
   information identifying a particular partition. This is done by
   having the packet carry the PE Distinguisher Label corresponding to
   the upstream PE of one partition.  For a particular P-tunnel, this
   label will have been advertised by the node that is the root of that
   P-tunnel. (A full specification of the procedures for advertising PE
   Distinguisher Labels is out of the scope of this document.)

   This label needs to be used whenever a packet belongs to a PIM-BIDIR
   C-group, no matter what encapsulation is used by the P-tunnel.  Hence
   the encapsulations of section 12.2 MUST be used.  If the P-tunnel
   contains only one PMSI, the PE label replaces the label discussed in
   section 12.2 If the P-tunnel contains multiple PMSIs, the PE label

Rosen & Raggarwa                                               [Page 81]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   follows the label discussed in section 12.2.

   In general, PE Distinguisher Labels can be carried if the
   encapsulation is MPLS or MPLS-in-IP or MPLS-in-GRE.  However,
   procedures for advertising and using PE Distinguisher Labels when the
   encapsulation is LDP-based MP2P MPLS is outside the scope of this
   specification.

12.4. General Considerations for IP and GRE Encaps

   These apply also to the MPLS-in-IP and MPLS-in-GRE encapsulations.

12.4.1. MTU (Maximum Transmission Unit)

   It is the responsibility of the originator of a C-packet to ensure
   that the packet is small enough to reach all of its destinations,
   even when it is encapsulated within IP or GRE.

   When a packet is encapsulated in IP or GRE, the router that does the
   encapsulation MUST set the DF bit in the outer header.  This ensures
   that the decapsulating router will not need to reassemble the
   encapsulating packets before performing decapsulation.

   In some cases the encapsulating router may know that a particular
   C-packet is too large to reach its destinations.  Procedures by which
   it may know this are outside the scope of the current document.
   However, if this is known, then:

     - If the DF bit is set in the IP header of a C-packet that is known
       to be too large, the router will discard the C-packet as being
       "too large", and follow normal IP procedures (which may require
       the return of an ICMP message to the source).

     - If the DF bit is not set in the IP header of a C-packet that is
       known to be too large, the router MAY fragment the packet before
       encapsulating it, and then encapsulate each fragment separately.
       Alternatively, the router MAY discard the packet.

   If the router discards a packet as too large, it should maintain OAM
   information related to this behavior, allowing the operator to
   properly troubleshoot the issue.

   Note that if the entire path of the P-tunnel does not support an MTU
   that is large enough to carry the a particular encapsulated C-packet,
   and if the encapsulating router does not do fragmentation, then the
   customer will not receive the expected connectivity.

Rosen & Raggarwa                                               [Page 82]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

12.4.2. TTL (Time to Live)

   The ingress PE should not copy the TTL field from the payload IP
   header received from a CE router to the delivery IP or MPLS header.
   The setting of the TTL of the delivery header is determined by the
   local policy of the ingress PE router.

12.4.3. Avoiding Conflict with Internet Multicast

   If the SP is providing Internet multicast, distinct from its VPN
   multicast services, and using PIM based P-multicast trees, it must
   ensure that the group P-addresses that it used in support of MPVN
   services are distinct from any of the group addresses of the Internet
   multicasts it supports.  This is best done by using administratively
   scoped addresses [ADMIN-ADDR].

   The group C-addresses need not be distinct from either the group
   P-addresses or the Internet multicast addresses.

12.5. Differentiated Services

   The setting of the DS (Differentiated Services) field in the delivery
   IP header should follow the guidelines outlined in [RFC2983].
   Setting the EXP field in the delivery MPLS header should follow the
   guidelines in [RFC3270]. An SP may also choose to deploy any of
   additional Differentiated Services mechanisms that the PE routers
   support for the encapsulation in use.  Note that the type of
   encapsulation determines the set of Differentiated Services
   mechanisms that may be deployed.

13. Security Considerations

   This document describes an extension to the procedures of [RFC4364],
   and hence shares the security considerations described in  [RFC4364]
   and [RFC4365].

   When GRE encapsulation is used, the security considerations of
   [MPLS-IP] are also relevant.  The security considerations of
   [RFC4797] are also relevant as it discusses implications on packet
   spoofing in the context of BGP/MPLS IP VPNs.

   The security considerations of [MPLS-HDR] apply when MPLS
   encapsulation is used.

   This document makes use of a number of control protocols: PIM

Rosen & Raggarwa                                               [Page 83]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   [PIM-SM], BGP [MVPN-BGP], mLDP [MLDP], and RSVP-TE [RSVP-P2MP].
   Security considerations relevant to each protocol are discussed in
   the respective protocol specifications.

   If one uses the UDP-based protocol for switching to S-PMSI (as
   specified in Section 7.4.2), then an S-PMSI Join message (i.e., a UDP
   packet with destination port 3232 and destination address
   ALL-PIM-ROUTERS) that is not received over a PMSI (e.g., one received
   directly from a CE router) is an illegal packet and MUST be dropped.

   The various procedures for P-tunnel construction have security issues
   that are specific to the way that the P-tunnels are used in this
   document.  When P-tunnels are constructed via such techniques as PIM,
   mLDP, or RSVP-TE, it is important for each P or PE router receiving a
   control message MUST ensure that the control message comes from
   another P or PE router, not from a CE router.  (Interpreting an mLDP
   or PIM or RSVP-TE control message from a CE router as referring to a
   P-tunnel would be a bug.)

   A PE MUST NOT accept BGP routes of the MCAST-VPN address family from
   a CE.

   If BGP is used as a CE-PE routing protocol, then when a PE receives
   an IP route from a CE, if this route carries the VRF Route Import
   extended community, the PE MUST remove this community from the route
   before turning it into a VPN-IP route. Routes that a PE advertises to
   a CE MUST NOT carry the VRF Route Import extended community.

   An ASBR may receive, from one SP's domain, an mLDP, PIM, or RSVP-TE
   control message that attempts to extend a P-tunnel from one SP's
   domain into another SP's domain.  This is perfectly valid if there is
   an agreement between the SPs to jointly provide an MVPN service.  In
   the absence of such an agreement, however, this could be an
   illegitimate attempt to intercept data packets.  By default, an ASBR
   MUST NOT allow P-tunnels to extend beyond AS boundaries.  However, it
   MUST be possible to configure an ASBR to allow this on a specified
   set of interfaces.

   Many of the procedures in this document cause the SP network to
   create and maintain an amount of state which is proportional to
   customer multicast activity.  If the amount of customer multicast
   activity exceeds expectations, this can potentially cause P and PE
   routers to maintain an unexpectedly large amount of state, which may
   cause control and/or data plane overload.  To protect against this
   situation an implementation should provide ways for the SP to bound
   the amount of state it devotes to the handling of customer multicast
   activity.

Rosen & Raggarwa                                               [Page 84]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   In particular, an implementation SHOULD provide mechanisms that allow
   a SP to place limitations on the following:

     - total number of (C-*,C-G) and/or (C-S,C-G) states per VRF

     - total number of P-tunnels per VRF used for S-PMSIs

     - total number of P-tunnels traversing a given P router

   A PE implementation MAY also provide mechanisms that allow a SP to
   limit the rate of change of various MVPN-related states on PEs, as
   well as the rate at which MVPN-related control messages may be
   received by a PE from the CEs and/or sent from the PE to other PEs.

   An implementation that provides the procedures specified in Sections
   10.1 or 10.2 MUST provide the capability to impose an upper bound on
   the number of Source Active A-D routes generated, and on how
   frequently they may be originated. This MUST be provided on a per PE,
   per MVPN granularity.

   Lack of the mechanisms that allow a SP to limit the rate of change of
   various MVPN-related states on PEs, as well as the rate at which
   MVPN-related control messages may be received by a PE from the CEs
   and/or sent from the PE to other PEs may result in the control plane
   overload on the PE, which in turn would adversely impact all the
   customers connected to that PE, as well as to other PEs.

   See also the security considerations of [MVPN-BGP].

14. IANA Considerations

   Section 7.4.2 defines the "S-PMSI Join Message", which is carried in
   a UDP datagram whose port number is 3232.  This port number is
   already assigned by IANA to "MDT port".  IANA should now have that
   assignment reference this document.

   IANA should create a registry for the "S-PMSI Join Message Type
   Field".  Assignments are to be made according to the policy "IETF
   Review" as defined in [RFC5226].  The value 1 should be registered
   with a reference to this document.  The description should read "PIM
   IPv4 S-PMSI (unaggregated)".

   [PIM-ATTRIB] establishes a registry for "PIM Join Attribute Types".
   IANA should assign the value 1 to the "MVPN Join Attribute", and
   should reference this document.

Rosen & Raggarwa                                               [Page 85]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

15. Other Authors

   Sarveshwar Bandi, Yiqun Cai, Thomas Morin, Yakov Rekhter, IJsbrands
   Wijnands, Seisho Yasukawa

16. Other Contributors

   Significant contributions were made Arjen Boers, Toerless Eckert,
   Adrian Farrel, Luyuan Fang, Dino Farinacci, Lenny Giuliano, Shankar
   Karuna, Anil Lohiya, Tom Pusateri, Ted Qian, Robert Raszuk, Tony
   Speakman, Dan Tappan.

17. Authors' Addresses

   Rahul Aggarwal (Editor)
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: rahul@juniper.net

   Sarveshwar Bandi
   Motorola
   Vanenburg IT park, Madhapur,
   Hyderabad, India
   Email: sarvesh@motorola.com

   Yiqun Cai
   Cisco Systems, Inc.
   170 Tasman Drive
   San Jose, CA, 95134
   E-mail: ycai@cisco.com

   Thomas Morin
   France Telecom R & D
   2, avenue Pierre-Marzin
   22307 Lannion Cedex
   France
   Email: thomas.morin@francetelecom.com

Rosen & Raggarwa                                               [Page 86]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   Yakov Rekhter
   Juniper Networks
   1194 North Mathilda Ave.
   Sunnyvale, CA 94089
   Email: yakov@juniper.net

   Eric C. Rosen (Editor)
   Cisco Systems, Inc.
   1414 Massachusetts Avenue
   Boxborough, MA, 01719
   E-mail: erosen@cisco.com

   IJsbrand Wijnands
   Cisco Systems, Inc.
   170 Tasman Drive
   San Jose, CA, 95134
   E-mail: ice@cisco.com

   Seisho Yasukawa
   NTT Corporation
   9-11, Midori-Cho 3-Chome
   Musashino-Shi, Tokyo 180-8585,
   Japan
   Phone: +81 422 59 4769
   Email: yasukawa.seisho@lab.ntt.co.jp

18. Normative References

   [MLDP] I. Minei, K., Kompella, I. Wijnands, B. Thomas, "Label
   Distribution Protocol Extensions for Point-to-Multipoint and
   Multipoint-to-Multipoint Label Switched Paths",
   draft-ietf-mpls-ldp-p2mp-08.txt, October 2009

   [MPLS-HDR] E. Rosen, et. al., "MPLS Label Stack Encoding", RFC 3032,
   January 2001

   [MPLS-IP] T. Worster, Y. Rekhter, E. Rosen, "Encapsulating MPLS in IP
   or Generic Routing Encapsulation (GRE)", RFC 4023, March 2005

   [MPLS-MCAST-ENCAPS] T. Eckert, E. Rosen, R. Aggarwal, Y. Rekhter,

Rosen & Raggarwa                                               [Page 87]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   "MPLS Multicast Encapsulations", RFC 5332, August 2008

   [MPLS-UPSTREAM-LABEL] R. Aggarwal, Y. Rekhter, E. Rosen, "MPLS
   Upstream Label Assignment and Context-Specific Label Space", RFC
   5331, August 2008

   [MVPN-BGP], R. Aggarwal, E. Rosen,  T. Morin, Y. Rekhter,  C.
   Kodeboniya, "BGP Encodings for Multicast in MPLS/BGP IP VPNs",
   draft-ietf-l3vpn-2547bis-mcast-bgp-08.txt, September 2009

   [OSPF] J. Moy, "OSPF Version 2", RFC 2328, April 1998

   [OSPF-MT} P. Psenak, S. Mirtorabi, A. Roy, L. Nguyen, P.
   Pillay-Esnault, "Multi-Topology (MT) Routing in OSPF", RFC 4915, June
   2007

   [PIM-ATTRIB], A. Boers, IJ. Wijnands, E. Rosen, "The PIM Join
   Attribute Format", RFC 5384, November 2008

   [PIM-SM]  "Protocol Independent Multicast - Sparse Mode (PIM-SM)",
   Fenner, Handley, Holbrook, Kouvelas, August 2006, RFC 4601

   [RFC2119] "Key words for use in RFCs to Indicate Requirement
   Levels.", Bradner, March 1997

   [RFC4364] "BGP/MPLS IP VPNs", Rosen, Rekhter, et. al., February 2006

   [RFC4659] "BGP-MPLS IP Virtual Private Network (VPN) Extension for
   IPv6 VPN", De Clercq, et. al., RFC 4659, September 2006

   [RSVP-OOB] Z. Ali, G. Swallow, R. Aggarwal, "Non PHP behavior and
   Out-of-Band Mapping for RSVP-TE LSPs",
   draft-ietf-mpls-rsvp-te-no-php-oob-mapping-03.txt, October 2009

   [RSVP-P2MP] R. Aggarwal, D. Papadimitriou, S. Yasukawa, et. al.,
   "Extensions to RSVP-TE for Point-to-Multipoint TE LSPs", RFC 4875,
   May 2007

19. Informative References

   [ADMIN-ADDR] D. Meyer, "Administratively Scoped IP Multicast", RFC
   2365, July 1998

   [BIDIR-PIM] "Bidirectional Protocol Independent Multicast
   (BIDIR-PIM)" M.  Handley, I. Kouvelas, T. Speakman, L. Vicisano, RFC
   5015, October 2007

Rosen & Raggarwa                                               [Page 88]



Internet Draft    draft-ietf-l3vpn-2547bis-mcast-10.txt     January 2010

   [BSR] "Bootstrap Router (BSR) Mechanism for PIM", N. Bhaskar, et.
   al., RFC 5059, January  2008

   [MVPN-REQ] T. Morin, Ed., "Requirements for Multicast in L3
   Provider-Provisioned VPNs", RFC 4834, April 2007

   [RFC2003] C. Perkins, "IP Encapsulation within IP", RFC 2003, October
   1996

   [RFC2784] D. Farinacci, et. al., "Generic Routing Encapsulation",
   March 2000

   [RFC2890] G. Dommety, "Key and Sequence Number Extensions to GRE",
   September 2000

   [RFC2983] D. Black, "Differentiated Services and Tunnels", October
   2000

   [RFC3270] F. Le Faucheur, et. al., "MPLS Support of Differentiated
   Services", May 2002

   [RFC3618] B. Fenner D. Meyer, "Multicast Source Discovery Protocol",
   October 2003

   [RFC4365], E. Rosen, " Applicability Statement for BGP/MPLS IP
   Virtual Private Networks (VPNs)", February 2006

   [RFC4607] H. Holbrook, B. Cain, "Source-Specific Multicast for IP",
   August 2006

   [RFC4797] Y. Rekhter, R. Bonica, E. Rosen, "Use of Provider Edge to
   Provider Edge (PE-PE) Generic Routing Encapsulation (GRE) or IP in
   BGP/MPLS IP Virtual Private Networks", January 2007

   [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an
   IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.

Rosen & Raggarwa                                               [Page 89]