RIFT                                                            Z. Zhang
Internet-Draft                                          Juniper Networks
Intended status: Standards Track                              P. Thubert
Expires: January 9, 2020                                           Cisco
                                                            July 8, 2019


                     Multicast Routing In Fat Trees
                     draft-zzhang-rift-multicast-00

Abstract

   This document specifies multicast procedures with RIFT.  Multicast in
   RIFT is similar to Bidirectional Protocol Independent Multicast (PIM-
   Bidir), with the Rendezvous Point Link (RP-Link) simulated by a
   spanning tree of some Top of Fabric (ToF) nodes and sub-ToF nodes.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC2119.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 9, 2020.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of



Zhang & Thubert          Expires January 9, 2020                [Page 1]


Internet-Draft                    mrift                        July 2019


   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Specifications  . . . . . . . . . . . . . . . . . . . . . . .   5
   3.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   4.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   5
   5.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   5
     5.1.  Normative References  . . . . . . . . . . . . . . . . . .   5
     5.2.  Informative References  . . . . . . . . . . . . . . . . .   6
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   6

1.  Introduction

   Because of the simple north-south regular topology in Fat Tree
   networks, the PIM-Bidir [RFC5015] solution is extended for multicast
   in RIFT (referred to as MRIFT in this document).  The following is a
   summary of the changes and adaptations compared to PIM-Bidir.

   With PIM-Bidir, PIM joins are sent towards a Rendezvous Point
   Address, which could be an address not belonging to any router.  The
   RPA does belong to a RP Link (RPL), which could be attached to a
   single router or multiple routers (e.g.  RPL is a LAN).  With MRIFT,
   there is no concept of RPA any more (joins are simply sent
   northbound).  The joins are terminated on some sub-ToF nodes and the
   RPL is simulated by a spanning tree among some ToF and sub-ToF nodes.

   Instead of (*,G) trees in PIM-Bidir, MRIFT uses (*,G-Prefix) trees,
   where the G-Prefix could be *, G, or anything in between (e.g.,
   225.1.1.0/24).  For light flows, they could just follow the (*,*)
   tree.  For heavy flows, individual (*,G) trees could be built.  For
   medium flows, some (*,G-prefix) trees could be shared.  All the First
   Hop Routers (FHRs, connecting to sources) and the Last Hop Routers
   (LHRs, connecting to receivers) of a particular (*,G) flow must agree
   on whether a (*,*) or (*,G) or (*,G-prefix) tree is used for the flow
   so that they all join the same tree.  This is done via out of band
   control outside the scope of this document.

   Because of the rich connections in Fat Trees, a router has to choose
   one of its many north neighbors to send join to.  This is done
   through hashing.  The hashing algorithm should lead to several but
   not too many routers choosing the same north neighbor, so that fewer



Zhang & Thubert          Expires January 9, 2020                [Page 2]


Internet-Draft                    mrift                        July 2019


   routers are involved in multicast traffic forwarding, yet none of
   those routers are overburdened by replicating to too many downstream
   neighbors.

   Instead of PIM messages, RIFT's own TIEs are used.  This is similar
   to the concept in [draft-zzhang-pim-pds].  Specifically, RIFT Policy
   Guided Prefixes (PGP) [draft-atlas-rift-pgp] are used.  The TIEs are
   consumed, processed at each hop and then regenerated for the next
   hop.

   When a join reaches a sub-ToF node, the normal join process stops.
   This forms a sub-tree rooted at this sub-ToF node.  Multiple sub-
   trees of the same tree may be joined by a single ToF node, or they
   may have to be connected by a spanning tree serving as the RPL.  For
   example, in the following topology, in normal situations the two sub-
   tree roots for the two pods, say Spine111 and Spine121, may be joined
   by ToF21, but if the ToF21-Spine121 link is down, then ToF22 may be
   used, and if the ToF22-Spine111 link is also down, then Spine111 and
   Spine121 will have to be joined via
   Spine111-ToF21-Spine112-ToF22-Spine121.































Zhang & Thubert          Expires January 9, 2020                [Page 3]


Internet-Draft                    mrift                        July 2019


      .                +--------+          +--------+          ^ N
      .                |ToF   21|          |ToF   22|          |
      .Level 2         ++-+--+-++          ++-+--+-++        <-*-> E/W
      .                 | |  | |            | |  | |           |
      .             P111/2|  |P121          | |  | |         S v
      .                 ^ ^  ^ ^            | |  | |
      .                 | |  | |            | |  | |
      .  +--------------+ |  +-----------+  | |  | +---------------+
      .  |                |    |         |  | |  |                 |
      . South +-----------------------------+ |  |                 ^
      .  |    |           |    |         |    |  |              All TIEs
      .  0/0  0/0        0/0   +-----------------------------+     |
      .  v    v           v              |    |  |           |     |
      .  |    |           +-+    +<-0/0----------+           |     |
      .  |    |             |    |       |    |              |     |
      .+-+----++ optional +-+----++     ++----+-+           ++-----++
      .|       | E/W link |       |     |       |           |       |
      .|Spin111+----------+Spin112|     |Spin121|           |Spin122|
      .+-+---+-+          ++----+-+     +-+---+-+           ++---+--+
      .  |   |             |   South      |   |              |   |
      .  |   +---0/0--->-----+ 0/0        |   +----------------+ |
      . 0/0                | |  |         |                  | | |
      .  |   +---<-0/0-----+ |  v         |   +--------------+ | |
      .  v   |               |  |         |   |                | |
      .+-+---+-+          +--+--+-+     +-+---+-+          +---+-+-+
      .|       |  (L2L)   |       |     |       |  Level 0 |       |
      .|Leaf111~~~~~~~~~~~~Leaf112|     |Leaf121|          |Leaf122|
      .+-+-----+          +-+---+-+     +--+--+-+          +-+-----+
      .  +                  +    \        /   +              +
      .  Prefix111   Prefix112    \      /   Prefix121    Prefix122
      .                          multi-homed
      .                            Prefix
      .+---------- Pod 1 ---------+     +---------- Pod 2 ---------+


   The following algorithm is used to form the spanning tree.

   1.  Each sub-tree root (a sub-ToF node) hashes to a ToF neighbor as
       its parent and advertises the parent's SystemID in a N-TIE for
       the tree.  This allows different trees to have different RPLs for
       load-balancing.  In the above example, Suppose Spine111
       advertises its choice of ToF21, and Spine121 advertises its
       choice of ToF22.

   2.  Each ToF node advertises the highest SystemID in its S-TIE for a
       tree, of all the ToF nodes chosen and advertised by sub-ToF nodes
       for the same tree.  The S-TIE also includes the SystemID of the
       sub-ToFs who made the choice.  A ToF node knows the choices



Zhang & Thubert          Expires January 9, 2020                [Page 4]


Internet-Draft                    mrift                        July 2019


       either because it is the neighbor of a sub-ToF who made a choice
       (e.g.  ToF21 knows Spine121's choice is ToF22 because of
       Spine121's N-TIE), or because it received another ToF's S-TIE
       reflected by a common south neighbor (e.g. if the ToF21-Spine121
       link is down, ToF21 still knows ToF22 was chosen by Spine121
       because of ToF22's S-TIE for the tree reflected by Spine122).

   3.  If a sub-ToF node sees ToF nodes with higher SystemIDs (than that
       of its own chosen parent) advertised for the tree, it reparents
       to the one that is its neighbor and has the highest SystemID, and
       re-advertises the new parent.  In the above example, Spine111
       will reparent to ToF22, assuming ToF22 has higher SystemID than
       ToF21.

   4.  A ToF parent (with remaining sub-ToF children who could not
       reparent) joins towards the ToF parent with the highest SystemID
       (as determined in step #2) via a south neighbor by including in
       its S-TIE for the tree the identity of the south neighbor, who
       either advertised its choice of the highest SystemID ToF parent,
       or reflected a ToF node's S-TIE about sub-ToF node's choice of
       the highest SystemID ToF parent.  In the above example, if the
       ToF22-Spine111 link is down, ToF21 will join ToF22 either via
       Spine112 or Spine122.

   The above procedures may repeat multiple times before the spanning
   tree is settled; unless the connections among ToF and sub-ToF nodes
   are badly broken, the process should be fairly simple.

2.  Specifications

   More details will be specified in future revisions.

3.  Security Considerations

   To be provided.

4.  Acknowledgements

   The authors thank Bruno Rijsman and Antoni Przygenda for their review
   and suggestions.

5.  References

5.1.  Normative References

   [I-D.ietf-rift-rift]
              Team, T., "RIFT: Routing in Fat Trees", draft-ietf-rift-
              rift-06 (work in progress), June 2019.



Zhang & Thubert          Expires January 9, 2020                [Page 5]


Internet-Draft                    mrift                        July 2019


   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

5.2.  Informative References

   [I-D.zzhang-pim-pds]
              Zhang, J. and K. Patel, "Protocol Dependent Multicast
              Signaling", draft-zzhang-pim-pds-00 (work in progress),
              October 2015.

   [RFC5015]  Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano,
              "Bidirectional Protocol Independent Multicast (BIDIR-
              PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007,
              <https://www.rfc-editor.org/info/rfc5015>.

Authors' Addresses

   Zhaohui Zhang
   Juniper Networks

   EMail: zzhang@juniper.net


   Pascal Thubert
   Cisco Systems, Inc

   EMail: pthubert@cisco.com






















Zhang & Thubert          Expires January 9, 2020                [Page 6]