Network Working Group                                          M. Bhatia
Internet-Draft                                            Alcatel-Lucent
Intended status: Standards Track                                 M. Chen
Expires: June 18, 2012                                           Z. Wang
                                            Huawei Technologies Co., Ltd
                                                                  L. Guo
                                                           China Telecom
                                                         M. Binderberger
                                                       December 16, 2011


Bidirectional Forwarding Detection (BFD) on Link Aggregation Group (LAG)
                               Interfaces
                        draft-mmm-bfd-on-lags-01

Abstract

   This document proposes a mechanism to run BFD on Link Aggregation
   Group (LAG) interfaces.  It does so by running an independent BFD
   session on every LAG member link.

   A dedicated well-known multicast IP address for both IPv4 and IPv6 is
   introduced as the destination IP address of the BFD packets when
   running BFD on the member links of the LAG.

   There is currently also no standard that describes how BFD runs on a
   LAG interface as a whole.  This draft proposes a definition for this
   problem too while taking into consideration existing implementations.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference



Bhatia, et al.            Expires June 18, 2012                 [Page 1]


Internet-Draft           BFD for LAG Interfaces            December 2011


   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on June 18, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

































Bhatia, et al.            Expires June 18, 2012                 [Page 2]


Internet-Draft           BFD for LAG Interfaces            December 2011


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  BFD over LAG with a single session . . . . . . . . . . . . . .  4
     2.1.  BFD over Big Pipe  . . . . . . . . . . . . . . . . . . . .  5
     2.2.  Examples of existing implementations . . . . . . . . . . .  5
   3.  BFD on LAG member links  . . . . . . . . . . . . . . . . . . .  6
     3.1.  BFD BLM session  . . . . . . . . . . . . . . . . . . . . .  6
     3.2.  BFD micro session  . . . . . . . . . . . . . . . . . . . .  7
     3.3.  Concluded BFD state  . . . . . . . . . . . . . . . . . . .  7
   4.  BFD on LAG members and layer-3 applications  . . . . . . . . .  8
   5.  Application example: LMM using BLM . . . . . . . . . . . . . .  9
   6.  Security Consideration . . . . . . . . . . . . . . . . . . . . 10
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 10
     9.2.  Informative References . . . . . . . . . . . . . . . . . . 11
   Appendix A.  IETF discussion status  . . . . . . . . . . . . . . . 11
     A.1.  Unicast vs. Multicast IP address . . . . . . . . . . . . . 11
     A.2.  BFD packet details (Ethernet encaps) . . . . . . . . . . . 12
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13





























Bhatia, et al.            Expires June 18, 2012                 [Page 3]


Internet-Draft           BFD for LAG Interfaces            December 2011


1.  Introduction

   The Bidirectional Forwarding Detection (BFD) protocol [RFC5880]
   provides a mechanism to detect faults in the bidirectional path
   between two forwarding engines, including interfaces, data link(s),
   and to the extent possible the forwarding engines themselves, with
   potentially very low latency.

   BFD can be used for detecting failures of the path between two
   network devices.  Typically the application clients are not aware of
   any inner structure of the underlying interface, being layer 3
   applications themselves like Open Shortest Path First (OSPF)
   [RFC2328] or Border Gateway Protocol (BGP)[RFC4271].  While this
   works for interfaces like Ethernet and Packet Over SONET (POS), it
   causes problems for bundled interfaces like LAG.

   A LAG is used to bind together several physical ports between two
   adjacent nodes so they appear to higher-layer protocols as a single,
   higher bandwidth "virtual" pipe.  A LAG interface thereby allows
   aggregation of multiple network interfaces as one virtual interface
   for the purpose of providing fault-tolerance and higher bandwidth.

   The problem with running BFD over a LAG is that with a single BFD
   session and without internal knowledge of the LAG structure it is
   impossible for BFD to guarantee a detection of anything but a full
   LAG shutdown within the BFD timeout period.  The LAG shutdown is
   typically initiated by some LAG module, which we will refer to as the
   LAG Management Module (LMM) in the rest of the document.  LAG timers
   are typically multiple times slower than the BFD detection timers
   (multiple 100msec of LMM vs. multiple 10msec of BFD).  There is thus
   a need to bring some sort of determinism in how BFD runs over a LAG.
   There is also a need to detect member link failures much faster than
   what Link Aggregation Control Protocol (LACP) allows.

   The document proposes establishing a BFD session over every member
   link the LAG is built upon.  BFD can combine these information to
   provide fast detection for layer-3 applications.

   While there are native Ethernet mechanisms to detect failures
   (802.1ax, .3ah) that could be used for LAG, the solution proposed in
   this document enables operators who have already deployed BFD over
   different technologies (e.g.  IP, MPLS) to use a common failure
   detection mechanism.


2.  BFD over LAG with a single session





Bhatia, et al.            Expires June 18, 2012                 [Page 4]


Internet-Draft           BFD for LAG Interfaces            December 2011


2.1.  BFD over Big Pipe

   The simplest approach to run BFD on a LAG interface is to ignore the
   internal structure and treat the LAG as one "big pipe".  To
   differentiate this mode of operation we call it "BFD over Big Pipe"
   or "BBP" for short.  It corresponds to section 7.1 in RFC 5882
   [RFC5882].

   We need to standardize this approach.  The following requirements
   define what it means to treat a LAG interface as a single interface
   with no additional structure:

   o  BFD can send packets on any member link

   o  BFD must accept packets from every member link

   o  the Rx/Tx link can change any time and/or regularly with every
      change pattern without causing BFD to fail

   The BFD session on the LAG interface then follows RFC 5880 and
   RFC 5881 in all details.

2.2.  Examples of existing implementations

   Because there is no standard, vendors have implemented their own
   proprietary mechanisms to run BFD over LAG interfaces.  Two examples
   are shown here.  Both satisfy the requirements in Section 2.1

   Some implementations send BFD packets only over one member link .
   Others spray BFD packets over all member links of the LAG.  There are
   issues with each of these approaches.

   In the first approach, BFD sends packets onto the LAG and the LAG
   load balance algorithm will select a member port, which may be the
   same port for all the packets of this BFD session.  BFD will remain
   up as long as this "primary" port is alive.  It will go down once the
   primary port goes down till another port is selected as the primary.
   Problems arise with this design as BFD is oblivious to the presence
   of other member links in the LAG.  If a non-primary link goes down,
   the BFD session remains unaffected as it can still send and receive
   BFD packets over the primary link.  This results in all traffic sent
   over the failed member link getting dropped, till the LMM removes the
   failed link from the LAG.

   In the second approach, BFD packets are sprayed over all the member
   links of a LAG.  This is done naively via round-robin, where each BFD
   packet is sent using the subsequent member link, in a round-robin
   fashion.  It solves the problem of BFD going down because of the



Bhatia, et al.            Expires June 18, 2012                 [Page 5]


Internet-Draft           BFD for LAG Interfaces            December 2011


   primary port going down, but it still does not solve the problem of
   traffic getting lost when one of the member link goes down.  This is
   because, when a member link goes down, BFD remains up and traffic
   continues to go over the link that has failed till a higher layer
   protocol detects this and removes the offending link from the LAG.

   It is RECOMMENDED to implement this second approach due to it's
   deterministic behaviour.


3.  BFD on LAG member links

   The mechanism proposed for a fast detection of LAG member link
   failure is running BFD sessions on every LAG member link.  We name
   this mode of BFD operations "BFD on LAG members" or "BLM" for short.
   It corresponds to section 7.3 in RFC 5882 [RFC5882].

3.1.  BFD BLM session

   The overall BLM session consists of the LAG interface, i.e. the
   aggregated link, a set of BFD sessions running on the member links
   and a new BFD state for the LAG; this state is explained in more
   detail in Section 3.3.  We name the member-link sessions also micro
   BFD sessions; their details are discussed in Section 3.2.

   The set of micro sessions is such that we have one micro session per
   member link.  This set can change over the lifetime of a BLM session.
   E.g.  BFD receives updates for the micro session set when links are
   physically added or removed from the LAG and will accordingly create
   or delete micro BFD sessions.

   The details how the update happens are implementation specific and
   outside the document's scope.  For example the client requesting the
   BLM session could provide these updates.

   Per BLM session request only one address family can be used.  I.e.
   the set of micro sessions belonging to the BLM session MUST either
   all use IPv4 or all use IPv6.

   Multiple BLM session requests for the same LAG interface result in a
   shared BLM session.  The set of micro sessions finally used is the
   superset of the individual micro session sets.  If conflicting
   session parameters are requested then it is a local issue as to how
   to resolve the parameter conflicts, as explained in RFC 5882, Section 
   2.






Bhatia, et al.            Expires June 18, 2012                 [Page 6]


Internet-Draft           BFD for LAG Interfaces            December 2011


3.2.  BFD micro session

   A single micro BFD session runs on every member link of the LAG.
   These micro BFD sessions follow RFC 5880 [RFC5880].

   Only asynchronous mode is considered in this document.  The echo
   function is outside the document's scope.  At least one system MUST
   take the Active role (possibly both).  The micro BFD sessions on the
   member links are independent BFD sessions.  They use their own
   unique, local discriminator values, maintain their own set of state
   variables and have their own independent state machine.  Timer values
   MAY be different, even among the micro sessions belonging to the same
   LAG, although it is expected that micro sessions belonging to the
   same LAG use the same timer values.

   The demultiplexing of a received packet is solely based on the Your
   Discriminator field, if this field is nonzero.  For the initial Down
   packet of a micro session this value may be zero.  In this case
   demultiplexing MUST be based on some combination of other fields
   which MUST include the interface information of the member link.

   When receiving a BFD packet for a micro session with a valid, non-
   zero Your Discriminator then a check MUST be done if the packet was
   received on the correct member link interface.  If the check fails
   then the packet MUST be discarded.  This test needs to be done before
   state variables for the micro sessions are updated by the received
   packet.

   The BFD packets for the micro session are IP/UDP encapsulated as
   defined in RFC 5881 [RFC5881].  Control packets for each micro BFD
   session use a well-known link-local multicast IP address (224.0.0.X
   for IPv4, FF02::X for IPv6, to be assigned by IANA).

   On Ethernet-based LAG member links the corresponding destination
   multicast MACs will be 01:00:5e:00:00:XX for IPv4 and
   33:33:00:00:00:XX for IPv6.  Each member link uses its own MAC
   address as the source MAC address.

3.3.  Concluded BFD state

   An additional state variable is introduced for BFD on LAG members:
   the concluded state.  The state values are Down, Up and AdminDown.
   This state is not part of the micro session state machine.  Instead
   it describes the overall state of the LAG.  It is a local state and
   does not appear (directly) in any BFD packet on any link.

   The concluded state may be set to AminDown for administrative
   purpose, to keep the BLM and the micro sessions indefinitely down.



Bhatia, et al.            Expires June 18, 2012                 [Page 7]


Internet-Draft           BFD for LAG Interfaces            December 2011


   When the concluded state is entering AdminDown then all micro
   sessions belonging to the BLM MUST enter the AdminDown state as well.

   A function must be defined, which evaluates all the states of the
   micro sessions that belong to the BLM.  This function has two output
   values Down and Up and the concluded state is updated with the last
   evaluation result, unless it is already in AdminDown state.  The
   evaluation takes place whenever a micro session is added, removed or
   is changing state.

   The details of the evaluation function are outside the scope of the
   document.  The function could for example test for a minimum number
   of micro sessions in Up state.  The function could even be
   "outsourced" and e.g. the decision logic of the LMM module could be
   used.

   The concluded state is important for layer-3 clients requesting BFD
   sessions over the LAG or over Vlans on the LAG.  Details will be
   discussed in Section 4.


4.  BFD on LAG members and layer-3 applications

   Layer 3 protocols like e.g.  OSPF may use BFD on LAG members in one
   of the following ways:

   o  The session request from the client creates a virtual session.
      This virtual session is not sending actual BFD packets.  Instead
      the state, which is reported to the layer-3 client, is based on
      the concluded state.

      Implementations compliant to this standard MUST support this mode.
      This is the default mode in which BFD over LAG works.

   o  The session request from the client creates a BBP session, as
      described in Section 2.1.  BFD SHOULD update the state of the BBP
      session with the concluded state of the corresponding LAG in the
      following way:

      *  when the concluded state is Down then the BBP session state is
         transitioning to Down as well

      *  for a concluded state of Up or AdminDown the BBP session state
         is unaffected

      This state update allows BBP session to run with more relaxed
      timer values as the more intense liveliness detection is done by
      the micro BFD sessions.



Bhatia, et al.            Expires June 18, 2012                 [Page 8]


Internet-Draft           BFD for LAG Interfaces            December 2011


      Compliant implementations MUST support this mode.

   An implementation MUST provide a configuration knob which lets the
   user select the mode if both modes are supported.


5.  Application example: LMM using BLM

   There are certainly many ways to use BLM.  Here is one example
   envisioned by the authors.

   The LAG Management Module (LMM) could be envisaged as a client of
   BFD, requesting micro BFD sessions for all member links of the LAG.
   I.e.  LMM is requesting a BLM session from BFD and takes
   responsibility to update the set of micro sessions when interfaces
   are administratively added or removed.

   LMM uses BFD, instead of or in parallel with LACP, to monitor the
   health of the individual members links of the LAG.  BFD takes a
   precedence over LACP in deciding the fate of individual member links
   when both are run in parallel.

   A member link of the LAG is not used anymore for data forwarding when
   the associated micro BFD session running over that link goes down.
   The member link MUST be removed from the LAG.  The BFD session for
   the link remains, i.e. it is not deleted.

   To add a member link to the LAG, LMM MAY wait for the BFD session on
   the link to come Up.  There may be a deadlock situation since the
   link interface not being active (e.g., layer 3 protocol down) may
   prevent BFD packets, including other control protocols packets (e.g.
   ARP) that are tightly coupled with the status of the interface, to be
   transmitted between the pair of interfaces, thus failing to bring up
   the interfaces.

   To avoid the deadlock, BFD packets SHOULD NOT be blocked by the layer
   N protocol status of the interface when the application depends on
   the BFD status to enable layer N of the interface.  If this cannot be
   achieved then the BFD status MUST be ignored by the application when
   bringing up an interface.  The BFD status can then be used afterwards
   to bring the interface down.

   The behaviour of the LMM MUST be configurable if waiting for BFD
   status of Up to add a member link is supported, to allow an
   alternative mode of adding the member link irrespective of the BFD
   state for interoperability purpose.





Bhatia, et al.            Expires June 18, 2012                 [Page 9]


Internet-Draft           BFD for LAG Interfaces            December 2011


6.  Security Consideration

   This document does not introduce any additional security issues and
   the security mechanisms defined in [RFC5880] apply in this document.

   Routers compliant to this standard will now need to process packets
   addressed to a new multicast address.  This however, should not open
   any new attack vector as it is a link local multicast and the
   attacker would have to be on the same link as the router to launch
   such packets.


7.  IANA Considerations

   The IANA is requested to assign a well-known link-local multicast IP
   address: "224.0.0.XXX" for IPv4 and FF02::X for IPv6.


8.  Acknowledgements

   Most of the text for this document came originally from
   draft-chen-bfd-interface-00.

   We would like to thank Dave Katz, Alexander Vainshtein, Greg Mirsky
   and Jeff Tantsura for their comments on this draft.

   We would also like to thank the members of the BFD WG who expressed
   strong support about the need to run BFD on all the member links of a
   LAG.


9.  References

9.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC5880]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
              (BFD)", RFC 5880, June 2010.

   [RFC5881]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
              (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881,
              June 2010.

   [RFC5882]  Katz, D. and D. Ward, "Generic Application of
              Bidirectional Forwarding Detection (BFD)", RFC 5882,
              June 2010.



Bhatia, et al.            Expires June 18, 2012                [Page 10]


Internet-Draft           BFD for LAG Interfaces            December 2011


9.2.  Informative References

   [RFC1042]  Postel, J. and J. Reynolds, "Standard for the transmission
              of IP datagrams over IEEE 802 networks", STD 43, RFC 1042,
              February 1988.

   [RFC2328]  Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.

   [RFC4271]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
              Protocol 4 (BGP-4)", RFC 4271, January 2006.


Appendix A.  IETF discussion status

   [this section will finally go away.  It documents some of the
   discussions and decisions made recently on the BFD group list]

A.1.  Unicast vs. Multicast IP address

   The destination IP address for the BFD control packets for the micro
   BFD sessions can be Unicast or Multicast.  Each has its set of
   advantages and disadvantages.

   Advantages with using a Unicast IP destination address:

   o  Minimal code changes to support micro BFD sessions per member
      link.  A new UDP port number can be used to differentiate BFD
      control packets associated with the micro BFD sessions and the
      regular BFD sessions.

   Disadvantages with using a Unicast IP destination address:

   o  It is configuration intensive.  Each LAG needs to be configured
      with the remote end's IP address for BFD to boot strap.
      Similarly, a change in the IP address of the interface would need
      all LAGs to be reconfigured.  While one could minimize the amount
      of human intervention required, it cannot be completely
      eliminated.

   o  ARP needs to be resolved for sending Unicast packets.  This means
      that ARP must be resolved even before the first control packet is
      sent to bring up the micro BFD session.  There are multiple ways
      to achieve this.  The most logical approach is to mandate LACP on
      this LAG.  This way., LACP will bring up the links so that ARP
      resolution can begin.  However, this necessitates the need to run
      LACP along with BFD on all member links.  The other option is to
      allow ARP processing even when the port state is down.  This means
      that implementations would have to allow all packets with



Bhatia, et al.            Expires June 18, 2012                [Page 11]


Internet-Draft           BFD for LAG Interfaces            December 2011


      broadcast MAC and port MAC to be sent to CPU for processing.  This
      violates the basic tenets of IP layering and opens a hole for a
      DoS attack.  This also requires a huge change to the IP stack to
      allow packet Rx and Tx on ports that are down.

   o  Not possible to support unnumbered IP interfaces.

   Advantages with using a Multicast IP destination address:

   o  No additional configuration is required, and the micro BFD
      sessions are set up automatically.  It remains independent of the
      LAG IP addressing scheme.  The member links get added to the LAG
      as soon as the micro BFD sessions come up.

   o  This involves minimal modifications to data plane and L3 stack.
      Currently, ports that are down do process packets coming with
      certain well known L2 MAC addresses.  This solution requires such
      ports to process packets addressed to another well known L2 MAC
      address (derived from the multicast IP address assigned by IANA).

   o  Can support unnumbered IP interfaces.

   Disadvantages with using a Multicast IP destination address:

   o  Unfamiliar.  Some bit of the data plane and the source code would
      need to be modified to accept BFD control packets that are
      multicast.

   o  Need to allocate a new link local multicast address from IANA.

   Based on the above analysis, we decided to go with multicast IP
   addressing scheme for the micro BFD sessions.

A.2.  BFD packet details (Ethernet encaps)

   [Either this section or the corresponding parts of Section 3.2 should
   remain in the final document.  There is no intention to support both
   encapsulations]

   The BFD packet is directly encapsulated into the Ethernet frame.  The
   frame has the following format: Ethernet 802.3 header, then either:

   o  Type/Length field set to 46 octets, then LLC/SNAP according to
      RFC 1042 [RFC1042] (can we reuse it for BFD?) with an Ethertype
      field in the SNAP header for BFD encapsulated in Ethernet [TBD],
      then the BFD packet itself.





Bhatia, et al.            Expires June 18, 2012                [Page 12]


Internet-Draft           BFD for LAG Interfaces            December 2011


   o  Type/Length field set to the type of "BFD encapsulated in
      Ethernet", followed by the BFD packet

   In both cases the Ethernet payload must be padded with zeros to reach
   46 bytes if the size is not already larger.

   When receiving an Ethernet frame the payload, without any potential
   LLC/SNAP header, is used for further BFD processing.  Additional
   padding data MUST be ignored if it was required to reach the minimum
   payload length of 46 bytes.

   IANA needs to assign a L2 multicast address 01-80-C2-XX-XX-XX that
   would be used as the destination MAC for all control packets in the
   micro BFD sessions.

   A new Ethertype must be assigned by the IEEE Registration Authority
   to the BFD over Ethernet protocol that will be used for all micro BFD
   sessions.

   [Needs more detailed work TBD.]


Authors' Addresses

   Manav Bhatia
   Alcatel-Lucent
   Bangalore,   560045
   India

   Email: manav.bhatia@alcatel-lucent.com


   Mach(Guoyi) Chen
   Huawei Technologies Co., Ltd
   Q14 Huawei Campus, No. 156 Beiqing Road, Hai-dian District
   Beijing  100095
   China

   Email: mach@huawei.com












Bhatia, et al.            Expires June 18, 2012                [Page 13]


Internet-Draft           BFD for LAG Interfaces            December 2011


   Zuliang Wang
   Huawei Technologies Co., Ltd
   Q15 Huawei Campus, No. 156 Beiqing Road, Hai-dian District
   Beijing  100095
   China

   Email: liang_tsing@huawei.com


   Liang Guo
   China Telecom
   Guangzhou
   China

   Email: guoliang@gsta.com


   Marc Binderberger
   Lausanne,
   Switzerland

   Email: marc@sniff.de





























Bhatia, et al.            Expires June 18, 2012                [Page 14]