Network Working Group                                    John G. Scudder
Internet Draft                                           Chandra Appanna
Expiration Date: May 2004                                  Cisco Systems
File name: draft-scudder-bgp-multisession-00.txt           November 2003

                            Multisession BGP
                 draft-scudder-bgp-multisession-00.txt

Status of this Memo

   This document is an Internet-Draft and is subject to all provisions
   of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

Abstract

   This specification augments "Multiprotocol Extensions for BGP-4" [MP-
   BGP] by proposing a mechanism to allow multiple sessions to be used
   between a given pair of BGP speakers.  Each session is used to
   transport routes for one or more AFI/SAFI.  This provides an
   alternative to the current [MP-BGP] approach of multiplexing routes
   for all AFI/SAFI onto a single connection.

   Use of this approach is expected to increase the robustness of the
   BGP protocol as it is used to support more and more diverse AFI/SAFI.











Expires May 2004                                                [Page 1]


INTERNET DRAFT              Multisession BGP               November 2003


1. Introduction

   Most BGP [BGP, BGP-DRAFT] implementations only permit a single
   ESTABLISHED connection to exist with each peer.  More precisely, they
   only permit a single ESTABLISHED connection for any given pair of IP
   endpoints.

   Multiprotocol BGP [MP-BGP] extends BGP to allow information for
   multiple NLRI families and sub-families to be transported in BGP.
   Routes for different families are distinguished by AFI and SAFI.
   Routes for different families are commonly multiplexed onto a single
   BGP session.

   A common criticism of BGP is the fact that most malformed messages
   cause the session to be terminated.  While this behavior is necessary
   for protocol correctness, one may observe that the protocol machinery
   of a given implementation may only be defective with respect to a
   given AFI/SAFI.  Thus, it would be desirable to allow the session
   related to that family to be terminated while leaving other AFI/SAFI
   unaffected.  As BGP is commonly deployed, this is not possible.

   In this specification, we propose a mechanism by which multiple
   transport sessions may be established between a pair of peers.  Each
   transport session can be used for one or more AFI/SAFI.  Each session
   is distinct from a BGP protocol point of view; an error or other
   event on one session has no implications for any other session.  All
   protocol modifications proposed by this specification take place
   during the OPEN exchange phase of the session, there are no
   modifications to the operation of the protocol once a session reaches
   ESTABLISHED state.

   Routers implementing this specification MUST also implement [MP-BGP].


2. Definitions

   "MP-BGP capability" refers to the capability [BGP-CAP] with code 1,
   specified in [MP-BGP] section 10.

   A BGP speaker is said to "support" some feature or functionality (for
   example, to support this specification, or to support a particular
   AFI/SAFI) when the BGP implementation supports the feature AND the
   feature has not been disabled by configuration.

   A pair of AFI/SAFI groups is said to "conflict" when considering the
   two groups as two sets, there is an intersection between the groups
   but neither group is a subset of the other.




Expires May 2004                                                [Page 2]


INTERNET DRAFT              Multisession BGP               November 2003


3. Use of BGP Capability Advertisement

   This specification defines the Multisession capability [BGP-CAP]:

      Capability code (1 octet): TBD

      Capability length (1 octet): 1

      Capability value (1 octet): Flags as below

            0 1 2 3 4 5 6 7
           +-+-+-+-+-+-+-+-+
           |G|  Reserved   |
           +-+-+-+-+-+-+-+-+

   The most significant bit is defined as the Grouping Support (G) bit.
   It can be used to indicate support for the ability to group multiple
   AFI/SAFI into one session.  When set (value 1) this bit indicates
   that the BGP speaker supports grouping.

   The remaining bits are reserved, and should be set to zero by the
   sender and ignored by the receiver.


4. New NOTIFICATION Subcodes

   [BGP, BGP-DRAFT] Section 4.5 provides a number of subcodes to the
   NOTIFICATION message, and Section 6.2 elaborates on the use of those
   subcodes.

   This specification introduces two new subcodes:

      OPEN Message Error subcodes:

         7 - No Supported AFI/SAFI.

         8 - Grouping Conflict

         9 - Grouping Required

   The No Supported AFI/SAFI code MAY be used when an OPEN message
   contains one or more MP-BGP capabilities, none of which list an
   AFI/SAFI supported by the local BGP speaker.  It is observed that
   this subcode may be useful for MP-BGP speakers in general, even if
   they do not (otherwise) implement this specification.

   The Grouping Conflict code MAY be used when an OPEN message contains
   several MP-BGP capabilities whose AFI/SAFI conflict with one or more



Expires May 2004                                                [Page 3]


INTERNET DRAFT              Multisession BGP               November 2003


   AFI/SAFI groups configured on the local BGP speaker.  The Data field
   SHOULD indicate one of the conflicting locally-configured AFI/SAFI
   groups, encoded as MP-BGP capabilities.

   The Grouping Required code MAY be used when a BGP speaker which is
   configured to require grouping attempts to establish a connection
   with a BGP speaker which does not support grouping.  (While it is
   true that it might be possible to communicate much the same
   information using the Unsupported Capability NOTIFICATION message,
   this more explicit method is felt to be more transparent.)

   The use of these subcodes is further elaborated below.


5. Overview of Operation

   Until a BGP speaker has initiated or accepted one connection from a
   given peer, it is unknown whether the peer supports this
   specification or not.  Two strategies can be considered for making
   this initial determination -- either the BGP speaker can initially
   assume that the peer does not support this specification, and switch
   modes if it is discovered that it does, or vice-versa.  Either
   approach is acceptable.

   The "Using Multisession" sections below discuss the BGP speaker's
   behavior when the peer does support this specification or is assumed
   to.  The "Backward Compatibility" section discusses the BGP speaker's
   behavior when the peer does not support this specification, or is
   assumed not to.  Both sections discuss how to switch to the other
   mode.

   A BGP speaker which supports this specification SHOULD always
   advertise the Multisession capability, regardless of its peer's known
   or presumed capability set.


5.1. Using Multisession:

   The following subsections discuss a BGP speaker's behavior towards a
   peer which is known or assumed to support this specification.

   Note that if a BGP speaker only wishes to support a single AFI/SAFI
   in its communications with a given peer only one session is needed in
   any case, and so the "multisession" feature is moot.  In such a case
   the behavior required would be indistinguishable from that given in
   the "backward compatibility" section below.  In the following
   sections, it is generally assumed that a BGP speaker does wish to
   support multiple AFI/SAFI in its communications with a given peer.



Expires May 2004                                                [Page 4]


INTERNET DRAFT              Multisession BGP               November 2003


5.1.1. Initiating Connections:

   When a BGP speaker attempts BGP communication with its peer, it
   initiates one connection per group of AFI/SAFI it wishes to support.
   (This implies that a new local TCP port will be allocated for each
   new connection.)  The OPEN sent on each connection MUST include the
   Multisession capability and one or more MP-BGP capabilities
   indicating the AFI/SAFI to be supported on that session.  If a non-
   trivial group of AFI/SAFI (i.e., a group of two or more) is proposed,
   the BGP speaker MUST also set the G bit of the Multisession
   capability.  Even if a trivial group of AFI/SAFI is proposed, the G
   bit SHOULD be set if grouping is supported.

   Note that any "group of AFI/SAFI" may be a singleton group, i.e. the
   speaker may wish to use a separate BGP connection for each AFI/SAFI.

   If the peer also supports this specification and also wishes to
   support the AFI/SAFI in question, it will respond with an OPEN which
   includes the Multisession capability and the AFI/SAFI included in the
   active speaker's OPEN.  If the active speaker's OPEN included a non-
   trivial group of AFI/SAFI which the peer supports, then the peer's
   Multisession capability will have the G bit set.

   If the peer also supports this specification and wishes to support
   some but not all of the AFI/SAFI in question, it will respond with an
   OPEN which includes the Multisession capability and a subset of
   AFI/SAFI included in the active speaker's OPEN.  The reason for
   listing only a subset may be because some of the AFI/SAFI are simply
   not supported, or because the peer does not wish to support the
   AFI/SAFI as a group (i.e. it may be configured to use a smaller
   group).  In this case, the BGP speaker MAY consider the set of
   AFI/SAFI which were not included in the peer's OPEN to form a new
   group, and MAY try to initiate a new session using that group.

   If the peer also supports this specification but does not support
   grouping, and a non-trivial group of AFI/SAFI has been proposed, then
   it will respond as given in the previous paragraph but with the
   additional proviso that the G bit will be clear.  In this case, the
   BGP speaker MAY accept the connection as given in the previous
   paragraph, or it MAY reply with a NOTIFICATION message with ERROR
   Code OPEN Message Error and Error Subcode Grouping Required, and the
   connection will be closed.

   If the peer does not wish to support the AFI/SAFI in question, it
   will reply with a NOTIFICATION message with Error Code OPEN Message
   Error, and Error Subcode No Supported AFI/SAFI, and the connection
   will be closed.




Expires May 2004                                                [Page 5]


INTERNET DRAFT              Multisession BGP               November 2003


   A BGP speaker SHOULD NOT attempt to initiate connections for any
   AFI/SAFI for which a connection already exists.

   If the peer does not support this specification, it will respond with
   an OPEN which does not include the Multisession capability.  In this
   case the connection SHOULD be terminated, and future connections to
   the peer should be attempted in the "backward compatibility" mode
   discussed below.


5.1.2. Accepting Connections:

   When processing a connection attempt, the BGP speaker MUST wait until
   the peer's OPEN message has been received before proceeding.  This is
   at variance with the behavior specified in the finite state machine
   (FSM) of [BGP-DRAFT], but is interoperable with that FSM.  The FSM
   changes are specified in a later section.

   Once the peer's OPEN message has been received, if it includes the
   Multisession capability and one or more MP-BGP capabilities
   indicating a group of AFI/SAFI which the BGP speaker wishes to
   support, then the BGP speaker responds with an OPEN message which
   includes the Multisession capability and one or more MP-BGP
   capabilities indicating the same AFI/SAFI.

   If the OPEN includes the Multisession capability and one or more MP-
   BGP capabilities indicating a group of AFI/SAFI which conflicts with
   an AFI/SAFI grouping that has been configured on the BGP speaker then
   the BGP speaker MAY reply with an OPEN listing a set of AFI/SAFI
   which intersect with those proposed by the peer (in effect overriding
   the locally configured set) or it MAY close the connection with a
   NOTIFICATION message with Error Code OPEN Message Error and Error
   Subcode Grouping Conflict.  The former behavior is suggested as the
   default if grouping is supported.

   If the BGP speaker does not support AFI/SAFI grouping it MAY reply
   with an OPEN listing one of the AFI/SAFI out of those proposed by the
   peer.  It SHOULD also set the G bit in the Multisession capability to
   zero.

   If the received OPEN message does not include any MP-BGP capability
   indicating an AFI/SAFI the BGP speaker wishes to support, it should
   close the connection with a NOTIFICATION message with Error Code OPEN
   Message Error and Error Subcode No Supported AFI/SAFI.

   If the received OPEN message does not include the Multisession
   capability, then the peer does not support this specification.  The
   connection MAY be continued in the "backward compatibility" mode



Expires May 2004                                                [Page 6]


INTERNET DRAFT              Multisession BGP               November 2003


   discussed below, or it MAY be terminated and future connections to
   the peer attempted in the "backward compatibility" mode.


5.1.3. Collision Detection, Graceful Restart:

   [BGP, BGP-DRAFT] Section 6.8 (BGP connection collision detection)
   considers a pair of connections to have collided if the source and
   destination IP addresses of both connections match.  With respect to
   peers which support this specification, the AFI/SAFI groups
   associated with the connections must also intersect for them to be
   considered to have collided.

   This consideration also applies to Section 6.2 of [BGP-GR], when
   determining whether a new connection should be considered equivalent
   to a reset of a previous TCP session.


5.2. Backward Compatibility:

   This subsection discusses a BGP speaker's behavior towards a peer
   which is known or assumed not to support this specification.  In
   short, the BGP speaker's behavior towards such a peer should be as
   otherwise defined for the BGP protocol, according to [BGP, BGP-DRAFT]
   and any other extension supported by the BGP speaker.

   As previously mentioned, the BGP speaker SHOULD always advertise the
   Multisession capability in its OPEN message, even towards "backward
   compatibility" peers.

   If, in opening a BGP connection with such a peer, an OPEN which
   includes the Multisession capability is received from the peer, then
   the peer SHOULD be changed to "multisession" mode.  How this is done
   depends on whether the BGP speaker has already sent an OPEN or not --

   If the BGP speaker has not yet sent an OPEN to the peer, then the
   connection MAY be continued in the "multisession" mode discussed
   above, or it MAY be terminated and future connections to the peer
   attempted in "multisession" mode.

   If the BGP speaker has sent an OPEN to the peer, then the current
   session SHOULD be terminated and future connections to the peer
   attempted in "multisession" mode.

   Use of techniques such as [BGP-DYN-CAP] for on-the-fly switching of
   session modes are beyond the scope of this document.





Expires May 2004                                                [Page 7]


INTERNET DRAFT              Multisession BGP               November 2003


6. State Machine

   As mentioned under "accepting connections" above, this specification
   modifies the BGP finite state machine, albeit in a backward-
   compatible fashion.

   In addition, note that one state machine is considered to exist for
   each of the connections which may exist to a given peer.  This
   implies that, for example, any session flap dampening that may exist
   is performed per AFI/SAFI.

   The specific state machine modifications to [BGP-DRAFT] Section 8.2.2
   are as follows.


6.1. Modifications to Connect State and Active State

   In the actions in response to the events Open Delay timer expires
   [Event 12] and TCP connection succeeds [Event 16 or Event 17], an
   OPEN is not sent and the state changes to WaitForOpen and not to
   OpenSent.


6.2. Addition of WaitForOpen State, Deletion of OpenSent State

   The WaitForOpen state is the same in all respects to OpenSent, except
   for the action in response to reception of a valid OPEN message
   [Event 19].  In that event, the local system sends an OPEN message
   prior to sending a KEEPALIVE message.

   The OpenSent state is deleted.  All references to OpenSent are
   replaced by references to WaitForOpen.


7. Discussion

   Note that many BGP implementations already permit multiple sessions
   to be used between a given pair of routers, typically by configuring
   multiple IP addresses on each router and configuring each session to
   be bound to a different IP address.  The principal contribution of
   this specification is to allow multiple sessions to be created
   automatically, without additional configuration overhead or address
   consumption.

   In addition to the simple mode of supporting one AFI/SAFI per
   connection, the procedures described here also permit arbitrary
   grouping of AFI/SAFI onto BGP connections.  For such grouping to
   function pleasingly, both peers participating in a connection need to



Expires May 2004                                                [Page 8]


INTERNET DRAFT              Multisession BGP               November 2003


   agree on what AFI/SAFI groupings will be used.  If conflicting
   groupings are configured, the connections may not establish, or more
   connections may be established than were expected (in the degenerate
   case, one connection per AFI/SAFI could be established despite
   configured groupings).  We observe that the potential for misbehavior
   in the presence of conflicting configuration is not unusual in BGP,
   and that support for, and configuration of grouping is purely
   optional.


8. Acknowledgements

   To be supplied.


9. References

[BGP4]
     Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)," RFC
     1771, March 1995.

[BGP-DRAFT]
     Rekhter, Y., T. Li and S. Hares, "A Border Gateway Protocol 4
     (BGP-4)," Work in Progress (draft-ietf-idr-bgp4-20), April 2003.

[MP-BGP]
     Bates, T., R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol Exten-
     sions for BGP-4," Work in Progress (draft-ietf-idr-rfc2858bis-03),
     July 2003.

[BGP-GR]
     Sangli, S., Y. Rekhter, R. Fernando, J. Scudder, E. Chen, "Graceful
     Restart Mechanism for BGP," Work in Progress (draft-ietf-idr-
     restart-06), January 2003.

[BGP-CAP]
     Chandra, R., J. Scudder, "Capabilities Advertisement with BGP-4,"
     RFC 2842, May 2000.

[BGP-DYN-CAP]
     Chen, E. and S. Sangli, "Dynamic Capability for BGP-4," Work in
     Progress (draft-ietf-idr-dynamic-cap-03), December 2002.









Expires May 2004                                                [Page 9]


INTERNET DRAFT              Multisession BGP               November 2003


10. Security Considerations

   This document introduces no new security vulnerabilities to BGP or
   other specifications referenced in this document.


11. IANA Considerations

   TBD


12. Authors' Addresses

   John G. Scudder
   Cisco Systems, Inc.
   100 S. Main Suite 200
   Ann Arbor, MI 48104
   Email: jgs@cisco.com

   Chandra Appanna
   Cisco Systems, Inc.
   170 West Tasman Drive
   San Jose, CA 95134
   e-mail: achandra@cisco.com


13. Full Copyright Statement

   Copyright (C) The Internet Society (2003).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this doc-
   ument itself may not be modified in any way, such as by removing the
   copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of develop-
   ing Internet standards in which case the procedures for copyrights
   defined in the Internet Standards process must be followed, or as
   required to translate it into languages other than English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING



Expires May 2004                                               [Page 10]


INTERNET DRAFT              Multisession BGP               November 2003


   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER-
   CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.















































Expires May 2004                                               [Page 11]