CCAMP Working Group                                              Yi Lin
Internet Draft                                      Huawei Technologies
Intended status: Standards Track                       November 3, 2019
Expires: May 2020




           RSVP-TE Extensions in Support of Proactive Protection
             draft-lin-ccamp-gmpls-proactive-protection-00.txt


Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on May 3, 2020.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document. Code Components extracted from this
   document must include Simplified BSD License text as described in




Yi Lin                  Expires May 3, 2020                   [Page 1]


Internet-Draft          GMPLS Proactive Protection       November 2019


   Section 4.e of the Trust Legal Provisions and are provided without
   warranty as described in the Simplified BSD License.

Abstract

   This document describes protocol-specific procedures and extensions
   for Generalized Multi-Protocol Label Switching (GMPLS) Resource
   ReSerVation Protocol - Traffic Engineering (RSVP-TE) signaling to
   support Label Switched Path (LSP) Proactive Protection, which create
   the protection LSP after a failure is predicted and before it
   becomes a real failure.

Table of Contents

   1. Introduction .................................................. 2
   2. Conventions used in this document ............................. 3
   3. Overview of Predicted Failure and Related Recovery Methods .... 3
      3.1. Predicted Failure ........................................ 3
      3.2. Proactive Protection ..................................... 4
   4. Modified PROTECTION Object Format ............................. 5
   5. Extension to ERROR_SPEC Object ................................ 6
      5.1. New Error Code / Sub-code ................................ 6
      5.2. New TLV in ERROR_SPEC Object ............................. 6
   6. End-to-end Proactive Protection ............................... 7
      6.1. Creation of the Protected LSP ............................ 7
      6.2. Notification of Predicted Failure Event .................. 7
      6.3. Tearing Down of the Protection LSP ....................... 8
   7. Proactive Segment Protection .................................. 8
      7.1. Creation of the Protected LSP ............................ 8
      7.2. Notification of Predicted Failure Event .................. 9
      7.3. Tearing Down of the Segment Recovery LSP ................. 9
      7.4. Priority and Resource Pre-emption ....................... 10
   8. Consideration of Backward Compatibility ...................... 11
   9. Security Considerations ...................................... 11
   10. IANA Considerations ......................................... 11
   11. References .................................................. 12
      11.1. Normative References ................................... 12
      11.2. Informative References ................................. 12
   12. Authors' Addresses .......................................... 12



1. Introduction

   [RFC4872] and [RFC4873] describe protocol-specific procedures and
   extensions for GMPLS RSVP-TE signaling to support end-to-end LSP



Yi Lin                  Expires May 3, 2020                   [Page 2]


Internet-Draft          GMPLS Proactive Protection       November 2019


   recovery (including protection and restoration) and segment LSP
   recovery, respectively.

   Traditional protection solution (e.g., 1+1 or 1:1 protection) could
   have very fast protection switch after failure happens, but takes
   twice of resource in the network during the whole lifetime of the
   LSP. On the other hand, the traditional restoration solution has
   much higher resource use, but the recovery of the LSP is much
   slower, due to the additional signaling time to create the
   restoration LSP.

   In order to reduce the recovery resource while keeping the very fast
   protection switch, an approach is to use the failure prediction
   technologies and to create 1+1 or 1:1 protection only when a
   potential failure is predicted. This approach refers to "Proactive
   Protection" in this document.

   This document extends the RSVP-TE protocol to support the control of
   the Proactive Protection.

2. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3. Overview of Predicted Failure and Related Recovery Methods

3.1. Predicted Failure

   In most cases, there will be some indications before a physical
   failure happens in a network. For example, abnormal fluctuation of
   noise of a lightpath, BER (Bit Error Rate) (before error correction)
   rising, temperature rising of a transponder.

   Therefore, by monitoring on certain physical parameters and
   analyzing the change tendency using, for example, Machine Learning
   (ML) or other technologies, a node is possible to predict whether
   failure will happen in an upcoming period of time.

   Note that a predicted failure is different from a Signal Degrade in
   that:

   -  When Signal Degrade happens to a connection, the connection is
      still available but the quality of the signal carried by this


Yi Lin                  Expires May 3, 2020                   [Page 3]


Internet-Draft          GMPLS Proactive Protection       November 2019


      connection has declined and is lower than the predetermined
      threshold. For example, the BER of a connection rises and is out
      of tolerance.

   -  When a predicted failure of a connection is inferred, no failure
      nor degradation happens at present, but there is a trend that
      after a period of time, failure will probably happen, which will
      cause Signal Fail or Signal Degrade.

   The methods to predict failures are outside the scope of this
   document.

3.2. Proactive Protection

   The "Proactive Protection" refers to an LSP protection approach
   which create the protection LSP after a failure is predicted and
   before it becomes a real failure. Both end-to-end protection
   (defined in [RFC4872] and segment protection (defined in [RFC4873])
   are applicable for the Proactive Protection.

   The main procedure of Proactive Protection is shown in Figure 1:


         |-> Predicted failure notification received
         |   |-> Proactive Protection path created
         |   |               |-> Real failure happens
         |   |               | |-> Protection switch finished
         |   |               | |
         |   |               | |     Protection path deleted <-|
         |   |               | |     if no failure happened    |
         |   |               | |                               |
         |   |        t3     | |                          t6   |
      ---+---+--------+======x=+==========================+----+---> t
         t1  t2       |     t4 t5                         |    t7
                      |                                   |
                      |<--Predicted failure time period-->|

                Figure 1: Overview of Proactive Protection

   -  t1: The protection source node of an LSP is notified that a
      failure will probably happen during t3~t6, so it starts to create
      1+1 or 1:1 protection of the connection. Here the protection
      source node can be the source node of the LSP (for end-to-end
      protection case), or a branch node located between the source node
      and the predicted failure point of the LSP (for segment protection
      case).



Yi Lin                  Expires May 3, 2020                   [Page 4]


Internet-Draft          GMPLS Proactive Protection       November 2019


      t2: The 1+1 or 1:1 protecting path is created between the
      protection source node and the protection destination node. Here
      the protection destination node can be the destination node of the
      LSP (for end-to-end protection case), or a merge node located
      between the predicted failure point and the destination node of
      the LSP (for segment protection case).

   -  t4: If real failure happens as predicted, the 1+1 or 1:1
      protection switch will be triggered.

   -  t5: Protection switch finished and the service in the connection
      is recovered.

   -  t7: If in fact the predicted failure didn't happen, and no further
      predicted failure notification received, the protection source
      node MAY tear down the protecting path after t6, in order to save
      the network resource.

4. Modified PROTECTION Object Format

   This document modifies the PROTECTION object (C-Type=2) by adding
   two new bits T and A in reserved fields, as shown in Figure 2 below:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Length             | Class-Num(37) |  C-Type (2)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|P|N|O|T|  Res.   | LSP Flags |     Reserved      | Link Flags|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |I|R|A|  Reserved   | Seg.Flags |           Reserved            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 2: The modified PROTECTION object (C-Type=2)

   -  T (Triggered End-to-end Proactive Protection): 1 bit, when set
      (1), it indicates that the end-to-end Proactive Protection are
      required.

     Note that if T bit is set (1), the LSP Flags SHOULD be one of:
        0x04    1:N Protection with Extra-Traffic
        0x08    1+1 Unidirectional Protection
        0x10    1+1 Bidirectional Protection

   -  A (proActive Segment Protection): 1 bit, when set (1), it
      indicates that the Proactive Segment Protection are required.



Yi Lin                  Expires May 3, 2020                   [Page 5]


Internet-Draft          GMPLS Proactive Protection       November 2019


     Note that If A bit is set (1), the Seg. Flags SHOULD be one of:
        0x04    1:N Protection with Extra-Traffic
        0x08    1+1 Unidirectional Protection
        0x10    1+1 Bidirectional Protection

   See [RFC4872] and [RFC4873] for the definition of other fields.

5. Extension to ERROR_SPEC Object

5.1. New Error Code / Sub-code

   A new Error Sub-code under Error Code "25 - Notify Error" is defined
   in this document, which is used to notify the event of a predicted
   failure:

   Error Code = 25: "Notify Error" (see [RFC3209])

   Error Sub-code = TBA: "Notify Error/LSP Local Predicted Failure"

5.2. New TLV in ERROR_SPEC Object

   When predicting a failure, a certain time before which the failure
   may happen may also be predicted. This time information is useful
   for the source node to know how long it should wait for the
   predicted failure to become a real failure, and to decide when it's
   safe to tear down the protection LSP if the predicted failure didn't
   happen.

   A new TLV in IPv4/IPv6 IF_ID ERROR_SPEC Object is defined in this
   document, which is used to indicate the time before which the
   predicted failure will probably become real failure. The format of
   this new TLV is shown in Figure 3 below:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Type = TBA           |          Length = 8           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              Time                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             Figure 3: New TLV (type=TBA) in ERROR_SPEC Object

   -  Type: TBA

   -  Length: 8



Yi Lin                  Expires May 3, 2020                   [Page 6]


Internet-Draft          GMPLS Proactive Protection       November 2019


   -  Time: A relative time measured in second, which indicates within
      how many seconds (from the current time) the predicted failure
      will probably become real failure.

6. End-to-end Proactive Protection

6.1. Creation of the Protected LSP

   To create an LSP with recovery type of "End-to-end Proactive
   Protection", the source node of the LSP generates a Path message
   with a PROTECTION object included. The T bit in the PROTECTION
   object MUST be set to 1 (End-to-end Proactive Protection), so that
   all other nodes along the LSP can start the failure prediction
   function on related links/nodes.

   Note that the N bit in the PROTECTION object is used to indicate
   whether the control plane message exchange is only used for
   notification or for protection-switching purpose after real failure
   happens, see [RFC4872]. In other words, the N bit have nothing to do
   with the notification of a predicted failure before real failure
   happens.

   To allow the notification of predicted failure event to the source
   node by the Notify message, the NOTIFY REQUEST object MUST also be
   included in the Path message (see [RFC3473]), where the "Notify Node
   Address" SHOULD be the address of the source node of the LSP.

6.2. Notification of Predicted Failure Event

   When an intermediate node on an LSP infers that a failure will
   happen and will affect the LSP, a Notify message will be sent to the
   source node of the LSP, to inform such predicted failure event. A
   new error code/sub-code "Notify Error/LSP Local Predicted Failure"
   is used in the ERROR_SPEC object or IF_ID_ERROR_SPEC object in the
   Notify message.

   The Notify message MAY also include a TLV (type = TBA) in the IPv4
   or IPv6 IF_ID_ERROR_SPEC object, to indicate the time before which
   the predicted failure will probably become real failure.

   On receiving the Notify message with error code/sub-code "Notify
   Error/LSP Local Predicted Failure", the source node of the LSP
   SHOULD trigger the procedure to create the protection LSP, according
   to the protection type indicated in the "LSP Flags" field of the
   PROTECTION object in the Path message for the protected LSP. The
   procedures of creating the protection LSP and the protection
   switching after real failure happens are described in [RFC4872].


Yi Lin                  Expires May 3, 2020                   [Page 7]


Internet-Draft          GMPLS Proactive Protection       November 2019


6.3. Tearing Down of the Protection LSP

   After the protected LSP is created, the source node MAY start a
   timer T_wait and wait for the predicted failure to become a real
   failure. If no real failure happens and no more notification of
   predicted failure is received till T_wait, the source node MAY
   trigger the procedure to tear down the protection LSP, according to
   local policy. See [RFC4872] about the process of tearing down a
   protection LSP.

   Implementations SHOULD allow this policy to be configured to provide
   a default across all LSPs on a node, but SHOULD also allow it to be
   configured per LSP.

   Note that the T_wait MUST longer than the time indicated in the TLV
   (type=TBA) in the ERROR_SPEC object in the Notify message, if the
   TLV exists.

   Note also that the value of T_wait is a local matter of the source
   node, and is outside the scope of this document.

7. Proactive Segment Protection

7.1. Creation of the Protected LSP

   To create an LSP with recovery type of "Proactive Segment
   Protection", the source node of the LSP generates a Path message,
   where:

   -  A PROTECTION object is included, where the A bit MUST be set to 1
      (Proactive Segment Protection), so that all nodes along the
      protected LSP can start the failure prediction function on related
      links/nodes if supported. The "Seg. Flags" are used to indicate
      the protection type of the Proactive Segment Protection.

   -  One or more SERO objects MAY included (i.e., explicit Proactive
      Segment Protection), indicating the branch node and the merge node
      of each segment recovery LSP. If no SERO object is included, it
      indicates that the dynamic Proactive Segment Protection method is
      used.

   -  A NOTIFY REQUEST object is included, where the Notify Node
      Address" SHOULD be the address of the source node of the LSP.

   For explicit Proactive Segment Protection, when a branch node
   receives a Path message with A bit set to 1 in the PROTECTION
   object, the branch node follows [RFC4873] to process the Path


Yi Lin                  Expires May 3, 2020                   [Page 8]


Internet-Draft          GMPLS Proactive Protection       November 2019


   message, except that the Path message for the recovery LSP will not
   be generated and be sent at this stage. Also, one more NOTIFY
   REQUEST object SHOULD be added to the Path message of the protected
   LSP, which carries the address of this branch node.

   For dynamic Proactive Segment Protection, when an intermediate node
   receives a Path message with A bit set to 1 in the PROTECTION
   object, the node will determine if it has the ability to be a branch
   node, as described in Section 6.2 of [RFC4873]. If yes, it follows
   the same procedure as what a branch node does in the case of
   explicit Proactive Segment Protection, as described above. If not,
   the node only follows the standard procedure to create the protected
   LSP.

7.2. Notification of Predicted Failure Event

   When an intermediate node between a pair of branch and merge nodes
   on an LSP infers that a failure will happen and will affect the LSP,
   a Notify message will be sent to the nearest branch node on the
   upstream direction of the LSP, to inform such predicted failure
   event. The error code/sub-code "Notify Error/LSP Local Predicted
   Failure" is used in the ERROR_SPEC object or IF_ID_ERROR_SPEC object
   in the Notify message.

   Similar to End-to-end Proactive Protection, the time before which
   the predicted failure may occur MAY also be included in the Notify
   message.

   On receiving the Notify message with error code/sub-code "Notify
   Error/LSP Local Predicted Failure", the branch node on the protected
   LSP SHOULD generate a new Path message, and send this new Path
   message along the recovery LSP between the branch and the merge
   nodes. The procedures of generating new Path message and creating
   the recovery LSP are the same as what is described in [RFC4873],
   except that the A bit in the PROTECTION object of this new Path
   message MUST set to 1.

7.3. Tearing Down of the Segment Recovery LSP

   After the segment recovery LSP is created, the branch node MAY start
   a timer T_wait and wait for the predicted failure to become a real
   failure. If no real failure happen and no more notification of
   predicted failure is received till T_wait, the branch node MAY
   trigger the procedure to tear down the segment recovery LSP,
   according to local policy. See [RFC4873] about the process of
   tearing down a segment recovery LSP.



Yi Lin                  Expires May 3, 2020                   [Page 9]


Internet-Draft          GMPLS Proactive Protection       November 2019


   Implementations SHOULD allow this policy to be configured to provide
   a default across all LSPs on a node, but SHOULD also allow it to be
   configured per LSP.

   Note that the T_wait MUST longer than the time indicated in the TLV
   (type=TBA) in the ERROR_SPEC object in the Notify message, if the
   TLV exists.

   Note also that the value of T_wait is a local matter of the branch
   node, and is outside the scope of this document.

7.4. Priority and Resource Pre-emption

   It's possible that after recovery LSP is created and before the
   predicted failure becomes a real failure, another real failure
   happens on the LSP outside the protected segment. In this case, the
   source node (or an intermediate node in the upstream direction of
   the real failure) may start a restoration procedure to recover the
   LSP. For the same protected LSP, since recovering from a real
   failure always has higher priority than protecting against a
   predicted failure which still hasn't happened, the restoration LSP
   can pre-empt the resource of the segment recovery LSP.

   As shown in Figure 4, assume that node B (branch node) was notified
   of a predicted failure event between N-4 and M (merge node), and has
   created the segment recovery LSP along B, N-1, N-2, N-3 and M. If
   another failure between S (source node) and B happens before the
   predicted failure becomes a real failure, node S will try to create
   the restoration LSP. Since that resource is limited, the restoration
   LSP can pre-empt the resource of the segment recovery LSP between N-
   1 and N-3.

   The nodes along the segment recovery LSP has enough information to
   determine whether pre-emption is allowed. This is because these
   nodes know that:

   -  The current segment recovery LSP is used for Proactive Segment
      Protection through the A bit in the PROTECTION object;

   -  The segment recovery LSP and the restoration LSP are protecting
      the same LSP through the association relationship.








Yi Lin                  Expires May 3, 2020                  [Page 10]


Internet-Draft          GMPLS Proactive Protection       November 2019


                      |<------ Pre-emption ------>|
                      |                           |
     ***************************************************************
     *+---+         +---+         +---+         +---+         +---+*
     *|   +---------+N-1+---------+N-2+---------+N-3+---------+   |*
     *+-+-+         +-+-+         +---+         +-+-+         +-+-+*
     *  |             |###########################|             |  *
     *  |             |#                         #|             |  *
     *  |             |#                         #|             |  *
     *+-+-+         +-+-+         +---+         +-+-+         +-+-+*
   ***| S +----X----+ B +---------+N-4+----?----+ M +---------+ D |***
      +---+         +---+         +---+         +---+         +---+
   ===================================================================

     S: Source node     D: Destination node
     B: Branch node     M: Merge node
     X: Real failure    ?: Predicted failure (haven't happened yet)

     =====: Protected LSP
     #####: Segment Recovery LSP
     *****: Restoration LSP

             Figure 4: Resource pre-emption by restoration LSP

8. Consideration of Backward Compatibility

   TBD.

   [Editor's note]: will add some description about interwork with
   legacy nodes which do not support the function of failure prediction
   and reporting.

9. Security Considerations

   TBD.

10. IANA Considerations

   IANA assigns values to RSVP protocol parameters. Within the current
   document, a new Error code/sub-code value is defined:

   Error Code = 25: "Notify Error" (see [RFC3209])

      o  "Notify Error/LSP Local Predicted Failure"   (TBA)





Yi Lin                  Expires May 3, 2020                  [Page 11]


Internet-Draft          GMPLS Proactive Protection       November 2019


11. References

11.1. Normative References

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, DOI
             10.17487/RFC2119, March 1997.

   [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V.,
             and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP
             Tunnels", RFC 3209, December 2001.

   [RFC3473] Berger, L., Ed., "Generalized Multi-Protocol Label
             Switching (GMPLS) Signaling Resource ReserVation Protocol-
             Traffic Engineering (RSVP-TE) Extensions", RFC 3473,
             January 2003.

   [RFC4872] Lang, J., Ed., Rekhter, Y., Ed., and D. Papadimitriou,
             Ed., "RSVP-TE Extensions in Support of End-to-End
             Generalized Multi-Protocol Label Switching (GMPLS)
             Recovery", RFC 4872, May 2007.

   [RFC4873] Berger, L., Bryskin, I., Papadimitriou, D., and A. Farrel,
             "GMPLS Segment Recovery", RFC 4873, May 2007.

   [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
             2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
             May 2017.

11.2. Informative References

   [RFC4426] Lang, J., Ed., Rajagopalan, B., Ed., and D. Papadimitriou,
             Ed., "Generalized Multi-Protocol Label Switching (GMPLS)
             Recovery Functional Specification," RFC 4426, March 2006.

12. Authors' Addresses

   Yi Lin
   Huawei Technologies
   F3 R&D Center, Huawei Industrial Base,
   Bantian, Longgang District,
   Shenzhen 518129 P.R.China
   Email: yi.lin@huawei.com






Yi Lin                  Expires May 3, 2020                  [Page 12]