Last Call Review of draft-ietf-pce-stateful-pce-18
review-ietf-pce-stateful-pce-18-opsdir-lc-morand-2017-03-16-00

Request Review of draft-ietf-pce-stateful-pce
Requested rev. no specific revision (document currently at 21)
Type Last Call Review
Team Ops Directorate (opsdir)
Deadline 2017-02-28
Requested 2017-02-14
Authors Edward Crabbe, Ina Minei, Jan Medved, Robert Varga
Draft last updated 2017-03-16
Completed reviews Opsdir Last Call review of -18 by Lionel Morand (diff)
Genart Last Call review of -18 by Joel Halpern (diff)
Assignment Reviewer Lionel Morand
State Completed
Review review-ietf-pce-stateful-pce-18-opsdir-lc-morand-2017-03-16
Reviewed rev. 18 (document currently at 21)
Review result Has Issues
Review completed: 2017-03-16

Review
review-ietf-pce-stateful-pce-18-opsdir-lc-morand-2017-03-16

I have reviewed this document as part of the Operational directorate's ongoing effort to review all IETF documents being processed by the IESG.  These comments were written with the intent of improving the operational aspects of the IETF drafts. Comments that are not addressed in last call may be included in AD reviews during the IESG review.  Document editors and WG chairs should treat these comments just like any other last call comments. 

I've pointed out some "issues" that might not be real issues for people involved in PCE but that could be at least clarified for readers of this document. Detailed comments are provided below.

Regards,

Lionel

********************
2.  Terminology

   This document uses the following terms defined in [RFC5440]: PCC,
   PCE, PCEP Peer, PCEP Speaker.

   This document uses the following terms defined in [RFC4655]: TED.

   This document uses the following terms defined in [RFC3031]: LSP.

   This document uses the following terms defined in
   [I-D.ietf-pce-stateful-pce-app]: Stateful PCE, Passive Stateful PCE,
   Active Stateful PCE, Delegation, LSP State Database.

[LM] I didn't find a clear definition of the LSP state, except through the definition of the LSP State database in ietf-pce-stateful-pce-app, i.e.

   The second is the LSP
   State Database (LSP-DB), in which a PCE stores attributes of all
   active LSPs in the network, such as their paths through the network,
   bandwidth/resource usage, switching types and LSP constraints.

As this document heavily relies on this LSP state concept, it could be useful to (re-)define it in this document or to find a 
correct reference for it.


3.2.  Objectives

   The objectives for the protocol extensions to support stateful PCE
   described in this document are as follows:

[...]

   o  Allow a PCC to delegate control of its LSPs to an active stateful
      PCE such that a given LSP is under the control of a single PCE at
      any given time.  A PCC may revoke this delegation at any time
      during the lifetime of the LSP.  If LSP delegation is revoked
      while the PCEP session is up, the PCC MUST notify the PCE about
      the revocation.  A PCE may return an LSP delegation at any point
      during the lifetime of the PCEP session.

[LM] I'm assuming that the PCE returning the delegation has also to notify the PCC. If so, maybe:
     
     "the If LSP delegation is returned by the PCE while the PCEP 
      session is up, the PCE MUST notify the PCE about the revocation."

[LM] the bullet point could be then splitted into two bullets, one for PCC, one for PCE.

5.4.  Capability Advertisement

   During PCEP Initialization Phase, PCEP Speakers (PCE or PCC)
   advertise their support of stateful PCEP extensions.  A PCEP Speaker
   includes the "Stateful PCE Capability" TLV, described in
   Section 7.1.1, in the OPEN Object to advertise its support for PCEP
   stateful extensions.  The Stateful Capability TLV includes the 'LSP
   Update' Flag that indicates whether the PCEP Speaker supports LSP
   parameter updates.

   The presence of the Stateful PCE Capability TLV in PCC's OPEN Object
   indicates that the PCC is willing to send LSP State Reports whenever
   LSP parameters or operational status changes.

   The presence of the Stateful PCE Capability TLV in PCE's OPEN message
   indicates that the PCE is interested in receiving LSP State Reports
   whenever LSP parameters or operational status changes.

[LM] is it "willing/interested for" or just a capability indication? It is not the same thing from a procedure point of view.

   The PCEP extensions for stateful PCEs MUST NOT be used if one or both
   PCEP Speakers have not included the Stateful PCE Capability TLV in
   their respective OPEN message.  If the PCEP Speaker on the PCC
   supports the extensions of this draft but did not advertise this
   capability, then upon receipt of PCUpd message from the PCE, it MUST
   generate a PCErr with error-type 19 (Invalid Operation), error-value
   2 (Attempted LSP Update Request if the stateful PCE capability was
   not advertised)(see Section 8.5) and it SHOULD terminate the PCEP
   session.  If the PCEP Speaker on the PCE supports the extensions of
   this draft but did not advertise this capability, then upon receipt
   of a PCRpt message from the PCC, it MUST generate a PCErr with error-
   type 19 (Invalid Operation), error-value 5 (Attempted LSP State
   Report if active stateful PCE capability was not advertised) (see
   Section 8.5) and it SHOULD terminate the PCEP session. 

[LM] why the recommendation for closing the session if an explicit error is sent back? The session could remain open i.e. except stateful PCEP extensions,everything else would be fine. If the session termination is the right thing to do, as we are in a clear error case (as the PCEP speaker should not send the report or the update), the SHOULD should probably be a MUST hee. 

[LM] Is there any reason to use a specific error in such a case? The PCE could just behave as a PCE not supporting the feature. Why is it important for the supporting PCEP speaker to make the difference? The generic behavior described in RFC5440 is not enough?

[LM] active/passive mode are not  advertized in PCEP.
s/if active stateful PCE capability was not advertised/if stateful PCE capability was not advertised

From RFC5440: "6.9.  Reception of Unknown Messages

   A PCEP implementation that receives an unrecognized PCEP message MUST
   send a PCErr message with Error-value=2 (capability not supported).

   If a PCC/PCE receives unrecognized messages at a rate equal or
   greater than MAX-UNKNOWN-MESSAGES unknown message requests per
   minute, the PCC/PCE MUST send a PCEP CLOSE message with close
   value="Reception of an unacceptable number of unknown PCEP message".
   A RECOMMENDED value for MAX-UNKNOWN-MESSAGES is 5.  The PCC/PCE MUST
   close the TCP session and MUST NOT send any further PCEP messages on
   the PCEP session."

[LM] If it is the case, the error handling would be the same between a PCEP speaker supporting the feature but not advertizing it and a PCEP speaker not supporting the feature. And it would not be possible possible to use the specific error responses to infer the capability supported by the other node.

   Note that even if the update
   capability has not been advertised, a PCE can still accept LSP Status
   Reports from a PCC and build and maintain an up to date view of the
   state of the PCC's LSPs.

[LM] I don't undersand. Is it not in contradiction with
  
  "If the PCEP Speaker on the PCE supports the extensions of
   this draft but did not advertise this capability, then upon receipt
   of a PCRpt message from the PCC, it MUST generate a PCErr with error-
   type 19 (Invalid Operation), error-value 5 (Attempted LSP State
   Report if active stateful PCE capability was not advertised) (see
   Section 8.5) and it SHOULD terminate the PCEP session."

Or does it mean that there is another way than PCRpt message for the PCC to send LSP status reports to the PCE?

[LM] In this section, it is not described how non-supporting PCEP speaker will hanlded the new TLV. It could be said that unrecognized TLVs will be ignored (as per RFC5440) and OPEN messages will be handled as per RFC5440 (if the comment above is accepted).

[LM] Would it be useful to discover (using another TLV) whether the PCE is an active/passive stateful PCE, as in IGP-based capabilities discovery mechanism?

5.5.  IGP Extensions for Stateful PCE Capabilities Advertisement

   When PCCs are LSRs participating in the IGP (OSPF or IS-IS), and PCEs
   are either LSRs or servers also participating in the IGP, an
   effective mechanism for PCE discovery within an IGP routing domain
   consists of utilizing IGP advertisements.  

[...]

   Note that while active and passive stateful PCE capabilities may be
   advertised during discovery, PCEP Speakers that wish to use stateful
   PCEP MUST negotiate stateful PCEP capabilities during PCEP session
   setup, as specified in the current document.  A PCC MAY initiate
   stateful PCEP capability negotiation at PCEP session setup even if it
   did not receive any IGP PCE capability advertisements.
 
[LM] As said above, an as stated in section 5.4:
  "The PCEP extensions for stateful PCEs MUST NOT be used if one or both
   PCEP Speakers have not included the Stateful PCE Capability TLV in
   their respective OPEN message."

What is the real added-value of this IGP-based mechanism? Only to be able to identify active PCE/passive PCE mode?

5.6.  State Synchronization

[LM] "State" could be misleading. "LSP State Sync" would be more approriate/accurate I think.

   The purpose of State Synchronization is to provide a checkpoint-in-
   time state replica of a PCC's LSP state in a PCE.  State
   Synchronization is performed immediately after the Initialization
   phase ([RFC5440]).

   During State Synchronization, a PCC first takes a snapshot of the
   state of its LSPs state, then sends the snapshot to a PCE in a
   sequence of LSP State Reports.  Each LSP State Report sent during
   State Synchronization has the SYNC Flag in the LSP Object set to 1.
   The set of LSPs for which state is synchronized with a PCE is
   determined by advertised stateful PCEP capabilities and PCC's local
   configuration (see more details in Section 9.1).

[LM] It seems that :

   "The set of LSPs for which state is synchronized"

should be:

   "The set of LSPs for which state is to be synchronized"

[LM] I don't see how the "advertised stateful PCEP capabilities" have an effect on the set of LSP to synchronize. The capabilities adv is only used to discover stateful PCE capabilities and see if the related PCEP extensions can be used. The identification of the LSPs to synchronize seems to be only based on policies configured in the PCC.

   The end of synchronization marker is a PCRpt message with the SYNC
   Flag set to 0 for an LSP Object with PLSP-ID equal to the reserved
   value 0 (see Section 7.3).  In this case, the LSP Object SHOULD NOT
   include the SYMBOLIC-PATH-NAME TLV and SHOULD include the LSP-
   IDENTIFIERS TLV with the special value of all zeroes.  The PCRpt
   message MUST include an empty ERO as its intended path and SHOULD NOT
   include the optional RRO object for its actual path.

[LM] What is the reason for the use of "SHOULD" here, instead of "MUST"?

   If the PCC has
   no state to synchronize, it SHOULD only send the end of
   synchronization marker.

[LM] Is it when there is no active LSP? If it is, it could be useful to clarify it. And it is maybe not so related to the text just before. It could be clarified in a separate paragraph.

   A PCE SHOULD NOT send PCUpd messages to a PCC before State
   Synchronization is complete.  A PCC SHOULD NOT send PCReq messages to
   a PCE before State Synchronization is complete.  This is to allow the
   PCE to get the best possible view of the network before it starts
   computing new paths.

[LM] it is obviously assumed that this requirement is true for PCC/PCE explicitly advertizing support of stateful PCE capability.

   If the PCC encounters a problem which prevents it from completing the
   state transfer, it MUST send a PCErr message with error-type 20 (LSP
   State Synchronization Error) and error-value 5 (indicating an
   internal PCC error) to the PCE and terminate the session.

[LM] s/state transfer/LSP state synchornization

5.7.1.  Delegating an LSP

   A PCC delegates an LSP to a PCE by setting the Delegate flag in LSP
   State Report to 1.  If the PCE does not accept the LSP Delegation, it
   MUST immediately respond with an empty LSP Update Request which has
   the Delegate flag set to 0.  If the PCE accepts the LSP Delegation,
   it MUST set the Delegate flag to 1 when it sends an LSP Update
   Request for the delegated LSP (note that this may occur at a later
   time).  The PCE MAY also immediately acknowledge a delegation by
   sending an empty LSP Update Request which has the Delegate flag set
   to 1.

[LM] if the delegation is not aceepted by the PCE, I assume that the PCC should not set the Delegate flag in subsequent LSP state report.
If it is, this could be clarified somewhere in this section.

[..]

   Note that for an LSP to remain delegated to a PCE, the PCC MUST set
   the Delegate flag to 1 on each LSP Status Report sent to the PCE.

[LM] s/each LSP Status Report/each LSP State Report for this LSP.

5.7.2.1.  Explicit Revocation

   When a PCC decides that a PCE is no longer permitted to modify an
   LSP, it revokes that LSP's delegation to the PCE.  A PCC may revoke
   an LSP delegation at any time during the LSP's life time.  A PCC
   revoking an LSP delegation MAY immediately clear the LSP state
   provided by the PCE, but to avoid traffic loss, it SHOULD do so in a
   make-before-break fashion.

[LM] After PCE delegation revocation, is there really a requirement for LSP state clean-up? The revocation is used to remove the authorization of LSP state update to the PCE but I don't see why the PCC would clear LSP state after revocation. The LSP state can be updated using operator-defined default parameters, right?

   If the PCC has received but not yet acted
   on PCUpd messages from the PCE for the LSP whose delegation is being
   revoked, then it SHOULD ignore these PCUpd messages when processing
   the message queue.  All effects of all messages for which processing
   started before the revocation took place MUST be allowed to complete
   and the result MUST be given the same treatment as any LSP that had
   been previously delegated to the PCE (e.g. the state MAY be
   immediately cleared).  

[LM] If I correctly understood the text above, after revocation of the PCE delegation,
- any PCUpd should be rejected and not ignored
- the LSP state(s) that was delegated to the PCE cannot be changed until all the messages in the message queue have been processed. 

If I'm correct, the text above could be clarified.  

   Any further PCUpd messages from the PCE are
   handled according to the PCUpd procedures described in this document.

[LM] Not understood. Further PCUpd "will result in the PCC sending a PCErr message" as said after the figure.

5.7.2.2.  Revocation on Redelegation Timeout

   When a PCC's PCEP session with a PCE terminates unexpectedly, the PCC
   MUST wait the time interval specified in Redelegation Timeout
   Interval before revoking LSP delegations to that PCE and attempting
   to redelegate LSPs to an alternate PCE.  If a PCEP session with the
   original PCE can be reestablished before the Redelegation Timeout
   Interval timer expires, LSP delegations to the PCE remain intact.

[LM] In case of PCEP session interruption, is there a requirement for the PCE to update delegated LSP states in order to ensure the synchronization between states in the PCE and in the PCC? 

   Likewise, when a PCC's PCEP session with a PCE terminates
   unexpectedly, the PCC MUST wait for the State Timeout Interval before
   flushing any LSP state associated with that PCE.

[LM] it should be clarified that the "flushing" applies only if there is no other PCE. Otherwise, see section 5.7.4

   Note that the State
   Timeout Interval timer may expire before the PCC has redelegated the
   LSPs to another PCE, for example if a PCC is not connected to any
   active stateful PCE or if no connected active stateful PCE accepts
   the delegation.  

[LM] Not sure to understand. Is it "if a PCC is not connected to any OTHER active stateful PCE or if no OTHER connected active stateful PCE accepts the delegation?

   In this case, the PCC MUST flush any LSP state set
   by the PCE upon expiration of the State Timeout Interval and revert
   to operator-defined default parameters or behaviors.  This operation
   SHOULD be done in a make-before-break fashion.

[LM] I think it is important to make the distinction between PCC with only one PCE and PCC with N PCE. The "Note that" could be put in a separate section.

   The State Timeout Interval MUST be greater than or equal to the
   Redelegation Timeout Interval and MAY be set to infinity (meaning
   that until the PCC specifically takes action to change the parameters
   set by the PCE, they will remain intact).

[LM] reading "the State Timeout Interval timer may expire before the PCC has redelegated the LSPs to another PCE" could be understood as STI timer < RTI timer. And here, it is stated STI timer >= RTI timer. Depending on the clarification on the previous comment, this text could be also clarified (or not)

5.7.3.  Returning a Delegation

   In order to keep a delegation, a PCE MUST set the Delegate flag to 1
   on each LSP Update Request sent to the PCC.  A PCE that no longer
   wishes to update an LSP's parameters SHOULD return the LSP delegation
   back to the PCC by sending an empty LSP Update Request which has the
   Delegate flag set to 0.  If a PCC receives an LSP Update Request with
   the Delegate flag set to 0 (whether the LSP Update Request is empty
   or not), it MUST treat this as a delegation return.

                     +-+-+                    +-+-+
                     |PCC|                    |PCE|
                     +-+-+                    +-+-+
                       |                        |
                       |---PCRpt, Delegate=1--->| LSP delegated
                       |            .           |
                       |            .           |
                       |            .           |
                       |<--PCUpd, Delegate=0----| Delegation returned
                       |                        |
                       |---PCRpt, Delegate=0--->| No delegation for LSP
                       |                        |

                     Figure 6: Returning a Delegation

   If a PCC cannot delegate an LSP to a PCE (for example, if a PCC is
   not connected to any active stateful PCE or if no connected active
   stateful PCE accepts the delegation), the LSP delegation on the PCC
   will time out within a configurable Redelegation Timeout Interval and
   the PCC MUST flush any LSP state set by a PCE at the expiration of
   the State Timeout Interval.

[LM] same comment: is it "for example, if a PCC is not connected to any OTHER active stateful PCE or if no OTHER connected active stateful PCE accepts the delegation"?

[LM] "the PCC MUST flush any LSP state set" should be completed by "and revert to operator-defined default parameters or behaviors", right?

5.7.5.  Redelegation on PCE Failure

[...]

   If the PCE crashes but recovers within the Redelegation Timeout, both
   the delegation state and the LSP state are kept intact.

[LM] "Recover" here seems to be associated with the PCEP session (as linked to the Redelegation Timeout), not with the LSP states maintained by the PCE.

[LM] How can the PCC be ensured that the PCE has not been impacted by the stop/restart and that LSP states are intact? Is there any need for a sanity check?

   If the PCE crashes but does not recover within the Redelegation
   Timeout, the delegation state is returned to the PCC.  If the PCC can
   redelegate the LSPs to another PCE, and that PCE accepts the
   delegations, there will be no change in LSP state.  If the PCC cannot
   redelegate the LSPs to another PCE, then upon expiration of the State
   Timeout Interval, the state set by the PCE is flushed, which may
   cause change in the LSP state.

[LM] the last sentence is difficult to understand. How to understand "flush the state MAY cause change in the LSP state"? It would like talking about two different states in the PCC.
[LM] about "flushed", should we add "and revert to operator-defined default parameters or behaviors"?

[...]

   If there is a standby PCE, the Redelegation Timeout may be set to 0
   through policy on the PCC, causing the LSPs to be redelegated
   immediately to the PCC, which can delegate them immediately to the
   standby PCE.  Assuming the State Timeout Interval is greater than the
   Redelegation Timeout, and assuming the standby PCE takes similar
   decisions as the failed PCE, the LSP state will be kept intact.

[LM] the first "assuming" is not an assumption but a requirement, according to "The State Timeout Interval MUST be greater than or equal to the Redelegation Timeout Interval"

5.8.1.  Passive Stateful PCE Path Computation Request/Response

                     +-+-+                    +-+-+
                     |PCC|                    |PCE|
                     +-+-+                    +-+-+
                       |                        |
   1) Path computation |----- PCReq message --->|
      request sent to  |                        |2) Path computation
      PCE              |                        |   request received,
                       |                        |   path computed
                       |                        |
                       |<---- PCRep message ----|3) Computed paths
                       |     (Positive reply)   |   sent to the PCC
                       |     (Negative reply)   |
   4) LSP Status change|                        |
      event            |                        |
                       |                        |
   5) LSP Status Report|----- PCRpt message --->|
      sent to all      |            .           |
      stateful PCEs    |            .           |
                       |            .           |
   6) Repeat for each  |----- PCRpt message --->|
      LSP status change|                        |
                       |                        |

     Figure 7: Passive Stateful PCE Path Computation Request/Response

[LM] in the figure, please use "state" instead of "status"


7.1.  OPEN Object

   This document defines one new optional TLVs for use in the OPEN
   Object.

s/one new optional TLVs/one new optional TLV

7.2.  SRP Object

   The SRP (Stateful PCE Request Parameters) object MUST be carried
   within PCUpd messages and MAY be carried within PCRpt and PCErr
   messages.  The SRP object is used to correlate between update
   requests sent by the PCE and the error reports and state reports sent
   by the PCC.

[LM] Should we add the following text at the end of the section?

  "Optional TLVs may be included within the SRP object body. The 
   specification of such TLVs is outside the scope of this document."

7.3.  LSP Object

[...]

   TLVs that may be included in the LSP Object are described in the
   following sections.

[LM] I think that it is possible to add extra optional TLV, not defined in this document. This should be clarified if I'm correct.