Early Review of draft-ietf-pals-endpoint-fast-protection-00
review-ietf-pals-endpoint-fast-protection-00-rtgdir-early-chen-2015-08-25-00

Request Review of draft-ietf-pals-endpoint-fast-protection
Requested rev. no specific revision (document currently at 05)
Type Early Review
Team Routing Area Directorate (rtgdir)
Deadline 2015-08-25
Requested 2015-08-10
Authors Yimin Shen, Rahul Aggarwal, Wim Henderickx, Yuanlong Jiang
Draft last updated 2015-08-25
Completed reviews Genart Last Call review of -04 by Dale Worley (diff)
Secdir Last Call review of -04 by Chris Lonvick (diff)
Opsdir Last Call review of -04 by Susan Hares (diff)
Rtgdir Early review of -00 by Mach Chen (diff)
Rtgdir Early review of -00 by John Drake (diff)
Tsvart Last Call review of -04 by David Black (diff)
Assignment Reviewer Mach Chen 
State Completed
Review review-ietf-pals-endpoint-fast-protection-00-rtgdir-early-chen-2015-08-25
Reviewed rev. 00 (document currently at 05)
Review result Not Ready
Review completed: 2015-08-25

Review
review-ietf-pals-endpoint-fast-protection-00-rtgdir-early-chen-2015-08-25

Hello, 

I have been selected as the Routing Directorate reviewer for this draft. The Routing Directorate seeks to review all routing or routing-related drafts as they pass through IETF last call and IESG review, and sometimes on special request. The purpose of the review is to provide assistance to the Routing ADs. For more information about the Routing Directorate, please see 

http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir

 
Although these comments are primarily for the use of the Routing ADs, it would be helpful if you could consider them along with any other IETF Last Call comments that you receive, and strive to resolve them through discussion or by updating the draft.
Document: draft-ietf-pals-endpoint-fast-protection-00.txt 

Reviewer: Mach Chen
Review Date: 20, August
IETF LC End Date: 
Intended Status: Standards Track

Summary: 
This draft defines a fast mechanism for protecting pseudowires against egress attachment circuit failure, egress PE failure, multi-segment PW terminating PE failure, and multi-segment PW switching PE failure. The solution works but based on a lot of assumptions that are not explicitly and detailed discussed in the document. Which include PLR determination, context identifier advertisement, PSN tunnel protocols extensions, etc. I personally feel that the solution proposed in the document is a bit complicated, it's arguable whether the use case and requirement deserve such complicated solution, given that there are already many protections mechanisms existed. 

Comments:
Overall, I think the draft should at least resolve the following comments and questions before moving forward.

Major Issues:
Several places in the document state "it's outside the scope of this document", but they are critical to the solution and for interoperability IMHO, which should be detailed described in this document or in another parallel document (as a reference). 

Minor Issues:

1. 
Idnits tool shows that a lot of nits need to be fixed.

2.
Abstract

"This document specifies a fast mechanism for protecting pseudowires
   against egress endpoint failures,..."

Since S-PE and egress AC are also included, seems the "egress endpoint failures" may not be the right description. 

3. 
Introduction:
"In each direction between the PEs, PW packets are transported by a PSN tunnel, which is also called a transport tunnel." 

The "PSN tunnel" is a well-known and common used term for PW, seems no need to introduce a new term (transport tunnel) here.

4.
Section 1, the last sentence of 4th paragraph, similar to comment 2:

s/following egress endpoint failures/ following failures

5.
Section 4.1

s/If transport tunnels are LDP/ If transport tunnels are LDP based tunnels

6. 
Section 4.1:
"The mechanism is also assumed to be used in conjunction with global
   repair and control plane repair, in such a manner that the mechanism
   temporarily repairs traffic by using a bypass tunnel, and global
   repair and control plane repair eventually move traffic to a fully
   functional path."

Did you consider the situation where protections are also employed for the PSN tunnel?  E.g., the PSN tunnel is protected by some FRR mechanisms (this should be very typical), when there is failure between the PLR and the primary PE, which protection will take priority?  


7. 
Section 4.2:
"
A PLR can realize its role based on configuration or the signaling of
   transport tunnel.  For example, in the case where the transport
   tunnel is signaled by RSVP, the penultimate hop router can realize
   that it is the PLR for egress (T-)PE or S-PE failure based on the RRO
   in Resv message, which should indicate that the router is one hop
   away from the PE.  The detail of how this could be achieved on a per-
   protocol basis is out of the scope of this document."

PLR determination should be critical for this "endpoint protection" mechanism, I am not sure it is legitimately to state that it is out of scope here. Too many situations need to be considered here. E.g., How does an LSR know it should enable the PLR function? Is the "PLR" functions enabled by default or by some other means? Since a potential "PLR" (P router) has no information of the PW, it should not know which PSN tunnel a PW will bind to, how can a PLR realize its role for a PW? And when ECMP is employed, when topology is changed with time, PLR determination should more difficult (especially for LDP based tunnel), no matter through static configuration or some form of signaling. It needs more clarification/discussion here. And presumably, some extensions to the PSN tunnel protocols may need IMHO.

8.
Section 4.3.2:

This section talks how to advertise context identifier, but it just gives a high level introduction and states the detail is outside of scope. Separate the advertisement to another document is fine, but there should be valid reference. Because there are many thing that related to this context identifier. For example, how an ingress PE know a PW should be bound to a tunnel with destination of the context identifier rather than the address of the primary PE? And how does a PLR know that an IP address is the context identifier and then setup a bypass tunnel to it? How to make sure that the ingress PE will setup or resolve a tunnel to the primary PE rather than the protector? (considering the topologies and metrics of other links may change at any time)

9. 
Section 4.6

How does a PLR know whether a MPLS or IP tunnel should be established? 


10. Section 6.3 PW Label Distribution from Backup PE to Protector

" ...This Protection FEC Element
   MUST be identical to the Protection FEC Element TLV that the primary
   PE advertises to the protector (Section 6.2).  The context identifier
   SHOULD NOT be encoded in Interface_ID TLV in this message."

How does the backup PE know the Protection FEC information of the primary PW? Configured on the backup PE? Why not just the context identifier to correlate the protected PWs?


Best regards,
Mach