Last Call Review of draft-ietf-pals-endpoint-fast-protection-04
review-ietf-pals-endpoint-fast-protection-04-tsvart-lc-black-2016-12-05-00

Request Review of draft-ietf-pals-endpoint-fast-protection
Requested rev. no specific revision (document currently at 05)
Type Last Call Review
Team Transport Area Review Team (tsvart)
Deadline 2016-12-06
Requested 2016-11-22
Authors Yimin Shen, Rahul Aggarwal, Wim Henderickx, Yuanlong Jiang
Draft last updated 2016-12-05
Completed reviews Genart Last Call review of -04 by Dale Worley (diff)
Secdir Last Call review of -04 by Chris Lonvick (diff)
Opsdir Last Call review of -04 by Susan Hares (diff)
Rtgdir Early review of -00 by Mach Chen (diff)
Rtgdir Early review of -00 by John Drake (diff)
Tsvart Last Call review of -04 by David Black (diff)
Assignment Reviewer David Black 
State Completed
Review review-ietf-pals-endpoint-fast-protection-04-tsvart-lc-black-2016-12-05
Reviewed rev. 04 (document currently at 05)
Review result Ready with Issues
Review completed: 2016-12-05

Review
review-ietf-pals-endpoint-fast-protection-04-tsvart-lc-black-2016-12-05

I've reviewed this document as part of TSV-ART's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors for their information and to allow them to address any issues raised. When done at the time of IETF Last Call, the authors should consider this review together with any other last-call comments they receive. Please always CC tsv-art@ietf.org if you reply to or forward this review.

This draft specifies local pseudowire (PW) repair mechanisms to quickly react to PW egress failures by rerouting traffic around the failure until slower-to-react repair mechanisms at larger scope are able to effect longer term repairs, e.g., via network topology changes.

-- TSV-ART review comments:

I found a couple of minor transport-related issues, both of which should be resolvable with modest amounts of additional explanation:

* ECMP: The ECMP discussion in Section 4.1 on Applicability takes a conservative approach to avoiding packet reordering by recommending (SHOULD) that the entire ECMP set be rerouted as part of local repair.  It's not clear what sort of ECMP is involved, as that acronym is used without a reference (or even expansion), so I'd suggest citing a reference.   If the ECMP used is flow-aware so that reordering across ECMP branches within an ECMP set does not cause reordering within any of the flows involved, then it ought to be safe from a reordering perspective to reroute an ECMP branch or set of branches that are less than the full ECMP set, although such partial rerouting could cause potentially undesirable forwarding latency differences within the ECMP set.  This ought to be discussed, as situations in which rerouting the entire ECMP bundle is overly conservative seem likely to arise in practice.

* Traffic Engineering: Considering the intended speed of local repair, "order of tens of milliseconds" in the abstract, the bandwidth used by the repair paths has to be provisioned in advance of any failure that causes repair path usage - traffic engineering is a likely means of provisioning that bandwidth.  I see "TE domain," "TE metric" and "TE path," which I assume refer to Traffic Engineering, but that TE acronym is not expanded, and I did not find text requiring traffic engineering and/or advance (bandwidth) provisioning of repair paths.  I assume that this advance bandwidth provisioning of repair paths is intended as part of local repair, as not doing that invites immediate repair path failure due to lack of forwarding resources, which is definitely not desired.  A sentence or two ought to be added to point this bandwidth provisioning requirement out, possibly in Section 4.1 (Applicability).  Adding that text would also reinforce the conclusion in the Security Considerations section that local repair reroutes are not a security threat, as the new text would add the rationale that local repair reroutes are anticipated and planned for by the network operator's traffic engineering.

--  Other comments:

* Having found two acronyms that were not expanded, I'd suggest a general look for such acronyms.   OTOH, this is an area of network technology where many acronyms are in common use, and hence expansion of every acronym on first use may be excessive - among the ways of avoiding this could be citation of a reference at the start of Section 3 where commonly used PW terms and acronyms are defined.