Last Call Review of draft-ietf-rtgwg-backoff-algo-07
review-ietf-rtgwg-backoff-algo-07-genart-lc-davies-2018-02-15-00

Request Review of draft-ietf-rtgwg-backoff-algo
Requested rev. no specific revision (document currently at 10)
Type Last Call Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2018-02-14
Requested 2018-01-31
Other Reviews Rtgdir Early review of -04 by Sasha Vainshtein (diff)
Secdir Last Call review of -07 by Benjamin Kaduk (diff)
Opsdir Last Call review of -07 by Mehmet Ersue (diff)
Review State Completed
Reviewer Elwyn Davies
Review review-ietf-rtgwg-backoff-algo-07-genart-lc-davies-2018-02-15
Posted at https://mailarchive.ietf.org/arch/msg/gen-art/V-0UuWxEomsXVgejrmzBtFe0fv8
Reviewed rev. 07 (document currently at 10)
Review result Ready with Nits
Draft last updated 2018-02-15
Review completed: 2018-02-15

Review
review-ietf-rtgwg-backoff-algo-07-genart-lc-davies-2018-02-15

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document: draft-ietf-rtgwg-backoff-algo-07.txt
Reviewer: Elwyn Davies
Review Date: 2018/02/15
IETF LC End Date: 2018/02/14
IESG Telechat date: 2016/02/22

Summary: Ready with nits. The draft does not refer to OSPFv3 - i am not sure if this is an oversight or because ODSPFv3 already has this mechanism - either way it should be mentioned.  One question that occurred to me is whether the draft could be considered as updating the OSPFv2/v3 and ISIS standards (not that IETF has any control over ISIS).

Major issues:
None

Minor issues:
(Non-)Relation between HOLDDOWN_INTERVAL and *_SPF_DELAY values:  I notced that Benjamin Kaduk's SECDIR review of this document (https://datatracker.ietf.org/doc/review-ietf-rtgwg-backoff-algo-07-secdir-lc-kaduk-2018-02-14/) was concerned that certain state transitions would never occur.  I loooked at this and realized that his assumption that LONG_SPF_DELAY < HOLDDOWN_INTERVAL is not required by the document and s6 explicitly resiles from offering suggested default values.  Without this assumption, the state machine appears to be correct. Not being familiar with the consequences of setting the HOLDDOWN_INTERVAL relative to the *_SPF_DELAY, I am not sure if anything could be said about such consequences, but I think it would avoid other people making the same assumption as the SECDIR reviewer if it was explicitly stated that HOLDDOWN_INTERVAL is not necessarily bigger than any of the *_SPF_DELAY values and adding any advice from experience about how to choose appropriate values.  This might also avoid naive implementers shortcutting the state machine implementation if they made the same assumption.

Requirements Language: Suggest s/RFC2119/RFC8174/ as there are uses of lower case versions of the reserved words.

Default values for parameters:  There is a possible conflict between s3, where example values for the various interval parameters are given and s6 which states that no default values are specified in the document.  The difference in termnology maybe too subtle for some implementers.  

Aborting or otherwise of SPF calculation if an IGP event occurs while an SPF calculation is in progress.  A note about whether this should happen (if it is possible) would be desirable.

OSPFv3: Does this (not) equally apply to OSPF v3 for IPv6?  If so it should be mentioned and RFC 5340 included in the references.

s12:  I suspect (although it could be arguable) that the ISIS definition, RFC 2328 (OSPFv2) and (if added) RFC5340 are normative as you need to understand how they work.  This work could even be considered to update these documents.

Nits/editorial comments:
General: The term 'back-off' may not be familiar to non-Emglish mother tongue speakers and on first occurrence needs a little explanation for naive readers to indicate what it means and to what the back-off is being applied.  I have suggested some additional text to this end for the abstract and s1.

Abstract: 
OLD:
   This document defines a standard algorithm to back-off link-state IGP
   Shortest Path First (SPF) computations.
NEW:
   This document defines a standard algorithm to temporararily postpone or 'back-off' link-state IGP
   Shortest Path First (SPF) computations to reduce the computational load on IGP nodes if 
   network events occurring at closely spaced times would otherwise lead to multiple, essentially
   redundant recalculations of the routing tables.
ENDS

s1, para 1: s/at the same time/essentially at the same time/

s1, para 2: s/new Shortest Path First (SPF)/new Shortest Path First (SPF) routing table/ 

s1, para 2:
OLD:
   experiencing multiple temporally close failures over a short
   period of time
NEW:
   experiencing multiple temporally close failures (that is, eventuating over a short
   period of time)
ENDS

s1, para 2: There is a right bracket missing in the following and starting a clause with 'such as' and ending it with an ellipsis ('...') is redundant.  
>    such as LDP [RFC5036], RSVP-TE [RFC3209],
>    BGP [RFC4271], Fast ReRoute computations (e.g.  Loop Free Alternates
>    (LFA) [RFC5286], FIB updates...
It is unclear to me where the bracket should go: maybe after [RFC5286] or at the end.
Please clarify.

s1, para 2: the phrase 
> This also reduces the churn on
>    routers and in the network and.
is useless, vague jargon.  The previous sentence expresses what I suspect is meant by 'churn'. so this is redundant and can be omitted.

s1, para 3:
OLD:
To allow for this, IGPs implement an SPF back-off algorithm.
NEW:
To allow for this, IGPs usually implement an SPF back-off algorithm that postpones or backs-off 
the running of the SPF calculation when the algorithm predicts that a run would be essentially 
redundant or even counter-productive because it appears that multiple closely timed routing-affecting events can be expected.
ENDS

s1, para 3: s/choosen/chosen/

s2, last bullet: SPF_DELAY is not defined at this point: 
s/SPF_DELAY timers values/values for any timers used to back-off SPF calculations/

s2, last bullet:  s/Even though/This is important even though/

s3, para 1: Undesirable ellipsis: 
s/a metric change on a link or prefix.../and a metric change on a link or prefix./

s3:Need to expand SRLG on first use - it isn't deemed to be well-known.

s3, INITIAL_SPF_DELAY bullet: s/A very small delay to quickly handle link failure/A very small delay to quickly handle a single isolated link failure/

s3, SHORT_SPF_DELAY bullet:
OLD:
    SHORT_SPF_DELAY: A small delay to have a fast convergence in case of
    a single failure (node, SRLG..), e.g., 50-100 milliseconds.
NEW:
    SHORT_SPF_DELAY: A small delay to provide fast convergence in the case of
    a single component failure (node, SRLG..) that leads to multiple IGP events, 
    e.g., 50-100 milliseconds.
ENDS

s5/s5.1: There is currently no text in s5: this is generally considered inappropriate.  Suggest removing the first sentence in s5.1 ("This section describes the state machine.") and adding to s5:
NEW:
This section describes the abstract finite state machine (FSM) intended to control the timing of the running of SPF calculations in response to IGP events.  

s5.1, QUIET bullet: s/occured/occurred/

s5.2:  There is no need for 3 expansions of FSM - the expansion can be moved to s5 as suggested above.

s5.3 title: s/States/State/

s6, next to last para: s/it's RECOMMENDED to play it safe/it is recommended that timer intervals should be chosen conservatively/ (this is an operational recommendation).

s6, last para: s/RECOMMENDED/recommended/ (ditto).

s7, para 1: s/is based on/is dependent on/, s/RECOMMENDED/recommended/ (operational again)

s8: Other documents (e.g., from vendors) have used the terms SPF wait time and SPF hold time.  It might be useful to mention that this document essentially provides ways to implement these settings.