Restart Signaling for IS-IS
draft-ietf-lsr-isis-rfc5306bis-09

Note: This ballot was opened for revision 05 and is now closed.

Alvaro Retana Yes

(Ignas Bagdonas) No Objection

Deborah Brungard No Objection

Alissa Cooper No Objection

Roman Danyliw No Objection

Comment (2019-09-19 for -07)
No email
send info
Thank you for addressing my COMMENTs.

Benjamin Kaduk (was Discuss) No Objection

Comment (2019-09-19 for -08)
No email
send info
Thanks for addressing my Discuss points and comments!

(Suresh Krishnan) No Objection

Warren Kumari No Objection

Comment (2019-09-16 for -05)
No email
send info
Thank you for writing / updating this, it's really useful.


I do have some comments though:
1: "2.  It sets SRMflags on its own LSP database on the adjacency concerned."
There are a number of instances throughout this document where acronyms / terms are used without expansion - this being just one instance. I think that the right reference here is RFC1142, but it sure would make reading the doc easier if these were cited / expanded. I get that this is a -bis, so perhaps just a "Familiarity with IS-IS and RFC 1142, ... ... is assumed"?

2: Section 3.1.  Timers
"A typical value is 3 seconds." -- for this, and other timers, you list a "typical" value - typical implies "not universal / fixed", so *who* should be twiddling it? Is it defined to be 3? Should vendors choose the right number or should this be user configurable (I assume the latter, but...)

3: I'm assuming I'm missing something really obvious here, but:
"The RA bit is sent by the neighbor of a (re)starting router to acknowledge the receipt of a restart TLV with the RR bit set." -- and then what / why? Let's say I'm a restarting router, and I ** don't ** get a receipt from a neighbor, does that mean I hold off on restarting? Do I resend until I *do* get a RA response?

4: "... the first time the router has started, copies of LSPs generated by this router in its previous incarnation may exist in the LSP databases of other routers in the network."
No action needed, I just wanted to mention that I really liked the use of "incarnation" here. I'm not sure if this came from the original or -bis, but it was good...

5: "NOTE: Receipt of an IIH with PA bit set indicates to the router planning a restart that the neighbor is aware of the planned restart  and - in the absence of topology changes as described below - will maintain the adjacency for the "remaining time" included in the IIH with PA set." -- Sounds good, but as I understand it, Remaining Time can be up to 65535 seconds -- this is (I think) 18 hours, which feels way too long to be reasonable. I realize this gets into bikeshedding on what *reasonable* is, but should this be defined, or configurable?

6: "On a LAN circuit, if the router in planned restart state is the DIS at any supported level, the adjacency(ies)" -- another unexpanded acronym.

(Mirja Kühlewind) No Objection

Comment (2019-09-14 for -05)
No email
send info
For the record, I only reviewed the new/changed text.

Barry Leiba No Objection

Comment (2019-09-13 for -05)
Thanks for this well written document, which I’ve found easy to read and mostly clear.  I have some editorial comments below, a few related to clarity.  I realize that some of these apply to text that was in RFC 5306, and I ask you to please consider them, but I understand if you want to minimize changes from 5306.

— Abstract —

This is entirely an editorial style comment, and no response is needed; just do what you think best, and if that is to leave it as it is, then that’s fine.  I find the “This document…  This document additionally…  This document additionally…” to be awkward, and suggest this instead:

NEW
   This document obsoletes RFC 5306 and describes a set of mechanisms
   that can improve neighbor reconfiguration when a router restarts.
   Using these mechanisms:

   1. A restarting router can signal to its neighbors that it is
   restarting, allowing them to reestablish their adjacencies without
   cycling through the down state, while still correctly initiating
   database synchronization.

   2. A router can signal its neighbors that it is preparing to initiate
   a restart while maintaining forwarding plane state.  This allows the
   neighbors to maintain their adjacencies until the router has
   restarted, but also allows the neighbors to bring the adjacencies down
   in the event of other topology changes.

   3. A restarting router can determine when it has achieved Link State
   Protocol Data Unit (LSP) database synchronization with its neighbors
   and can optimize LSP database synchronization, while minimizing
   transient routing disruption when a router starts.
END

— Section 1 —

   This document describes a mechanism for a restarting router to signal
   that it is restarting to its neighbors, and allow them to reestablish
   their adjacencies without cycling through the down state, while still
   correctly initiating database synchronization.

As this is written, (1) “to its neighbors” is misplaced (it is not “restarting to its neighbors”) and (2) it sounds like the restarting router is allowing them to do the reestablishment, but it’s the signal that is.  I suggest this:

NEW
   This document describes a mechanism for a restarting router to signal
   to its neighbors that it is restarting.  The signal allows them to
   reestablish their adjacencies without cycling through the down state,
   while still correctly initiating database synchronization.
END

— Section 3.1 —

   An instance of the timer T2 is maintained for each LSP database
   (LSPDB) present in the system, i.e., for a Level 1/2 system, there
   will be an instance of the timer T2 for Level 1 and an instance for
   Level 2.

Do you really mean “i.e.” here?  Is this the only possible situation, or is it an example (for which you would want “e.g.”)?  I think it’s the latter, in which case I would avoid the Latin, use English, and start a new sentence:

NEW
   An instance of the timer T2 is maintained for each LSP database
   (LSPDB) present in the system.  For example, for a Level 1/2 system,
   there will be one instance of the timer T2 for Level 1 and another
   instance for Level 2.
END

   This is initialized to 65535
   seconds, but is set to the minimum of the remaining times of received
   IIHs containing a restart TLV with the Restart Acknowledgement (RA)
   set and an indication that the neighbor has an adjacency in the "UP"
   state to the restarting router.

I found that quite confusing, because the long clause after “minimum of” is hard to follow (maybe it’s not an issue for readers who are versed in IS-IS).  I don’t understand what it’s set to (and when it’s set to it, after the initial value of 65535), and I can’t suggest a rephrasing because I don’t understand.  Can you try re-wording this (and maybe splitting it into two sentences)?

— Section 3.2 —

   A new TLV is defined to be included in IIH PDUs.  The presence of
   this TLV indicates that the sender supports the functionality defined
   in this document and it carries flags that are used to convey
   information during a (re)start.

The antecedent of “it” isn’t formally clear from the wording.  I suggest this:

NEW
   A new TLV is defined to be included in IIH PDUs, which carries flags
   that are used to convey information during a (re)start.  The presence
   of this TLV indicates that the sender supports the functionality
   defined in this document.
END

   The functionality associated with each of the defined flags (as
   described in the following sections) is mutually exclusive with any
   of the other flags.  Therefore, it is expected that at most one flag
   will be set in a TLV.  Received TLVs which have multiple flags set
   MUST be ignored.

Is there a reason not to say, “Therefore senders MUST NOT set more than one flag in a Restart TLV.”?  Why aren’t we forbidding it, if the TLV will be ignored (MUST be ignored) on receipt otherwise?

— Section 3.2.1 —

   b.  immediately (i.e., without waiting for any currently running
       timer interval to expire, but with a small random delay of a few
       tens of milliseconds on LANs to avoid "storms")

Then it’s not “immediately”, right?  Might “promptly” be an appropriate characterization?  Or is “immediately but with a small random delay” a common meaning of “immediately” in this context?

(Similar comment for Section 3.2.3.)

(Alexey Melnikov) No Objection

(Adam Roach) No Objection

Martin Vigoureux No Objection

Comment (2019-09-17 for -05)
Hello,

thank you for this document.
What is the expected behaviour, if any needing to be described, when the neighbor of a router planning to restart decides to also plan a restart?

Thank you

Éric Vyncke No Objection

Magnus Westerlund No Objection