Last Call Review of draft-ietf-ccamp-alarm-module-06
review-ietf-ccamp-alarm-module-06-rtgdir-lc-halpern-2019-01-10-00

Request Review of draft-ietf-ccamp-alarm-module
Requested rev. no specific revision (document currently at 09)
Type Last Call Review
Team Routing Area Directorate (rtgdir)
Deadline 2019-01-25
Requested 2019-01-09
Requested by Deborah Brungard
Authors Stefan Vallin, Martin Björklund
Draft last updated 2019-01-10
Completed reviews Yangdoctors Last Call review of -06 by Carl Moberg (diff)
Rtgdir Last Call review of -06 by Joel Halpern (diff)
Secdir Last Call review of -07 by Shawn Emery (diff)
Genart Last Call review of -07 by Dan Romascanu (diff)
Opsdir Last Call review of -07 by Joe Clarke (diff)
Genart Telechat review of -09 by Dan Romascanu
Comments
Prep for Last Call.
Assignment Reviewer Joel Halpern
State Completed
Review review-ietf-ccamp-alarm-module-06-rtgdir-lc-halpern-2019-01-10
Reviewed rev. 06 (document currently at 09)
Review result Has Issues
Review completed: 2019-01-10

Review
review-ietf-ccamp-alarm-module-06-rtgdir-lc-halpern-2019-01-10

I have been selected as the Routing Directorate reviewer for this draft. The Routing Directorate seeks to review all routing or routing-related drafts as they pass through IETF last call and IESG review, and sometimes on special request. The purpose of the review is to provide assistance to the Routing ADs. For more information about the Routing Directorate, please see ​http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir

Although these comments are primarily for the use of the Routing ADs, it would be helpful if you could consider them along with any other IETF Last Call comments that you receive, and strive to resolve them through discussion or by updating the draft.

Document: draft-ietf-ccamp-alarm-module-06
Reviewer: Joel Halpern
Review Date: 10-January-2019
IETF LC End Date: N/A
Intended Status: Proposed Standard

Summary: 
This document is basically ready for publication, but has minor issues that should be considered prior to publication.

Comments:
    The document is quite readable, and starts with a clear and helpful description of what it is trying to do.

Major Issues:
    No major issues found.

Minor Issues:
    The first paragraph of section 3.6 (Root Cause, Impacted Resources and Related Alarms) has a confused "not", a missing preposition, and a typoed conjunction, making it very hard to be sure what is intended.  I believe the first part of the sentence should read:
    "The recommendation is to have a single alarm for the underlying problem and list ..."

    There is a larger issue about system behavior and root cause analysis that I think should be discussed in this section.  Root cause analysis and side-effect analysis are not simple tasks.  It is common for them to be performed outside of network elements.  When such is performed outside of a network element, it is unclear what the implications are.  Is it the intent that network elements that can not perform root cause analysis and impacted resource determination should NOT support this YANG module?  Or can / should / may they support it even though they can not perform this analysis?  There is a paragraph that seems to be trying to talk about this, but I was left confused about what was expected.  Part of my confusion is that the text treats this inability as rare, whereas in my experience for network elements such inability is common.

    It took me a while to realize what the text in 3.7 (and 4.1.1) about not generating notification is talking about.   The problem is that with all the effort to make clear that alarms are not notifications, I missed the fact that an alarm being raised (or re-raised) does itself cause a notification.  And that it is this re-raise notification (and other severity change, clearing, etc notifications) that are suppressed by the shelving.   It seems to me that there needs to be better explanation of this in or before 3.7.

    Reading the YANG for shelving alarms, it looks to me that while it can do what is described earlier in the document, the conceptual structure is VERY different.  From the YANG, to shelve a specific alarm one has to create a named shelf whose conditions identify the specific alarm.  To selve several alarms that are related (for example, when the operator looks at a list and selects several items to shelve) the system will likely have to create multiple shelves, give each a unique name, and put the different alarm identifiers in each one.   To unshelve alarms, one has to find the named shelf which has caused the shelving.   This seems very awkward.  It seems to have been designed to enable one to store the shelving reason separate from the alarm itself.  It introduces the odd effect that if the shelves are used with conditions that can match more than one thing, then one could have several shelves shelving the same alarm, and an effort to unshelve might well not produce the desired result.
    Assuming that this complexity is desired by the working group, I would ask that it be explicitly called out in the descriptive portions of the document.

Nits:
        In section 4.4 (overview of The Alarm List) tree showing the components of the purge-alarm operation, is there any way to make clear that the enumeration called alarm-status is the enumeration of filter choices related to whether the alarm is cleared?  Maybe rename it alarm-cleared-filter?