Last Call Review of draft-ietf-ccamp-alarm-module-07
review-ietf-ccamp-alarm-module-07-genart-lc-romascanu-2019-03-18-00

Request Review of draft-ietf-ccamp-alarm-module
Requested rev. no specific revision (document currently at 09)
Type Last Call Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2019-03-20
Requested 2019-03-06
Draft last updated 2019-03-18
Completed reviews Yangdoctors Last Call review of -06 by Carl Moberg (diff)
Rtgdir Last Call review of -06 by Joel Halpern (diff)
Secdir Last Call review of -07 by Shawn Emery (diff)
Genart Last Call review of -07 by Dan Romascanu (diff)
Opsdir Last Call review of -07 by Joe Clarke (diff)
Genart Telechat review of -09 by Dan Romascanu
Assignment Reviewer Dan Romascanu
State Completed
Review review-ietf-ccamp-alarm-module-07-genart-lc-romascanu-2019-03-18
Reviewed rev. 07 (document currently at 09)
Review result Almost Ready
Review completed: 2019-03-18

Review
review-ietf-ccamp-alarm-module-07-genart-lc-romascanu-2019-03-18

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document: draft-ietf-ccamp-alarm-module-07
Reviewer: Dan Romascanu
Review Date: 2019-03-18
IETF LC End Date: 2019-03-20
IESG Telechat date: Not scheduled for a telechat

Summary:

The document defines a YANG module for Alarm management. While it's clear and comprehensive, it ignores previous work done in the IETF and does not provide a good motivation about the reasons of ignoring it. I would suggest to revisit that part and provide a clear motivation for the need of this new design. 


Major issues:

1. The definition of Alarm is key for the whole model. It reads like this: 

> Alarm (the general concept): An alarm signifies an undesirable state in a resource that requires corrective action.

However, RFC 3877 already defined a number of concepts including: 

>  Error
      A deviation of a system from normal operation.

   Fault
      Lasting error or warning condition.

   ....

   Alarm
      Persistent indication of a fault.

I believe that there is a need to show why the model defined by RFC 3877 needs to be changed, and why the difference that RFC 3877 was making between a Fault and an Alarm is no longer needed. Also, RFC 3877 defined in Section 3 a Framework and an Architecture that was consistent with X.733. This document has no such section, and while acknowledging the need for a mapping to X.733 it states as a goal: 

> Mapping to X.733, which is a requirement for some alarm systems. Still, keep some of the X.733 concepts out of the core model in order to make the model small and easy to understand

More details about what is left out and why these are not needed would help. 

Minor issues:

1. Section 2 makes a statement that includes 

> ... While IETF has not really addressed alarm management

This is is actually not accurate. RFC 3877 addressed Alarm Management. Maybe there is a need to revise that approach, but this should be done explicitly, not by stating that it did not exist. 

2. Section 3.5: 

> Closing an alarm implies that the operator considers the corrective action performed.

Is this always true? The undesirable state may have been cancelled by some other event than corrective action, for example the resource is no longer used, or the time elapsed mat have made the undesirable state irrelevant. 

3. In section 3.5.1: 

> Alarms are not cleared by operators, only the underlying instrumentation can clear an alarm.  Operators can close alarms.

So, the document makes a distinction between clearing an alarm and closing an alarm. It may be good to define two two concepts to make the distinction clear. 

4. Appendix F.1:

> The alarm MIB is state oriented rather than notification oriented, an alarm is a "lasting condition", not a discrete notification reporting about a condition state change.

I am not sure that I understand this comment. Alarm states are defined also in this document, and Alarms as defined here are also different than ' a discrete notification reporting about a condition state change'. So, what does this comment really try to say? 


Nits/editorial comments: