Last Call Review of draft-ietf-pce-monitoring-
review-ietf-pce-monitoring-secdir-lc-tschofenig-2010-01-09-00

Request Review of draft-ietf-pce-monitoring
Requested rev. no specific revision (document currently at 09)
Type Last Call Review
Team Security Area Directorate (secdir)
Deadline 2010-01-05
Requested 2009-04-16
Draft last updated 2010-01-09
Completed reviews Secdir Last Call review of -?? by Hannes Tschofenig
Assignment Reviewer Hannes Tschofenig
State Completed
Review review-ietf-pce-monitoring-secdir-lc-tschofenig-2010-01-09
Review completed: 2010-01-09

Review
review-ietf-pce-monitoring-secdir-lc-tschofenig-2010-01-09

I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG. Â These comments were written primarily for the benefit of the
security area directors. Â Document editors and WG chairs should treat
these comments just like any other last call comments.

What does the document do? 

My understanding is that the document defines some payloads for obtaining
information about control plane elements (namely about Path Computational
Elements). The obtained information includes 'Liveness', 'Processing Time',
and 'Overload'. The document does not describe mechanisms for obtaining
information about the data plane elements, as far as I can tell. 

Collecting information about control plane elements is not uncommon
(although other approaches are typically used, such as SNMP MIBs). 

There is often a different philosophy when it comes to writing a
specification. Some people believe that it is sufficient to just define the
packet on the wire (and a minimum of message processing rules) and then
there are folks who are more interested in defining the behavior and
algorithms. Here the former approach was selected and there might be some
problems with doing that (see comments below). 

(Minor note: A figure or an example would have really helped me to
understand this document.)

-- Technical Issues 

The Monitoring-id-number is described as: 

"
   Monitoring-id-number (32 bits): The monitoring-id-number value
   combined with the PCEP-ID of the PCC identifies the monitoring
   request context.  The monitoring-id-number MUST start at a non-zero
   value and MUST be incremented each time a new monitoring request is
   sent to a PCE.  Each increment SHOULD have a value of 1 and may cause
   a wrap back to zero.  If no reply to a monitoring request is received
   from the PCE, and the PCC wishes to resend its path computation
   monitoring request, the same monitoring-id-number MUST be used.
   Conversely, a different monitoring-id-number MUST be used for
   different requests sent to a PCE.  The path computation monitoring
   reply is unambiguously identified by the monitoring-id-number and the
   PCEP-ID of the replying PCE.  A PCEP implementation SHOULD checkpoint
   the Monitoring-id-number of pending monitoring requests in case of
   restart thus avoiding the re-use of a Monitoring-id-number of an in-
   process monitoring request.
"

I understand that a sender may want to match requests against responses even
if the responses alone should provide already sufficient information so that
the matching to the request becomes irrelevant. 

But what is the reason to demand a specific way to increment the
Monitoring-id-number? The sender only needs to avoid sending a request with
the same monitor-id-# already used in an previously sent request where the
response is still pending. Why would someone want to re-send the request
with the same number again? Wouldn't this be largely irrelevant? 

From the PROTO writeup I can read that there are three independent
implementations. 
When I read through the Section 4.1 about the MONITORING object and the
flags that are defined I have some doubts how they can interwork. Just to
give you an example. I (PCE element of domain A) send a request for
monitoring and, for example, set the flag "General". This flag is defined in
the document as: 

"
   G (General) - 1 bit: when set, this indicates that the monitoring
   request is a general monitoring request.  When the requested
   performance metric is specific, the G bit MUST be cleared.  The G bit
   MUST always be ignored in a PCMonRep or PCRep message.
"

So, another PCE element in domain B is now receiving this request and is
supposed to include something in the reply message a little later. So, what
would this be? Where does an implementer find a precise description of what
has to be done to fullfill this request? 

The same is true for the other flags as well. Another example: "Processing
Time" ... is set to indicate that the
   processing times is a metric of interest. Section 4.3 provides a bit more
details but stops more or less when it comes to the real meat with something
like " out of the scope of this document". 
   
I am not sure how to get meaningful measurement results at different PCE
elements (potentially provided by different vendors). What conclusions could
I then draw from the information that comes back with the response message?
 
The same is true for the OVERLOAD Object. 

So, while the authors (and the group) may believe that they have indeed
developed a standard it will not lead to interoperability without a number
of other things happening. I would at least appreciate a discussion in the
document (maybe related to operational considerations) that the values are
only meaningful if the entities operating the PCEs have a way to agree on
the same mechanism to perform the value calculation. Typically, this will
not easily work in a multi-vendor fashion. So, I believe that the authors
might have had another document in mind in the future that actually provides
the algorithms. This would then raise the question of the usefulness of the
flags if the calculation of the values are done in many differents. Example.
I could imagine that the overload being calculated in, for example, 5
different ways. Vendor A supports only 2 and vendor B supports 4 different
algorithms but there is only one flag to indicate "overload". 


-- Specification Writing Style

Below I list two examples where I had hoped for a stricter style:

"
4.1.  MONITORING Object

   The MONITORING object MUST be present within PCMonReq and PCMonRep
   messages ("out of band" monitoring requests) and MAY be carried
   within PCRep and PCReq messages ("in band" monitoring requests).
   There SHOULD NOT be more than one instance of the MONITORING object
   in a PCMonReq or PCMonRep message: if more than one instance of the
   MONITORING object is present, the recipient MUST process the first
   instance and MUST ignore other instances.
"

Why couldn't you say that there MUST only be one instance of the MONITORING 
object in the message. Everything else is an error (and you already have an
error message for that).
Done. 
   
"
  I (Incomplete) - 1 bit: If a PCE supports a received PCMonReq message
   and that message does not trigger any policy violation, but the PCE
   cannot provide any of the set of requested performance metrics for
   unspecified reasons, the PCE MUST set the I bit.  The I bit has no
   meaning in a request and SHOULD be ignored on receipt.
"

If you say "SHOULD" in the last sentence then you have to say what happens
otherwise. 
In this specific case I believe you would just put a MUST in there. 

Additionally, the document defines multiple ways to accomplish the same
effect on how monitoring takes place (In-band vs. out-of-band requests). Is
there some text that could be referenced to give some guides on when to use
what? 

For some reason you decided to re-use the same object for the request and
the response. This typically leads to a more complex description because the
semantic of the object content is different in the two directions. 

-- Operations & Management 

How does the information being collected here related to other mechanisms to
potentially collect similar information? For example, network management
systems might collect information already and two systems collect lead to
conflicting conclusions. 

Wouldn't it be good to discuss how the monitoring requests are treated
differently from any other request. Particularly for the overload handling
request this would be interesting. You don't want to the monitoring starve
because the request is stuck in the queues somewhere. 

The document does not discuss what is actually being done with the collected
information. Obviously, there is a lot of potential for messing up the
network. 

-- Security 

The description focuses on securing communication. With the scope of the
document this seems to be fine. However, the real security problems will
show up when an adversary is able to control one of the PCE elements and is
therefore able to impact larger parts of the network. On the other hand,
adversaries could have a huge impact already without this specific extension
being defined. A PCE represents a single point of failure and a nice target.


-- IANA Considerations

To help IANA it is always useful to state whether you enter new values into
an existing registry (and where that registry has initially been defined) or
whether you create a new registry (and what the policy is). 

* The new PCEP messages are being added to an existing registry (one that
was created with [RFC5440]). 
* The new PCEP objects are also added to an existing registry. Most likely
one that was also created by [RFC5440].
* The same is true for the new Error-Values. Sentences like "A registry was
created for the Error-type and Error-value of the PCEP Error Object." are
misleading. 
* Then, you create a new registry for a few other items. You may want to
reference 

http://tools.ietf.org/html/rfc5226

. Are you sure that you really
want to put the bar so high for adding new entries to the registry, namely
"IETF review"? I would have said that "Specification Required" is more than
enough. You might also want to indicate whether bits can be re-assigned, the
semantic updated, etc. For example, assume that you revise the RFC to
strengthen the description. Under what conditions would it be OK to just
update the existing registry entries and under what conditions would you
need new entries? Because of the encoding you have chosen you have some
space constraints. You also say that "Each bit should be tracked with the
following qualities:". Why "should"? Just indicate what has to be included
in the registry. Additionally, I believe the second column isn't really a
description but rather a label. 

-- Editorial

s/section Section 4.1/Section 4.1

In general, always write "Section X" instead of "section X". 
Capitalize headings. Example: s/Path Computation Monitoring messages/Path
Computation Monitoring Messages

Ciao
Hannes