Early Review of draft-ietf-homenet-dncp-05
I have been selected as the Routing Directorate reviewer for this draft. The Routing Directorate seeks to review all routing or routing-related drafts as they pass through IETF last call and IESG review, and sometimes on special request. The purpose of the review is to provide assistance to the Routing ADs. For more information about the Routing Directorate, please see
Although these comments are primarily for the use of the Routing ADs, it would be helpful if you could consider them along with any other IETF Last Call comments that you receive, and strive to resolve them through discussion or by updating the draft.
Reviewer: Thomas Heide Clausen
Review Date: June 16, 2015
IETF LC End Date: <Reviewed during (just after - apologies) WGLC>
Intended Status: Standards Track
o I have significant concerns about this document and recommend that the
Routing ADs discuss these issues further with the authors.
o Is there any good reason why the authors have no listed affiliation?
o It is somewhat contradictory that the abstract talks about
"...describes a protocol" and then later "...leaves some details
to be specified in profiles, which define actual implementable DNCP
Does that not mean, then, that this document specifies an algorithm,
a framework, and not a protocol?
o On that, I see "DNCP protocol" several places. Expanded, that becomes
"Dynamic Network Configuration Protocol Protocol" ...
o In general, and despite actually knowing some of the core algorithms
somewhat before this review, I found the document really tough to
read, with convoluted sentences, inconsistent requirements-language,
and a lack of introductory "here's the 1000ft view of the protocol,
what it does, how it works, and under which conditions it works".
o On that, I do not find the chosen structure of the document to be
optimal for conveying an unambiguous protocol specification. For one,
the same concepts are occasionally described slightly differently.
For another, it is often hard to find the information needed to
parse a specific mandated processing (for example). I provide an
example of what I would suggest a better structure in the below.
The goal is to provide first concepts and an overview, followed by a
single, easy to identify place for "precise and unambiguous definitions
of concepts", and then use those in the detailed expression of the
protocol. Note that this is just an example, of course:
The Network State Hash is a hash value which represents the
current state of the network, as known by a node.
Section "Protocol Overview and Functioning"
When receiving a FOO TLV, the DNCP node compares the received
Network State Hash with its own Network State Hash. This
represents the consistency check rom RFC6206. If same,
then...if not, then ....
Section "Protocol Information Bases"
For the purpose of this specification, the Protocol
Information Bases are orgnaized as sets of tuples ... any
implementation can chose whatever representation it wants.
The Network State Information Base in a DNCP node is a set
(x, y, z, w)
where x is ..., y is ..., z is ..., and w is ...
Section "How to calculate the Network State Hash":
The network State Hash is calculated using the information
from the Network State Information Base, as follows:
1. First, the tuples in that information base are sorted
in ascending order based on ....
2. Second, .... (concatenation)
3. Third, the hash function from <profile> is used
4. Fourth, the first n bits of the resulting hash value,
are retained, witn n being from <profile>.
And then, in remaining sections simply reference the Network
State Hash, which is now ubiquitously defined in a single place.
I am taking this example, since when reading section 5.3 I found
myself chasing through the document, finding multiple slightly
different definitions of "Network State Hash" -- but beyond this
example, it generally does apply to the document as a whole, and
certainly to all of the processing and generation considerations in
o As a general comment, the document would do well with a good editorial
overhaul to bring consistency in language usage, consistency in 2119
terminology, coherence in defined terms and their definition, document
o The introduction does not read well; it contains parts of something that
could be considered as part of an applicability statement (without it
being called out as such, and without forming a complete applicability
statement), and does not actually introduce the protocol. Reading just
the introduction and the abstract, it is very obscure if
this is a framework, a protocol, a building block, an architecture, an
algorithm -- and, if either of those, what it is actually accomplishing,
and why one would chose to use DNCP. It does, however, transpire that
"whatever it is", it has two "modes" and that it requires something
(presumably a routing protocol) to provide each "node" with a topology
Suggest that a proper introduction consisting of three parts would be
beneficial: (i) what this document is, (ii) what doing DNCP actually
gets you, and (iii) the operating conditions under which the
DNCP is applicable.
On the latter point, given that you state that DNCP requires profiles
to provide "actual implementable DNCP based protocols", it appears
important to understand what the limits for "what a profile can give
I am calling this out as a major issue, since I believe that it is
not just editorial, but is a matter of scoping this document correctly,
and in particular not falling into the trap of "claiming applicability
where it's not".
o The document, in my understanding, defines an exchange format with
limited ability to evolve, as simply "a steam of TLVs".
As long as there's never a need to evolve the TLV format itself, and
as long as you do not run out of TLV types, that's not going to be
a problem. The doc sets aside a 16bit TLV type space, that's reasonable
enough, but I worry if eventually a DNCPv2 will need to evolve the
format. One purely hypothetical example could be if a "sequence number"
would be needed in each DNCP message to detect "link success rates", or
something of that sort.
I do not have an actual example in mind -- and that's exactly the point:
to be evolutive for the unknown future and (at the very least) be able
to discriminate between "old" and "new".
A discussion could be had if a "version number" in each TLV would do,
or if a concept of "protocol message with a version number" is
preferential. I do not believe, however, that "no version number" is
o Noting that the "overhearing n reduncant transmissions" is a key
retransmission suppression mechanism in Trickle, and that this
seems to assume broad/multicast, using unicast seems to contradict
the statement of "consists of Trickle", at least in the way the
algorithm is defined in RFC6206. Note: it's fine to use an algorithm
outside of its initial scope, but it should be with the caveat of
"which of the characteristics still hold, and which do not"
o DNCP claims to be trickle based, yet supports unicast. It also
(apparently) is a request/reply protocol. It doesn't have messages.
This document needs a good, and pedagogical, "protocol overview and
functioning" section somewhere: one needs to get through the end of
Section 5 before having even a vague idea of how DNCP works.
o The use of normative language is not as tight as could be desired.
For example, a number of SHOULDs seem to really ought to be "MAYs" since
not following the SHOULD won't break the algorithm. It would be good
to walk through the document and take a careful look at these to
either MUST/MAY the SHOULDs, or to qualify the SHOULDs remaining.
o I am going to go out on a limb here, and say that "the protocol is
underspecified". That's a deliberately provocative statement, but it
was honestly how I felt upon having completed the review.
The document does not help the reader get an intuitive understanding
of the protocol functioning, but jumps right into minute details --
requiring the reader to "build up her or his own model of how DNCP
works". On having read the document a few times, I think that I
understand it -- but there's nothing permitting me to verify my
understanding, and thereby I'd not feel confident to be able to
provide an interoperable and independent implementation. I've given
some comments in the "Comments" section as to what I think would be
viable ways to improve this point.
o Section 5.3, penultimate paragraph:
"If keep-alives specified in Section 6.1 are NOT sent by the peer
(either the DNCP profile does not specify the use of keep-alives or
the particular peer chooses not to send keep-alives), some other
means MUST be employed to ensure its presence. When the peer is no
longer present, the Neighbor TLV and the local DNCP peer state MUST
"...some other means MUST be employed to ensure its presence." --
followed by more MUST verage when a peer disappears...I am not sure that
that's conductive to interoperable implementations.
Two implementatons may chose different "means" and then turn off keep-
alives - and be non-interoperable.
For interoperability, we need:
o A mandatory to implement mechanism, that always is
present, but can be complemented by another "means", or
o A mandatory to implement mechanism, which by way of a
specified negotiation mechanism can be turned off between
two peers, to allow them to use another "means".
If you argument is "...this will be specified in the profile", then
you still should provide the two above in this document, with the note
that "...and a profile may specify which from among these MUST be
used in a given deployment"
o Section 8:
Interesting; I am not a security expert, but I am very curious to
see the SEC-DIR review of this document. That said, section 8.3.1
contains normative verbage:
"A node MUST be trusted for participating in the DNCP network if and
Which I think needs a qualifier of the "If the certificate based
trust model is used, then a node must be trusted for ...."
Same goes for the subsequent SHOULD - it really reads as-if this
certificate based mechanism initially was intended as MTI, but then
was backed away from subsequently without a complete cleanup of the
I do actually question the value of having a laundry-list of trust
management methods, and for one of those (certs) a laundry-list
of all sorts of trust relationship establishment methods, in this
document; this in no small part as the lists are explicitly indicated
as "non-exhaustive" and that none are listed as "mandatory to
implement". Was any thought given to factoring this into a seperate
document, and focusing in this document on one, mandatory-to-implement,
o 1st paragraph: "reachable nodes"; two things:
- I always have a problem with the term "node"; it is often
used as a shorthand for "routers and hosts, both". I was
given to understand that homenet specifically did not want
to consider host changes?
- "Reachable" - does that mean something as in "radio range",
does it mean "on the same link", does it mean within a
specific (DNCP?) domain, or does it mean simply "on the
o 2nd paragraph: "nodes that are currently accounted for":
- What does that mean?
- Also, the conclusion "Therefore unlike Time-To-Live (TTL)
based solutions, it does not require periodic re-publishing
of the data by the nodes" does actually not follow from
the previous sentence in that paragraph.
- I actually do not think that the introduction describes
what DNCP does, and so the comparison to TTL-based
solutions is rather hard to get here.
"On the other hand, it does require the topology
to be visible to every node that wants to be able to
identify unreachable nodes and therefore remove old,
This reads a lot more like an applicability statement than
an introduction; the take-away when reading this is:
"Each node must have something that maintains
a topology map of the entire network, such as
a (LS) routing protocol, for DNCP to function"
Is that actually the intent here?
- "DNCP is most suitable for data that changes only gradually"
How is the reader to interpret "gradually"? Do you mean
"infrequently", or do you really mean "gradualy"?
o Last paragraph:
"DNCP has relatively few requirements for the underlying
transport; it requires some way of transmitting either unicast
datagram or stream data to a peer"
This is a bit of a forward comment, but we now have "nodes
that are accounted for" and "peers". I see neither defined in
the terminology section.
"and, if used in multicast mode, a way of
sending multicast datagrams."
This is the first mention of two "modes" of this protocol. This
loops back to an earlier comment, that the introduction actually
does not introduce the protocol, but rather is an incomplete
"If security is desired and one of the
built-in security methods is to be used, support for some
TLS-derived transport scheme - such as TLS [RFC5246] on top of
TCP or DTLS [RFC6347] on top of UDP - is also required."
I am not pretending to be a security expert, but "some
TLS-derived...such as ... on top of TCP or DTLS..." (i) does not
sound like it could lead to interoperable implementations, and (ii)
does not sound sufficiently tight as a MTI security mechanism to
pass security reviews. Again, I am no security expert, but perhaps
getting one looped in early would be advicable?
o Suggest adding "In this document ..." somewhere to this text:
"For readability, any DNCP profile specific
parameters with a profile-specific fixed value are
prefixed with DNCP_."
o DNCP network: I read this twice, and came away with two different
understandings, perhaps you can clarify which it is:
o A set of nodes running DNCP, within the same domain, and
for which a path betwen any two DNCP nodes includes only
other DNCP nodes; i.e., a DNCP network forms a connected
component with only other DNCP nodes.
o A set of nodes running DNCP. They may be anywhere on the
Internet, they are part of the same DNCP network as long
as they (through other means) have learned of each others
In the former, that'd be (for example) a deployment within my
home -- in the latter, it could be a node in my home and a node in
your home forming a DNCP network.
The text is not quite clear on this point.
o Link: a point of clarification here. In "DNCP network", there was
talk about "unidirectional links" and "bidirectional links"; in
"Link" the definition is somewhat vague "directly connected" and
"can communicate". Could something like "without decrementing TTL/
hop-count" be added, and could a statement on bidirectionality
(IOW, that this is just an IP link) be added?
o "Interface" is overloading the term "port" (IP port) which can be
o "Endpoint" - The definition "locally configured use of DNCP" is not
clear -- are you really not talking about a DNCP process?
I am not sure that it is clear how a DNCP process can be "attached
to ... a specific remote unicast address, or to a range of unicast
addresses that are allowed to contact"
I can see how a DNCP process can be configured to allow connections
from a specific range of addresses, or can be configured to connect
to a specific remote unicast address. Is that what you mean instead?
o "Peer" - states that two peers "communicate directly". For link,
the definition is "directly connected nodes can communicate".
Would it then not be easier to say "a DNCP node on the same link
as ..." ?
o "Node state"
"The hash function and the number of bits used are defined
in the DNCP profile."
"The hash function and the length of the hash value are defined
in the DNCP profile."
o "Network state hash" - same comment as for node state (above)
o "Latest update sequence number"
This may just be my personal taste, but does it hurt to mandate
a specific way of doing the looping comparison? The reason I
suggest this is, that it's one of those things where creativity
in an implementation seems to simply be an invitation for bugs,
and for little gain
o "Relative time delta"
Document talks about "a 32 bit number on the wire" -- does that
mean that wireless links are excluded?
o Related to terminology, there seems to be some fuzzyness around
node and endpoint. For example, in data model one of the things that
a DNCP node may have is:
"Unicast address: the DNCP node it should connect with"
Does that mean *any* DNCP process (i.e., *any* endpoint) at that
address, or a *specific* DNCP process at that address?
The same, but inverse, for "Range of addresses: the DNCP nodes that
are allowed to connect" - is this "any DHCP process (i.e., *any*
endpoint) on any of these addresses?
Following, the same section reads:
"For each remote (peer, endpoint) pair detected on a local
endpoint, a DNCP node has..."
the following text indicating that there's some sort of distinction
between which endpoint.
This whole thing needs some clarification.
o First a generic comment that Trickle itself has some operating
conditions which scopes its applicability, and it would behove
this document to, in its own applicability statement, call out
o On the same token, while the use of Trickle in an unicast fashion
is possible, I wonder if (in general) unicast use is advicable. I
appreciate that some links are point-to-point and so a broadcast
across it becomes an unicast -- but, does that necessitate being
IF the reason for this "because we can use TCP", then be explicit
about this - but, also, that you're then not exactly using Trickle
where and how it was intended. I wonder if you could be explicit
as to what consequences this "alternate use of Trickle" have? It
seems that the use of unicast is directly contradicting the main
operating consideration of Trickle?
o 2nd paragraph states:
"the multicast transport does not have to be particularly
What is the definition of "not have to be particularly secure"?
Is cleartext OK? Authentication? Encryption?
Should I do something more?
5.1 Trickle-driven status updates
o First paragraph:
"Multicast MUST be employed on a multicast-capable interface;
otherwise, unicast can be used as well"
If the interface is not multicast-capable, then unicast can be
used as well as what? Certainly not multicast, since the interface
is not multicast capable...?
"If possible, most recent,"
What would make it "not possible"?
"recently changed, or best of all, all known Node State TLVs"
OK, so assuming that for some reason (MTU limitation) it is not
possible, does the above represent an order that I MUST respect,
or is it "take a pick from among these, according to your whim of
"(Section 7.2.3) SHOULD be also included,"
SHOULD is a strong statement, especially when prefixed by
"if possible". That, essentially, renders it a MAY.
"unless it is defined as undesirable for some reason
by the DNCP profile
Now it DEFINITELY is a MAY since apparently a profile can state
that these TLVs MUST NOT be included -- and, I assume, since the
document permits it to do so, it is possible without breaking the
o And, continuing again:
DNCP profile supports dense broadcast link optimization
(Section 6.2), and if a node does not have the highest node
identifier on a link, the endpoint may be in a unicast mode in
which multicast traffic is only listened to. In that mode,
multicast updates MUST NOT be sent."
Really hard to parse. Is that not equivalent to saying:
"If a DNCP endpoint is not configured to be in multicast
mode, then it MUST NOT send multicast updates"
If it is, then say that -- if it is not, then a rewrite is needed,
as that's what I manage to extract from the text.
5.2. Processing of Received TLVs
o First paragraph reads:
"The DNCP profile may specify criteria based on which particular
TLVs are ignored."
Criteria for what? Do you perhaps mean:
"The DNCP profile may specify which TLVs to process, and
which to ignore"?
Auxiliary question, then, and related to my penultimate comment
to 5.1, are there any constraints on that, any risks from ignoring
(or not) specific TLVs to the operation of the network?
o I am also confused by the 3rd sentence in the first paragraph:
"Any ’reply’ mentioned in the steps below denotes sending of
the specified TLV(s) via unicast to the originator of the TLV
This confusion is likely due to the lack of a "protocol overview
and functioning" description [either as its own section, or as part
of the introduction].
I know how trickle works. Trickle is a distributed consistency
algorithm. When an inconsistency is detected, then an action is
triggered that rectifies that inconsistency. DNCP claims to be trickle based, but apparently also a sort of request/reply
mechanism. Combined with trickle-over-unicast-links, I am not sure
what the protocol logic actually is. Reading through to the end of
Section 5, I think that I understand the idea, but I am not sure.
And the old "when in doubt, look at the state machines" didn't help
either, there aren't any.
The point to this comment is, that the document immediately jumps
into the details -- but forgets to give the "10000ft view" of the
o First paragraph states two SHOULD. Would those not be MUST? What
breaks if not respecting those criteria?
o 2nd paragraph, a "valid address", that definition is rather unclear.
I understand that that's something specified in "the profile", but
what is the relationship to the different addresses discussed in
the data model section?
It is not clear what the parenthesis to this paragraph means,
but that is probably again a case of the "use case" and "protocol
overview" not being documented - the document so far has nowhere
described interaction with outside processes.
o First bullet, but generally through these, and other, bullets:
I had a really hard time deciphering this. First:
"The receiver MUST reply
with a Network State TLV (Section 7.2.2) and a Node State TLV
(Section 7.2.3) for each node data used to calculate the
network state hash"
Alright, off to find "network state hash".
The terminology tells me that it is:
"a hash value which represents the current state of
the network. The hash function and the number of
bits used are defined in the DNCP profile.
Whenever a node is added, removed or updates its
published node data this hash value changes as
well. It is calculated over each reachable nodes'
update number concatenated with the hash value of
its node data. For calculation these tuples are
sorted in ascending order of the respective node's
Searching further, I find Section 5.1, but that simply states:
"The Trickle state for all endpoints is
considered inconsistent and reset if and only if the locally
calculated network state hash changes."
Next occurence is in these bullets, and then just before Section 6,
the grace period, the nodes that were not marked reachable
in the most recent graph traversal MUST NOT be used for
calculation of the network state hash, be provided to any
applications that need to use the whole TLV graph, or be
provided to remote nodes."
Alright, now I know what I can't use for calculating it.
A few occurences later, in section 7.2.2, in what looks like a
section laying out the packet -- sorry, TLV -- format, I see for
"Network State TLV":
"This TLV contains the current locally calculated network state
hash. It is calculated over each reachable nodes' update number
concatenated with the hash value of its node data in ascending
order of the respective node identifiers"
Phew. Now, it does seem a little at odds with the terminology. The
terminology states something about tuples that are ordered. While
those tuples are not defined (they should be), at least what is
described is clear and possibly can be implemented. What is in 7.2.2
is not ant cannot.
This is an instance of a general issue that I have with this
document: that it doesn't take a step back, and properly define
things in a proper order, but dives into (and repeats) details.
o Also to section 5.2, for each of the cases that are described, could
a conceptual description of "what this corresponds to" be added? For
Upon reciept of a Node State TLV:
If the node identifier matches the local node identifier and
the TLV has a higher update sequence number than its current
local value, or the same update sequence number and a
different hash, the node SHOULD re-publish its own node data
with an update sequence number 1000 higher than the received
It's not clear why it is a "SHOULD re-publish" (not MUST, nor what
happens if SHOULD is not followed). And it is not clear why 1000 ...
[I just pick this example, but it applies to all processing bullets]
o In the same cases, it is a lot more readable (IMO) to do nested bullets:
o If FOO; and either of:
Then do all of the following:
o Otherwise, if not-FOO, ...
That's a personal preference, though, so feel free to disregard this
o Section 5.3 and elsewhere, suggest replacing:
"If it comes via..."
"If received over ..."
o Last paragraph in 5.3:
Same comment as 3rd comment to 5.1 made above.
o Section 5.4, first sentence:
"DNCP validates the set of data within it ..."
Should that not be:
"A DNCP instance validates the data within its data sets ..."
Also, "nodes that are currently accounted for; what's the definition
of "accounted for"?
o Section 5.4, first paragraph
unlike Time-To-Live (TTL) based solutions, it does not require
periodic re-publishing of the data by the nodes. On the other
hand, it does require the topology to be visible to every node that
wants to be able to identify unreachable nodes and therefore remove
old, stale data."
which also appeared in the introduction, is copied verbatimly. Once
more, the statement is a claim which is not supported, and that which
follows "therefore" is not a consequence of that which comes before
o Section 5.4, first paragraph
"When a Neighbor TLV or a whole node is added or removed, the
neighbor graph SHOULD be traversed, starting from the local node.
The edges to be traversed are identified by looking for Neighbor
TLVs on both nodes, that have the other node’s identifier in the
neighbor node identifier, and local and neighbor endpoint
identifiers swapped. Each node reached should be marked currently
First comment, why SHOULD and not MUST?
Second comment, and now you made me go look...."neighbor" sounds like
"someone on the same link as me" so the "neighbor graph" is really just
a set relating "this node" and "another node which is on the same link
as this node".
Yet, looking in the terminology, I see "Neighbor graph" defined as:
"the undirected graph of DNCP nodes produced by
retaining only bidirectional peer relationships
Which doesn't sound as much like a "neighbor graph" as it does a
"topology graph" for the whole network.
So, is the terminology wrong, or is the definition wrong?
o Section 5.4, 3rd paragraph
Is it actually important that the content of that graph be "purged"?
That sounds like an implementation detail -- rather, it sounds like
the elements of the graph should "not be used for calculations and
MAY be removed". Or, is there a specific requirement that I am
o Section 6.1, I do not understand the parenthesis in this sentence:
Trickle-driven status updates (Section 5.1) provide a mechanism for
handling of new peer detection (if applicable) on an endpoint
Under what conditions is that applicable, and under which is it not?
o Section 6.2:
"An upper bound for the number of neighbors that are allowed for a
(particular type of) link that an endpoint runs on SHOULD be
provided by a DNCP profile, user configuration, or some hardcoded
default in the implementation."
A couple of things to that:
1) Can you explain the parenthesis? What type of link?
2) How does "an endpoint runs on" a link?
3) Why SHOULD?
4) Is this specification seriously suggesting "some hardcoded
default in the implementation" as a SHOULD?
[I am tempted to upgrade this to a "Major issue" simply because of 4) ]
Also to 6.2, this particular optimization, do you have any
quantification of its actual benefit? What should I look for to
determine if this "optimization" yields a benefit or not? What are
you trying to optimize? Over what link types does this function?
I am dubious that it "optimizes" much, if anything, across an Ethernet, for example ...
o Section 7
As indicated previously, having to search through the frame format
diagrams for "how to calculate the value" isn't ideal.
o Section 7.2.3, I worry when I see something like this:
"The whole network should have roughly the same idea about the time
since origination of any particular published state."
What is the definition of "roughly"?
Is the "should" intentionally in non-caps?
What're the consequences if not?
[Note that trickle almost mechanically makes information propagate
with non-trivial jitter across a network, so how do you ensure this?]
o Section 7.2.4, CUSTOM-DATA TLV.
Given the description:
"This TLV can be used to contain anything; the URI used should be
under control of the author of that specification."
It seems that (i) the description is self-contradictory: it cannot
contain *anything* but can contain an URI?
Secondly, how is this supposed to work, what does it mean [for DNCP]
that "the URI is under control of the author"?
Thirdly, what does "that specification" refer to?
Fourthly, why lower-case should? Indeed, why is the "control" of the
URI of any importance to DNCP?
o Section 9, the bullet:
"When receiving messages, what sort of messages are dropped, as
specified in Section 5.2"
Seems at odds with Section 5.2, which discusses TLV processing.
o Please reflect Errata 499 for RFC2119 in the boilerplate
o The RFC2119 boilerplate could conveniently be in the terminology
section, given that it is terminology.