Telechat Review of draft-ietf-anima-reference-model-07
review-ietf-anima-reference-model-07-rtgdir-telechat-hopps-2018-08-26-00

Request Review of draft-ietf-anima-reference-model
Requested rev. no specific revision (document currently at 08)
Type Telechat Review
Team Routing Area Directorate (rtgdir)
Deadline 2018-08-24
Requested 2018-08-09
Requested by Alvaro Retana
Other Reviews Opsdir Last Call review of -06 by Tianran Zhou (diff)
Genart Last Call review of -06 by Joel Halpern (diff)
Secdir Last Call review of -06 by Radia Perlman (diff)
Genart Telechat review of -08 by Joel Halpern
Review State Completed
Reviewer Christian Hopps
Review review-ietf-anima-reference-model-07-rtgdir-telechat-hopps-2018-08-26
Posted at https://mailarchive.ietf.org/arch/msg/rtg-dir/r68B6XnG-qjHkvNTbvkJtpsZ76s
Reviewed rev. 07 (document currently at 08)
Review result Has Issues
Draft last updated 2018-08-26
Review completed: 2018-08-26

Review
review-ietf-anima-reference-model-07-rtgdir-telechat-hopps-2018-08-26

Subject: RtgDir review: draft-ietf-anima-reference-model-07.txt

Hello,

I have been selected as the Routing Directorate reviewer for this draft. The Routing Directorate seeks to review all routing or routing-related drafts as they pass through IETF last call and IESG review, and sometimes on special request. The purpose of the review is to provide assistance to the Routing ADs. For more information about the Routing Directorate, please see ‚Äčhttp://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir

Although these comments are primarily for the use of the Routing ADs, it would be helpful if you could consider them along with any other IETF Last Call comments that you receive, and strive to resolve them through discussion or by updating the draft.

Document: draft-ietf-anima-reference-model-07.txt
Reviewer: Christian E. Hopps
Review Date: 2018-08-26
Intended Status: Informational

Summary:

I have a couple minor major concerns, and a couple minor concerns about this document that I think should be resolved before publication.

I found the document very well written.

Minor Major Issues:

- Virtualization is mentioned once in "4.2 addressing" section. To quote:

  TEXT: "Support for virtualization: Autonomic Nodes may support Autonomic
  Service Agents in different virtual machines or containers. The addressing
  scheme should support this architecture."

  The special casing of VM/containers here seems to indicate that virtual
  devices are not "1st class citizens" in an autonomic network. In particular I
  could easily imagine virtual machines being full blown autonomic nodes
  themselves. Assuming the intent is not to restrict virtual devices in this
  manor something needs to be said (somewhere) to make that clear.

- Robust programming techniques. I think the intention here is to say that the
  design of ASAs must have robustness as a top design principle. I think in
  doing that it should talk about what being robust means; however, it should
  not be talking about how to accomplish that as there are multiple ways to
  achieve this goal.

  In particular I feel saying that restarting is the *last* thing an ASA should
  do is way overreaching into engineering the solution rather than specifying
  the requirement. Indeed plenty of people think that overly complex recovery
  mechanisms that try everything under the sun to *not* restart often have more
  bugs and are less robust than KISS solutions that "fail" simply but recover
  quickly with minimal or no disruption.

  I feel this section reads a bit more like someones idea of how to design a
  robust system instead of talking about what robust means which is the intent I
  believe.

  Perhaps better is just to focus on robust design ideas (some are already
  stated in the text):

  - must deal with discovery and negotiation failure as routine.
  - recovering from failures should be minimally disruptive.
  - must not leak resources.
  - must monitor for and deal with hung code.
  - must include security analysis


- 7.4: When text talks about feedback loop, it mentions "allow the intervention"
  of human admin or control system; however, it then describes the feedback loop
  as presenting default actions and allowing for override. This is fine, but it
  seems to leave out the common case where something is misbehaving and would
  not be presenting any choices to the administrator (using the feedback loop),
  so the admin must forcefully intervene.

Minor Issues:

- 6.1 TEXT: "It must be possible to run ASAs as non-privileged (user space)
  processes except for those (such as the infrastructure ASAs) that necessarily
  require kernel privilege. Also, it is highly desirable that ASAs can be
  dynamically loaded on a running node."

  ISSUE: Discussing implementation details like user-space, kernel privilege and
  dynamic loading seems unnecessary and outside the scope of this document. Does
  this document care if I implement my ASA on a real-time architecture with no
  "user space" etc..?

- 4.6 Why call out global routing and overlay networks in particular? Is the
  real intention to just say that the ACP implementation is not restricted to any
  specific type of networking?

- TEXT: 6.3.1.2 "on a given LAN"

  NIT: Everyone knows what a LAN is; however, I wonder if the text should be
  more generic and actually describe what it really requires here which is a
  broadcast or multicast network?

Questions/Comments:

- QUESTION: IoT and node requirements. There a couple node ASA requirements. I
  found myself wondering if a very simple IoT things like thermostats might ever
  be an AN and if so did they all really need to have joining assistent ASAs? It
  could be that the answer is "Yes, they do or they can't be nodes". I was just
  curious.

- COMMENT: For the types of ASAs: simple (run anywhere), complex (resource
  restricted), and infra (run everywhere), I was reminded of Kubernetes/cloud
  orchestration, and the concept of DaemonSets (pods that run everywhere) and
  Deployments (pods that can run anywhere, possibly be scaled replicated, and
  may also have requirements that restrict where they can run). I imagine that
  folks in Anima have also looked at this, but if not it would be good to as
  they seem to be solving very similar problems.

Nits:

- TEXT: 3.2 "However, the information is tracked independently of the status of
  the peer nodes; specifically, it contains information about non-enrolled
  nodes, nodes of the same and other domains. "

  QUESTION: What are peer nodes? Is this another name for adjacent nodes? If so
  "s/peer/adjacent/".

- TEXT: 3.3.1 "enrols"
  CHANGE: "enrolls"

- TEXT: 3.3.3 "In this state, the autonomic node has at least one ACP channel to
  another device. It can participate in further autonomic transactions, such as
  starting autonomic service agents. For example it must now enable the join
  assistant ASA, to help other devices to join the domain.

  NIT: "For example foo" is not a sentence on it's own, also "It" is not a good
  subject as there are multiple nouns in the previous sentence that could serve as
  antecedents.

  SUGGEST: 3.3.3 "In this state, the autonomic node has at least one ACP channel
  to another device. The node can now participate in further autonomic
  transactions, such as starting autonomic service agents (e.g., it must now
  enable the join assistant ASA, to help other devices to join the domain).

- TEXT: 4.1 "Names are typically assigned by a Registrar at bootstrap time and
  persistent over the lifetime of the device."

  NIT: s/persistent/and persist/

- TEXT: "Out of scope are addressing approaches for the data plane of the
  network, which may be configured and managed in the traditional way, or
  negotiated as a service of an ASA. One use case for such an autonomic function
  is described in [I-D.ietf-anima-prefix-management]."

- NIT: Sounds sort of Yoda-like, and the compounding makes things less clear.

  SUGGEST: "Addressing approaches for the data plane of the network are outside
  the scope of this document. These addressing approaches may be configured and
  managed in the traditional way, or negotiated as a service of an ASA. One use
  case for such an autonomic function is described in
  [I-D.ietf-anima-prefix-management]."

- TEXT: 6.1: "Following an initial discovery phase, the device properties and
  those of its neighbors are the foundation of the behavior of a specific
  device. A device and its ASAs have no pre-configuration for the particular
  network in which they are installed."

  NIT: Why suddenly lose the "node" abstraction and start talking about devices
  here? I think it continues to work well to say "node" (e.g., "node
  properties", "specific node" and "A node and its ASAs...").

- TEXT: 6.2 "install ASA: copy the ASA code onto the host and start it,"
  NIT: "s/host/node/"