Skip to main content

An Autonomic Control Plane (ACP)
draft-ietf-anima-autonomic-control-plane-30

Discuss


Yes

(Terry Manderson)

No Objection

(Adam Roach)
(Spencer Dawkins)
(Suresh Krishnan)

Recuse


Note: This ballot was opened for revision 13 and is now closed.

Erik Kline
(was Discuss) Yes
Comment (2020-10-07 for -29) Sent
Thanks for addressing things!
Éric Vyncke
Yes
Comment (2020-07-02 for -27) Sent
Thanks to the authors and the ANIMA WG and the numerous reviewers for this document.

After my own review late 2019, I think that the document is ready to be published.
Roman Danyliw
(was Discuss) No Objection
Comment (2020-08-12 for -29) Sent for earlier
The style of explaining the design choice after describing an element of the protocol was informative and helpful.  Thanks.

This document has undergone a significant amount of security review.  Thank you for incorporating all of this feedback.

Thanks for resolving my previous DISCUSS and COMMENTs.

** Section 6.8.2.1. Editorial.
OLD
The compromised ACP node would
   simply announce the objective as well, potentially filter the
   original objective in GRASP when it is a MITM and act as an
   application level proxy.

NEW
The compromised ACP node would simply announce the objective as well, potentially filter the original objective in GRASP when it is in-path and acting as an application level proxy.

** 10.2.2.  Editorial.
OLD
This minimizes man in the middle
   attacks by compromised ACP group members

NEW
This minimizes attacks by compromised ACP group members who are on-path.	

** In reviewing the ballot position of my predecessor (ekr)

> Section 6.10.2.
>    o  When creating a new routing-subdomain for an existing autonomic
>       network, it MUST be ensured, that rsub is selected so the
>       resulting hash of the routing-subdomain does not collide with the
>       hash of any pre-existing routing-subdomains of the autonomic
>       network.  This ensures that ACP addresses created by registrars
>       for different routing subdomains do not collide with each others.

[ekr] You need to lay out the security assumptions here. It's not difficult
to create a new domain with the same 40bit hash. If you have a private
CA, this probably isn't an issue, but if you are sharing a public CA,
it would allow me to produce a domain with other people's addresses.

[Roman] If the domain uses a "public CA" as a trust anchor, is there a risk of it might also be used by some other autonomic domain?
Warren Kumari
(was Discuss) No Objection
Comment (2018-06-11 for -16) Unknown
Thank you for addressing my DISCUSS concerns so quickly and well.

I've cleared.

 (Actually wrote this a few days back, but forgot to hit the confirm button :- ( )


-- Original DISCUSS for hysterical raisins -- 
I'm balloting DISCUSS, but I think that this should be relatively simple to address:
The document says things like:"Today, the management and control plane of networks typically runs in
the global routing table, which is dependent on correct configuration
and routing." and "Context separation improves security, because the ACP is not
   reachable from the global routing table. "

The term "global routing table" is widely used and understood to mean the global BGP routing table, or Internet global routing table. I understand that you are using it in the "default VRF" meaning, but I think that it is really important to clarify / disambiguate this the first time you use it.

----------






Thank you very much for writing this document -- it is comprehensive...

A rich text version of my review is here: https://mozphab-ietf.devsvcdev.mozaws.net/D3801#inline-4146 , and pasted below for tooling, email, etc.

--- 
draft-ietf-anima-autonomic-control-plane.txt:212
   network nodes that is not the ACP, and therefore considered to be
   dependent on (mis-)configuration.  This data-plen includes both the
   traditional forwarding-plane, as well as any pre-existing control-
Nit: data-plane


draft-ietf-anima-autonomic-control-plane.txt:508
   certificate.  This does not require any configuration on intermediate
   nodes, because they can communicate zero-touch and securely through
   the ACP.
I understand what you are trying to say, but "zero-touch" is not an adverb. 


draft-ietf-anima-autonomic-control-plane.txt:518
   the data-plane is operational, will the other planes work as
   expected.
This is *sometimes* an undesirable dependency, but is usually viewed as a feature (by operational people) -- having the control plane share fate with the dataplane is something that is usually a feature - this drives at least part of the reason that many organizations run OSPF and OSPFv3 - having V4 OSPF relying on v4 dataplane avoids blackholes if the v4 dataplane stops working.
(This is sometimes, but less often used as an argument against ISIS).

I understand why it is useful in this context, but it would be useful to clarify/make it clear that you understand the subtleties.

Also, nit: "is operational, will the" -- the comma feel weird here.


draft-ietf-anima-autonomic-control-plane.txt:526
   management session is running can lock an admin irreversibly out of
   the device.  Traditionally only console access can help recover from
   such issues.
"only console access or OOB".

You may be using "console access" to mean OOB, but much (most?) OOB is now not console based.


draft-ietf-anima-autonomic-control-plane.txt:531
   Operations Center") such as SDN controller applications: Certain
   network changes are today hard to operate, because the change itself
   may affect reachability of the devices.  Examples are address or mask
I think that this was an editing issue -- you don't "operate" changes. Perhaps "implement"?


draft-ietf-anima-autonomic-control-plane.txt:858
   o  If the node certificates indicate a CDP (or OCSP) then the peer's
      certificate must be valid according to those criteria. e.g.: OCSP
You expand CDP further in the document, but this is the first time it is used.


draft-ietf-anima-autonomic-control-plane.txt:994
   ACP neighbors.  Native interfaces (e.g.: physical interfaces on
   physical nodes) SHOULD be brought up automatically enough so that ACP
   discovery can be performed and any native interfaces with ACP
I don't have a suggestion, but "automatically enough" doesn't sound right - "automatically configured enough" ?


draft-ietf-anima-autonomic-control-plane.txt:1067
   In the above (recommended) example the period of sending of the
   objective could be 60 seconds the indicated ttl of 180000 msec means
   that the objective would be cached by ACP nodes even when two out of
Editing fail -- missing some punctuation or words.


draft-ietf-anima-autonomic-control-plane.txt:1933
   for reachability.  The use of the autonomic control plane specific
   context eliminates the probable clash with the global routing table
   and also secures the ACP from interference from the configuration
IMPORTANT: The term "global routing table" has a well known meaning in operations -- it is the global BGP table. I strongly suggest using a different term, or having a very clear statement in the terminology section, AND the first time you use it in the document. This will help minimize confusion.

draft-ietf-anima-autonomic-control-plane.txt:3081
10.2.  ACP (and BRSKI) Diagnostics
Just a note that I like / appreciate this section - having guidance on how to troubleshoot is very helpful.


draft-ietf-anima-autonomic-control-plane.txt:3416
10.3.2.2.  Fast state propagation and Diagnostics

   "Physical down" state propagates on many interface types (e.g.:
When I saw the "physically brought down" I started composing a long soapbox rant on the fact that this will slow down state propagation -- I like that that document anticipates and addresses this. It might be useful to have a pointer in the previous section (like "(see below)" or similar.)


draft-ietf-anima-autonomic-control-plane.txt:3482
   for 5 seconds to probe if there is an ACP neighbor on the remote end
   every 500 seconds = 1% power consumption.
I believe that this is sufficiently incorrect that you should remove the 1% result (or, better yet the whole last sentence).

Various interfaces (especially long reach) take a significant amount of time (and additional power) when bringing up interfaces -- things like DWDM optics and amplifiers sometimes need significant power for heating elements to lock the frequency / wavelength, and so the power consumption is not linear with interface uptime.
Eric Rescorla Former IESG member
Discuss
Discuss [Treat as non-blocking comment] (2018-08-01 for -16) Unknown
Rich version of this review at:
https://mozphab-ietf.devsvcdev.mozaws.net/D9959


I found this document extremely hard to follow due to a large number
of grammar errors. It really needs a very thorough copy-edit pass,
which I believe is beyond the RFC-editor's usual process. Ideally, the
WG would do this.


DETAIL
S 6.1.1.
>      each other.  See Section 6.1.2.  Acp-domain-name SHOULD be the FQDN
>      of a DNS domain owned by the operator assigning the certificate.
>      This is a simple method to ensure that the domain is globally unique
>      and collision of ACP addresses would therefore only happen due to ULA
>      hash collisions.  If the operator does not own any FQDN, it should
>      choose a string (in FQDN format) that intends to be equally unique.

These rules do not seem to be strong enough. Unless you have disjoint
trust anchors, there is a potential for cross-domain attac.


S 6.1.2.
>      See section 4.2.1.6 of [RFC5280] for details on the subjectAltName
>      field.
>   
>   6.1.2.  ACP domain membership check
>   
>      The following points constitute the ACP domain membership check of a

What is the relationship of these rules to the existing 5280 rules?


S 6.1.2.
>   
>      o  The peer has proved ownership of the private key associated with
>         the certifictes public key.
>   
>      o  The peer's certificate is signed by one of the trust anchors
>         associated with the ACP domain certificate.

So you don't allow chaining? It seems later that you say you do, but
this language prohibits it.


S 6.1.3.1.
>      The objective value "SRV.est" indicates that the objective is an
>      [RFC7030] compliant EST server because "est" is an [RFC6335]
>      registered service name for [RFC7030].  Future backward compatible
>      extensions/alternatives to [RFC7030] may be indicated through
>      objective-value.  Future non-backward compatible certificate renewal
>      options must use a different objective-name.

EST runs over HTTPS. What is the certificate that the server presents?


S 6.4.
>      information in the ACP Adjacency table.
>   
>      The ACP is by default established exclusively between nodes in the
>      same domain.  This includes all routing subdomains.  Appendix A.7
>      explains how ACP connections across multiple routing subdomains are
>      special.

I must be missing something, but how do you know what the routing
domain is of an ACP node? I don't see it in the message above. Is it
in some common header?


S 6.5.
>   
>      o  Once the first secure channel protocol succeeds, the two peers
>         know each other's certificates because they must be used by all
>         secure channel protocols for mutual authentication.  The node with
>         the lower Node-ID in the ACP address becomes Bob, the one with the
>         higher Node-ID in the certificate Alice.

A ladder diagram would really help me here, because I'm confused about
the order of events.

As I understand it, Alice and Bob are both flooding their AN_ACP
objectives. So, Alice sees Bob's and starts trying to connect to Bob.
But Bob may not have Alice's objective, right? So, in the case you
describe below, she just has to wait for it before she can try the
remaining security protocols?

I note that you have no downgrade defense on the meta-negotiation
between the protocols, so an attacker could potentially force you down
to the weakest joint protocol. Why did you not provide a defense here?


S 6.7.1.1.
>      To run ACP via IPsec natively, no further IANA assignments/
>      definitions are required.  An ACP node that is supporting native
>      IPsec MUST use IPsec security setup via IKEv2, tunnel mode, local and
>      peer link-local IPv6 addresses used for encapsulation.  It MUST then
>      support ESP with AES256 for encryption and SHA256 hash and MUST NOT
>      permit weaker crypto options.

This is not sufficient to guarantee interop. Also, this is an odd
cipher suite chioice.

    Why are you requiring AES-256 rather than AES-128?
    Why aren't you requiring AES-GCM?
    Why aren't you requiring specific key establishment methods (e.g.,
ECDHE with P-256...)



S 6.7.2.
>   
>      To run ACP via UDP and DTLS v1.2 [RFC6347] a locally assigned UDP
>      port is used that is announced as a parameter in the GRASP AN_ACP
>      objective to candidate neighbors.  All ACP nodes supporting DTLS as a
>      secure channel protocol MUST support AES256 encryption and MUST NOT
>      permit weaker crypto options.

This is not sufficiently specific to guarantee interoperability. Which
cipher suites? Also, why are you requiring AES-256 and not AES-128?


S 6.7.3.
>   
>      A baseline ACP node MUST support IPsec natively and MAY support IPsec
>      via GRE.  A constrained ACP node that can not support IPsec MUST
>      support DTLS.  An ACP node connecting an area of constrained ACP
>      nodes with an area of baseline ACP nodes MUST therefore support IPsec
>      and DTLS and supports threefore the baseline and constrained profile.

These MTIs do not provide interop between constrained and baseline
nodes, because a baseline node might do IPsec and the constrained node
DTLS.


S 6.10.2.
>         hash of the routing subdomain SHOULD NOT be assumed by any ACP
>         node during normal operations.  The hash function is only executed
>         during the creation of the certificate.  If BRSKI is used then the
>         BRSKI registrar will create the domain information field in
>         response to the EST Certificate Signing Request (CSR) Attribute
>         Request message by the pledge.

you need to lay out the security assumptions here. It's not difficult
to create a new domain with the same 40bit hash. If you have a private
CA, this probably isn't an issue, but if you are sharing a public CA,
it would allow me to produce a domain with other people's addresses.


S 8.1.1.
>      configured to be put into the ACP VRF.  The ACP is then accessible to
>      other (NOC) systems on such an interface without those systems having
>      to support any ACP discovery or ACP channel setup.  This is also
>      called "native" access to the ACP because to those (NOC) systems the
>      interface looks like a normal network interface (without any
>      encryption/novel-signaling).

This seems pretty unclear. Is the idea that you connect natively to
the ACP Connect node and then it forwards your packets over the ACP?
Does that mean they need to be GRASP or whatever? I think that's what
you are saying below.


S 8.1.5.
>      interface is physically protected from attacks and that the connected
>      Software or NMS Hosts are equally trusted as that on other ACP nodes.
>      ACP edge nodes SHOULD have options to filter GRASP messages in and
>      out of ACP connect interfaces (permit/deny) and MAY have more fine-
>      grained filtering (e.g., based on IPv6 address of originator or
>      objective).

Given that this is an important security requirement, it seems like it
should be a normative requirement that it be filtered.


S 9.1.
>      same trust anchor, a re-merge will be smooth.
>   
>      Merging two networks with different trust anchors requires the trust
>      anchors to mutually trust each other (for example, by cross-signing).
>      As long as the domain names are different, the addressing will not
>      overlap (see Section 6.10).

Why does it require the *trust anchors* to trust each other? Can't the
endpoints just have the union of the trust anchors.

This is way underspecified for actual implementation.


S 10.2.1.
>      registrar can rely on the ACP and use Proxies to reach the candidate
>      ACP node, therefore allowing minimum pre-existing (auto-)configured
>      network services on the candidate ACP node.  BRSKI defines the BRSKI
>      proxy, a design that can be adopted for various protocols that
>      Pledges/candidate ACP nodes could want to use, for example BRSKI over
>      CoAP (Constrained Application Protocol), or proxying of Netconf.

I am finding it very difficult to work out the security properties of
this mechanism and the security considerations do not help. What can a
malicious registrar do? For that matter, you say "uncoordinated", so
does that mean anyone in the ACP can just decide to be a registrar?


S 11.
>   
>   11.  Security Considerations
>   
>      An ACP is self-protecting and there is no need to apply configuration
>      to make it secure.  Its security therefore does not depend on
>      configuration.

This is not true. You need to configure the trust anchor and the
domain name.


S 11.
>         all products.
>   
>      There is no prevention of source-address spoofing inside the ACP.
>      This implies that if an attacker gains access to the ACP, it can
>      spoof all addresses inside the ACP and fake messages from any other
>      node.

You need to be clear that the security is just group security and that
any compromised ACP node compromises the entire system.
Benjamin Kaduk Former IESG member
(was Discuss, No Record, Discuss) Yes
Yes (2020-10-01 for -29) Sent
We're down to largely editorial stuff at this point, and I'm happy with the overall
state of things.

A couple of the new bits in the -29 might benefit from targeted review (noted
inline), e.g., for CDDL, TSV, or INT-specific aspects.

Section 6.1

   TLS MUST offer TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 and
   TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 and MUST NOT offer options
   with less than 256 bit symmetric key strength or hash strength of
   less than SHA384.  When TLS 1.3 is supported, TLS_AES_256_GCM_SHA384

One could potentially say "hash strength of less than 384 bits" instead
of anchoring the reference point at SHA384, but I'm not terribly
concerned about it.

   TLS MUST also include the "Supported Elliptic Curves" extension, it
   MUST support the NIST P-256 (secp256r1(22)) and P-384 (secp384r1(24))
   curves [RFC4492].  In addition, TLS clients SHOULD send an
   ec_point_formats extension with a single element, "uncompressed".

We can say "TLS 1.2 clients" for the ec_point_format extension (nit: no 's').

Section 6.2.1

Thank you for clarifying the "serialNumber" attribute; I think that will
be helpful for a lot of people.

   ACP nodes MUST NOT support certificates with RSA public keys whose
   modulus is less than 2048 bits, or certificates whose ECC public keys
   are in groups whose order is less than 256 bits.  RSA signing
   certificates with 2048-bit public keys MUST be supported, and such

I think I mentioned this previously (and sorry for the repetition if I
did), but just in case I didn't: this 256-bit group order requirement
excludes Ed25519 and friends.  If you're fine with that, that's okay; I
just want to make sure it's an informed choice.

   ACP nodes MUST support RSA certificates that are signed by RSA
   signatures over the SHA-256 digest of the contents, and SHOULD
   additionally support SHA-384 and SHA-512 digests in such signatures.
   The same requirements for certificate signatures apply to ECDSA
   certificates, and additionally, ACP nodes MUST support ECDSA
   signatures on ECDSA certificates.

I think "same requirements for digest usage in certificate signatures"
is more accurate.

   In support of ECDH key establishment, ACP certificates with ECC keys
   MUST indicate to be Elliptic Curve Diffie-Hellman capable (ECDH): If
   the X.509v3 keyUsage extension is present, the keyAgreement bit MUST
   be set.

I think I may have failed to think about and comment on this previously,
but doing direct ECDH with the (static) key in the certificate is pretty
uncommon -- as I understand it you don't need this bit set in order to
use the TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 ciphersuite, for
example.  To be clear, I'm not saying it's inherently wrong to make this
requirement, just that I don't think it's needed for the use-cases
presented in this document.  (It may also make it harder to get such
certificates issued in the future, though it's hard to predict what
path CA policies will take in the future.)

Section 6.2.3

It might be nice to say something about in what cases the transport
address used to reach the peer can/cannot be validated against the
acp-address in the peer's ACP certificate.  I think there are some
classes of interactions for which that check can be done and would add
value, even though there are definitely some classes of interaction for
which it is not viable.

Section 6.2.5.3

   When using a private PKI for ACP certificates, the CRL may be need-
   to-know, for example to prohibit insight into the operational
   practices of the domain by tracking the growth of the CRL.  In this
   case, HTTPS may be chosen to provide confidentiality, especially when
   making the CRL available via the Data-Plane.  Authentication and
   authorization SHOULD use ACP certificates and ACP domain membership
   check.  [...]

(I assume that the SHOULD here is still only in the case where the CRL
is need-to-know; no text changes needed if that's correct.)

Section 6.2.5.5

   To prohibit attacks that attempt to force the ACP node to forget its
   prior (expired) certificate and TA, the ACP node should alternate
   between attempting to re-enroll using its old keying material and
   attempting to re-enroll with its IDevID and requesting a voucher.

I think that as written, this doesn't fully "prohibit" such attacks (but
does make them harder and is good advice).  I suppose some nodes might
continue trying with the old key material for some time even after
obtaining a new voucher, but the state-keeping requirements for that are
big enough that we shouldn't require it.  So I'd suggest just a small
change like s/To prohibit/As a countermeasure against/.

   Maintaining existing TA information is especially important when
   enrollment mechanisms are used that unlike BRSKI do not leverage a
   mechanism (such as the voucher in BRSKI) to authenticate the ACP
   registrar and where therefore the injection of certificate failures
   could otherwise make the ACP node easily attackable remotely by
   returning the ACP node to a "duckling" state in which it accepts to
   be enrolled by any network it connects to.  The (expired) ACP
   certificate and ACP TA SHOULD therefore be maintained and used for
   re-enrollment until new keying material is enrolled.

(editorial) the wording here should probably be checked; right now it
seeems to be saying "use X for re-enrollment until [re-enrollment
succeeds]" which makes me wonder how the new keying material would come
into play.

Section 6.4

Figure 7 has some nits, now -- '|' is not a CDDL keyword.
Also, we should probably get a CDDL expert to look at it, since both
method-params and extensions are 1*any, and I'm not 100% sure what the
order of binding is (Appendix A of RFC 8610 suggests that it is a
prioritized choice in CDDL, so things after the first comma would always
be parameters, not extensions).

   Attackers on a subnet may be able to inject malicious DULL GRASP
   messages that are indistinguishable from non-malicious DULL GRASP
   messages to create Denial-of-Service (DoS) attacks that force ACP
   nodes to attempt many unsuccessful ACP secure channel connections.
   When an ACP node sees multiple AN_ACP objectives for the same secure
   channel protocol on different transport addresses, it SHOULD prefer
   connecting via the well-known transport address if the secure channel
   method has one (such as UDP port 500 for IKEv2).

This new text should probably be run by some TSV-area reviewer (e.g.,
AD).

Section 6.8.3.1

   The IKEv2 Diffie-Hellman key exchange group 19 (256-bit random ECP),
   MUST be support.  Reason: ECC provides a similar security level to

nit: "supported".

Section 6.8.3.2

   If IKEv2 initiator and responder support IPsec over GRE, it will be
   preferred over native IPsec because of the way how IKEv2 negotiates
   transport mode as used by this IPsec over GRE profile) versus tunnel
   mode as used by native IPsec (see [RFC7296], section 1.3.1).  The ACP

nit: missing open paren?

Section 9.2.2

   *  Policies if candidate ACP nodes should receive a domain
      certificate or not, for example based on the devices IDevID
      certificate as in BRSKI.  The ACP registrar may have a whitelist
      or blacklist of devices [X.520] "serialNumbers" attribute in the
      subjects field distinguished name encoding from their IDevID
      certificate.

I note we had that long ietf@ thread about terminology, that included
"whitelist" and "blacklist", and trust you to make an informed choice
about terminology usage.

Section 9.3.5.2

   When a greenfield node enables multiple enrollment/botstrap
   protocols/mechanisms in parallel, care must be taken not to terminate
   any protocol/mechanism before another one has progressed to a point
   where greenfield state is defined to end.

(editorial) Do we give a clear definition of when greenfield state is
defined to end, that would apply to arbitrary such mechanisms?  We might
want to reword a little bit.

Section 11

Wow, so much new good stuff here; thanks!

   *  For IDevIDs to securely identify the node to which it IDevID is
      assigned, the node it needs to (1) utilize hardware support such
      as a Trusted Platform Module (TPM) to protect against extraction/
      cloning of the private key of the IDevID and (2) a hardware/
      software infrastructure to prohibit execution of non authenticated
      software to protect against malicious use of the IDevID.

nit: s/node it needs to/node needs to/

   *  A malicious ACP node could declare itself to be an EST server via
      GRASP across the ACP if malicious software could be executed on
      it.  CA should therefore authenticate only known trustworthy EST
      servers, such as nodes with hardware protections against malicious
      software.  Without the ability to talk to the CA, a malicious EST
      server can still attract ACP nodes attempting to renew their
      keying material, but they will fail to perform successful renewal
      of a valid ACP certificate.  The ACP node attempting to use the
      malicious EST server can then continue to use a different EST
      server, and log a failure against a malicious EST server.

We have two copies of basically this text.  The second one (that I
quoted) does not have a note about id-kp-cmcRA, and is the one that
should be removed.

   If public CA are to be used, ACP registrars would need to prove
   ownership of the domain-name of AcpNodeNames to the public CA.
   However, maintaining the ULA based address allocation when using a
   public CA might be considered to be a violation of the private
   allocation expectation of ULA prefixes.  To avoid this issue, further
   changes to registrar address allocation procedures might be needed,
   for example using global IPv6 address prefixes owned by the public CA
   instead of ULA.

I don't expect any problems here, but it might be good to get some
INT-area (e.g., AD) eyes on this text.

Section 12

   0: ACP Zone Addressing Sub-Scheme (ACP RFC 1: ACP Vlong Addressing
   Sub-Scheme (ACP RFC Figure 12) / ACP Manual Addressing Sub-Scheme
   (ACP RFC Section 6.11.4) Section 6.11.5)

Something went awry here (maybe just formatting?), as the '1' is
supposed to be the initial value allocated by this document, not a
reference to RFC 1.

Section 16

RFC 4492 is obsoleted by RFC 8422.

Section A.6

   When the two peers successfully establish the GRASP/TSL session, they
   will negotiate the channel mechanism to use using objectives such as

nit: s/TSL/TLS/

Section A.10.9

Thank you for adding this discussion; it is a good treatment of the
issues and considerations in play.
Terry Manderson Former IESG member
Yes
Yes (for -13) Unknown

                            
Adam Roach Former IESG member
No Objection
No Objection (for -16) Unknown

                            
Alexey Melnikov Former IESG member
No Objection
No Objection (2018-08-02 for -16) Unknown
I haven't finished reading the whole document. I agree with Benjamin and Ekr that some security aspects are underspecified.

A few extra comments/questions of my own:

1) Where is locator-option formally defined?

2) 
6.10.2.  The ACP Addressing Base Scheme

   o  The 40 bits ULA "global ID" (term from [RFC4193]) for ACP
      addresses carried in the domain information field of domain
      certificates are the first 40 bits of the SHA256 hash of the
      routing subdomain from the same domain information field.

I think you need to make clear that one needs to canonicalize (e.g. to lowercase) the routing subdomain before applying hash.
You don't want some nodes using "example.com" and other "EXAMPLE.com".

      In the
      example of Section 6.1.1, the routing subdomain is
      "area51.research.acp.example.com" and the 40 bits ULA "global ID"
      89b714f3db.

3) A.6:

   When Alice and Bob successfully establish the GRASP/TSL session, they

typo: TSL --> TLS

   will negotiate the channel mechanism to use using objectives such as
   performance and perceived quality of the security.  After agreeing on
   a channel mechanism, Alice and Bob start the selected Channel
   protocol.  Once the secure channel protocol is successfully running,
   the GRASP/TLS connection can be kept alive or timed out as long as
   the selected channel protocol has a secure association between Alice
   and Bob.  When it terminates, it needs to be re-negotiated via GRASP/
   TLS.
Alissa Cooper Former IESG member
(was Discuss) No Objection
No Objection (2019-08-01 for -20) Sent
Thanks for addressing my DISCUSS. Original COMMENT is left below.

General:

Please address the Gen-ART reviewer's latest round of comments.

There are a bunch of places in this document where it seems like there is a tension between specifying a limited set of functionality here and being able to support a wider variety of deployment scenarios. This is noted in Section 1 but I think in general it would be clearer if uses of the term "future" throughout the document could be more surgical as well as more specific about whether they mean "people might deploy this differently in the future" or "standards would need to be developed in the future." I've made a few suggestions about some of these turns of phrase below but would suggest someone do a full edit pass with this in mind because there are a large number of mentions of "future work." Of course there is always more work to do, but every bit of "future work" need not be mentioned in this document, and in cases where it is mentioned I think there should be a specific reason for doing so that bears on people implementing this specification. I don't think this fits in the DISCUSS criteria but for a document that intends to be published on the standards track I would expect it to be crisper about the dividing line between the normative behavior being specified here versus changes or extensions that may or may not be made in the future.

"Intent" is used both capitalized and in lower case throughout the document and I'm unclear if this is meant to signify a distinction or not.

Section 2: 

Please remove the -->"..."() notation.

Please use the exact boilerplate from RFC 8174, not a variation.

It seems like RFC citations should appear for IKEv2 and DTLS upon first use in this section. Otherwise, it seems they are first cited at different future points in the document (Section 6.3 and 6.7, respectively).

Section 3.3:

"The ACP provides reachability that is independent of the Data-Plane
   (except for the dependency discussed in Section 6.12.2 which can be
   removed through future work),"

Isn't this kind of a big exception, given that there is meant to be a secure channel between pairs of nodes in the ACP and that developing future encapsulations is non-trivial? It seems like phrasing this the other way around (the ACP is dependent on the Data-Plane for <XYZ> but is otherwise independent of it) would be more accurate.

Section 6:

"Indestructible" seems like an overstatement. Maybe "resilient" would be more accurate?

Section 6.1.1:

s/Such methods are subject to future work though./No such methods have been defined at the time of publication of this document./

s/to build ACP channel/to build ACP channels/

s/that intends to be equally unique/that it intends to be equally unique/ 

""rsub" is optional; its syntax is defined in this document,
   but its semantics are for further study.  Understanding the benefits
   of using rsub may depend on the results of future work on enhancing
   routing for the ACP."

What is the point of defining this now when it is unclear if or how it will be used? There are already means for nodes to do error handling, so it seems like defining a new field in the future if/when it is needed would work fine and be cleaner. Appendix A.7 seems to assume some semantics for this field, which makes the way it is specified here even more confusing IMO. 

"In this specification, the "acp-address" field is REQUIRED, but
   future variations (see Appendix A.8) may use local information to
   derive the ACP address.  In this case, "acp-address" could be empty.
   Such a variation would be indicated by an appropriate "extension".
   If "acp-address" is empty, and "rsub" is empty too, the "local-part"
   will have the format "rfcSELF + + extension(s)".  The two plus
   characters are necessary so the node can unambiguously parse that
   both "acp-address" and "rsub" are empty."

This seems contradictory. Either "acp-address" is REQUIRED in which case there are no exceptions, or it's not; if it's not, then the expected syntax for cases when it's not present should be specified.

Section 6.1.2:

s/If the node certificates indicates/If the node certificate indicates/

Section 6.3:

It seems odd to provide a citation/discussion for IKEv2 here but not for DTLS.

Section 6.4:

This is a good example of a section where the blurring between the specified behavior and expectations for the future is unhelpful IMO. Why specify the current default and then spend a lot of words (including Appendix A.7) talking about how it will be different in the future?

Section 6.10.3.1:

s/We do not think this is required at this point/This is not currently required/

Section 6.12.2:

s/may specify additional layer 2 or layer encapsulations/may specify additional layer 2 or layer 3 encapsulations/ (I think?)

Section 8.2.1:

This seems extraneous: "Future work could transform this into a YANG ([RFC7950]) data
   model."
   
Appendix A.8:

"Secure channels may
   even be replaced by simple neighbor authentication to create
   simplified ACP variations for environments where no real security is
   required but just protection against non-malicious misconfiguration."
   
I think experience has shown that even environments where it is assumed that security is not required prove to need it. I would suggest removing this text or changing this implication.
Alvaro Retana Former IESG member
No Objection
No Objection (2020-08-12 for -28) Sent
(1) §6: "An ACP node... Initially, it MUST have...an (empty) ACP Adjacency Table..."  Is "empty" a requirement?  I'm wondering because §6.2 says that the adjacency table can also contain configured information, which I assume would be present before neighbor discovery starts.


(2) As far as I understand, events happen in this order:  The ACP Adjacency Table (§6.2) is populated with information from DULL GRASP (§6.3).  Based on that information, a candidate set of neighbors is selected (§6.4).  Is that correct?

§6.4 (Candidate ACP Neighbor Selection) says that the "ACP is established exclusively between nodes in the same domain".  However, the domain membership check is not performed until later (§6.6).  

How are the candidate nodes selected in §6.4?  Some nodes may not be chosen, right?  How can it be verified that the candidate nodes are in the same domain without performing the domain membership check first?  

§6.6 says that "the connection attempt is aborted" if the domain membership check fails -- but there is no mention about considering other candidates.  It seems to me as if it may be possible for some selected candidates to not pass the domain membership check...  What am I missing?


(3) §6.11.1.8 (Multicast): s/Not used yet but possible because of the selected mode of operations./Not used but possible if the selected MOP is 3.


(4) [nits]

s/explanation how ACP acts/explanation of how ACP acts

s/nodes ACP certificate/node's ACP certificate/g

s/nodes ACP address/node's ACP address
Barry Leiba Former IESG member
(was Discuss) No Objection
No Objection (2020-09-11 for -29) Sent
Thanks for addressing my DISCUSS issues and other comments in version -29.

Special thanks for the changes in the "Channel Selection" section; I find it *much* easier to follow now, with "the Decider" and "the Follower" making the roles clear.  Good work, folks!
Ben Campbell Former IESG member
No Objection
No Objection (2018-08-01 for -16) Unknown
Substantive Comments:

- I agree with Alissa's comment about "future" things.

§4: What do the normative keywords in this section apply to? If this document fulfills the requirements, it seems odd to continue to state them normatively. Normally such keywords are intended for implementors and sometimes administrators; protocol requirements don't really fit the RFC 2119/8174 definitions. If you need to use them in a non 2119/8174 sense, please mention that somewhere.

§4, ACP5: SHOULD is SHOULD. If it needs to be stronger than a normal SHOULD, consider a MUST. (But see my previous comment.)

§6.1.1: I'm a bit surprised to see the syntax burn 7 characters on the literal RFC name.

In §6.1.1, the statement "If the operator does not own any FQDN, it should
   choose a string (in FQDN format) that intends to be equally unique." seems problematic without further guidance about how to actuall make them "equally unique" For example, how does one ensure this does not collide with real FQDNs?

§6.1.2: Please describe how one actually checks for cert validity (e.g.  explicit field comparisons) In the second bullet, how does one check for private key ownership. If the answer is "PKI", then how does that requirement differ from the following one?)

§6.1.3, first paragraph: The 2nd MUST seems like a statement of fact in light of the first MUST.

§6.10.1, first bullet: Does this mean the address spaces can overlap?
-- last bullet: "not expected to be an end-user device" and "stay within a domain (of trust)" are both tricky assumptions. Is there a mechanism to ensure the assumptions are not violated?

Editorial Comments:

- IDNits reports several outdated or unused references--please check.
- General: Some sections are marked as "Normative", but there are unmarked sections with normative keywords in them. Please be consistent in such labeling. (Personally, I suggest not labeling sections this way unless you think they are more likely to be misunderstood than normal.)
§1.1: Please expand "RPL" on first mention.
§2, definition of MIC: Why include a definition to say you don't use it in the doc? Also, please use the boilerplate from 8174 rather than rolling your own.
§3 and §4: Are these sections useful to the average reader not involved in the standards process? It seems like they might be better off in an appendix or even a wg wiki, especially considering the document length.
§4: This section contains a lot of sentence fragments, which I suspect were intentional. Please use complete sentences when writing in paragraph form.
§6.1: Paragraphs 2 and 3 contain comma splices.

§6.1.3.4: "Certificate lifetime may be set to shorter lifetimes than customary
   (1 year) because certificate renewal is fully automated via ACP and
   EST. "
Are you proposing setting it to one year, or are you suggesting one year is customary?

§6.1.3.4, 2nd paragraph: "allowing to simplify" is grammatically incorrect. Consider "allowing [something] to simplify" or "allowing the simplification"

§6.1.3.6: The first paragraph is hard to parse. Please do not use "/" as a shortcut for a conjunction.
Deborah Brungard Former IESG member
No Objection
No Objection (2018-05-21 for -13) Unknown
I noted in the different versions the content of section 10 floated between an
appendix and as part of the document (current version). Considering Section 10's
intro, I agree with Mirja, this content seems more suitable (and
will ease the readability) as an appendix. Section 10.5 still says "This appendix..".
Martin Duke Former IESG member
No Objection
No Objection (2020-08-12 for -28) Sent
I found significant parts of this document tough to follow, particularly because there are many deployment variations for almost every element of the architecture. But I trust that the Security ADs will catch any remaining security issues.

I appreciate that this effort appears, refreshingly, to have security baked in from the start.

Sec 6.1.1 
"it is beneficial to
   copy the device identifying fields of the node's IDevID certificate
   into the ACP certificate,... and
   the "serialNumber" contains usually device type information that may
   help to faster determine working exploits/attacks against the device."

I am not certain the 'beneficial' assertion is supportable, if the benefit is some diagnostic help but the drawback is a security vulnerability.

sec 6.5. If both nodes have empty ACP address fields, they are both Bob. What happens then?

sec 6.11.1.14. "As this requirement raises additional Data-Plane,..."
I am not sure what this clause means to say.
Mirja Kühlewind Former IESG member
No Objection
No Objection (2018-05-18 for -13) Unknown
1) I would like to see a slightly stronger statement here in section 6.1.3:
"The M_FLOOD message MUST be sent periodically.  The default SHOULD be
   60 seconds, the value SHOULD be operator configurable."
Maybe the following instead:
"The M_FLOOD message MUST be sent periodically.  The default MUST be
   60 seconds, the value SHOULD be operator configurable but SHOULD be
   not smaller than 60 seconds."
Or even a MUST for the minimum value is that acceptable for the desired use cases.

2) Also in section 6.5, I would like to seem some rate limiting/pacing:
"An ACP node may choose to attempt initiate the different feasible ACP
   secure channel protocols it supports according to its local policies
   sequentially or in parallel,..."

3) Sec 6.7.3: How are baseline ACP and constrained ACP nodes defined?

4) sec 6.10.6:
"With the current allocations, only 2 more schemes are
   possible, so the last addressing scheme should consider to be
   extensible in itself (e.g.: by reserving bits from it for further
   extensions."
Maybe use a normative MUST here:
"With the current allocations, only 2 more schemes are
   possible, so the last addressing scheme MUST be
   extensible in itself (e.g.: by reserving bits from it for further
   extensions."

5) I guess section 10 could be moved to the appendix.
Robert Wilton Former IESG member
No Objection
No Objection (2020-08-10 for -28) Sent
Hi,

I appreciate that this document has already gone through quite a lot of reviews.  Just a few minor nits (for the version of the doc that was originally on the telechat before IETF 108):

    6.1.  ACP Domain, Certificate and Network

       This document uses the term ACP in many places where the Autonomic
       Networking reference documents [RFC7575] and
       [I-D.ietf-anima-reference-model] use the word autonomic.  This is
       done because those reference documents consider (only) fully
       autonomic networks and nodes, but support of ACP does not require
       support for other components of autonomic networks except for relying
       on GRASP and providing security and transport for GRASP.  Therefore
       the word autonomic might be misleading to operators interested in
       only the ACP.
   
Should this paragraph be somewhere earlier in the document?


    6.1.2.  ACP Certificate AcpNodeName

    The acp-node-name is not
    intended for end user consumption, and there is no protection against
    someone not owning a domain name to simpy choose it.

The latter part of this sentence doesn't seem to scan particularly well.


    6.7.3.1.1.  RFC8221 (IPsec/ESP)

    AH MUST NOT be used (because it does not provide confidentiality).

Do you need AH in the terminology or define what it means?

    6.7.4.  ACP via DTLS

       We define the use of ACP via DTLS in the assumption that it is likely
       the first transport encryption supported in some classes of
       constrained devices because DTLS is already used in those devices but
       IPsec is not, and code-space may be limited.
 
DTLS in the assumption => DTLS, on the assumption
This paragraph could possibly do with a little more wordsmithing.


    6.10.1.  Fundamental Concepts of Autonomic Addressing
      
For a PE device or NID, how does it know which interfaces to run ACP over?

       o  OAM protocols do not require IPv4: The ACP may carry OAM
          protocols.  All relevant protocols (SNMP, TFTP, SSH, SCP, Radius,
          Diameter, ...) are available in IPv6.  See also [RFC8368] for how
          ACP could be made to interoperate with IPv4 only OAM.
      
Should this include a YANG management protocol like NETCONF?
Radius => RADIUS (in a few places)

    6.11.1.14.  Unknown Destinations

       As this requirement raises additional Data-Plane, it does not apply
       to nodes where the administrative parameter to become root
       (Section 6.11.1.12) can always only be 0b001, e.g.: the node does not
       support explicit configuration to be root, or to be ACP registrar or
       to have ACP-connect functionality.
   
The first sentence doesn't quite scan.

Nits:
retrieved bei neighboring nodes =>  retrieved by neighboring nodes
"serialNumber" contains usually => "serialNumber" usually contains
remotely sent IPv6 link-local => remotely send IPv6 link-local
Spencer Dawkins Former IESG member
No Objection
No Objection (for -13) Unknown

                            
Suresh Krishnan Former IESG member
No Objection
No Objection (for -16) Unknown

                            
Ignas Bagdonas Former IESG member
Recuse
Recuse (2018-08-02 for -16) Unknown
I was involved in this for a while.