Skip to main content

GeneRic Autonomic Signaling Protocol Application Program Interface (GRASP API)
draft-ietf-anima-grasp-api-10

Yes

(Robert Wilton)

No Objection

(Alvaro Retana)
(Barry Leiba)
(Deborah Brungard)

Note: This ballot was opened for revision 07 and is now closed.

Erik Kline
No Objection
Comment (2020-12-01 for -08) Sent
[[ questions ]]

[ section 2.3.2.4 ]

* It looks like the URI may contain an IP address or FQDN as well as a
  port number?  If so, is there a validation requirement about the presence
  or value of the port field in the ASA_locator in relation to the port
  number in the URI?

[ section 2.3.3. ]

* For deregister_asa(), if the ASA name is redundant, does that mean that
  a call like deregister_asa(asa_nonce=valid_nonce, name="") should succeed?

  I suppose one ASA can deregister other ASAs by cycling through the 32-bit
  numberspace?

* For register_objective(), but happens if overlap=False for an objective
  already registered with overlap=True?  And what about the inverse?

  I guess, what is the trust model of multiple ASAs sharing a GRASP core
  (i.e. on the same node)?

[ section 2.3.4 ]

* For objectives that other ASAs on the same node might be trying to
  discover(), is the cache kept separate per-ASA or shared?

  If shared, it seems like the TTL<minTTL entries should be ignored and not
  deleted, maybe? (I haven't read any text describing cache implementation
  requirements or guidance yet.)

* For asynchronous mechanisms, is the callback (if used) called multiple
  times, as locators are discovered or are they accumulated until the timeout
  is reached and returned in one callback invocation?

  If the former, is there one final callback with, if necessary, an empty list
  to indicate the timeout was reached (as a convenience)?


[[ nits ]]

[ abstract ]

* "adapted to the support for" -> "adapted to add support for", perhaps

[ section 2.3.2.4 ]

* Perhaps replace "ifi...probably no use to a normal ASA" with something like
  "probably only of use to an ASA on a node with multiple active interfaces"?

[ section 2.3.6 ]

* s/caches all flooded objectives that it receive/... that it receives/
Murray Kucherawy
No Objection
Comment (2020-11-30 for -08) Sent
This might be an implementation detail, but I feel like bringing attention to it to clarify:

Looking at this as a guide to API implementers, I'm confused by one aspect to this document.  There are portions of the API specification where some of the returned items are conditional.  For example, in Section 2.3.3, the response to "register_asa()" always contains an "errorcode" but it will also contain an "asa_nonce" if registration was successful.  What does it mean for a response to be sometimes missing a piece of information?  I'm thinking, for instance, about python where your response might be a single value or a tuple of values depending on success or failure, and I as the consumer will have to handle each case separately.   Wouldn't it be simpler for "asa_nonce" to have a possible sentinel value for use in failures?  (Maybe 0, maybe -1, maybe MAXINT; the use of "integer" in the document generally doesn't specify whether it's signed or unsigned or what limits might exist.  Or maybe "None".) That way, responses always have the same number of elements and possibly types irrespective of the function's outcome.

For a more extreme example, the response to "request_negotiate()" could have anywhere between one and four elements in it too, and of varying types.

It's possible this doesn't matter though; you're doing the API implementation, you get to decide and document it and then deal with user response.  But as someone who produces and documents APIs a lot, this stuck out to me.
Roman Danyliw
No Objection
Comment (2020-12-01 for -08) Sent
Thank for responding to the SECDIR reviewer and thank you to Joseph Salowey for performing it.

** Since this is an API spec a few more example pseudo code snippets showing common ASA “tasks” invoking this API from both sides of the connection (like Figure 2) would be very helpful.

** More precise references to draft-ietf-anima-grasp might helpful to implementers (e.g., in Section 2.3.2.3, “… default GRASP_DEF_LOOPCT, see [I-D.ietf-anima-grasp]” ==> “... see Section 2.6 of [I-d.ietf-anima-grasp]”)

** Section 1.  Per “An ASA runs in an ACP node and therefore inherits all its security properties, i.e., message integrity, message confidentiality and the fact that unauthorized nodes cannot join the ACP.”, in the spirit of precise, things like message integrity and message confidentiality are not properties of the ASA or of the ACP _node_ but instead properties of the protocol used on the control plane.

** Section 2.1.  Recommend using consistent terminology.  In this section ASA call a “GRASP module”.  However, Section 1 lays out an architecture of GRASP Core + API.

** Section 2.2.  I found the placement of this section confusing.  There is a discussion of the calling conventions for an API that hasn’t been discussed yet.  IMO, this should be after Section 2.3.  That said, thanks for describing these different calling conventions.  Showing these in examples would be very helpful. 

** Section 2.2.2.2.  Per the definition of TTL, is it worth clarifying here and in the subsequent descriptions that this is an unsigned of a particular size (unsigned 32-bit at least) per Section 5 of draft-ietf-anima-grasp?

** Section 2.3.2.3.  Is it worth clarifying that loop_count should be between 0 and 255 per Section 5 of the draft-ietf-anima-grasp?  

** Section 2.3.2.3.  Provide a normative reference to which version of C and Python will be used.

** Section 2.3.2.3.  If an older C is used, is “char *name” the right way to handle a UTF-8 string?

** Section 2.3.2.3. Per the C data structure of an objective, should loop_count and value_size be unsigned integers of some kind?

** Section 2.3.2.3.  Why does the Python implementation set a default value of loop_count but C does not?

** Section 2.3.2.3.  Please provide a reference to libcbor

** These examples in C and Python found Section 2.3.2.3 were helpful.  I was hoping to find them in the other sections.  Also a C-style .h file with function prototypes and constants would also be nice (e.g., GRASP_DEF_TIMEOUT, IPPROTO_*, all the error types)

** Section 2.3.4.  Typo. s/tiemout/timeout/

** Section 2.3.2.4.  The constants IPPROTO_TCP and IPPROTO_UDP aren’t defined here.  Recommend a reference to the grasp draft.

** Section 2.3.7.  Double checking -- per the info input parameter, is the ASA supposed to provide this content or is this something from GRASP Core?

** Appendix A.  This list doesn’t appear to be a complete crosswalk of function to error codes to possible APIs.  For example, “NotObj” is listed as a general error code, but would that get returned by register_asa()?

** Per the GENART Review, IMO, Paul makes a number of good points, in particular:
-- a reference or further explanation of the flow for dry run and how this would be used in other API calls

-- additional clarifying language on request_negotiate

-- Renaming the “session nonce” to “session handle” (or something like it) might improve clarity so the API doesn’t have to deal with multiple “nonce”
Éric Vyncke
No Objection
Comment (2020-12-01 for -08) Sent
Thank you for the work put into this document.

Please find below one some non-blocking COMMENT points, and one nits. I have also request IoT directorate and INT directorate reviews, so, you may expect more reviews.

I hope that this helps to improve the document,

Regards,

-éric

== COMMENTS ==

-- Section 1 --
In figure 1, is the "GRASP API Library" identical to the "basic GRASP library" mentioned later in the text?

-- Section 2.1 --
May I assume that the bulleted list is not exhaustive? Probably worth stating "For example, ..." if this is the case.

-- Section 2.2.1 --
This whole section looks more like a tutorial than something useful in an IETF document ;-) but no problem to leave it. Same applies for section 2.2.2 and even to 2.2.3.

-- Section 2.3.2.2 --
Should it be specified that the timeout is an *unsigned* integer? Same applies for "loop_count" in section 2.3.2.3

-- Section 2.3 --
Several occurrences of "returned parameters"... should it better be "returned values" ?

-- Section 2.3.3 --
"All ASAs must use this call." should it be followed by "before issuing any other API calls" ?

"automatically if an ASA crashes" but what about "graceful termination" ?

== NITS ==

-- Section 1 --
Suggestion move figure 1 earlier in the text to improve readability.
Robert Wilton Former IESG member
Yes
Yes (for -07) Unknown

                            
Alissa Cooper Former IESG member
No Objection
No Objection (2020-12-03 for -08) Sent
There are  a few outstanding unresolved comments from the Gen-ART review that it would be useful to resolve, particularly clarifications in Section 2.3.5. 

In general the Gen-ART review made me wonder if it might be useful to get some more implementation experience and interop testing going before trying to extend or build out much more functionality on top of GRASP, since there are many implementation-specific decisions left unspecified.
Alvaro Retana Former IESG member
No Objection
No Objection (for -08) Not sent

                            
Barry Leiba Former IESG member
No Objection
No Objection (for -08) Not sent

                            
Benjamin Kaduk Former IESG member
No Objection
No Objection (2020-12-02 for -08) Sent
I have two comments in particular that I would like to call your
attention to: my comment on cache flushing in Section 2.3.4, and my
comment on the CBOR data model used for validation in Appendix A.

Section 1

   An ASA runs in an ACP node and therefore inherits all its security
   properties, i.e., message integrity, message confidentiality and the
   fact that unauthorized nodes cannot join the ACP.  All ASAs within a

I agree with Roman's comment that the "it" whose security properties are
inhereited is the ACP *node*, not the ACP itself, and thus that some
rewording is appropriate.

   The GRASP API library would need to communicate with the GRASP core
   via an inter-process communication (IPC) mechanism.  The details of

Hmm, if the GRASP core is in kernel-space and the API library in
userspace, wouldn't we normally refer to that exchange as a system call
rather than IPC?  (Figure 1 also labels this interaction "IPC".)

Section 2.1

   *  Authorization of ASAs is not defined as part of GRASP and is not
      supported.

Any chance I could interest you in s/not supported/a subject for future
work/?  It is looking somewhat likely since such a statement is already
present in the security considerations...

   *  User-supplied explicit locators for an objective are not
      supported.  The GRASP core will supply the locator, using the ACP
      address of the node concerned.

This would seem to prevent any non-ACP use of GRASP; I suggest adding
some language with a caveat about "for example" or similar, unless the
intent is to limit the API usage to ACP (or DULL) scenarios.

Section 2.2.1

I think that the possibility for a single outbound message to get a
sequence of incoming replies (at different times) further complicates
the design of an asynchronous mechanism, and we would do well to discuss
how such scenarios (e.g., broadcast discovery messages) would be handled
by the implementation and API.  (I see that we do end up using a timeout
in practice to resolve this topic, but would probably still mention it
as an issue that has been resolved, here.)

Section 2.2.2

   ports rather than a separate port per session.  Hence the GRASP
   design includes a session identifier.  Thus, when necessary, a
   'session_nonce' parameter is used in the API to distinguish
   simultaneous GRASP sessions from each other, so that any number of
   sessions may proceed asynchronously in parallel.

I do see that there was previous discussion on the 'nonce' terminology
here, and I am unsure why there is need to move away from the "session
ID" terminology used in GRASP itself.  In particular, the
"session_nonce" is not a number used *once*, rather, it is used only for
one session (but potentially multiple times within that session).  That,
to me, makes it a (short-lived) identifier, not a nonce.  Roman's
proposal of 'handle' would resolve this apparent disparity.

Section 2.2.3

   On the first call in a new GRASP session, the API returns a
   'session_nonce' value based on the GRASP session identifier.  This

What does "based on" mean?  Does there need to be a one-to-one
correspondence?  Or just in one direction?  Are we going to be
constrained by the (IMO, too limited) 32 bits of randomness limit of the
GRASP Session ID?

Section 2.3.2.3

   -  Note 3: In a language such as C the preferred implementation
      may be to represent the Boolean flags as bits in a single byte,

Which aspect(s) of C are relevant for the "such as"?

   An essential requirement for all language mappings and all
   implementations is that, regardless of what other options exist
   for a language-specific representation of the value, there is
   always an option to use a raw CBOR data item as the value.  The
   API will then wrap this with CBOR Tag 24 as an encoded CBOR data
   item [RFC7049] for transmission via GRASP, and unwrap it after
   reception.

I'm not sure I understand why the bstr wrapping is mandatory -- I would
have thought that the attraction of using a raw encoded CBOR data item
would be that it could be used directly, without additional wrapping.

    int loop_count;
    int value_size;           // size of value in bytes

Some people might argue for using unsigned types for at least sizes
(e.g., size_t), and often for things like loop counts that cannot be
negative (though the argument for an unsigned type there is somewhat
weaker).

        self.value = 0      # Place holder; any valid Python object

Wouldn't None be a more conventional placeholder in Python?

Section 2.3.2.4

   *  The following cover all locator types currently supported by
      GRASP:

      -  is_ipaddress (Boolean) - True if the locator is an IP address

      -  is_fqdn (Boolean) - True if the locator is an FQDN

      -  is_uri (Boolean) - True if the locator is a URI

Are these mutually exclusive?

Section 2.3.2.6

As for the GRASP session ID, I think that a 32-bit cap is too
restrictive.  I think we should be in the habit of using 128-bit nonces
and needing to justify anything smaller.  (64 bits would *probably* be
fine here, FWIW, and might make it easier to represent in common
language bindings.)

   Section 2.3.2.7).  Another possible implementation is to hash the
   name of the ASA with a locally defined secret key.

I recognize that this is a throwaway line, but the naive keyed hash
construction is subject to length-extension attacks (for certain hash
constructions such as the Merkle-Damgarg family that includes SHA-2);
HMAC is more robust for this type of usage and can be phrased in an
similarly concise manner ("compute an HMAC of the name of the ASA under
a locally defined secret key").

Section 2.3.3

   *  deregister_asa()
      [...]
      -  Note - the ASA name is strictly speaking redundant in this
         call, but is present for clarity.

So what happens if the wrong name is passed?

         transmit to other ASAs.  It is not necessary to register an
         objective that is only received by GRASP synchronization or
         [...]
         Registration is not needed for "read-only" operations, i.e.,
         the ASA only wants to receive synchronization or flooded data
         for the objective concerned.

These seem to have high overlap and thus be candidates for
deduplication.

      -  The 'ttl' parameter is the valid lifetime (time to live) in
         milliseconds of any discovery response for this objective.  The

(nit?) I'd suggest to add "generated", since it would not apply to any
hypothetical received discovery response for the objective in question.

      -  If the parameter 'overlap' is True, more than one ASA may
         register this objective in the same GRASP instance.

Do all ASAs registering this objective have to set it to True, or just
the first one, in order for the subsequent registrations to succeed?

Section 2.3.4

      -  If the parameter 'minimum_TTL' is greater than zero, any
         locally cached locators for the objective whose remaining time
         to live in milliseconds is less than or equal to 'minimum_TTL'
         are deleted first.  Thus 'minimum_TTL' = 0 will flush all
         entries.

Why does one ASA's request flush entries from the cache shared with
other ASAs?  I am forced to infer the motivation for including the
minimum_TTL parameter in the first place, but it seems like it is useful
if the requesting ASA needs to find something that will remain active
for a given period of time, but different ASAs may have different needs
for the peer's stability, and so flushing the cache in this way could
hamper the operation of peer ASAs.
If the intent is only to not return those cached locators *for this
discovery operation*, then say that, not that they are flushed from the
cache entirely.

Section 2.3.5

Thanks for the figure (I probably should have put one into RFC 7546,
which is basically this section but for the GSS-API).

I suggest noting in the first paragraph that the negotiation occurs in
lockstep, with the initiator starting the negotiation and preparing a
message, the responder processing that message and generating a new
negotiation message in turn, with at most one negotiation message in
flight at any given time.  It seems particularly important to note
whether this also applies to negotiate_wait() calls/messages, or if
those can be made at any time by either entity.  (This probably relates
to some of the genart reviewer's comments.)

I note that the prospect of the loop count going up (and, thus, risk of
infinite looping) was pointed out by the genart review.  I share such
concerns and am happy to see that improved discussion of this topic (and
the related 'lifetime' extension) is planned.

         For this and any other error code, an exponential backoff is
         recommended before any retry.

Any guidance about whether this should be by doubling vs a different
exponent base?  I guess the security considerations do say that it's
dependent on the semantics of the objective in question, which may be
enough (though a pointer or mention here would be appreciated).
(Also, any reason to not use the 2119 RECOMMENDED?)

      -  This function must be followed by calls to 'negotiate_step'
         and/or 'negotiate_wait' and/or 'end_negotiate' until the
         negotiation ends. 'listen_negotiate' may then be called again
         to await a new negotiation.

We just recommended a few paragraph previously that listen_negotiate()
should be called again *immediately* after the first listen_negotiate()
returns; I don't see why it's useful to also say that it might be called
again after a given negotiation ends.

      -  Executes the next negotation step with the peer.  The
         'objective' parameter contains the next value being proffered
         by the ASA in this step.  It must also contain the latest
         'loop_count' value received from request_negotiate() or
         negotiate_step().

This is intreseting; negotiate_step() must preserve the loop count from
the previous call, so only the initial negotiation response (the
request_negotiate() 'proffered_objective' output) can increase the loop
count, not any arbitrary negotiation step?  That seems to limit concerns
about infinite looping (as raised by the genart reviewer and apparently
acknowledged in the response to the genart review).

         o  Threaded implementation: Called in the same thread as the
            preceding 'request_negotiate' or 'listen_negotiate', with
            the same value of 'session_nonce'.

IIUC it is *expected* to be called in the same thread as the previous
call, but is not strictly speaking *required* to do so, since the
session_nonce tracks the library state for the negotiation in question.
Or am I mistaken?

         'result' = True for accept (successful negotiation), False for
         decline (failed negotiation).

         'reason' = optional string describing reason for decline.

What happens if I pass a reason string with result of True?

Section 2.3.6

      -  If the 'peer' parameter is null, and the objective is already
         available in the local cache, the flooded objective is returned
         immediately in the 'result' parameter.  In this case, the
         'timeout' is ignored.

      -  Otherwise, synchronization with a discovered ASA is performed.
         If successful, the retrieved objective is returned in the
         'result' parameter.

From context this 'otherwise' seems to be the "'peer' parameter is null
but the objective is not available in the local cache" case (as opposed
to also covering the "'peer' parameter is not null" case).  It might be
possible to clarify this with formatting and/or rewording.

   *  synchronize()
      [...]
      -  Since this is essentially a read operation, any ASA can do it,
         unless an authorization model is added to GRASP in future.
         Therefore the API checks that the ASA is registered, but the
         objective does not need to be registered by the calling ASA.
      [...]
      -  Since this is essentially a read operation, any ASA can use it.
         Therefore GRASP checks that the calling ASA is registered but
         the objective doesn't need to be registered by the calling ASA.

These seem redundant and candidates for de-duplication.

      -  In the case of failure, an exponential backoff is recommended
         before retrying.

[same remark as previously]

Section 2.3.7

         'info' = optional diagnostic data.  May be raw bytes from the
         invalid message.

This means it does not have to be well-formed CBOR, and will be wrapped
in a bstr by the library?  (The GRASP spec suggests that a different
CBOR structure would be permitted, though of course the API need not be
required to expose such flexibility.)

Section 4

If we're going to keep the 32-bit nonce/handle/etc, it's probably worth
a mention of collision/guessing probability.

It might be worth a reference to the RFC 3986 security considerations
since we do allow URI locators.  This is not really any different than
for GRASP itself, but the URI is exposed to the API consumer and so
reminding them about it seems worthwhile.

The session_nonce is nominally opaque to (non-ACP, at least) ASAs, but
is likely to be implemented in a way that does preserve some state.  Is
there a risk if an ASA attempts to "peek through the abstraction
barrier"?  (I am not sure I see one, but you're the expert!)

   GRASP objective concerned.  These precautions are intended to assist
   the detection of malicious denial of service attacks.

I suggest to drop the word "malicious"; such denial of service
conditions need not be malicious and can occur by accident.

   As a general precaution, all ASAs able to handle multiple negotiation
   or synchronization requests in parallel may protect themselves
   against a denial of service attack by limiting the number of requests
   they can handle simultaneously and silently discarding excess
   requests.

I think that best practices would also include some limit on the number
of objectives registered by a given ASA and possibly the number of ASAs
registered, to protect the core library/kernel resources.
(nit?) I suggest dropping 'can'.

Appendix A

There was some discussion with the genart reviewer about the CBORfail
error code as being particularly useful.  I note that
draft-ietf-cbor-7049bis is in AUTH48 and introduces a hierarchy of
"levels of validation" (in the form of different data models).  CBOR
that is valid in the generic data model might not be valid in the
extended data model or a data model specific to a given application.  I
strongly encourage this document to update to referencing 7049bis and
giving an indication of what data model is in use for processing both
information received from the peer and any CBOR-encoded data received
from the ASA.

   'noSecurity' error will be returned to most calls if GRASP is running
   in an insecure mode (no ACP), except for the specific DULL usage mode

My understanding of the text in the GRASP spec itself was that non-ACP
security services were allowed.  Is the API intended to be limited to
only ACP usage?

   ASAfull          4 "ASA registry full"  (register_asa)
   dupASA           5 "Duplicate ASA name" (register_asa)
   noASA            6 "ASA not registered"
   notYourASA       7 "ASA registered but not by you"

Giving this much detail is making things much easier for malicious ASAs
... but given that the deployment model basically assumes that such
things don't exist (even if we do give some small consideration to the
possibility in some places), I will not complain about retaining this
level of detail in the error messages.

   noDiscReply     17 "No reply to discovery"
                                 (req_negotiate)

There is perhaps some explanation to give about the distinction between
noReply and noDiscReply, i.e., in the body text.  Maybe it is
self-explanatory, though, provided that the author of the code notices
that noDiscReply exists at all.
Likewise for noNegReply, noSynchReply, noValidSynch, and, possibly,
noValidStep.
Deborah Brungard Former IESG member
No Objection
No Objection (for -08) Not sent

                            
Magnus Westerlund Former IESG member
No Objection
No Objection (2020-12-03 for -08) Sent
So I didn't have time to read your document in detail, thus I can easily have missed something.  Hopefully a bit of clarification on what I might have missed will resolve this issue. 

I do wonder over one aspect of this API surface. How does it handles when the GRASP layer is unable to send the messages in a timely fashion based on the API calls? Looking at GRASP I understand that it is using either UDP or TCP. The rate limiting of UDP does not appear to be more well specified that to follow RFC 8085 recommendations. So my concern here is that you actually have some risk of running into that the upper layer using this API tries to become a bit to active and do everything at once, thus resulting in that TCP congestion control and flow control might block timely transmissions, and for UDP the rate limiter / congestion control of the UDP messages. What happens in this API when this occurs?