HIP Diet EXchange (DEX)

Summary: Has 2 DISCUSSes. Needs one more YES or NO OBJECTION position to pass.

Roman Danyliw Discuss

Discuss (2021-03-24)
** I’d like to discuss the maturity of the proposal through the lens of publishing as PS vs. experimental.  With the latter, one would expect the current best practice of forward secrecy (FS) to be used unless there was a demonstrated need.  With an experimental or even informational, the bar would be lower.  In response to earlier ballots and WG/IETF last call discussion [1][2] this document has clearly documented the design differences between BEX and DEX (this document),  the impact of removing FS, and added additional text on the motivating constrained environment.  

After reviewing all of this new text, I was looking for and couldn’t find a clear narrative on the proposed DEX operational environment that would motivate the loss of FS.  What I found was the following:

(a) Section 1.2 cites execution times of 8-bit 8051-based ZWAVE ZW0500 microprocessor doing ECDH and DEX-specific operations

(b) Section 1.2.1 enumerates the crypto operations that BEX and DEX needs to perform

(c) Section 1.2.1 also points to [EfficientECC] that documents the clock cycles needed for relevant crypto operations on a class 0 platform

All are helpful to show discrete analyses, but I need help connecting them into a simple, tangible narrative of roughly the following form -- “current BEX protocol (or a FS preserving approach) is too expensive, per some measure, on a particular constrained hardware given real-world use cases that wanted to use.”  For example:

-- (a) seemed to describe wall-clock time execution of a class-1 system using DEX.  What would have been the equivalent execution time use BEX or an approach with FS?  Would those execution numbers have been unacceptable?

-- (c) provides concrete clock cycle numbers.  Like with (a), a bit more context would be helpful.  What’s the equivalent on BEX or with a scheme that uses FS?

I didn’t follow all of the WG discussion that produced the document.  If this analysis was done somewhere, can it please be shared.

My concern is that if the analysis hasn’t been done to show that BEX or a FS-preserving approach is inadequate, it seems difficult to publish a PS with intentionally weakened security properties as an alternative.

** Section 5.2.1.  The DH_GROUP_LIST has been significantly pruned (removal of NIST P-256, P-384, P-521, SECP160R1) in sequential revisions.  However, despite these reconsiderations, Curve448 remain despite “[i]]t is not [being] known if Curve448 Diffie Hellman can meet the performance requirements on 8-bit CPUs.”  If the whole point of HIP-DEX is to operate on constrained devices of a particular class, why would a proposed standard document be published with a recommendation which is acknowledged to have unvalidated suitability for the problem domain.

[1] https://mailarchive.ietf.org/arch/msg/hipsec/AO3C6Ol2S-i5obJ4msbCp1KDVmQ/

[2] https://mailarchive.ietf.org/arch/msg/last-call/ATU2rfUkX6aPWSC4ZwhSedSJIds/
Comment (2021-03-24)
Thank you for addressing  my preliminary DISCUSS points from the earlier telechat.

I support Lars's DISCUSS position.

** Section 1.  Per “This is based on actual implementation efforts on 8-bit CPU sensors with 16KB memory and 64KB flash for code”, is there an informational pointer to this effort?  Is this the same work as the ZW0500 noted in Section 1.2?

** Section 1.2.  Please clarify the intended application.  Currently text reads – “Due to the substantially reduced security guarantees of HIP DEX compared to HIP BEX, HIP DEX MUST only be used when at least one of the two endpoints is a class 0 or 1 constrained device defined in Section 3 of [RFC7228]).” 8-bit CPUs comes in up Section  1.2, 1.2.1 and 5.2.1 as the framing of the device.  If that’s an important characteristics, please reflect that in the text.

** Section 1.2.1.  Thanks for the pointers to [EfficientECC] and [ATmega328P].  It would be useful to clarify that the ATmega328P is a class 0 device since that’s the framework used in Section 1.2

Benjamin Kaduk Discuss

Discuss (2021-03-25)
I support Roman's Discuss.

I don't understand how the responder's HOST_ID is supposed to be
authenticated in the handshake.  In HIP-BEX, the HOST_ID is in R1 and
covered by the HIP_SIGNATURE_2, and it is *also* used as input to the
calculation of the HIP_MAC_2 in R2.  In HIP-DEX as currently specified,
the responder's HOST_ID is present in R1 (which has no cryptographic
protection applied) but not present in R2 at all (R2 being the only
message from the responder that is authenticated).  Since we are already
replicating most of R1 in R2 in order to allow the HIP_MAC on R2 to
substitute for the HIP_SIGNATURE_2 in the HIP-BEX version of R1, it
seems like it would be most straightforward to just include a copy of
the responder HOST_ID in R2 as well (thus covered by the main HIP_MAC),
but other options including HIP_MAC_2 are available.

Furthermore, that the lack of authentication for the responder's HOST_ID
could remain in the document for so long, even after multiple rounds of
review, causes me to question whether the cryptographic mechanisms of
this document have really seen an adequate level of review for the
Proposed Standard maturity level.

I also have concerns about the cryptographic analysis of the particular
CKDF construction that is given.  While the previous rounds of review
and response have convinced me that a CMAC-based analogue to HKDF is
safe and well-grounded, the current construction is not fully analogous
to HKDF and also uses a non-injective mapping for convering the
exchanged protocol parameters into CKDF inputs:

- My understanding of the principles behind HKDF is that there is no
  need for the "info" argument to the CKDF-Extract stage, and that using
  that data only in the CKDF-Expand stage is both safe and the expected
  usage.  (The Random #I provided by the responder matches to the HKDF
  salt as a random but non-secret value, and helps to churn the
  extraction of entropy from the IKM.  Adding the I_NONCE along with the
  Diffie-Hellman output allows for an additional source of contributory
  behavior for the initiator, but the Diffie-Hellman exchange itself is
  also supposed to give contributory behavior and the I_NONCE does not
  protect against attacks where the initiator might choose a key share
  that produces a DH output with particular properties, since the
  I_NONCE and initiator key share are produced at the same time.  I
  think we need to be more clear about what the I_NONCE actually does,
  which is to ensure that we get a new key if we have to repeat the
  static-static DH exchange due to (e.g.) state loss, etc.)

- Since we are using Kij | I_NONCE for both IKMm and IKMp, we need to
  ensure that the produced IKM<x> values are distinct by construction.
  The requirement that the encrypted values be at least 64 bits provides
  this property, however, we do not have injectivity because a given
  IKMp could be produced by dividing the "concatenated random values"
  between initiator and responder in different ways.  This introduces a
  risk of attack when the encrypted value of one party is chosen
  maliciously (the attack is easiest when it can be chosen after the
  other party's value is known, but this is not a strict requirement for
  enabling attacks).  So, I think we should either introduce length
  prefixes into the IKMp encoding or require a fixed length (i.e.,
  exactly 64 bits) for the random values.

- The description of the PRK input to CKDF-Expand() includes mention of
  a "case of no extract".  When does this case occur?  I think we need
  to have a clear procedure for when it is (and is not) used, or ideally
  to just always use extract.

- The intermediates T(n) used to generate the CKDF OKM appear to be an
  attempt to use the SP 800-108 "KDF in feedback mode" with optional
  counter, but the NIST version puts the counter directly after the
  previous iteration's output, i.e., before the additional data.  So in
  that sense we are not in a state that "follows the CMAC usage
  guidance" provided by the NIST references.

- The additional information passed to CKDF-Expand() does not provide
  for key separation of the output keys used for the pair-wise key SA
  based on what transport format the keys will be used for.
  (Including the selected transport format in the 'info' should be
  straightforward and resolves this issue.)

I do not see any justification for deviating from the existing RFC 7401
semantics of ECHO_RESPONSE_UNSIGNED in the I2 packet (Appendix B
suggests to use content other than the "unmodified opaque data copied
from the corresponding echo request").  If a two-factor authentication
method is desired, it seems like defining a new TLV pair to convey it is
straightforward and does not confuse the semantics of existing protocol

There is some text in Section 5.3 that indicates that the "UPDATE,
NOTIFY, CLOSE, and CLOSE_ACK packets are not covered by a signature and
purely rely on the HIP_MAC parameter for packet authentication".
However, the RFC 7401 NOTIFY packet contains only a HIP_SIGNATURE and
not a HIP_MAC.  I think we need to specify a complete NOTIFY message
structure that includes HIP_MAC, rather than attempting to rely on a
delta from RFC 7401 that just removes the HIP_SIGNATURE, most notably so
that we can clearly state what the MAC covers.

Section 6.3 suggests that the CKDF-Expand phase can be skipped for the
Pair-wise Key SA when the needed key is less than or equal to 128 bits,
but I don't see anything in [NIST.SP.800-56C] to suggest that such a
procedure follows the referenced guidance.  In particular, it removes
the opportunity to use the label/context data (known as the "info" in
the RFC 5869 terminology).

We have text in 5.3.2 that I managed to read as saying that the
initiator's (DH_GROUP_LIST) preference takes priority, but there is text
in Section 6.6 that I read as saying that the responder's preference
takes priority.  (See COMMENT for specific locations.)  It can only be
one of those, and we should be clear about which one it is, across the

Section 9.3 discusses the risk of key extraction attack and the need to
validate the peer's public key.  But we say to enforce this in
processing I2 and R2 packets, when the responder's HOST_ID is present in
R1 (and not R2) and is used in the preparation of I2.  If we only
validate the peer's key when processing R2, it is too late and the
damage has already been done.

I think we need greater clarity on whether we are using X25519, or doing
ECDH on Curve25519.  Section 9.3 suggests that we are using X25519, but
only by implicit reference to "the corresponding functions defined in
[RFC7748]"; the rest of the document only discusses Curve25519.
ECDH-on-Curve25519 (or the related curve Wei25519) and X25519 are not
compatible operations; we must pick one.

Section 5.2.2

   The counter for AES-128-CTR MUST have a length of 128 bits.  The
   puzzle value #I and the puzzle solution #J (see Section 4.1.2 in
   [RFC7401]) are used to construct the initialization vector (IV) as
   FOLD(I | J, 112) which are the high-order bits of the CTR counter.  A
   16 bit value as a block counter, which is initialized to zero on
   first use, is appended to the IV in order to guarantee that a non-
   repeating nonce is fed to the AES-CTR encryption algorithm.

   This counter is incremented as it is used for all encrypted HIP
   parameters.  That is a single AES-129-CTR counter associated with the
   Master Key SA.

Is the FOLD output just the initial value of the counter (so that we can
use the full 128-bit space) or do we only get the 16 bits of usable

Relatedly, I still don't have much clarity on how the counter is
incremented/mnaged for the master key SA.

Section 5.2.3

   HIP DEX HIs are serialized equally to the ECC-based HIs in HIPv2 (see
   Section 5.2.9. of [RFC7401]).  The Group ID of the HIP DEX HI is
   encoded in the "ECC curve" field of the HOST_ID parameter.  The
   supported DH Group IDs are defined in Section 5.2.1.

I don't think RFC 7401 actually specifies the serialization for the ECC
public keys (whether ECDSA or ECDH); that is deferred to the
corresponding references (and, furthermore, RFC 4754 seems to be
covering random groups, not the specific NIST groups).  We need an
actual reference for the serialization of the public key in order for
this to be implementable.  (If we're using X25519, this is very easy and
RFC 7748 does the hard work for us.)

Section 5.3.1

   Regarding the Responder's HIT, the Initiator may receive this HIT
   either from a DNS lookup of the Responder's FQDN (see [RFC8005]),
   from some other repository, or from a local table.  The Responder's
   HIT also MUST be of a HIP DEX type.  If the Initiator does not know
   the Responder's HIT, it may attempt to use opportunistic mode by
   using NULL (all zeros) as the Responder's HIT.  [...]

The "may attempt" seems in conflict with "MUST be of a HIP DEX type".
Comment (2021-03-25)
RFC 7401 has to say about secp160r1 (in 2015) that "Today, these groups
should be used only when the host is not powerful enough (e.g., some
embedded devices) and when security requirements are low (e.g.,
long-term confidentiality is not required)."  We might mirror the
"security requirements are low" portion ourselves as a requirement for
the use of DEX at all.

The use of the random #I from the puzzle as the CMAC key both for
solving the puzzle and for CKDF-Extract() is perhaps a bit
unconventional.  I don't know of a specific attack against it, though
(and the HKDF desing allows an all-zeros key to be used for
HKDF-Extract() when no salt is available).

Section 1

   4.  The forfeiture of the use of digital signatures leaves the R1
       packet open to a MITM attack.  Such an attack is managed in the

We can't use the acronym MITM without expanding it (it's used five times
throughout), and "active on-path attack" is probably more useful a
description anyway.

Section 1.1

   An existing HIP association can be updated with the update mechanism
   defined in [RFC7401].  Likewise, the association can be torn down
   with the defined closing mechanism for HIPv2 if it is no longer
   needed.  Standard HIPv2 uses a HIP_SIGNATURE to authenticate the
   association close operation, but since DEX does not provide for
   signatures, the usual per-message MAC suffices.

Thank you for calling out the divergence from RFC 7401 HIPv2.
However, the conclusion here ("the per-message MAC suffices") is not
supported by the rest of the sentence.

Section 1.2.1

Is it useful to present the overall summary of operations from the
Responder's perspective as well?  I recognize that it is in some sense
similar and may not be worth the partial redundancy.

   Papers like [EfficientECC] show on the ATmega328P [ATmega328P] an
   EdDSA25519 signature generation of 19M cycles and verification of 31M
   cycles.  Thus the SIGMA Public Key operations come at a cost of 81M

It's probably worth noting that the [EfficientECC] implementation has
the additional constraint of targeting side-channel resistance.  That
said, the proposed deployment scenarios for HIP-DEX include those where
the same motivations presented in the paper for wanting
side-channel-resistance apply, so we cannot reasonably remove that
constraint and achieve lighter-weight implementation.

Section 2.3

   HI (Host Identity):  The static ECDH public key that represents the
      identity of the host.  In HIP DEX, a host proves ownership of the
      private key belonging to its HI by creating a HIP_MAC with the
      derived ECDH key (see Section 3) in the appropriate I2 or R2

This definition is rather divergent from the RFC 7401 definition of Host
Identity.  Necessarily so, to some extent, since DEX doesn't have
signature keys, but I think we can do better at acknowledging the
divergence.  Perhaps something like "[RFC7401] defined this as the
public key of the signature algorithm that represents the identity of
the host.  Since DEX removes the signature operation, the static ECDH
public key is used to play the role of the identity of the host.  In HIP
DEX, a host [...]"?

My comment from the -13 about the HIP_MAC not directly proving ownership
of the private key also still applies, IMO.

   KEYMAT:  Keying material.  That is, the bit string(s) used as
      cryptographic keys.

This is also pretty divergent from RFC 7401's definition.  Do we want to
say something about "symmetric keys used for encryption and integrity
protection of HIP packets and encrypted user data packets"?

   RHASH (Responder's HIT Hash Algorithm):  In HIP DEX, RHASH is
      redefined as CMAC.  Still, note that CMAC is a message

We might also highlight the "from" part of the redefinition; something
like "Since HIP DEX does not use hash functions, an alternative
mechanism is needed for many of the places where RHASH is used.  To
match up with the HIP DEX design goals, CMAC is repurposed to perform
many of the functions where HIP-BEX uses RHASH.  Still, note that [...]"
might work.

   Security Association (SA):  An SA is a simplex "connection" that

I don't think I understand how an SA is "simplex", and RFC 7401 isn't
really enlightening me.  Help?

Section 3

   *  The HIT suite ID MUST only be a DEX HIT ID (see Section 5.2.4).

I don't think I understand where this restriction applies and what
exactly it's saying.  Section 5.2.4 covers the HIT_SUITE_LIST in R1, but
the reference seems to be made just for the specific ECDH/FOLD HIT Suite
ID (TBD2).  My current guess is that this is just writing down the
(near-)tautology that DEX HITs incorporate the ECDH/FOLD suite ID, in
which case there may not even be a need for a specific normative "MUST".

   Due to the latter property, an attacker may be able to find a
   collision with a HIT that is in use.  Hence, policy decisions such as

I think we should say rather that "it is assumed that an attacker can
find a collision with a HIT that is in use" rather than the current "may
be able to find" formulation.

Section 3.2.1

Is there anything useful to say about what mitigations are available if
an accidental collision occurs?  (Is just the full HOST_ID in the handshake
enough?  Would there be value in re-keying one of the nodes to not

   Even without collision-resistance, it is not trivial to create
   duplicate FOLD generated HITs, as FOLD is starting out with a random
   input (the HI).  Although there is a set, {N}, of HIs that will have
   duplicate FOLD HITs, even randomly generating duplicate HITs is
   unlikely.  [...]

I don't think describing a single set of HIs is particularly useful; the
situation might be better described as there being a set of equivalence
classes under FOLD, or many sets where each set has the same FOLDed HIT.
(The note a couple sentences later about "size of set above" would be
adjusted accordingly.


                                                        If the data
   transform does not specify its own KDF, the key derivation function
   defined in Section 6.3 is used.  Even though the concatenated input
   is randomly distributed, a KDF Extract phase may be needed to get the
   proper length for the input to the KDF Expand phase.

I'm reluctant to say "the concatenated input is randomly distributed"
since the constrained devices in question may not have particularly good
RNGs.  "Even if" might be safer.

Section 4.1.4

   The User Data Considerations in Section 4.5. of [RFC7401] also apply
   to HIP DEX.  There is only one difference between HIPv2 and HIP DEX.
   Loss of state due to system reboot may be a critical performance
   issue for resource-constrained devices.  Thus, implementors MAY
   choose to use non-volatile, secure storage for HIP states in order
   for them to survive a system reboot as discussed in Section 6.11.
   Using non-volatile storage will limit state loss during reboots to
   only those situations with an SA timeout.

IIUC this includes saving (e.g.) the pair-wise key SA state to
nonvolatile storage, which could affect the safety of user data
exchanged over the negotiated transport format.  That seems important to
note (though it should not be much of a surprise given the discussion
earlier in the document about lack of forward secrecy)!

Section 5.1

I think it would be appropriate to reiterate that indications of
Anonymity in the HIP Controls field are meaningless when DEX is used.

Section 5.2

   HIP DEX reuses the HIP parameters of HIPv2 defined in Section 5.2. of
   [RFC7401] where possible.  Still, HIP DEX further restricts and/or
   extends the following existing parameter types:

As a formal matter, how do we know that DEX is "in use" for a given
exchange and thus that these further restrictions are going to apply?
Is it just that it's the suite ID of the source HIT in the packet?

   *  PUZZLE, SOLUTION, and HIP_MAC parameter processing is altered to
      support CMAC in RHASH and RHASH_len (see Section 6.1 and
      Section 6.2).

I don't really follow how the processing needed to be altered for

Section 5.2.1, 5.2.x, etc.

   HIP DEX, the DH Group IDs are restricted to:

Similarly to the previous comment, at a formal level, how do we know
that DEX is in use and these further restrictions apply?

Section 5.2.4

In RFC 7401 we note that HIT_SUITE_LIST is in the signed part of R1.
I think it would be appropriate to reiterate that for DEX there is no
authenticity protection on R1 (including the HIT_SUITE_LIST), so the
contents of R1 can only be used provisionally until verified by
comparing against the contents of the validated R2.

Section 5.2.5

   The ENCRYPTED_KEY parameter encapsulates a random value that is later

This is a cryptographic random value, right?  We should probably say so
(or that it's from a CSPRNG, etc.).

   used in the session key creation process (see Section 6.3).  This
   random value MUST have a length of at least 64 bits.  The HIP_CIPHER
   is used for the encryption.

The only defined HIP_CIPHER for DEX is AES-128-CTR.  Where does the
counter value get taken from for performing the encryption?

Section 5.3

   In the future, an optional upper-layer payload MAY follow the HIP
   header.  The Next Header field in the header indicates if there is
   additional data following the HIP header.

(This is unchanged from the situation for RFC 7401, right?  Maybe we
should preface it as such, e.g., "As is the case for HIP-BEX, ...")

Section 5.3.1

   first list element.  With HIP DEX, the DH_GROUP_LIST parameter MUST
   only include ECDH groups defined in Section 5.2.1.

As written, this could be interpreted as limiting DEX to the specific
groups enumerated in §5.2.1, as opposed to all ECDH groups (with ECDH
group as defined in §5.2.1).  Limiting to a hardcoded list is bad for
cryptographic algorithm agility, see BCP 201.

Section 5.3.2

I see that the TLVs in R1 are ordered differently than in RFC 7401 (when
they appear in both documents), and interestingly it is the RFC 7401
case that is not in numeric TLV type order!  Is that an erratum against
RFC 7401?

The prose paragraphs cover HIP_CIPHER and DH_GROUP_LIST in the opposite
order than they appear in the figure, though.

   The Initiator's HIT MUST match the one received in the I1 packet if
   the R1 is a response to an I1.  If the Responder has multiple HIs,
   the Responder's HIT MUST match the Initiator's request.  If the
   Initiator used opportunistic mode, the Responder may select among its
   HIs as described below.  See Section 4.1.8 of [RFC7401] for detailed
   information about the "HIP Opportunistic Mode".

The first two sentences don't seem very consistent with opportunistic
mode (but I recognize this is a preexisting situation with the RFC 7401
description as well).

   the current handshake.  Based on the received HIT_SUITE_LIST, the
   Initiator MAY decide to abort the current handshake and initiate a
   new handshake with a different mutually supported HIT suite.  This

Do we want to recommend this version-changing dance before the signal is
authenticated?  What is the harm for waiting for the R2 and only acting
on the authenticated list of versions?

   The HOST_ID parameter depends on the received DH_GROUP_LIST parameter
   and the Responder HIT in the I1 packet.  Specifically, if the I1
   the R1 packet accordingly.  If the Responder however does not support
   the DH group required by the Initiator or if the Responder HIT in the
   I1 packet does not match the required DH group, the Responder selects

I suggest adding some introductory material that sets the stage here,
noting that because DEX keys are static DH keys and not signature keys,
we have to come up with a procedure (with no analogue in BEX) to find
HIs that are in the same group, so that DH key-exchange is possible at
all.  In order to do this in a manner where tampering/downgrade can be
detected, we make the (essentially arbitrary, since HIP is basically a
symmetric protocol) choice to use initiator preference, and for a given
handshake, deem the first entry in the initiator's DH_GROUP_LIST to be
the "required" group for that handshake.  (Note that we define what the
"required group" is, which the current text does not.)  If the
initiator-selected responder HIT (if present) is useful and is the
required group, we use it, otherwise we provide rules for the responder
behavior that allow the initiator to detect the failed negotiation and
what steps are needed for the next attempt to succeed.  (The responder
HOST_ID includes the correct HIT and group, and the mismatch between
that group and the source HIT group, or the mismatch between HOST_ID and
HIT, indicates the negotiation failure.)

It's a little unfortunate that we have to act on an unauthenticated
signal here, though, but in case of group mismatch there is no way to
achieve authentication without signatures.

   payload protection.  The different format types are DEFAULT, ESP
   (Mandatory to Implement) and ESP-TCP (Experimental, as explained in
   Section 3.1 in [RFC6261]).

I see that RFC 6261 is an experimental document, but not how
specifically section 3.1 thereof explains that ESP-TCP is experimental.

Section 5.3.4

   The Responder repeats the DH_GROUP_LIST, HIP_CIPHER, HIT_SUITE_LIST,
   and TRANSPORT_FORMAT_LIST parameters in the R2 packet.  These
   parameters MUST be the same as included in the R1 packet.  The
   parameter are re-included here because the R2 packet is MACed and
   thus cannot be altered by an attacker.  For verification purposes,
   the Initiator re-evaluates the selected suites and compares the
   results against the chosen ones.  If the re-evaluated suites do not
   match the chosen ones, the Initiator acts based on its local policy.

I strongly suggest saving the TLV payloads from R1 and doing a literal
memcmp() of the R1 and R2 versions.  This is incredibly simple to
implement and hard to mess up; redoing the evaluation/negotiation seems
much more prone to error.

   The ENCRYPTED_KEY parameter contains an Responder generated random
   value that MUST be uniformly distributed.  This random value is
   encrypted with the Master Key SA using the HIP_CIPHER encryption

(Same comment about cryptographic strength as for the other initiator's

   The I_NONCE parameter contains the nonce, supplied by the Initiator
   for the Master Key generation as shown in Section 6.3.  The Responder
   is echoing the value back to the Initiator to show it used the
   Initiator provided nonce.

This stated justification seems weak; if the Responder had used a
different value for the nonce, the derived key would not agree and the
MAC would fail to validate.  It seems to me, on first look, that the
role of repeating the nonce here is more the typical return-routability
check.  If you think that conveying it in the R2 payload itself plays a
different or additional role, please go into more detail about what and

   The MAC is calculated over the whole HIP envelope, excluding any
   parameters after the HIP_MAC, as described in Section 6.2.  The
   Initiator MUST validate the HIP_MAC parameter.

Should I be reading any particular meaning into the distinction between
"HIP envelope" (as used here) and "HIP packet" (as used in RFC 7401)?

Section 6.2

   5.  Set Checksum and Header Length fields in the HIP header to
       original values.  Note that the Checksum and Length fields
       contain incorrect values after this step.

I recognize that this is just mirroring the RFC 7401 discussion, but I
don't actually understand why these values are incorrect.  The process
of verifying the MAC doesn't remove the MAC from the packet, so AFAICT
the length and checksum could still be valid (provided there are no
parameters after HIP_MAC or they are restored "if they will be needed

Section 6.5

   4.  If the implementation chooses to respond to the I1 packet with an
       R1 packet, it creates a new R1 according to the format described
       in Section 5.3.2.  It chooses the HI based on the destination HIT
       and the DH_GROUP_LIST in the I1 packet.  If the implementation

What is "the HI" that it chooses?

       does not support the DH group required by the Initiator or if the
       destination HIT in the I1 packet does not match the required DH
       group, it selects the mutually preferred and supported DH group

In line with my earlier comments, I suggest being more clear that it is
the initiator's preference that is respected (there is no well-defined
notion of "mutual preference"), assuming that my understanding is

       based on the DH_GROUP_LIST parameter in the I1 packet.  The
       implementation includes the corresponding ECDH public key in the
       HOST_ID parameter.  If no suitable DH Group ID was contained in
       the DH_GROUP_LIST in the I1 packet, it sends an R1 packet with
       any suitable ECDH public key.

What defines "suitable" here?

   Note that only steps 4 and 5 have been changed with regard to the
   processing rules of HIPv2.  The considerations about R1 management

Pedantically, step 1's directive changed from a "must" to a "MUST",
which may or may not be noteworthy.

Section 6.6

   6.   The system MUST check that the DH Group ID in the HOST_ID
        parameter in the R1 matches the first DH Group ID in the
        Responder's DH_GROUP_LIST in the R1 packet, and also that this
        Group ID corresponds to a value that was included in the
        Initiator's DH_GROUP_LIST in the I1 packet.  If the DH Group ID

This looks like it's describing a system where the responder's
preference takes priority.  The earlier discussion (I thought) indicated
that the initiator's preference took priority.  There can only be one...

Section 6.7

   The processing of I2 packets follows similar rules as HIPv2 (see
   Section 6.9 of [RFC7401]).  The main differences to HIPv2 are that
   HIP DEX introduces a new session key exchange via the ENCRYPTED_KEY
   parameter as well as an I2 reception acknowledgement for
   retransmission purposes.  [...]

So the lack of anonymity support and DH key generation are not "main"
differences? :)

   5.   If the system's state machine is in the R2-SENT state, the
        system MUST check to see if the newly received I2 packet is
        similar to the one that triggered moving to R2-SENT.  If so, it

How is "similar to" determined?

   6.   If the system's state machine is in the I2-SENT state, the
        system MUST make a comparison between its local and sender's
        HITs (similarly as in Section 6.3).  If the local HIT is smaller
        than the sender's HIT, it should drop the I2 packet, use the
        peer Diffie-Hellman key, ENCRYPTED_KEY keying material and nonce
        #I from the R1 packet received earlier, and get the local
        Diffie-Hellman key, ENCRYPTED_KEY keying material, and nonce #J
        from the I2 packet sent to the peer earlier.  Otherwise, the
        system processes the received I2 packet and drops any previously
        derived Diffie-Hellman keying material Kij and ENCRYPTED_KEY
        keying material it might have generated upon sending the I2
        packet previously.  The peer Diffie-Hellman key, ENCRYPTED_KEY,
        and the nonce #J are taken from the just arrived I2 packet.  The
        local Diffie-Hellman key, ENCRYPTED_KEY keying material, and the
        nonce #I are the ones that were sent earlier in the R1 packet.

We list the two ways to get Kij, nonce #I, and nonce #J here ... but we
don't say what to do with them once you get them.

   8.   If the system's state machine is in any state other than
        R2-SENT, the system SHOULD check that the echoed R1 generation
        counter in the I2 packet is within the acceptable range if the
        counter is included.  [...]

If the system is in R2-SENT, do we just re-send the same R2, or do we
have to continue with the rest of the calculations (and the risk of a
loophole that bypasses the R1 generation counter checks)?

   11.  The system MUST derive Diffie-Hellman keying material Kij based
        on the public value and Group ID in the HOST_ID parameter.  This
        keying material is used to derive the keys of the Master Key SA

Do we need to validate that this group is the same group as the HOST_ID
we sent in the R1?

Section 6.8

   4.  The system MUST re-evaluate the DH_GROUP_LIST, HIP_CIPHER,
       HIT_SUITE_LIST, and TRANSPORT_FORMAT_LIST parameters in the R2
       packet and compare the results against the chosen suites.

As mentioned previously, the "remember and memcmp()" option is probably
safer.  (Also, resolving a discuss point might require adding HOST_ID to
this list.)

   Note that step 4 (signature verification) from the original
   processing rules of HIPv2 has been replaced with a negotiation re-
   evaluation in the above processing rules for HIP DEX.  Moreover, step
   6 has been added to the processing rules.

I think that steps 5 and 7 have been added, not step 6.

Section 6.11

   Storing of the R1 generation counter values and ENCRYPTED_KEY counter
   (Section 5.2.5) MUST be configured by explicit HITs.

What is the ENCRYPTED_KEY counter?  The word "counter" does not appear
in Section 5.2.5.

Section 7

   If a Responder is not under high load, K SHOULD be 0.

I believe this SHOULD duplicates normative guidance already given

Section 7.1

   ACL processing is applied to all HIP packets.  A HIP peer MAY reject
   any packet where the Receiver's HIT is not in the ACL.  The HI (in

The *Receiver's* HIT?  Not the sender's?

   the R1, I2, and optionally NOTIFY packets) MUST be validated as well,
   when present in the ACL.  This is the defense against collision and
   second-image attacks on the HIT generation.

I think "when present in the ACL" needs to be stricken, since we now
mandate the HIT,HI pairing (or just the HI) to be in the ACL.

Section 8

Why do we give guidance to wait for the retransmission timeout before
acting on I1 but not before acting on R1?

Section 9

   HIP DEX closely resembles HIPv2.  As such, the security
   considerations discussed in Section 8 of [RFC7401] similarly apply to
   HIP DEX.  HIP DEX, however, replaces the SIGMA-based authenticated
   Diffie-Hellman key exchange of HIPv2 with an exchange of random
   keying material that is encrypted with a Diffie-Hellman derived key.

IIUC the ENCRYPTED_KEY material is used only for the pair-wise SA, not
the master key SA.  So some further detail would be helpful here.

   Both the Initiator and Responder contribute to this keying material.
   As a result, the following additional security considerations apply
   to HIP DEX:

We do want to ensure that both parties contribute to the master key SA
as well (which I think they do, with I_NONCE and the puzzle's #i that is
used in CKDF), so we should say that more clearly.

   *  The strength of the keys for both the Master and Pair-wise Key SAs
      is based on the quality of the random keying material generated by
      the Initiator and the Responder.  As either peer may be a sensor
      or an actuator device, there is a natural concern about the
      quality of its random number generator.  Thus at least a CSPRNG
      SHOULD be used.

What is the "at least" intending to indicate here?  What would be
"better" than a CSPRNG?

   *  The R1 packet is unauthenticated and offers an adversary a new
      attack vector against the Initiator.  This is mitigated by only
      processing a received R1 packet when the Initiator has previously
      sent a corresponding I1 packet.  Moreover, the Responder repeats
      TRANSPORT_FORMAT_LIST parameters in the R2 packet in order to
      enable the Initiator to verify that these parameters have not been
      modified by an attacker in the unprotected R1 packet as explained
      in Section 6.8.

[depending on the discuss resolution, HOST_ID might be needed here as

   *  It is critical to properly manage the ENCRYPTED_KEY counter
      (Section 5.2.5).  If non-volatile store is used to maintain HIP
      state across system resets, then this counter MUST be part of the
      state store.

[the unexplained "ENCRYPTED_KEY counter" again]

Section 9.2

   generate a single keystream.  The integration of AES-CTR into IPsec
   ESP (RFC 3686) used by HIP (and, thus, by HIP-DEX) improves on the

AFAICT this integration is used for the pair-wise SA but the master key
SA messages are using the ENCRYPTED parameter which behaves differently.

   situation by partitioning the 128-bit counter space into a 32-bit
   nonce, 64-bit IV, and 32-bits of counter.  The counter is incremented
   to provide a keystream for protecting a given packet, the IV is
   chosen by the encryptor in a "manner that ensures uniqueness", and
   the nonce persists for the lifetime of a given SA.  In particular, in
   this usage the nonce must be unpredictable, not just single-use.  In
   HIP-DEX, the properties of nonce uniqueness/unpredictability and per-
   packet IV uniqueness are defined in Section 5.2.2.

I don't see such descriptions in Section 5.2.2.

Section 9.3

   With the curves specified here, there is a straightforward key
   extraction attack, which is a very serious problem with the use of
   static keys by HIP-DEX.  Thus it is MANDATORY to validate the peer's
   Public Key.

Please provide more details and/or references so that readers not
already skilled in the art can figure out what is being referenced.

Section 10

   ECC Curve Label  This document specifies a new algorithm-specific
      subregistry named "ECDH Curve Label".  The values for this
      subregistry are defined in Section 5.2.1.  The complete list of

The values analogous to the existing "ECDSA Curve Label" registry seem
to appear in Section 5.2.3, not Section 5.2.1.

Section 13.2

Following [NIST.SP.800-56C] gets a notice that it has been replaced by

The way in which we reference RFC 7228 and have MUST-level requirements
on the class of device that uses HIP DEX, could be considered to make
RFC 7228 a normative reference.

If we are actually using X25519, RFC 7748 needs to be normative.
Arguably it does even if we're using ECDH-on-{Curve25519,Wei25519}.

Appendix C

The content of this appendix seems stale (there are no SHOULDs in
Section 6.3 anymore, etc.)



   The HIP DEX protocol is primarily designed for computation or memory-
   constrained sensor/actuator devices.  Like HIPv2, it is expected to
   be used together with a suitable security protocol such as the
   Encapsulated Security Payload (ESP) for the protection of upper layer

per RFC 4303 ESP is the "Encapsulating" Security Payload.

Section 1.1

   HIP DEX does not have the option to encrypt the Host Identity of the
   Initiator in the I2 packet.  The Responder's Host Identity also is
   not protected.  Thus, contrary to HIPv2, HIP DEX does not provide for
   end-point anonymity and any signaling (i.e., HOST_ID parameter
   contained with an ENCRYPTED parameter) that indicates such anonymity
   should be ignored.

I think s/should/must/ -- attempting to rely on such signaling has no

   Finally, HIP DEX is designed as an end-to-end authentication and key
   establishment protocol.  As such, it can be used in combination with
   Encapsulated Security Payload (ESP) [RFC7402] as well as with other

(ESP again)

Section 1.2

   to be a recurring part of the protocol.  Further, for devices
   constrained in this manner, a FS-enabled protocol's cost will likely
   provide little gain.  Since the resulting "FS" key, likely produced
   during device deployment, would typically end up being used for the
   remainder of the device's lifetime.  Since this key (or the
   information needed to regenerate it) persists for the device's
   lifetime, the key step of 'throw away old keys' in achieving forward
   secrecy does not occur, thus the forward secrecy would not be
   obtained in practice.

I think the last two sentences are redundant, and editing remnants where
one (the latter?) is supposed to replace the other.

   try a DEX HIT.  Note that such a downgrade (from BEX to DEX) offer
   approach is open to attack, requiring additional mitigation (e.g.
   ACL controls).

I'd suggest s/open to attack/open to attack by interfering with the
initial BEX offer/

Section 1.2.1

   b.  Key generation

          1 Diffie-Hellman ephemeral keypair generation, and

          1 Diffie-Hellman shared secret generation.

I think I often see DH shared-secret computation classified as a "public
key operation", so perhaps the division between bullets should be
signature scheme vs key agreement.

Section 2.2

   FOLD (X, K)  denotes the partitioning of X into n K-bit segments and
      the iterative folding of these segments via XOR.  I.e., X = x_1,
      x_2, ..., x_n, where x_i is of length K and the last segment x_n
      is padded to length K by appending 0 bits.  FOLD then is computed

I suggest s/0 bits/bits with value 0/

Section 2.3

   CMAC:  The Cipher-based Message Authentication Code with the 128-bit
      Advanced Encryption Standard (AES) defined in [NIST.SP.800-38B].

I suggest
   CMAC:  The Cipher-based Message Authentication Code.  In this
      document, CMAC is instantiated using the 128-bit
      Advanced Encryption Standard (AES) defined in [NIST.SP.800-38B].

Section 3

   A compressed encoding of the HI, the Host Identity Tag (HIT), is used

To me a bare "compressed" suggests "reversibly compressed", but the HIT
generation procedure is lossy.  Maybe "reduced encoding"?

   *  The DEX HIT is not generated via a cryptographic hash.  Rather, it
      is a compression of the HI.

Likewise in §3.1, etc.

Section 4.1

   a MAC.  The R2 repeats the lists from R1 for signed validation to
   defend them against a MITM attack.

DEX has no signatures, so maybe "authenticated validation"?

We should probably expand "Trans" to "transport format" in the legend of
Figure 1, since it's not otherwise covered until Section 5.3.3 or so.

Section 4.1.1

               After a successful puzzle verification, the Responder can
   securely create session-specific state and perform CPU-intensive
   operations such as a Diffie-Hellman key generation.  [...]

In DEX, neither party does DH keypair generation in band, since only
static ECDH shares are used.  Maybe talking about DH shared-secret
computation is better?


                                 To this end, the Responder MAY notify
   the Initiator about the anticipated delay once the puzzle solution
   was successfully verified that the remaining I2 packet processing
   will incur a high processing delay.  [...]

pick one of "about the anticipated delay" and "that the remaining I2
packet processing will incur a high processing delay".

                                        The NOTIFICATION parameter
   contains the anticipated remaining processing time for the I2 packet
   in milliseconds as two-octet Notification Data.  [...]

Should we say "network byte order"?


   The Master Key SA is used to authenticate HIP packets and to encrypt
   selected HIP parameters in the HIP DEX packet exchanges.  Since only
   a small amount of data is protected by this SA, it can be long-lived
   with no need for rekeying.  [...]

I suggest "and in many cases will have no need to rekey", since we go on
to talk about the need for rekeying if the ESP sequence counter would

Section 5.2

   *  DH_GROUP_LIST and HOST_ID are restricted to ECC-based suites.

Is "suites" or "algorithms" more appropriate here?

Section 5.2.2


Section 5.3.2

   The DH_GROUP_LIST parameter contains the Responder's order of
   preference based on the Responder's choice the ECDH key contained in
   the HOST_ID parameter (see below).  [...]

The grammar is not right in this sentence, maybe around "choice the ECDH

Section 5.3.3

   If present in the R1 packet, the Initiator MUST include an unmodified
   copy of the R1_COUNTER parameter into the I2 packet.

It seems that RFC 7401 had the (nonsensical) "if present in the I1
packet", which probably merits an errata report.  (I did not submit one
myself so as to preserve credit for whomever actually noticed it first;
I just looked at the diff and it stuck out.)

   The Solution contains the Random #I from the R1 packet and the
   computed #J value.  The low-order #K bits of the RHASH(I | ... | J)
   MUST be zero.

It seems that to be consistent with the rest of the document and RFC
7401 we should capitalize as "SOLUTION".

   The TRANSPORT_FORMAT_LIST parameter contains the single transport
   format type selected by the Initiator.  The chosen type MUST
   correspond to one of the types offered by the Responder in the R1
   packet.  The different format types are DEFAULT, ESP and ESP-TCP as
   explained in Section 3.1 in [RFC6261].

It seems like we could use consistent phrasing for the format types in
§5.3.2 and §5.3.3.

Section 5.4

   of the packet, it MAY respond with an ICMP packet.  Any such reply

I think the "any such replies" formulation in RFC 7401 is correct.

Section 6.1

The RFC 7401 procedure is not particularly clear that the responder
rejects if the received #I is not a saved one (which is needed in order
for the puzzle mechanism to revert to a "cookie-based DoS protection
mechanism" as claimed in §4.1.1).  We might consider rectifying that

Section 6.2

   4.  Compute the CMAC using either HIP-gl or HIP-lg integrity key as
       defined in Section 6.3 and verify it against the received CMAC.

Just writing "either" reads as if it's an open choice; we might rather
say "the appropriate choice of" to indicate that the choice is

Section 6.6

   3.   If the HIP association state is I1-SENT or I2-SENT, the received
        Initiator's HIT MUST correspond to the HIT used in the original
        I1 packet.  Also, the Responder's HIT MUST correspond to the one
        used in the I1 packet, unless this packet contained a NULL HIT.

I think s/unless this packet/unless that packet/ makes more sense, as
"this packet" would be the R1 packet we're responding to.

   10.  The system attempts to solve the puzzle in the R1 packet.  The
        system MUST terminate the search after exceeding the remaining
        lifetime of the puzzle.  If the puzzle is not successfully
        solved, the implementation MAY either resend the I1 packet
        within the retry bounds or abandon the HIP base exchange.


   11.  The system computes standard Diffie-Hellman keying material
        according to the public value and Group ID provided in the
        HOST_ID parameter.  The Diffie-Hellman keying material Kij is
        used for key extraction as specified in Section 6.3.

In my experience "shared secret" is a more common term than "keying
material" in this context.

Section 6.8

   2.  The system MUST verify that the HITs in use correspond to the
       HITs that were received in the R1 packet that caused the
       transition to the I2-SENT state.

It looks like RFC 7401 had "transition to the I1-SENT state", which
seems worthy of an errata report.

Éric Vyncke Yes

Comment (2020-07-08 for -21)
This document was deferred by Terry Manderson in May 2018. The authors have taken into account all COMMENTs from the 2018 ballot, changing several parts of the document based on those COMMENTs.

The document went successfully through a new IETF last call and the authors have addressed all points raised during this Last Call (including the SECDIR review by Don Eastlake). Security AD have currently some DISCUSSs based on the May 2020 telechat (that was cancelled pending the fix to those DISCUSS). Authors have addressed all DISCUSS (and some COMMENTs) points raised during the IESG review in revision -21.

So I am balloting the approval again in front of the 2019 IESG members.


Martin Duke No Objection

Comment (2021-03-15)
Thanks for providing detailed data on why this sort of security compromise is necessary. I am not a crypto expert, but wonder if there is some way to leverage the potentially asymmetric capabilities of two HIP nodes to offload more of the computation onto one of them.

- This is optional, but some suggest replacing the term "man in the middle" with "on-path attacker". Please do it if you are amenable.

- Does HIP DEX always put version 2 in the version field, or is DEX somehow orthogonal to HIP version?

- Thanks for the discussion of the I2 retransmission timeout in Sec 9. It addressed my concerns in reading Section 4. The "reported delay + 1/2 maximum RTT" formula seems like an odd heuristic for what ought to be "the reported delay plus a little more", but it *does* limit a spoofed NOTIFY to no more than doubling the retransmission rate.

Lars Eggert (was Discuss) No Objection

Comment (2021-03-24)
No email
send info
All comments below are very minor change suggestions that you may choose to
incorporate in some way (or ignore), as you see fit. There is no need to let me
know what you did with these suggestions.

Section 5.2.6, paragraph 4, nit:
-    the Responder in I2 which echos it back to the Initiator in R2.
+    the Responder in I2 which echoes it back to the Initiator in R2.
+                                  +

Erik Kline No Objection

(Suresh Krishnan) (was Discuss) No Objection

Comment (2020-03-13 for -15)
Thanks for addressing my DISCUSS.

Warren Kumari No Objection

Comment (2021-03-22)
No email
send info
Carrying my earlier (-06) ballot position forward... and then my -13 position forward again. 

I only reviewed the differences, and do not see any operational concerns.

... and carrying forward to -24. 

(Mirja Kühlewind) No Objection

Comment (2020-02-26 for -13)
I only re-reviewed the changes, however, I don't see any transport issues there.

Francesca Palombini No Objection

Comment (2021-03-24)
Thank you for this document. Please find some minor comments below.


1. -----

   Responder, if it is known.  Moreover, the I1 packet initialises the
   negotiation of the Diffie-Hellman group that is used for generating

FP: as this is the first time it appears in the text, I would have appreciated a reference to its definition in HIP BEX, or for it to be mentioned in the terminology section.

2. -----

   When the Initiator receives the NOTIFY packet, it sets the I2
   retransmission timeout to the I2 processing time indicated in the
   NOTIFICATION parameter plus half the RTT-based timeout value.  In
   doing so, the Initiator MUST NOT set the retransmission timeout to a
   higher value than allowed by a local policy.  This is to prevent
   unauthenticated NOTIFY packets from maliciously delaying the
   handshake beyond a well-defined upper bound in case of a lost R2
   packet.  At the same time, this extended retransmission timeout
   enables the Initiator to defer I2 retransmissions until the point in
   time when the Responder should have completed its I2 packet
   processing and the network should have delivered the R2 packet
   according to the employed worst-case estimates.

FP: This paragraph (or Section 6.9, also talking about NOTIFY packets) does not mention the case where the Initiator receives 2 NOTIFY packets in sequence. Doing so would short circuit the existance of the local policy, and allow to delay the handshake indefinitely. I could not see this mentioned anywhere, is this covered in RFC 7401?

3. -----

   In HIP packets, HIP parameters are ordered according to their numeric
   type number and encoded in TLV format.

FP: Please add a reference to section 5.2.1 of RFC 7401.

4. -----

   Group                        KDF           Value

   Curve25519 [RFC7748]         CKDF          TBD7 (suggested value 12)
   Curve448   [RFC7748]         CKDF          TBD8 (suggested value 13)

FP: I think it would be good to add a reference to CKDF next to it, in the KDF column, analogous to what RFC 7401 does with RFC 5869 for HKDF.

5. -----

   results against the chosen ones.  If the re-evaluated suites do not
   match the chosen ones, the Initiator acts based on its local policy.

FP: I don't know if this is an addition to RFC 7401 policy or if this is defined there. If it is an addition, it would have been good to specify that, otherwise add a reference.

6. -----

     CKDF-Extract(I, IKM, info) -> PRK

FP: Although quite clear, since all other conventions are described in the terminology, it might be good to add "->" to it.

7. -----

   The key derivation for the Master Key SA employs always both the
   Extract and Expand phases.  The Pair-wise Key SA needs only the
   Extract phase when the key is smaller or equal to 128 bits, but
   otherwise requires also the Expand phase.

                 (either the output from the extract step or the
                 concatenation of the random values of the
                 ENCRYPTED_KEY parameters in the same order as the
                 HITs with sort(HIT-I | HIT-R) in case of no extract)

FP: "in case of no extract" - from the paragraph above it seemed as the Extract phase is always executed, is that not so?

8. -----

       N       =  ceil(L/(RHASH_len/8))

FP: Same as above, it might be good to mention or add to the terminology.

9. -----

   3.   If the HIP association state is I1-SENT or I2-SENT, the received
        Initiator's HIT MUST correspond to the HIT used in the original
        I1 packet.  Also, the Responder's HIT MUST correspond to the one
        used in the I1 packet, unless this packet contained a NULL HIT.

FP: What if it doesn't correspond? Is this specified in RFC 7401 (or anywhere else I might have missed)?

10. -----

Section 7. HIP Policies

FP: It would have been good here to highlight additional policies or differences with RFC 7401, if any.

11. -----

Appendix A.

FP: I would have appreciated some explanation or references for this formula.

Alvaro Retana No Objection

(Adam Roach) No Objection

Comment (2020-02-24 for -13)
No email
send info
Trusting the sponsoring AD. Skimmed for ART problems, none found.

Martin Vigoureux No Objection

Robert Wilton No Objection

Comment (2021-03-25)

Security and constrained devices are both outside my knowledge area, but I note that this document has been in development for a long time, and I wonder whether the axioms under which it was originally specified are still valid, and whether they will continue to hold to be valid in the short/medium term.  For example, this document references the ZWAVE 500 chip, but it looks like that technology is already being replaced by the ZWAVE 700 chip that is smaller, faster, uses less power, and plausible looks like it has more hardware support for crypto.  Hence, I just want to check that this specification still has value in being published now, but I have not formal objection.


Murray Kucherawy No Record

Zaheduzzaman Sarker No Record

John Scudder No Record