Encrypted Key Transport for DTLS and Secure RTP

Note: This ballot was opened for revision 09 and is now closed.

(Ben Campbell) (was Yes) Discuss

Discuss (2019-02-21 for -09)
I'm adding a process discuss to hold things until we get clarity around the IANA expert reviews. 

I know Benjamin mentioned this in his DISCUSS; I am duplicating it here in case we clear up the rest of Benjamin's discuss points prior to the IANA questions.

Murray Kucherawy (was No Record, Yes) Yes

Comment (2020-06-23)
Adam Roach's comments were addressed in:

(Alexey Melnikov) (was Discuss, Yes, Discuss) Yes

Comment (2020-02-06 for -11)
No email
send info
Note to self: make sure editors look/respond to Adam’s and Ben’s comments.

Old comments preserved for posterity. I didn't check if they still apply:

I share Benjamin's concern about extensibility.

In 4.4.1:

   The default EKT Cipher is the Advanced Encryption Standard (AES) Key
   Wrap with Padding [RFC5649] algorithm.  It requires a plaintext
   length M that is at least one octet, and it returns a ciphertext with
   a length of N = M + (M mod 8) + 8 octets.

I started looking at RFC 5649. Maybe I was tired and my math was wrong, but I couldn't figure out how you came up with the N value above.
In particular, where is the "+ 8" coming from?

In 6:

   An attacker who tampers with the bits in FullEKTField can prevent the
   intended receiver of that packet from being able to decrypt it.  This
   is a minor denial of service vulnerability.  Similarly the attacker
   could take an old FullEKTField from the same session and attach it to
   the packet.  The FullEKTField would correctly decode and pass
   integrity checks.  However, the key extracted from the FullEKTField ,
   when used to decrypt the SRTP payload, would be wrong and the SRTP
   integrity check would fail.  Note that the FullEKTField only changes
   the decryption key and does not change the encryption key.  None of
   these are considered significant attacks as any attacker that can
   modify the packets in transit and cause the integrity check to fail.

The last sentence seems to be incomplete. Did you mean "can" instead of the last "and"?

(Adam Roach) Yes

Comment (2020-02-03 for -11)
Re-sending my initial comments, as only one of the 17 were addressed
in subsequent revisions. While no single comment rises to the level of
a DISCUSS-worthy issue, several of these are moderately severe. I would
appreciate either a response to each comment, or a corresponding
change in the document.


Thanks to the work that everyone has put in on getting an EKT mechanism
specified and finalized. I have a handful of comments that I would like to see
considered prior to publication of the document.



>  EKT provides a way for an SRTP session participant, to securely
>  transport its SRTP master key and current SRTP rollover counter to
>  the other participants in the session.

Nit: "...participant to securely..."



>   EKTMsgTypeExtension = %x03-FF

Shouldn't this be "%x01 / %x03-ff" ?

>   SRTPMasterKeyLength = BYTE
>   SRTPMasterKey = 1*256BYTE

I think this either needs to be "1*255BYTE", or we need text that explicitly
indicates that an SRTPMasterKeyLength value of 0x00 means "256 bytes." Probably
the former.

I think this is even further constrained by the fact that EKTCiphertext is
limited to 256 bytes, and contains the SRTPMasterKeyLength, SRTPMasterKey,
SSRC, and ROC (and is not compressed) -- which means the SRTPMasterKeyLength
can't be more than (256 - 1 - 4 - 4 =) 247 bytes. So perhaps "1*247BYTE" is
more appropriate?



>  The creation of the EKTField MUST precede the normal SRTP
>  packet processing.

Why? This seems unnecessary and unnecessarily complicated. If the order of
operations has an impact on the bits on the wire (I don't see how it does?),
then please include some explanatory text here that clarifies the reason for
this constraint.



>  When a packet is sent with the ShortEKTField, the ShortEKFField is
>  simply appended to the packet.

Nit: s/ShortEKFField/ShortEKTField/



>  5.  If the SSRC in the EKTPlaintext does not match the SSRC of the
>      SRTP packet received, then all the information from this
>      EKTPlaintext MUST be discarded and the following steps in this
>      list are skipped.

I can see implementors easily interpreting this as requiring them to discard
the RTP payload as well. If that's not the intention (I don't think it is),
consider adding text like "The FullEKTField is removed from the packet then
normal SRTP or SRTCP processing occurs."



>  Section 4.2.1 recommends that SRTP senders continue using an old key
>  for some time after sending a new key in an EKT tag.

This is the first appearance of the phrase "EKT tag," which never seems to be
properly defined. I presume this is meant to be the combination of the EKT
Ciphertext and the SPI?

In any case, please clearly define this term somewhere, preferably before using
it the first time.



>  cannot be used and they also need to create a counter that keeps
>  track of how many times the key has been used to encrypt data to
>  ensure it does not exceed the T value for that cipher (see ).

The parenthetical phrase appears to be missing something here.

>  If
>  either of these limits are exceeded, the key can no longer be used

Nit: "...either... is exceeded..."

>  for encryption.  At this point implementation need to either use the

Nit: "...implementations need..."



>  If a source has its EKTKey changed by the key management, it MUST
>  also change its SRTP master key

I suppose it's not terribly important for interop, but the implication that this
change takes place immediately seems to contradict the 250 ms period specified
in §4.2.1. Perhaps a few words here about how these two normative statements
are intended to interact would save implementors a bit of grief.



>  This document defines the use of EKT with SRTP.  Its use with SRTCP
>  would be similar, but is reserved for a future specification.

After reading this far, I was quite surprised to find this qualification. If
this is the intention for this document, please adjust the rest of the text to
match. Some examples follow.

>  The following shows the syntax of the EKTField expressed in ABNF
>  [RFC5234].  The EKTField is added to the end of an SRTP or SRTCP
>  packet.
>  Rollover Counter (ROC): On the sender side, this is set to the
>  current value of the SRTP rollover counter in the SRTP/SRTCP context
>  associated with the SSRC in the SRTP or SRTCP packet.
>  1.  The final byte is checked to determine which EKT format is in
>      use.  When an SRTP or SRTCP packet contains a ShortEKTField, the
>      ShortEKTField is removed from the packet then normal SRTP or
>      SRTCP processing occurs.
>      The reason for
>      using the last byte of the packet to indicate the type is that
>      the length of the SRTP or SRTCP part is not known until the
>      decryption has occurred.
>  7.  At this point, EKT processing has successfully completed, and the
>      normal SRTP or SRTCP processing takes place.
>  This allows
>  those peers to process EKT keying material in SRTP (or SRTCP) and
>  retrieve the embedded SRTP keying material.



>     To accommodate packet loss, it is
>     RECOMMENDED that three consecutive packets contain the
>     FullEKTField be transmitted.

Nit: "...containing..." (alternately, remove "be transmitted" -- both make a
grammatically correct sentance)

More substantially -- under "New sender:", I'm a little surprised that there
isn't any mention of other senders re-keying in response to a new sender
joining. In the vast majority of conferences, when a sender joins, that same
entity generally will also be a receiver. It seems this should trigger other
senders to include the key in their next packet.



>  Rekey:
>     By sending EKT tag over SRTP, the rekeying event shares fate with
>     the SRTP packets protected with that new SRTP master key.

Is this actually true? Going back to the 250 ms period specified in §4.2.1, it
seems that the master key is sent out in packets pretty far removed from those
it actually protects.

Between this and the inconsistency I mention in §4.5 above, this increasingly
feels like maybe there were two different ways of reasoning about the timing
of sending a master key versus the timing of actually using it. Does the text
in §4.2.1 perhaps represent an outdated notion of how this is intended to



>     If sending audio and video, the RECOMMENDED
>     frequency is the same as the rate of intra coded video frames.  If
>     only sending audio, the RECOMMENDED frequency is every 100ms.

Is this "100ms" correct?  Assuming, say, the use of Opus at voice quality with
20 ms packets, this is taking packets on the order of 40 bytes in length and
tacking on something like 20 to 30 bytes to every fifth packet. That's an
increase in overall stream size on the order of roughly 15% to 20%.

At the same time, when using real-time video, intra frames are going to happen
roughly every 500 ms to 1500 ms. If a cadence on that order is okay for
audiovisual streams, I have to imagine it's okay for audio streams.

So, to clarify: is this "100ms" a typo for "1000 ms"?



>                  +----------+-------+---------------+
>                  | Name     | Value | Specification |
>                  +----------+-------+---------------+
>                  | AESKW128 |     1 | RFCAAAA       |
>                  | AESKW256 |     2 | RFCAAAA       |
>                  | Reserved |   255 | RFCAAAA       |
>                  +----------+-------+---------------+
>                        Table 3: EKT Cipher Types

Section 5.2.1 reserves "0" as well. I suspect we want to replicate that
reservation in this table.

Deborah Brungard No Objection

Alissa Cooper No Objection

Comment (2019-02-20 for -09)
I think I-D.ietf-tls-dtls13 needs to be a normative reference.

Roman Danyliw No Objection

Comment (2020-02-04 for -11)
The document appears to have already gotten significant review with iterative updates.

Section 5.2.
   If an EKTKey message is received that cannot be processed, then the
   recipient MUST respond with an appropriate DTLS alert.

Is there any more specificity that can be provided on which DTLS alert might be appropriate?

(Spencer Dawkins) No Objection

Benjamin Kaduk (was Discuss) No Objection

Comment (2020-02-05 for -11)
No email
send info
We say that EKT can work well in scenarios such as the PERC private media framework,
and in the security considerations we give some information about concerns/caveats with
respect to EKT usage in terms of the low-level cryptographic properties.  Do we want to
give some high-level advice about deployment scenarios in which EKT does not make

I also wonder if the "ekt" name is a little generic for the TLS codepoints being requested,
as opposed to something involving "srtp_ekt", but that's basically cosmetic.

Updating to include my previous comments, since (as for Adam) they largely seem to have
not been acted upon:

This document is written under the assumption that the EKT content will
be the only content after the encrypted SRTP payload (and authentication
tag, if present).  That's true at present, of course, but I would still
like to see a little discussion of how it might coexist with other SRTP
extensions that place content as a trailer (both would need to be
parseable from the tail of the content and have a length field; and they
woule either need to share a message-type namespace or have a profile
specification to indicate what order they appear in), though the
discussion that already occurred suffices to make this not a
Discuss-level point.

Section 5 has:

  the DTLS-SRTP peer in the server role to the client.  This allows
  those peers to process EKT keying material in SRTP (or SRTCP) and
  retrieve the embedded SRTP keying material.  This combination of

but in Section 4 we say that "use with SRTCP would be similar, but is
reserved for a future specification".  (There may be one or two other
places that have text placing SRTCP on the same footing as SRTP even
though they are not, at present.)

Also section 5

  In cases where the DTLS termination point is more trusted than the
  media relay, the protection that DTLS affords to EKT key material can
  allow EKT keys to be tunneled through an untrusted relay such as a
  centralized conference bridge.  For more details, see

I did not chase the reference, but it seems like this sentence might
apply equally for "EKT keys to be tunneled" and "SRTP master keys to be
tunneled".  I trust the authors to say what they mean :)

Section 5.2.2

What do I do when I receive an EKTKey containing an ekt_spi value for
which I already have stored parameters?

  When an EKTKey is received and processed successfully, the recipient
  MUST respond with an Ack handshake message as described in Section 7
  of [I-D.ietf-tls-dtls13].  The EKTKey message and Ack MUST be

Ack is a content type, not a handshake type.  (Per DISCUSS point)

  When an EKTKey is received and processed successfully, the recipient
  MUST respond with an Ack handshake message as described in Section 7
  of [I-D.ietf-tls-dtls13].  The EKTKey message and Ack MUST be
  retransmitted following the rules in Section 4.2.4 of [RFC6347].

It's a little weird to cite DTLS 1.3 for the Ack message but then revert
to DTLS 1.2 for the retransmission schedule...

  EKT MAY be used with versions of DTLS prior to 1.3.  In such cases,
  the Ack message is still used to provide reliability.  Thus, DTLS
  implementations supporting EKT with DTLS pre-1.3 will need to have
  explicit affordances for sending the Ack message in response to an
  EKTKey message, and for verifying that an Ack message was received.
  The retransmission rules for both sides are the same as in DTLS 1.3.

...but here we say that the DTLS 1.3 retransmission rules are
authoritative.  (per DISCUSS)

Section 6

  With EKT, each SRTP sender and receiver MUST generate distinct SRTP
  master keys.  This property avoids any security concern over the re-

Er, does an SRTP receiver have a master key ("what does it encrypt if
it's not sending anything")?

  In some systems, when a member of a conference leaves the
  conferences, the conferences is rekeyed so that member no longer has
  the key.  When changing to a new EKTKey, it is possible that the
  attacker could block the EKTKey message getting to a particular
  endpoint and that endpoint would keep sending media encrypted using
  the old key.  To mitigate that risk, the lifetime of the EKTKey MUST
  be limited using the ekt_ttl.

Do we want to give any concrete guidance about ekt_ttl values?

(Suresh Krishnan) No Objection

Warren Kumari No Objection

(Mirja Kühlewind) No Objection

Comment (2019-02-19 for -09)
Just a quick clarification question:
Sec 4.2.1: "   Outbound packets SHOULD continue to use the old SRTP Master Key for
   250 ms after sending any new key.  This gives all the receivers in
   the system time to get the new key before they start receiving media
   encrypted with the new key."
I assume that 250ms is selected under the assumption that longer RTTs are a problem for interactive communication anyway? Or where does this value come from?

Barry Leiba No Objection

Comment (2020-02-04 for -11)
I agree that Adam’s comments need to be addressed.

(Eric Rescorla) No Objection

Comment (2019-02-16 for -09)
Rich version of this review at:

S 4.4.1.
>      FullEKTField is retransmitted 3 times, that only counts as 1
>      encryption.
>      Security requirements for EKT ciphers are discussed in Section 6.
>   4.4.1.  Ciphers

How do I know which cipher is in use? Is it attached to EKTKey?

S 5.2.2.
>      Note: To be clear, EKT can be used with versions of DTLS prior to
>      1.3.  The only difference is that in a pre-1.3 TLS stacks will not
>      have built-in support for generating and processing Ack messages.
>      If an EKTKey message is received that cannot be processed, then the
>      recipient MUST respond with an appropriate DTLS alert.

How important is it that you (a) be able to change EKTKeys and (b) be
able to work with DTLS < 1.3? Because if the answer to these is "no",
then you can just send EKTKeys in EncryptedExtensions.

S 6.
>      With EKT, each SRTP sender and receiver MUST generate distinct SRTP
>      master keys.  This property avoids any security concern over the re-
>      use of keys, by empowering the SRTP layer to create keys on demand.
>      Note that the inputs of EKT are the same as for SRTP with key-
>      sharing: a single key is provided to protect an entire SRTP session.
>      However, EKT remains secure even when SSRC values collide.

How am I supposed to decrypt in case I don't have a FullEKTField? Am I
supposed to use the IP address.

S 6.
>      context, e.g., from a different sender.  When the underlying SRTP
>      transform provides integrity protection, this attack will just result
>      in packet loss.  If it does not, then it will result in random data
>      being fed to RTP payload processing.  An attacker that is in a
>      position to mount these attacks, however, could achieve the same
>      effects more easily without attacking EKT.

Why don't you add an epoch so that you can't roll back?

S 4.1.
>        :                                                               :
>        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>        |   Security Parameter Index    | Length                        |
>        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>        |0 0 0 0 0 0 1 0|
>        +-+-+-+-+-+-+-+-+

This encoding seems suboptimal, in that you burn an extra byte for
every FullEKTField. Given that:

1. You are only defining two types
2. It seems unlikely that there will ever be an EKTCiphertext longer
than 128 bits.

I would suggest the following encoding:

- The first bit of the last byte indicates whether this is
FullEKTField or <Something else.>. If it's FullEKTField, the rest is
used for length. Otherwise, the rest is used for type.

Alvaro Retana No Objection

Martin Vigoureux No Objection

Éric Vyncke No Objection

Comment (2020-02-06 for -11)
Thank you for the work put into this document.  Please find below two non-blocking questions.

NB: the document shepherd write-up should be updated with the responsible AD ;-)



About section 4.3.1

What is a "packet" in the context of "appended to the packet"? Is it the UDP payload ? Should the UDP length be increased? is it the layer-2 frame ?

I also wonder whether 250 msec is enough in all case... Unsure whether SRTP is only used in real-time communication (for info, just reviewed 2 I-D from Delay Tolerant Network... so I may be biased)

Magnus Westerlund (was Discuss) No Objection

Comment (2020-06-22 for -12)
Thanks for addressing the issue.