Last Call Review of draft-ietf-avtcore-ports-for-ucast-mcast-rtp-
I have reviewed this document as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG. These comments were written primarily for the benefit of the security area directors. Document editors and WG chairs should treat these comments just like any other last call comments.
This protocol deals with requesting and managing multicast and unicast connection streams, and in particular dealing with the case where they are coming through a NAT. The interesting security threat in this case is one of DoS through amplification, where the attacker with a small number of packets can request that a source direct a large number of packets towards the victim. Because these streams do not have the sorts of acknowledgements that would force a TCP stream to back off if the target were unresponsive, the possibility of this sort of attack is particularly dangerous.
The RTP and RTCP protocols have no authentication infrastructure that can be leveraged. There is a resumption that all information is public and can be requested by any destination. (It is possible that one could implement encryption at a higher layer such that the data received would not be useful to someone who doesn't know the key, but any such encryption is beyond the scope of this protocol). This means that certain security attacks are unavoidable. In particular, an attacker could potentially exhaust the source or the network by requesting that data from large numbers of locations simultaneously. And an attacker that can act as a man-in-the-middle between source and victim can initiate a flood of data and then get out of the way.
This protocol attempts to assure that an attacker that can forge the victim's IP address but cannot receive packets addressed to the victim cannot mount an amplification DoS attack. It does that by adding a "token" to various messages in the protocol. The token is generated by the source when a context is set up and must be supplied by the receiver in requests to modify the data streams. In order to not add state at the source, they recommend that the token be generated as an HMAC-SHA1 of the concatenation of the client IP address, a client nonce (that is chosen for the session and supplied with each request), and a server timestamp (also chosen per session and supplied to prevent long delayed replay). This closely matches the design of "cookies" in other protocols.
The recommended use of HMAC-SHA1 may be slightly controversial, but its security is more than adequate for this application, it is a local decision on the part of the server, and the I-D has excellent text on how and when implementations should migrate to a different algorithm (in particular, referencing HMAC-SHA2-256).
There is one aspect of the design that I could not figure out reading the spec, and it would seem to relate to a security vulnerability that the spec therefore does not close. The spec recommends computing the token as a hash over a number of fields including the source IP address but not the source port. It appears that this is because the RTP and RTCP protocols use different ports and they want to use the same token with both. This introduces a threat, however, that if a node requests a token, that token will be valid for all nodes that are NATted behind the same IP address. An attacker in the inside of the NATted network could therefore acquire a token, hand it off to a co-conspirator outside the NATted network, and that co-conspirator could use it to direct traffic at a different node inside the network. It's not obvious what could or should be done to mitigate this threat. The spec allows for - but recommends against - using lots of different ports on the client side for different purposes, but I couldn't figure out how the client knows how all of those ports will be translated by the NAT. If there were a separate token per port, this problem could be solved, but I suspect there is some reason why that is impractical.