Last Call Review of draft-ietf-mboned-driad-amt-discovery-09
review-ietf-mboned-driad-amt-discovery-09-tsvart-lc-aboba-2019-11-30-00

Request Review of draft-ietf-mboned-driad-amt-discovery
Requested rev. no specific revision (document currently at 13)
Type Last Call Review
Team Transport Area Review Team (tsvart)
Deadline 2019-12-02
Requested 2019-11-18
Authors Jake Holland
Draft last updated 2019-11-30
Completed reviews Rtgdir Last Call review of -09 by Henning Rogge (diff)
Rtgdir Last Call review of -09 by Carlos Pignataro (diff)
Opsdir Last Call review of -09 by Niclas Comstedt (diff)
Genart Last Call review of -09 by Dan Romascanu (diff)
Secdir Last Call review of -11 by Daniel Franke (diff)
Tsvart Last Call review of -09 by Bernard Aboba (diff)
Assignment Reviewer Bernard Aboba
State Completed
Review review-ietf-mboned-driad-amt-discovery-09-tsvart-lc-aboba-2019-11-30
Posted at https://mailarchive.ietf.org/arch/msg/tsv-art/H3vxh9AUBnt5uxiZHB_DtUDxuvA
Reviewed rev. 09 (document currently at 13)
Review result Ready with Issues
Review completed: 2019-11-30

Review
review-ietf-mboned-driad-amt-discovery-09-tsvart-lc-aboba-2019-11-30

Document: draft-ietf-mboned-driad-amt-discovery
Reviewer: Bernard Aboba
Review result: Ready with Nits

This document has been reviewed as part of the transport area review team's
ongoing effort to review key IETF documents. These comments were written
primarily for the transport area directors, but are copied to the document's
authors and WG to allow them to address any issues raised and also to the IETF
discussion list for information.

When done at the time of IETF Last Call, the authors should consider this
review as part of the last-call comments they receive. Please always CC
tsv-art@ietf.org if you reply to or forward this review.

This draft is ready for publication from a transport point of view, with
the exception of a few (relatively minor) issues: 

Section 2.5.4.1

"   The RECOMMENDED timeout is a random value in the range
   [initial_timeout, MIN(initial_timeout * 2^retry_count,
   maximum_timeout)], with a RECOMMENDED initial_timeout of 4 seconds
   and a RECOMMENDED maximum_timeout of 120 seconds.
"

[BA] The draft provides a justification for the initial_timeout value
of 4 seconds, but not for the maximum_timeout value of 120 seconds, 
which seems somewhat high.  It is my suspicion that the value is set
this high to allow for robustness in dealing with potential routing 
transients. It would be helpful to state the reasoning. 

Section 2.5.4.2

"  In some gateway deployments, it is also feasible to monitor the
   health of traffic flows through the gateway, for example by detecting
   the rate of packet loss by communicating out of band with receivers,
   or monitoring the packets of known protocols with sequence numbers.
   Where feasible, it's encouraged for gateways to use such traffic
   health information to trigger a restart of the discovery process
   during event #3 (before sending a new Request message).

   However, to avoid synchronized rediscovery by many gateways
   simultaneously after a transient network event upstream of a relay
   results in many receivers detecting poor flow health at the same
   time, it's recommended to add a random delay before restarting the
   discovery process in this case.

   The span of the random portion of the delay should be no less than 10
   seconds by default, but may be administratively configured to support
   different performance requirements."

[BA] There is good reason to be concerned about causing synchronized
rediscovery as a result of a transient network event, if "poor flow health"
is diagnosed too readily. As a result it would be useful to have more
specific advice on the definition of "poor flow health" as well as 
how to calculate the "random delay". 

My assumption is that we are talking about *major* and *sustained*
loss here (e.g. a period larger than most routing transients), as well 
as a *substantial* delay (to avoid instability). 

Concerns unrelated to Transport

Security

Section 6.2

   "There must be a trust relationship between the end consumer of this
   resource record and the DNS server.  This relationship may be end-to-
   end DNSSEC validation, a TSIG [RFC2845] or SIG(0) [RFC2931] channel
   to another secure source, a secure local channel on the host, DNS
   over TLS [RFC7858] or HTTPS [RFC8484], or some other secure
   mechanism."

[BA] This paragraph is mixing e2e security mechanisms (DNSSEC) with
mechanisms such as DoT and DoH. The threats addressed by each mechanism
are different (e.g. RR modification versus snooping) so it would be helpful
to be clear about what the threat model is.  Is there a privacy concern
relating to unauthorized snooping of AMTRELAY RRs? Or is the issue more
modification of the RRs?  

Overall utility

[BA] It is not clear to me why the AMTRELAY RR is needed, given that
Section 2.3.1 makes it clear that querying this record is a last
resort: 

"  It's only appropriate for an AMT gateway to discover an AMT relay by
   querying an AMTRELAY RR owned by a sender when all of these
   conditions are met:

   1.  The gateway needs to propagate a join of an (S,G) over AMT,
       because in the gateway's network, no RPF next hop toward the
       source can propagate a native multicast join of the (S,G); and

   2.  The gateway is not already connected to a relay that forwards
       multicast traffic from the source of the (S,G); and

   3.  The gateway is not configured to use a particular IP address for
       AMT discovery, or a relay discovered with that IP is not able to
       forward traffic from the source of the (S,G); and

   4.  The gateway is not able to find an upstream AMT relay with DNS-SD
       [RFC6763], using "_amt._udp" as the Service section of the
       queries, or a relay discovered this way is not able to forward
       traffic from the source of the (S,G) (as described in
       Section 2.5.4.1 or Section 2.5.5); and

   5.  The gateway is not able to find an upstream AMT relay with the
       well-known anycast addresses from Section 7 of [RFC7450]."

In particular, DNS-SD RRs can easily be added with DNS service 
providers, while this is not necessarily the case for a new
AMTRELAY RR.  So are there really situations in which it was not
feasible to add DNS-SD RRs, but using the AMTRELAY RR is more
convenient/easier to deploy?