Ballot for draft-ietf-dprive-rfc7626-bis-09

Comment (2020-10-05 for -06) Sent

[[ questions ]]

[ section 6.1.1.1 ]

* Does "Strict DoT" have a definition somewhere?  I couldn't find one
  in 8499 nor in 7858.


[[ nits ]]

[ section 1 ]

* "sent in clear", consider perhaps: "sent in the clear"

[ section 4.1 ]

* "those transaction" -> "those transactions"

[ section 6.1.1 ]

* "to limited subset" -> "to a limited subset"

[ section 6.1.3 ]

* "know to be used" -> "known to be used"

Comment (2020-10-08 for -06) Sent

[ Thank you for addressing my DISCUSS point. ]

[Edit: I accidentally hit "Send" too early; I have another few comments, also non-blocking:
1: "Also, sometimes, the QNAME embeds the software one uses, which could be a privacy issue. For instance, _ldap._tcp.Default-First-Site-Name._sites.gc._msdcs.example.org."... Unless you are a Microsoft or DNS weenie, this is likely not at clear -- what is being leaked here? The fact that the site uses TCP? LDAP? Windows? Goldbach's Conjecture? Example software? (I think adding a sentence here would be helpful...)
]

Thank you for this document - it's really useful, and readable as well.

I do have a few small comments to (possibly) make it even better - I will in no way be offended if you ignore these...

The background on how DNS works is nicely written, and I'm to point people at it when I need to explain how the DNS works -- but I think a better name example than:
"What are the SRV records of _xmpp-server._tcp.example.com?" would be good -- SRV is an unusual record type, and names with underscores surprise people. I'd instead suggest "What is the MX records for example.com" or "What is the A record for ftp.example.com?" -- I'm only mentioning this because the rest of the section is a very general introduction and this might confuse newcomers...

"At the time of writing, almost all this DNS traffic is currently sent in clear (i.e., unencrypted). However there is increasing deployment of DNS-over-TLS (DoT) [RFC7858] and DNS-over-HTTPS (DoH) [RFC8484], particularly in mobile devices, browsers, and by providers of anycast recursive DNS resolution services."
I think that you might want to remove the "particularly in ..." - I suspect that it will not age well; the document does say "At the time of writing" and "increasing", etc., but this document is likely foundational enough that it will still be referenced many many years from now, and this text may just cloud matters then.

Whatever the case, thanks again for this document!

Comment (2020-10-07 for -06) Sent

I'll add my thanks for this document.  I have tripped on some of the issues in my experience, but some of the others described here were eye-opening.  I'm also learning from the ensuing discussions.

Section 1:

A couple of nits:

* "DNS relies on caching heavily ..." -- suggest "DNS relies heavily on caching ..."

* "Both are a big privacy concern since ..." -- suggest "Both are big privacy concerns since ...", unless you mean the two of them collectively (in which case, please say so)

I agree with Warren in that it's not clear what's leaking in the example at the bottom of the second paragraph of Section 4.2.

In Section 5.1, please expand "CPE" on first use.

I'm having trouble parsing the third paragraph of Section 5.2.  The fourth paragraph in the same section needs some commas.

Comment (2020-10-07 for -06) Sent

Thank you for responding to the SECDIR review (and thank you to Stephen Farrell for performing the SECDIR review).

** Section 3.5.1.1.  Per “These resolvers may have strong, medium, or weak privacy policies …”, what are the dimensions of this Likert-style scale?  I recommend a simpler sentence -- “… may have varied privacy policies”.

** Section 6.1.1.  Per “All major OS's expose the system DNS settings and allow users to manually override them if desired”, agreed.  However, in managed environments, users may not be able to manually override these settings.

** Section 6.1.3.  Per “User privacy can also be at risk if there is blocking (by local network operators or more general mechanisms) …”, what is a “more general mechanism”?  Also, "local network operator" describes who is doing the blocking and "general mechanisms" seems to be describing a technique.

** Section 8.  Editorial.  Per “They are used for many reasons – some good, some bad.”, I’d recommend against making judgements and stick to a rubric of operational practices and attacker behavior (say RFC7258).  I’m not sure this sentence is needed.

Editorial nits

-- ** Section 6.1.1.  Editorial.  s/additionally highly dependent/highly dependent/

-- Section 12.  Typo. s/apprecriated/appreciated/

No Objection (2021-03-09 for -08) Sent for earlier

No Objection (2020-10-05 for -06) Sent

It would have been nice to include a narrative section indicating the differences with respect to rfc7626.  Maybe turn the bullets in §14 into a short explanation of the major changes.

No Objection (2020-10-07 for -06) Sent

I have comments on the DISCUSS positions of Alissa and Warren, both of which I support to some extent:

On Warren’s point, which I wouldn’t have made it a DISCUSS myself, I agree that editorial changes are warranted so as to make the point more clearly and with less baggage. I think we all know what the document means here, but not all readers will, and there’s sufficient FUD in this area that it behooves us to be very careful about how we say things. Avoiding things such as “alleged” and “it has long been claimed” is easy, would go a long way toward clarity and avoidance of feeding the FUD, and is worth a brief editing pass. I leave it to Warren to work the details out with the working group.

On Alissa’s first point — why publish this update now, rather than waiting until more things shake out and settle down — I basically agree, though I’m torn between thinking that waiting is better... and, on the other hand, acknowledging that enough has already changed that it’s important to get the update out there, and that it can be updated again later.

On her second point, I’ll go in a different direction: it’s bordering on silly to think that any real end user can be said to “be aware of and have the ability to control” anything related to DNS settings and resolution options. If “users” refers to those of us writing these specs, sure. But when we’re talking about our siblings and cousins and parents, who are doctors and nurses, chefs and bakers, bank tellers and car mechanics, there is no hope of awareness and understanding of the choices and their consequences, nor that any form of “communicate clearly” will really accomplish anything. I see little to recommend pretending that it will.

So I, too, am not sure what this text is really meant to convey.

No Objection (2020-10-06 for -06) Sent

Section 1

   At the time of writing, almost all this DNS traffic is currently sent
   in clear (i.e., unencrypted).  However there is increasing deployment

nit: I think that "in the clear" is the term of art (add "the").

   Today, almost all DNS queries are sent over UDP [thomas-ditl-tcp].

It looks like
(https://mailarchive.ietf.org/arch/msg/dns-privacy/1pZL1FA57hzE1e09mQ2HMg2aWYY/)
Sara was going to follow up with the DITL authors to try and ascertain
whether "almost all queries" is still accurate for the "UDP" aspect,
though the IETF mailarchive search doesn't seem to find any more recent
traffic on that topic.  Do we know if anyone actually heard back about
this (or the "sent in [the] clear" a few lines previously)?
I do not pretend to have the expertise needed to judge how the changes
deployed by major browser affect the statistics for "all DNS traffic"
(which presumably includes both stub-to-resolver and
resolver-to-authoritative).

   This has practical consequences when considering encryption of the
   traffic as a possible privacy technique.  Some encryption solutions
   are only designed for TCP, not UDP and new solutions are still
   emerging [I-D.ietf-quic-transport] [I-D.huitema-quic-dnsoquic].

[It looks like dnsoquic became draft-huitema-dprive-dnsoquic.]

Section 3

   multiple dynamic contexts of each device.  This document does not
   attempt such a complex analysis, instead it presents an overview of
   the various considerations that could form the basis of such an
   analysis.

nit: looks like a comma splice.

Section 4.1

   authentication or authorization of the client (resolver).  Due to the
   lack of search capabilities, only a given QNAME will reveal the
   resource records associated with that name (or that name's non-
   existence).  In other words: one needs to know what to ask for, in

I agree with Warren that this statement ("only [...] will reveal [...]
or that name's non-existence") is overly strong.

Section 4.2

   The DNS request includes many fields, but two of them seem
   particularly relevant for the privacy issues: the QNAME and the
   source IP address. "source IP address" is used in a loose sense of
   "source IP address + maybe source port number", because the port

In other contexts I've seen this combination referred to as the
"transport address".

   The QNAME is the full name sent by the user.  It gives information
   about what the user does ("What are the MX records of example.net?"
   means he probably wants to send email to someone at example.net,
   which may be a domain used by only a few persons and is therefore
   very revealing about communication relationships).  [...]

(editorial) something like not-a-secret-cabal.example might make the
example more visceral than example.net does.

   create more problems for the user.  Also, sometimes, the QNAME embeds
   the software one uses, which could be a privacy issue.  For instance,
   _ldap._tcp.Default-First-Site-Name._sites.gc._msdcs.example.org.

(nit) I trust that this can be made into a complete sentence while
addressing Warren's more-substantive comment.

   There are also some BitTorrent clients that query an SRV record for
   _bittorrent-tracker._tcp.domain.example.

In a similar vein, I'm not sure what domain.example is supposed to
represent here -- the domain of the author of the BitTorrent client?

   Therefore, all the issues and warnings about collection of IP
   addresses apply here.  For the communication between the recursive

I mostly assume that this is intended to be a reference to the generic
concerns about "IP addresses are PII", etc., that one is ambiently
exposed to by reading enough about the Internet.  (There does not seem
to be previous discussion of "collection of IP addresses" in this
document, which would seem to indicate that it is not trying to refer
back to previous text.)  If so, perhaps an extra word or two would help
("all the standard issues and warnings", "all the generic issues and
warnings", etc.) clarify the intent of the reference.

   However, hiding does not always work.  Sometimes EDNS(0) Client
   subnet [RFC7871] is used (see its privacy analysis in
   [denis-edns-client-subnet]).  [...]

(nit) The wording here ("its privacy analysis") suggests that the
referenced document is an authoritative/official IETF position, but it
seems to be a blog post by a single individual.  Using "one" or "a"
rather than "its" would convey a less-authoritative connotation.

                                       In both cases, the IP address
   originating queries to the authoritative server is as sensitive as it
   is for HTTP [sidn-entrada].

I don't see how [sidn-entrada] supports the claim that end-user-adjacent
DNS client IP addresses are equally sensitive as HTTP client IP
Addresses; it mentions "sensitive" only twice (as "privacy-sensitive",
admittedly, applying to such IP addresses, but as an assertion without
justification) and "http" only in URLs (mostly in the references) and as
an example request.  It would feel more natural to use an IETF reference
here, as well -- e.g., RFC 7624 discusses correlating client IP
addresses with end users, RFC 7239 clearly covers privacy considerations
for sending client IP addresses in the "forwarded" header field, and
there are no doubt others -- though I do note the contents of the
paragraph after this one.

                     However, for both IPv4 and IPv6 addresses, it is
   important to note that source addresses are propagated with queries
   and comprise metadata about the host, user, or application that
   originated them.

(This "propagated with queries" is still contingent on EDNS(0) Client
Subnet from the previous paragraph, right?)

Section 4.2.1

   cache poisoning attacks by off-path attackers.  It is noted, however,
   that they are designed to just verify IP addresses (and should change
   once a client's IP address changes), they are not designed to
   actively track users (like HTTP cookies).

nit: comma splice.

Section 5.1

   not be.  When other protocols will become more and more privacy-aware
   and secured against surveillance (e.g., [RFC8446],
   [I-D.ietf-quic-transport]), the use of unencrypted transports for DNS
   may become "the weakest link" in privacy.  It is noted that at the
   time of writing there is on-going work attempting to encrypt the SNI
   in the TLS handshake [I-D.ietf-tls-sni-encryption].

This mention of encrypted "SNI" (now encrypted ClientHello) comes as a
bit of a non sequitur.  I suggest a bit of transition such as an
additional clause at the end of the sentences ", which is one of the
last remaning non-DNS cleartext identifiers of a connection target".
(While the actual work itself has progressed to encrypting the entire
ClientHello, I think it's okay to focus the exposition here on the SNI,
as the relevant attribute.)

                                                         It can be noted
      that if the user selects a single resolver with a small client
      population (even when using an encrypted transport) it can
      actually serve to aid tracking of that user as they move across
      network environment.

I wonder if it is worth adding another clause at the end: ", and that an
attacker in a position to observe the moving user is likely also able to
observe the likely-unencrypted DNS queries from the resolver to the
authoritative servers"
Also, nit: "environments" plural.

Section 5.2

   Traffic analysis of unpadded encrypted traffic is also possible
   [pitfalls-of-dns-encryption] because the sizes and timing of
   encrypted DNS requests and responses can be correlated to unencrypted
   DNS requests upstream of a recursive resolver.

We could (but don't have to) note that effective padding policies remain
an open area of research.

Section 6.1.1.2

   o communicate clearly the change in default to users

I think this is intending to say "when the default application resolver
changes away from the system resolver", but the present text is perhaps
a little unclear about what "the change" is referring to.

Section 6.1.2

                                                                Even if
   encrypted DNS such as DoH or DoT is used, unless the client has been
   configured in a secure way with the server identity, an active
   attacker can impersonate the server.  [...]

More than the server identity is needed -- the credentials or trust
anchor needed to authenticate a peer as that identity are also needed.

Section 6.1.3

   User privacy can also be at risk if there is blocking (by local
   network operators or more general mechanisms) of access to remote
   recursive servers that offer encrypted transports when the local
   resolver does not offer encryption and/or has very poor privacy
   policies.  [...]

I suggest adding "e.g." before "when the local resolver" to avoid giving
the impression that this is an exhaustive list.

   This is a form of Rendezvous-Based Blocking as described in
   Section 4.3 of [RFC7754].  Such blocklists often include servers know
   to be used for malware, bots or other security risks.  In order to
   prevent circumvention of their blocking policies, some networks also
   block access to resolvers with incompatible policies.

Perhaps this is touching too much on the controversial topic, but it
seems to me that the networks in question "attempt to block access";
whether or not they fully and reliably succeed at doing so is not clear.
(See also the near-impossibility of closing covert channels in
protocols.)

   It is also noted that attacks on remote resolver services, e.g., DDoS
   could force users to switch to other services that do not offer
   encrypted transports for DNS.

nit: comma after DDoS.

Section 6.1.4.2

   Some implementations have, in fact, chosen to restrict the use of the
   'User-Agent' header so that resolver operators cannot identify the
   specific application that is originating the DNS queries.

With similar disclaimer as previously, perhaps "trivially identify"?
There are other fingerprinting techniques possible even at, e.g., the TLS
layer (that we discussed previously in this document!), which still
apply to DoH.

Section 6.2

   This "protection", when using a large resolver with many clients, is
   no longer present if ECS [RFC7871] is used because, in this case, the
   authoritative name server sees the original IP address (or prefix,
   depending on the setup).

(side note) this has always been a bit confusing to me -- ECS is "client
subnet", not "client address", and I don't really understand why someone
would set the prefix length to the full 128 (or 32) bits of the address.
Is there really a lot of non-truncated client addresses being sent
around like this?  How did that happen?

                                                                    So,
   requests to a given ccTLD may go to servers managed by organizations
   outside of the ccTLD's country.  End users may not anticipate that,
   when doing a security analysis.

(Is this a "for example"?  It seems plausibly relevant for non-cc TLDs
as well.)

Section 7.1

   The IAB privacy and security program also have a work in progress
   [RFC7624] that considers such inference-based attacks in a more
   general framework.

I do not really think the final RFC constitutes a "work in progress"
anymore.

Section 8

   Passive DNS systems [passive-dns] allow reconstruction of the data of
   sometimes an entire zone.  They are used for many reasons -- some
   good, some bad.  Well-known passive DNS systems keep only the DNS
   responses, and not the source IP address of the client, precisely for
   privacy reasons.  Other passive DNS systems may not be so careful.

Perhaps not so well-intentioned, either...

   The revelations from the Edward Snowden documents, which were leaked
   from the National Security Agency (NSA) provide evidence of the use

nit: comma after "(NSA)".

Section 9

   To our knowledge, there are no specific privacy laws for DNS data, in
   any country.  Interpreting general privacy laws like
   [data-protection-directive] or GDPR [10] applicable in the European
   Union in the context of DNS traffic data is not an easy task, and we
   do not know a court precedent here.  See an interesting analysis in
   [sidn-entrada].

This text is essentially unchanged since RFC 7626; did we do much of a
search for whether the past five years have brought about changes in the
legal landscape?

No Objection (for -06) Not sent

No Objection (2020-10-03 for -06) Not sent

Thank you for this document; it is important and I learned a few things.

No Objection (2020-10-05 for -06) Sent

Hi,

Thank you for this document.  I found it interesting and easy to read.

A few minor comments/nits that I spotted whilst reading this document:

"in clear (i.e., unencrypted)." => "unencrypted."
"However there is" => "However, there is"
"designed for TCP, not UDP and new" => "designed for TCP, not UDP, and new"
"It can be noted also that" => "It can also be noted that"
"Both are a big privacy concern" => "Both are significant privacy concerns"
"de-NAT DNS queries dns-de-nat [3]" => "de-NAT DNS queries [3]"?

Regards,
Rob

DNS Privacy Considerations draft-ietf-dprive-rfc7626-bis-09

DNS Privacy Considerations
draft-ietf-dprive-rfc7626-bis-09