Skip to main content

IP Fragmentation Avoidance in DNS over UDP
draft-ietf-dnsop-avoid-fragmentation-17

Document Type Active Internet-Draft (dnsop WG)
Authors Kazunori Fujiwara , Paul A. Vixie
Last updated 2024-02-29
Replaces draft-fujiwara-dnsop-avoid-fragmentation
RFC stream Internet Engineering Task Force (IETF)
Intended RFC status Best Current Practice
Formats
Reviews
Additional resources Mailing list discussion
Stream WG state Submitted to IESG for Publication
Document shepherd Tim Wicinski
Shepherd write-up Show Last changed 2023-10-10
IESG IESG state IESG Evaluation::AD Followup
Action Holder
Consensus boilerplate Yes
Telechat date (None)
Needs 2 more YES or NO OBJECTION positions to pass.
Responsible AD Warren "Ace" Kumari
Send notices to benno@NLnetLabs.nl, swoolf@pir.org, tjw.ietf@gmail.com
IANA IANA review state Version Changed - Review Needed
draft-ietf-dnsop-avoid-fragmentation-17
Internet Research Task Force (IRTF)                         M. Waehlisch
Request for Comments: 7046                          link-lab & FU Berlin
Category: Experimental                                        T. Schmidt
ISSN: 2070-1721                                              HAW Hamburg
                                                               S. Venaas
                                                           Cisco Systems
                                                           December 2013

             A Common API for Transparent Hybrid Multicast

Abstract

   Group communication services exist in a large variety of flavors and
   technical implementations at different protocol layers.  Multicast
   data distribution is most efficiently performed on the lowest
   available layer, but a heterogeneous deployment status of multicast
   technologies throughout the Internet requires an adaptive service
   binding at runtime.  Today, it is difficult to write an application
   that runs everywhere and at the same time makes use of the most
   efficient multicast service available in the network.  Facing
   robustness requirements, developers are frequently forced to use a
   stable upper-layer protocol provided by the application itself.  This
   document describes a common multicast API that is suitable for
   transparent communication in underlay and overlay and that grants
   access to the different flavors of multicast.  It proposes an
   abstract naming scheme that uses multicast URIs, and it discusses
   mapping mechanisms between different namespaces and distribution
   technologies.  Additionally, this document describes the application
   of this API for building gateways that interconnect current Multicast
   Domains throughout the Internet.  It reports on an implementation of
   the programming Interface, including service middleware.  This
   document is a product of the Scalable Adaptive Multicast (SAM)
   Research Group.

Waehlisch, et al.             Experimental                      [Page 1]
RFC 7046                    Common Mcast API               December 2013

Status of This Memo

   This document is not an Internet Standards Track specification; it is
   published for examination, experimental implementation, and
   evaluation.

   This document defines an Experimental Protocol for the Internet
   community.  This document is a product of the Internet Research Task
   Force (IRTF).  The IRTF publishes the results of Internet-related
   research and development activities.  These results might not be
   suitable for deployment.  This RFC represents the consensus of the
   Scalable Adaptive Multicast Research Group of the Internet Research
   Task Force (IRTF).  Documents approved for publication by the IRSG
   are not a candidate for any level of Internet Standard; see Section 2
   of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc7046.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.

Table of Contents

   1. Introduction ....................................................4
      1.1. Use Cases for the Common API ...............................6
      1.2. Illustrative Examples ......................................7
           1.2.1. Support of Multiple Underlying Technologies .........7
           1.2.2. Support of Multi-Resolution Multicast ...............9
   2. Terminology ....................................................10
   3. Overview .......................................................10
      3.1. Objectives and Reference Scenarios ........................10
      3.2. Group Communication API and Protocol Stack ................12
      3.3. Naming and Addressing .....................................14
      3.4. Namespaces ................................................15

Waehlisch, et al.             Experimental                      [Page 2]
RFC 7046                    Common Mcast API               December 2013

      3.5. Name-to-Address Mapping ...................................15
           3.5.1. Canonical Mapping ..................................16
           3.5.2. Mapping at End Points ..............................16
           3.5.3. Mapping at Inter-Domain Multicast Gateways .........16
      3.6. A Note on Explicit Multicast (Xcast) ......................16
      3.7. MTU Handling ..............................................17
   4. Common Multicast API ...........................................18
      4.1. Notation ..................................................18
      4.2. URI Scheme Definition .....................................18
           4.2.1. Syntax .............................................18
           4.2.2. Semantic ...........................................19
           4.2.3. Generic Namespaces .................................20
           4.2.4. Application-Centric Namespaces .....................20
           4.2.5. Future Namespaces ..................................20
      4.3. Additional Abstract Data Types ............................21
           4.3.1. Interface ..........................................21
           4.3.2. Membership Events ..................................21
      4.4. Group Management Calls ....................................22
           4.4.1. Create .............................................22
           4.4.2. Delete .............................................22
           4.4.3. Join ...............................................22
           4.4.4. Leave ..............................................23
           4.4.5. Source Register ....................................23
           4.4.6. Source Deregister ..................................23
      4.5. Send and Receive Calls ....................................24
           4.5.1. Send ...............................................24
           4.5.2. Receive ............................................24
      4.6. Socket Options ............................................25
           4.6.1. Get Interfaces .....................................25
           4.6.2. Add Interface ......................................25
           4.6.3. Delete Interface ...................................26
           4.6.4. Set TTL ............................................26
           4.6.5. Get TTL ............................................26
           4.6.6. Atomic Message Size ................................27
      4.7. Service Calls .............................................27
           4.7.1. Group Set ..........................................27
           4.7.2. Neighbor Set .......................................28
           4.7.3. Children Set .......................................28
           4.7.4. Parent Set .........................................28
           4.7.5. Designated Host ....................................29
           4.7.6. Enable Membership Events ...........................29
           4.7.7. Disable Membership Events ..........................30
           4.7.8. Maximum Message Size ...............................30
   5. Implementation .................................................30
   6. IANA Considerations ............................................30
   7. Security Considerations ........................................31
   8. Acknowledgements ...............................................31

Waehlisch, et al.             Experimental                      [Page 3]
RFC 7046                    Common Mcast API               December 2013

   9. References .....................................................32
      9.1. Normative References ......................................32
      9.2. Informative References ....................................33
   Appendix A. C Signatures ..........................................35
   Appendix B. Use Case for the API ..................................37
   Appendix C. Deployment Use Cases for Hybrid Multicast .............38
     C.1. DVMRP ......................................................38
     C.2. PIM-SM .....................................................38
     C.3. PIM-SSM ....................................................39
     C.4. BIDIR-PIM ..................................................40

1.  Introduction

   Currently, group application programmers need to choose the
   distribution technology that the application will require at runtime.
   There is no common communication Interface that abstracts multicast
   transmission and subscriptions from the deployment state at runtime,
   nor has the use of DNS for Group Addresses been established.  The
   standard multicast socket options [RFC3493] [RFC3678] are bound to an
   IP version by not distinguishing between the naming and addressing of
   multicast identifiers.  Group communication, however,

   o  is commonly implemented in different flavors, such as any-source
      multicast (ASM) vs. source-specific multicast (SSM),

   o  is commonly implemented on different layers (e.g., IP vs.
      application-layer multicast), and

   o  may be based on different technologies on the same tier, as seen
      with IPv4 vs. IPv6.

   The objective of this document is to provide for programmers a
   universal access to group services.

   Multicast application development should be decoupled from
   technological deployment throughout the infrastructure.  It requires
   a common multicast API that offers calls to transmit and receive
   multicast data independent of the supporting layer and the underlying
   technological details.  For inter-technology transmissions, a
   consistent view of multicast states is needed as well.  This document
   describes an abstract group communication API and core functions
   necessary for transparent operations.  Specific implementation
   guidelines with respect to operating systems or programming languages
   are out of scope for this document.

Waehlisch, et al.             Experimental                      [Page 4]
RFC 7046                    Common Mcast API               December 2013

   In contrast to the standard multicast socket Interface, the API
   introduced in this document abstracts naming from addressing.  Using
   a multicast address in the current socket API predefines the
   corresponding routing layer.  In this specification, the multicast
   name used for joining a group denotes an application-layer data
   stream that is identified by a multicast URI, independent of its
   binding to a specific distribution technology.  Such a Group Name can
   be mapped to variable routing identifiers.

   The aim of this common API is twofold:

   o  Enable any application programmer to implement group-oriented data
      communication independent of the underlying delivery mechanisms.
      In particular, allow for a late binding of group applications to
      multicast technologies that makes applications efficient but
      robust with respect to deployment aspects.

   o  Allow for flexible namespace support in group addressing and
      thereby separate naming and addressing (or routing) schemes from
      the application design.  This abstraction not only decouples
      programs from specific aspects of underlying protocols but may
      open application design to extend to specifically flavored group
      services.

   Multicast technologies may be of various peer-to-peer kinds, IPv4 or
   IPv6 network-layer multicast, or implemented by some other
   application service.  Corresponding namespaces may be IP addresses or
   DNS naming, overlay hashes, or other application-layer group
   identifiers like <sip:*@peanuts.org>, but they can also be names
   independently defined by the applications.  Common namespaces are
   introduced later in this document but follow an open concept suitable
   for further extensions.

   This document also discusses mapping mechanisms between different
   namespaces and forwarding technologies and proposes expressions of
   defaults for an intended binding.  Additionally, the multicast API
   provides internal Interfaces to access current multicast states at
   the host.  Multiple multicast protocols may run in parallel on a
   single host.  These protocols may interact to provide a gateway
   function that bridges data between different domains.  The usage of
   this API at gateways operating between current multicast instances
   throughout the Internet is described as well.  Finally, a report on
   an implementation of the programming Interface, including service
   middleware, is presented.

Waehlisch, et al.             Experimental                      [Page 5]
RFC 7046                    Common Mcast API               December 2013

   This document represents the consensus of the SAM Research Group.  It
   has been reviewed by the Research Group members active in the
   specific area of work.  In addition, this document has been
   comprehensively reviewed by people who are not "in" the Research
   Group but are experts in the area.

1.1.  Use Cases for the Common API

   The following generic use cases can be identified; these use cases
   require an abstract common API for multicast services:

   Application Programming Independent of Technologies:  Application
      programmers are provided with group primitives that remain
      independent of multicast technologies and their deployment in
      target domains.  Thus, for a given application, they can develop a
      program that will run in every deployment scenario.  The use of
      Group Names in the form of abstract metadata types allows
      applications to remain namespace-agnostic in the sense that the
      resolution of namespaces and name-to-address mappings may be
      delegated to a system service at runtime.  Complexity is thereby
      minimized, as developers need not care about how data is
      distributed in groups, while the system service can take advantage
      of extended information of the network environment as acquired at
      startup.

   Global Identification of Groups:  Groups can be identified
      independent of technological instantiations and beyond deployment
      domains.  Taking advantage of the abstract naming, an application
      can thus match data received from different Interface technologies
      (e.g., IPv4, IPv6, and overlays) to belong to the same group.
      This not only increases flexibility -- an application may, for
      instance, combine heterogeneous multipath streams -- but also
      simplifies the design and implementation of gateways.

   Uniform Access to Multicast Flavors:  The URI naming scheme uniformly
      supports different flavors of group communication, such as
      any-source multicast and source-specific multicast, and selective
      broadcast, independent of their service instantiation.  The
      traditional SSM model, for instance, can experience manifold
      support by directly mapping the multicast URI (i.e.,
      "group@instantiation") to an (S,G) state on the IP layer, by first
      resolving S for a subsequent Group Address query, by transferring
      this process to any of the various source-specific overlay
      schemes, or by delegating to a plain replication server.  The
      application programmer can invoke any of these underlying
      mechanisms with the same line of code.

Waehlisch, et al.             Experimental                      [Page 6]
RFC 7046                    Common Mcast API               December 2013

   Simplified Service Deployment through Generic Gateways:  The common
      multicast API allows for an implementation of abstract gateway
      functions with mappings to specific technologies residing at the
      system level.  Generic gateways may provide a simple bridging
      service and facilitate an inter-domain deployment of multicast.

   Mobility-Agnostic Group Communication:  Group naming and management
      as foreseen in the common multicast API remain independent of
      locators.  Naturally, applications stay unaware of any mobility-
      related address changes.  Handover-initiated re-addressing is
      delegated to the mapping services at the system level and may be
      designed to smoothly interact with mobility management solutions
      provided at the network or transport layer (see [RFC5757] for
      mobility-related aspects).

1.2.  Illustrative Examples

1.2.1.  Support of Multiple Underlying Technologies

   On a very high level, the common multicast API provides the
   application programmer with one single Interface to manage multicast
   content independent of the technology underneath.  Considering the
   following simple example in Figure 1, a multicast source S is
   connected via IPv4 and IPv6.  It distributes one flow of multicast
   content (e.g., a movie).  Receivers are connected via IPv4/v6 and
   Overlay Multicast (OM), respectively.

    +-------+       +-------+                       +-------+
    |   S   |       |  R1   |                       |  R3   |
    +-------+       +-------+                       +-------+
   v6|   v4|           |v4                             |OM
     |     |          /                                |
     |  ***| ***  ***/ **                          *** /***  ***  ***
      \*   |*   **  /**   *                       *   /*   **   **   *
      *\   \_______/_______*__v4__+-------+      *   /                *
       *\    IPv4/v6      *       |  R2   |__OM__ *_/ Overlay Mcast  *
      *  \_________________*__v6__+-------+      *                    *
       *   **   **   **   *                       *    **   **   **  *
        ***  ***  ***  ***                         ***  ***  ***  ***

   Figure 1: Common Scenario: Source S Sends the Same Multicast Content
                        via Different Technologies

   Using the current BSD socket API, the application programmer needs to
   decide on the IP technologies at coding time.  Additional
   distribution techniques, such as overlay multicast, must be
   individually integrated into the application.  For each technology,
   the application programmer needs to create a separate socket and

Waehlisch, et al.             Experimental                      [Page 7]
RFC 7046                    Common Mcast API               December 2013

   #x27;s maximum UDP payload size
           discussions . . . . . . . . . . . . . . . . . . . . . . .  10
   Appendix B.  Minimal-responses  . . . . . . . . . . . . . . . . .  11
   Appendix C.  Known Implementations  . . . . . . . . . . . . . . .  11
     C.1.  BIND 9  . . . . . . . . . . . . . . . . . . . . . . . . .  12
     C.2.  Knot DNS and Knot Resolver  . . . . . . . . . . . . . . .  12
     C.3.  PowerDNS Authoritative Server, PowerDNS Recursor, PowerDNS
           dnsdist . . . . . . . . . . . . . . . . . . . . . . . . .  13
     C.4.  PowerDNS Authoritative Server . . . . . . . . . . . . . .  13
     C.5.  Unbound . . . . . . . . . . . . . . . . . . . . . . . . .  13
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  13

1.  Introduction

   DNS has an EDNS0 [RFC6891] mechanism.  The widely deployed EDNS0
   feature in the DNS enables a DNS receiver to indicate its received
   UDP message size capacity which supports the sending of large UDP
   responses by a DNS server.  DNS over UDP invites IP fragmentation
   when a packet is larger than the MTU of some network in the packet's
   path.

   Fragmented DNS UDP responses have systemic weaknesses, which expose
   the requestor to DNS cache poisoning from off-path attackers.  (See
   Section 7.3 for references and details.)

Fujiwara & Vixie        Expires 1 September 2024                [Page 2]
Internet-Draft             avoid-fragmentation             February 2024

   [RFC8900] states that IP fragmentation introduces fragility to
   Internet communication.  The transport of DNS messages over UDP
   should take account of the observations stated in that document.

   TCP avoids fragmentation by segmenting data into packets that are
   smaller than or equal to the Maximum Segment Size (MSS).  For each
   transmitted segment, the size of the IP and TCP headers is known, and
   the IP packet size can be chosen to keep it within the estimated MTU
   and the other end's MSS.  This takes advantage of the elasticity of
   TCP's packetizing process as to how much queued data will fit into
   the next segment.  In contrast, DNS over UDP has little datagram size
   elasticity and lacks insight into IP header and option size, so we
   must make more conservative estimates about available UDP payload
   space.

   [RFC7766] states that all general-purpose DNS implementations MUST
   support both UDP and TCP transport.

   DNS transaction security [RFC8945] [RFC2931] does protect against the
   security risks of fragmentation, including protecting delegation
   responses.  But [RFC8945] has limited applicability due to key
   distribution requirements and there is little if any deployment of
   [RFC2931].

   This document specifies various techniques to avoid IP fragmentation
   of UDP packets in DNS.  This document is primarily applicable to DNS
   use on the global Internet.

   In contrast, a path MTU that deviates from the recommended value
   might be obtained through static configuration, server routing hints,
   or a future discovery protocol.  However, addressing this falls
   outside the scope of this document and may be the subject of future
   specifications.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   "Requestor" refers to the side that sends a request.  "Responder"
   refers to an authoritative server, recursive resolver or other DNS
   component that responds to questions.  (Quoted from EDNS0 [RFC6891])

   "Path MTU" is the minimum link MTU of all the links in a path between
   a source node and a destination node.  (Quoted from [RFC8201])

Fujiwara & Vixie        Expires 1 September 2024                [Page 3]
Internet-Draft             avoid-fragmentation             February 2024

   In this document, the term "Path MTU discovery" includes both
   Classical Path MTU discovery [RFC1191], [RFC8201], and Packetization
   Layer Path MTU discovery [RFC8899].

   Many of the specialized terms used in this document are defined in
   DNS Terminology [RFC8499].

3.  How to avoid IP fragmentation in DNS

   These recommendations are intended for nodes with global IP addresses
   on the Internet.  Private networks or local networks are out of the
   scope of this document.

   The methods to avoid IP fragmentation in DNS are described below:

3.1.  Recommendations for UDP responders

   R1.  UDP responders SHOULD NOT use IPv6 fragmentation [RFC8200].

   R2.  Where supported, UDP responders SHOULD set IP "Don't Fragment
   flag (DF) bit" [RFC0791] on IPv4.

   At the time of writing, most DNS server software did not set the DF
   bit for IPv4, and many operating systems' kernels constraint make it
   difficult to set the DF bit in all cases.

   R3.  UDP responders SHOULD compose response packets that fit in the
   minimum of the offered requestor's maximum UDP payload size
   [RFC6891], the interface MTU, the network MTU value configured by the
   knowledge of the network operators, and the RECOMMENDED maximum DNS/
   UDP payload size 1400.  (See Appendix A for more information.)

   R4.  If the UDP responder detects an immediate error indicating that
   the UDP packet cannot be sent beyond the path MTU size, the UDP
   responder MAY recreate response packets fit in the path MTU size, or
   with the TC bit set.

   The cause and effect of the TC bit are unchanged [RFC1035].

3.2.  Recommendations for UDP requestors

   R5.  UDP requestors SHOULD limit the requestor's maximum UDP payload
   size.  It SHOULD use a limit of 1400 bytes, but a smaller limit MAY
   be used.  (See Appendix A for more information.)

   R6.  UDP requestors SHOULD drop fragmented DNS/UDP responses without
   IP reassembly to avoid cache poisoning attacks.

Fujiwara & Vixie        Expires 1 September 2024                [Page 4]
Internet-Draft             avoid-fragmentation             February 2024

   R7.  DNS responses may be dropped by IP fragmentation.  Upon a
   timeout, to avoid resolution failures, UDP requestors SHOULD retry
   using TCP or UDP with a smaller EDNS requestor's maximum UDP payload
   size per local policy.  UDP requestors SHOULD observe [RFC8961] in
   setting their timeout.

4.  Recommendations for DNS operators

   Large DNS responses are typically the result of zone configuration.
   People who publish information in the DNS SHOULD seek configurations,
   resulting in small responses.  For example,

   R8.  Use a smaller number of name servers.

   R9.  Use a smaller number of A/AAAA RRs for a domain name.

   R10.  Use minimal-responses configuration: Some implementations have
   a 'minimal responses' configuration option that causes DNS servers to
   make response packets smaller, containing only mandatory and required
   data (Appendix B).

   R11.  Use a smaller signature / public key size algorithm for DNSSEC.
   Notably, the signature sizes of ECDSA and EdDSA are smaller than
   those of equivalent cryptographic strength using RSA.

   It is difficult to determine a specific upper limit for R8, R9, and
   R11, but it is sufficient if all responses from the DNS servers are
   below the size of R3 and R5.

5.  Protocol compliance considerations

   Some authoritative servers deviate from the DNS standard as follows:

   *  Some authoritative servers ignore the EDNS0 requestor's maximum
      UDP payload size and return large UDP responses.  [Fujiwara2018]

   *  Some authoritative servers do not support TCP transport.

   Such non-compliant behavior cannot become implementation or
   configuration constraints for the rest of the DNS.  If failure is the
   result, then that failure must be localized to the non-compliant
   servers.

6.  IANA Considerations

   This document requests no IANA actions.

Fujiwara & Vixie        Expires 1 September 2024                [Page 5]
Internet-Draft             avoid-fragmentation             February 2024

7.  Security Considerations

7.1.  On-path fragmentation on IPv4

   If the Don't Fragment (DF) bit is not set, on-path fragmentation may
   happen on IPv4, and be vulnerable, as shown in Section 7.3.  To avoid
   this, recommendation R6 SHOULD be used to discard the fragmented
   responses and retry by TCP.

7.2.  Small MTU network

   When avoiding fragmentation, a DNS/UDP requestor behind a small MTU
   network may experience UDP timeouts, which would reduce performance
   and which may lead to TCP fallback.  This would indicate prior
   reliance upon IP fragmentation, which is considered to be harmful to
   both the performance and stability of applications, endpoints, and
   gateways.  Avoiding IP fragmentation will improve operating
   conditions overall, and the performance of DNS/TCP has increased and
   will continue to increase.

   If a UDP response packet is dropped in transit, up to and including
   the network stack of the initiator, it increases the attack window
   for poisoning the requestor's cache.

7.3.  Weaknesses of IP fragmentation

   "Fragmentation Considered Poisonous" [Herzberg2013] proposed
   effective off-path DNS cache poisoning attack vectors using IP
   fragmentation.  "IP fragmentation attack on DNS" [Hlavacek2013] and
   "Domain Validation++ For MitM-Resilient PKI" [Brandt2018] proposed
   that off-path attackers can intervene in the path MTU discovery
   [RFC1191] to perform intentionally fragmented responses from
   authoritative servers.  [RFC7739] stated the security implications of
   predictable fragment identification values.

   In Section 3.2 (Message Side Guidelines) of UDP Usage Guidelines
   [RFC8085] we are told that an application SHOULD NOT send UDP
   datagrams that result in IP packets that exceed the Maximum
   Transmission Unit (MTU) along the path to the destination.

Fujiwara & Vixie        Expires 1 September 2024                [Page 6]
Internet-Draft             avoid-fragmentation             February 2024

   A DNS message receiver cannot trust fragmented UDP datagrams
   primarily due to the small amount of entropy provided by UDP port
   numbers and DNS message identifiers, each of which being only 16 bits
   in size, and both likely being in the first fragment of a packet if
   fragmentation occurs.  By comparison, the TCP protocol stack controls
   packet size and avoids IP fragmentation under ICMP NEEDFRAG attacks.
   In TCP, fragmentation should be avoided for performance reasons,
   whereas for UDP, fragmentation should be avoided for resiliency and
   authenticity reasons.

7.4.  DNS Security Protections

   DNSSEC is a countermeasure against cache poisoning attacks that use
   IP fragmentation.  However, DNS delegation responses are not signed
   with DNSSEC, and DNSSEC does not have a mechanism to get the correct
   response if an incorrect delegation is injected.  This is a denial-
   of-service vulnerability that can yield failed name resolutions.  If
   cache poisoning attacks can be avoided, DNSSEC validation failures
   will be avoided.

8.  Acknowledgments

   The author would like to specifically thank Paul Wouters, Mukund
   Sivaraman, Tony Finch, Hugo Salgado, Peter van Dijk, Brian Dickson,
   Puneet Sood, Jim Reid, Petr Spacek, Andrew McConachie, Joe Abley,
   Daisuke Higashi, Joe Touch and Wouter Wijngaards for extensive review
   and comments.

9.  References

9.1.  Normative References

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              DOI 10.17487/RFC0791, September 1981,
              <https://www.rfc-editor.org/rfc/rfc791>.

   [RFC1035]  Mockapetris, P., "Domain names - implementation and
              specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
              November 1987, <https://www.rfc-editor.org/rfc/rfc1035>.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              DOI 10.17487/RFC1191, November 1990,
              <https://www.rfc-editor.org/rfc/rfc1191>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

Fujiwara & Vixie        Expires 1 September 2024                [Page 7]
Internet-Draft             avoid-fragmentation             February 2024

   [RFC2931]  Eastlake 3rd, D., "DNS Request and Transaction Signatures
              ( SIG(0)s )", RFC 2931, DOI 10.17487/RFC2931, September
              2000, <https://www.rfc-editor.org/rfc/rfc2931>.

   [RFC6891]  Damas, J., Graff, M., and P. Vixie, "Extension Mechanisms
              for DNS (EDNS(0))", STD 75, RFC 6891,
              DOI 10.17487/RFC6891, April 2013,
              <https://www.rfc-editor.org/rfc/rfc6891>.

   [RFC7739]  Gont, F., "Security Implications of Predictable Fragment
              Identification Values", RFC 7739, DOI 10.17487/RFC7739,
              February 2016, <https://www.rfc-editor.org/rfc/rfc7739>.

   [RFC7766]  Dickinson, J., Dickinson, S., Bellis, R., Mankin, A., and
              D. Wessels, "DNS Transport over TCP - Implementation
              Requirements", RFC 7766, DOI 10.17487/RFC7766, March 2016,
              <https://www.rfc-editor.org/rfc/rfc7766>.

   [RFC8085]  Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
              Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
              March 2017, <https://www.rfc-editor.org/rfc/rfc8085>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

   [RFC8200]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", STD 86, RFC 8200,
              DOI 10.17487/RFC8200, July 2017,
              <https://www.rfc-editor.org/rfc/rfc8200>.

   [RFC8201]  McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed.,
              "Path MTU Discovery for IP version 6", STD 87, RFC 8201,
              DOI 10.17487/RFC8201, July 2017,
              <https://www.rfc-editor.org/rfc/rfc8201>.

   [RFC8499]  Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS
              Terminology", BCP 219, RFC 8499, DOI 10.17487/RFC8499,
              January 2019, <https://www.rfc-editor.org/rfc/rfc8499>.

   [RFC8899]  Fairhurst, G., Jones, T., Tüxen, M., Rüngeler, I., and T.
              Völker, "Packetization Layer Path MTU Discovery for
              Datagram Transports", RFC 8899, DOI 10.17487/RFC8899,
              September 2020, <https://www.rfc-editor.org/rfc/rfc8899>.

Fujiwara & Vixie        Expires 1 September 2024                [Page 8]
Internet-Draft             avoid-fragmentation             February 2024

   [RFC8945]  Dupont, F., Morris, S., Vixie, P., Eastlake 3rd, D.,
              Gudmundsson, O., and B. Wellington, "Secret Key
              Transaction Authentication for DNS (TSIG)", STD 93,
              RFC 8945, DOI 10.17487/RFC8945, November 2020,
              <https://www.rfc-editor.org/rfc/rfc8945>.

   [RFC8961]  Allman, M., "Requirements for Time-Based Loss Detection",
              BCP 233, RFC 8961, DOI 10.17487/RFC8961, November 2020,
              <https://www.rfc-editor.org/rfc/rfc8961>.

9.2.  Informative References

   [Brandt2018]
              Brandt, M., Dai, T., Klein, A., Shulman, H., and M.
              Waidner, "Domain Validation++ For MitM-Resilient PKI",
              Proceedings of the 2018 ACM SIGSAC Conference on Computer
              and Communications Security , 2018.

   [DNSFlagDay2020]
              "DNS flag day 2020", n.d., <https://dnsflagday.net/2020/>.

   [Fujiwara2018]
              Fujiwara, K., "Measures against cache poisoning attacks
              using IP fragmentation in DNS", OARC 30 Workshop , 2019.

   [Herzberg2013]
              Herzberg, A. and H. Shulman, "Fragmentation Considered
              Poisonous", IEEE Conference on Communications and Network
              Security , 2013.

   [Hlavacek2013]
              Hlavacek, T., "IP fragmentation attack on DNS", RIPE 67
              Meeting , 2013, <https://ripe67.ripe.net/
              presentations/240-ipfragattack.pdf>.

   [Huston2021]
              Huston, G. and J. Damas, "Measuring DNS Flag Day 2020",
              OARC 34 Workshop , February 2021.

   [RFC2308]  Andrews, M., "Negative Caching of DNS Queries (DNS
              NCACHE)", RFC 2308, DOI 10.17487/RFC2308, March 1998,
              <https://www.rfc-editor.org/rfc/rfc2308>.

   [RFC2782]  Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for
              specifying the location of services (DNS SRV)", RFC 2782,
              DOI 10.17487/RFC2782, February 2000,
              <https://www.rfc-editor.org/rfc/rfc2782>.

Fujiwara & Vixie        Expires 1 September 2024                [Page 9]
Internet-Draft             avoid-fragmentation             February 2024

   [RFC4035]  Arends, R., Austein, R., Larson, M., Massey, D., and S.
              Rose, "Protocol Modifications for the DNS Security
              Extensions", RFC 4035, DOI 10.17487/RFC4035, March 2005,
              <https://www.rfc-editor.org/rfc/rfc4035>.

   [RFC5155]  Laurie, B., Sisson, G., Arends, R., and D. Blacka, "DNS
              Security (DNSSEC) Hashed Authenticated Denial of
              Existence", RFC 5155, DOI 10.17487/RFC5155, March 2008,
              <https://www.rfc-editor.org/rfc/rfc5155>.

   [RFC8900]  Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O.,
              and F. Gont, "IP Fragmentation Considered Fragile",
              BCP 230, RFC 8900, DOI 10.17487/RFC8900, September 2020,
              <https://www.rfc-editor.org/rfc/rfc8900>.

   [RFC9460]  Schwartz, B., Bishop, M., and E. Nygren, "Service Binding
              and Parameter Specification via the DNS (SVCB and HTTPS
              Resource Records)", RFC 9460, DOI 10.17487/RFC9460,
              November 2023, <https://www.rfc-editor.org/rfc/rfc9460>.

   [RFC9471]  Andrews, M., Huque, S., Wouters, P., and D. Wessels, "DNS
              Glue Requirements in Referral Responses", RFC 9471,
              DOI 10.17487/RFC9471, September 2023,
              <https://www.rfc-editor.org/rfc/rfc9471>.

Appendix A.  Details of requestor's maximum UDP payload size discussions

   There are many discussions for default path MTU size and requestor's
   maximum UDP payload size.

   *  The minimum MTU for an IPv6 interface is 1280 octets (see
      Section 5 of [RFC8200]).  So, we can use it as the default path
      MTU value for IPv6.  The corresponding minimum MTU for an IPv4
      interface is 68 (60 + 8) [RFC0791].

   *  [RFC4035] defines that "A security-aware name server MUST support
      the EDNS0 message size extension, MUST support a message size of
      at least 1220 octets".  Then, the smallest number of the maximum
      DNS/UDP payload size is 1220.

   *  In order to avoid IP fragmentation, [DNSFlagDay2020] proposed that
      the UDP requestors set the requestor's payload size to 1232, and
      the UDP responders compose UDP responses so they fit in 1232
      octets.  The size 1232 is based on an MTU of 1280, which is
      required by the IPv6 specification [RFC8200], minus 48 octets for
      the IPv6 and UDP headers.

Fujiwara & Vixie        Expires 1 September 2024               [Page 10]
Internet-Draft             avoid-fragmentation             February 2024

   *  Most of the Internet and especially the inner core has an MTU of
      at least 1500 octets.  Maximum DNS/UDP payload size for IPv6 on
      MTU 1500 ethernet is 1452 (1500 minus 40 (IPv6 header size) minus
      8 (UDP header size)).  To allow for possible IP options and
      distant tunnel overhead, the recommendation of default maximum
      DNS/UDP payload size is 1400.

   *  [Huston2021] analyzed the result of [DNSFlagDay2020] and reported
      that their measurements suggest that in the interior of the
      Internet between recursive resolvers and authoritative servers the
      prevailing MTU is at 1,500 and there is no measurable signal of
      use of smaller MTUs in this part of the Internet, and proposed
      that their measurements suggest setting the EDNS0 requestor's UDP
      payload size to 1472 octets for IPv4, and 1452 octets for IPv6.

   As a result of discussions, this document decided to recommend a
   value of 1400, with smaller values also allowed.

Appendix B.  Minimal-responses

   Some implementations have a "minimal responses" configuration
   setting/option that causes a DNS server to make response packets
   smaller, containing only mandatory and required data.

   Under the minimal-responses configuration, a DNS server composes
   responses containing only necessary RRs.  For delegations, see
   [RFC9471].  In case of a non-existent domain name or non-existent
   type, the authority section will contain an SOA record and the answer
   section is empty. (defined in Section 2 of [RFC2308]).

   Some resource records (MX, SRV, SVCB, HTTPS) require additional A,
   AAAA, and SVCB records in the Additional Section defined in
   [RFC1035], [RFC2782] and [RFC9460].

   In addition, if the zone is DNSSEC signed and a query has the DNSSEC
   OK bit, signatures are added in the answer section, or the
   corresponding DS RRSet and signatures are added in the authority
   section.  Details are defined in [RFC4035] and [RFC5155].

Appendix C.  Known Implementations

   Editor note: RFC Editor, please remove this entire section.

   This section records the status of known implementations of these
   best practices defined by this specification at the time of
   publication, and any deviation from the specification.

Fujiwara & Vixie        Expires 1 September 2024               [Page 11]
Internet-Draft             avoid-fragmentation             February 2024

   Please note that the listing of any individual implementation here
   does not imply endorsement by the IETF.  Furthermore, no effort has
   been spent to verify the information presented here that was supplied
   by IETF contributors.

C.1.  BIND 9

   BIND 9 does not implement the recommendations 1 and 2 in Section 3.1.

   BIND 9 on Linux sets IP_MTU_DISCOVER to IP_PMTUDISC_OMIT with a
   fallback to IP_PMTUDISC_DONT.

   BIND 9 on systems with IP_DONTFRAG (such as FreeBSD), IP_DONTFRAG is
   disabled.

   Accepting PATH MTU Discovery for UDP is considered harmful and
   dangerous.  BIND 9's settings avoid attacks to path MTU discovery.

   For recommendation 3, BIND 9 will honor the requestor's size up to
   the configured limit (max-udp-size).  The UDP response packet is
   bound to be between 512 and 4096 bytes, with the default set to 1232.
   BIND 9 supports the requestor's size up to the configured limit (max-
   udp-size).

   In the case of recommendation 4, and the send fails with EMSGSIZE,
   BIND 9 set the TC bit and try to send a minimal answer again.

   In the first recommendation of Section 3.2, BIND 9 uses the edns-buf-
   size option, with the default of 1232.

   BIND 9 does implement recommendation 2 of Section 3.2.

   For recommendation 3, after two UDP timeouts, BIND 9 will fall back
   to TCP.

C.2.  Knot DNS and Knot Resolver

   Both Knot servers set IP_PMTUDISC_OMIT to avoid path MTU spoofing.
   UDP size limit is 1232 by default.

   Fragments are ignored if they arrive over an XDP interface.

   TCP is attempted after repeated UDP timeouts.

   Minimal responses are returned and are currently not configurable.

   Smaller signatures are used, with ecdsap256sha256 as the default.

Fujiwara & Vixie        Expires 1 September 2024               [Page 12]
Internet-Draft             avoid-fragmentation             February 2024

C.3.  PowerDNS Authoritative Server, PowerDNS Recursor, PowerDNS dnsdist

   *  IP_PMTUDISC_OMIT with fallback to IP_PMTUDISC_DONT

   *  default EDNS buffer size of 1232, no probing for smaller sizes

   *  no handling of EMSGSIZE

   *  Recursor: UDP timeouts do not cause a switch to TCP.  "Spoofing
      nearmisses" do.

C.4.  PowerDNS Authoritative Server

   *  the default DNSSEC algorithm is 13

   *  responses are minimal, this is not configurable

C.5.  Unbound

   Unbound sets IP_MTU_DISCOVER to IP_PMTUDISC_OMIT with fallback to
   IP_PMTUDISC_DONT.  It also disables IP_DONTFRAG on systems that have
   it, but not on Apple systems.  On systems that support it Unbound
   sets IPV6_USE_MIN_MTU, with a fallback to IPV6_MTU at 1280, with a
   fallback to IPV6_USER_MTU.  It also sets IPV6_MTU_DISCOVER to
   IPV6_PMTUDISC_OMIT with a fallback to IPV6_PMTUDISC_DONT.

   Unbound requests UDP size 1232 from peers, by default.  The
   requestors size is limited to a max of 1232.

   After some timeouts, Unbound retries with a smaller size, if that is
   smaller, at size 1232 for IPv6 and 1472 for IPv4.  This does not do
   anything since the flag day change to 1232.

   Unbound has minimal responses as an option, default on.

Authors' Addresses

   Kazunori Fujiwara
   Japan Registry Services Co., Ltd.
   Chiyoda First Bldg. East 13F, 3-8-1 Nishi-Kanda, Chiyoda-ku, Tokyo
   101-0065
   Japan
   Phone: +81 3 5215 8451
   Email: fujiwara@jprs.co.jp

Fujiwara & Vixie        Expires 1 September 2024               [Page 13]
Internet-Draft             avoid-fragmentation             February 2024

   Paul Vixie
   AWS Security
   11400 La Honda Road
   Woodside, CA,  94062
   United States of America
   Phone: +1 650 393 3994
   Email: paul@redbarn.org

Fujiwara & Vixie        Expires 1 September 2024               [Page 14]