Skip to main content

FlowQueue-Codel
draft-ietf-aqm-fq-codel-03

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft that was ultimately published as RFC 8290.
Authors Toke Høiland-Jørgensen , Paul McKenney , dave.taht@gmail.com , Jim Gettys , Eric Dumazet
Last updated 2015-11-20
RFC stream Internet Engineering Task Force (IETF)
Formats
Reviews
Additional resources Mailing list discussion
Stream WG state WG Document
Waiting for Referenced Document
Document shepherd Wesley Eddy
IESG IESG state Became RFC 8290 (Experimental)
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-ietf-aqm-fq-codel-03
Operations Area Working Group                                   F. Baker
Internet-Draft                                             Cisco Systems
Intended status: BCP                                       June 12, 2012
Expires: December 14, 2012

                   On Firewalls in Internet Security
                     draft-ietf-opsawg-firewalls-00

Abstract

   There is an ongoing discussion regarding the place of firewalls in
   security.  This note is intended to capture and try to make sense out
   of it.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

5.3.  Probability of hash collisions

   Since the Linux FQ-CoDel implementation by default uses 1024 hash
   buckets, the probability that (say) 100 flows will all hash to the
   same bucket is something like ten to the power of minus 300.  Thus,
   the probability that at least one of the flows will hash to some
   other queue is very high indeed.

   Expanding on this, based on analytical equations for hash collision
   probabilities, for 100 flows, the probability of no collision is
   90.78%; the probability that no more than two of the 100 flows will
   be involved in any given collision = 99.57%; and the probability that
   no more than three of the 100 flows will be involved in any given
   collision = 99.99%.

   These probabilities can be improved upon by using set-associative
   hashing, a technique used in the Cake algorithm currently being
   developed as a further development upon the FQ-CoDel principles.  For
   a 4-way associative hash with the same number of total queues, the
   probability of no collisions for 100 flows is 99.93%, while for an
   8-way associative hash it is ~100%.

5.4.  Memory Overhead

   FQ-CoDel can be implemented with a very low memory footprint (less
   than 64 bytes per queue on 64 bit systems).  These are the data
   structures used in the Linux implementation:

   struct codel_vars {
      u32             count;
      u32             lastcount;
      bool            dropping;
      u16             rec_inv_sqrt;
      codel_time_t    first_above_time;
      codel_time_t    drop_next;
      codel_time_t    ldelay;
   };

   struct fq_codel_flow {
      struct sk_buff    *head;
      struct sk_buff    *tail;
      struct list_head  flowchain;
      int               deficit;
      u32               dropped; /* # of drops (or ECN marks) on flow */
      struct codel_vars cvars;
   };

   The master table managing all queues looks like this:

Hoeiland-Joergensen, et alExpires May 23, 2016                 [Page 12]
Internet-Draft                  fq-codel                   November 2015

   struct fq_codel_sched_data {
      struct tcf_proto *filter_list;  /* optional external classifier */
      struct fq_codel_flow *flows;    /* Flows table [flows_cnt] */
      u32             *backlogs;      /* backlog table [flows_cnt] */
      u32             flows_cnt;      /* number of flows */
      u32             perturbation;   /* hash perturbation */
      u32             quantum;        /* psched_mtu(qdisc_dev(sch)); */
      struct codel_params cparams;
      struct codel_stats cstats;
      u32             drop_overlimit;
      u32             new_flow_count;

      struct list_head new_flows;     /* list of new flows */
      struct list_head old_flows;     /* list of old flows */
   };

5.5.  Per-Packet Timestamping

   The CoDel portion of the algorithm requires per-packet timestamps be
   stored along with the packet.  While this approach works well for
   software-based routers, it may be impossible to retrofit devices that
   do most of their processing in silicon and lack space or mechanism
   for timestamping.

   Also, while perfect resolution is not needed, timestamp resolution
   below the target is necessary.  Furthermore, timestamping functions
   in the core OS need to be efficient as they are called at least once
   on each packet enqueue and dequeue.

5.6.  Other forms of "Fair Queueing"

   Much of the scheduling portion of FQ-CoDel is derived from DRR and is
   substantially similar to DRR++. SFQ-based versions have also been
   produced and tested in ns2.  Other forms of Fair Queueing, such as
   WFQ or QFQ, have not been thoroughly explored, but there's no a
   priori reason why the round-robin scheduling of FQ-CoDel couldn't be
   replaced with something else.

5.7.  Differences between CoDel and FQ-CoDel behaviour

   CoDel can be applied to a single queue system as a straight AQM,
   where it converges towards an "ideal" drop rate (i.e. one that
   minimises delay while keeping a high link utilisation), and then
   optimises around that control point.

   The scheduling of FQ-CoDel mixes packets of competing flows, which
   acts to pace bursty flows to better fill the pipe.  Additionally, a
   new flow gets substantial "credit" over other flows until CoDel finds

Hoeiland-Joergensen, et alExpires May 23, 2016                 [Page 13]
Internet-Draft                  fq-codel                   November 2015

   an ideal drop rate for it.  However, for a new flow that exceeds the
   configured quantum, more time passes before all of its data is
   delivered (as packets from it, too, are mixed across the other
   existing queue-building flows).  Thus, FQ-CoDel takes longer (as
   measured in time) to converge towards an ideal drop rate for a given
   new flow, but does so within fewer delivered _packets_ from that
   flow.

   Finally, the flow isolation FQ-CoDel provides means that the CoDel
   drop mechanism operates on the flows actually building queues, which
   results in packets being dropped more accurately from the largest
   flows than CoDel alone manages.  Additionally, flow isolation
   radically improves the transient behaviour of the network when
   traffic or link characteristics change (e.g. when new flows start up
   or the link bandwidth changes); while CoDel itself can take a while
   to respond, fq_codel doesn't miss a beat.

6.  Limitations of flow queueing

   While FQ-CoDel has been shown in many scenarios to offer significant
   performance gains, there are some scenarios where the scheduling
   algorithm in particular is not a good fit.  This section documents
   some of the known cases which either may require tweaking the default
   behaviour, or where alternatives to flow queueing should be
   considered.

6.1.  Fairness between things other than flows

   In some parts of the network, enforcing flow-level fairness may not
   be desirable, or some other form of fairness may be more important.
   An example of this can be an Internet Service Provider that may be
   more interested in ensuring fairness between customers than between
   flows.  Or a hosting or transit provider that wishes to ensure
   fairness between connecting Autonomous Systems or networks.  Another
   issue can be that the number of simultaneous flows experienced at a
   particular link can be too high for flow-based fairness queueing to
   be effective.

   Whatever the reason, in a scenario where fairness between flows is
   not desirable, reconfiguring FQ-CoDel to match on a different
   characteristic can be a way forward.  The implementation in Linux can
   leverage the packet matching mechanism of the _tc_ subsystem to use
   any available packet field to partition packets into virtual queues,
   to for instance match on address or subnet source/destination pairs,
   application layer characteristics, etc.

   Furthermore, as commonly deployed today, FQ-CoDel is used with three
   or more tiers of classification: priority, best effort and

Hoeiland-Joergensen, et alExpires May 23, 2016                 [Page 14]
Internet-Draft                  fq-codel                   November 2015

   background, based on diffserv markings.  Some products do more
   detailed classification, including deep packet inspection and
   destination-specific filters to achieve their desired result.

6.2.  Flow bunching by opaque encapsulation

   Where possible, FQ-CoDel will attempt to decapsulate packets before
   matching on the header fields for the flow hashing.  However, for
   some encapsulation techniques, most notably encrypted VPNs, this is
   not possible.  If several flows are bunched into one such
   encapsulated tunnel, they will be seen as one flow by the FQ-CoDel
   algorithm.  This means that they will share a queue, and drop
   behaviour, and so flows inside the encapsulation will not benefit
   from the implicit prioritisation of FQ-CoDel, but will continue to
   benefit from the reduced overall queue length from the CoDel
   algorithm operating on the queue.  In addition, when such an
   encapsulated bunch competes against other flows, it will count as one
   flow, and not assigned a share of the bandwidth based on how many
   flows are inside the encapsulation.

   Depending on the application, this may or may not be desirable
   behaviour.  In cases where it is not, changing FQ-CoDel's matching to
   not be flow-based (as detailed in the previous subsection above) can
   be a mitigation.  Going forward, having some mechanism for opaque
   encapsulations to express to the outer layer which flow a packet
   belongs to, could be a way to mitigate this.

6.3.  Low-priority congestion control algorithms

   In the presence of queue management schemes that contain latency
   under load, low-priority congestion control algorithms such as LEDBAT
   [RFC6817] (or, in general, algorithms that try to voluntarily use up
   less than their fair share of bandwidth) experiences very little
   added latency when the link is congested.  Thus, they lack the signal
   to back off that added latency previously afforded them.  This effect
   is seen with FQ-CoDel as well as with any effective AQM [GONG2014].

   As such, these delay-based algorithms tend to revert to loss-based
   congestion control, and will consume the fair share of bandwidth
   afforded to them by the FQ-CoDel scheduler.  However, low-priority
   congestion control mechanisms may be able to take steps to continue
   to be low priority, for instance by taking into account the vastly
   reduced level of delay afforded by an AQM, or by using a coupled
   approach to observing the behaviour of multiple flows.

Hoeiland-Joergensen, et alExpires May 23, 2016                 [Page 15]
Internet-Draft                  fq-codel                   November 2015

7.  Deployment status and future work

   The FQ-CoDel algorithm as described in this document has been shipped
   as part of the Linux kernel since version 3.5, released on the 21st
   of July, 2012, with the ce_threshold being added in version 4.2.  The
   algorithm has seen widespread testing in a variety of contexts and is
   configured as the default queueing discipline in a number of mainline
   Linux distributions (as of this writing at least OpenWRT, Arch Linux
   and Fedora).  We believe it to be a safe default and encourage people
   running Linux to turn it on: It is a massive improvement over the
   previous default FIFO queue.

   Of course there is always room for improvement, and this document has
   listed some of the know limitations of the algorithm.  As such, we
   encourage further research into algorithm refinements and addressing
   of limitations.  One such effort is undertaken by the bufferbloat
   community in the form of the Cake [1] queue management scheme.  In
   addition to this we believe the following (non-exhaustive) list of
   issues to be worthy of further enquiry:

   o  Variations on the flow classification mechanism to fit different
      notions of flows.  For instance, an ISP might want to deploy per-
      subscriber scheduling, while in other cases several flows can
      share a 5-tuple, as exemplified by the RTCWEB QoS recommendations
      [RTCWEB-QOS].

   o  Interactions between flow queueing and delay-based congestion
      control algorithms and scavenger protocols.

   o  Other scheduling mechanisms to replace the DRR portion of the
      algorithm, e.g.  QFQ or WFQ.

   o  Sensitivity of parameters, most notably the number of queues and
      the CoDel parameters.

8.  Security Considerations

   There are no specific security exposures associated with FQ-CoDel
   that are not also present in current FIFO systems.  And some are in
   fact reduced (e.g. simple minded packet floods).  However, some care
   is needed in the implementation to ensure this is the case.  These
   are included in the description above, however we reiterate them
   here:

   o  To prevent packets in the new queues from starving old queues, it
      is important that when a queue on the list of new queues empties,
      it is moved to _the end of_ the list of old queues.  This is
      described at the end of Section 4.2.

Hoeiland-Joergensen, et alExpires May 23, 2016                 [Page 16]
Internet-Draft                  fq-codel                   November 2015

   o  To prevent an attacker targeting a specific flow for a denial of
      service attack, the hash that maps packets to queues should not be
      predictable.  To achieve this, FQ-CoDel salts the hash, as
      described in the beginning of Section 4.1.  The size of the salt
      and the strength of the hash function is obviously a tradeoff
      between performance and security.  The Linux implementation uses a
      32 bit random value as the salt and a Jenkins hash function.  This
      makes it possible to achieve very high throughput, and we consider
      it sufficient to ward off the most obvious attacks.

   o  Packet fragments without a layer 4 header can be hashed into
      different bins than the first fragment with the header intact.
      This can cause reordering and/or adversely affect the performance
      of the flow.  Keeping state to match the fragments to the
      beginning of the packet, or simply putting all fragmented packets
      into the same queue, are two ways to alleviate this.

9.  IANA Considerations

   This document has no actions for IANA.

10.  Acknowledgements

   Our deepest thanks to Kathie Nichols, Van Jacobson, and all the
   members of the bufferbloat.net effort for all the help on developing
   and testing the algorithm.  In addition, our thanks to Anil Agarwal
   for his help with getting the hash collision probabilities in this
   document right.

11.  Conclusions

   FQ-CoDel is a very general, efficient, nearly parameterless queue
   management approach combining flow queueing with CoDel.  It is a very
   powerful tool for solving bufferbloat, and we believe it to be safe
   to turn on by default, as has already happened in a number of Linux
   distributions.  In this document we have documented the Linux
   implementation in sufficient detail for an independent
   implementation, and we encourage such implementations be widely
   deployed.

12.  References

12.1.  Normative References

   [CODELDRAFT]
              Nichols, K., Jacobson, V., McGregor, A., and J. Iyengar,
              "Controlled Delay Active Queue Management", October 2014,
              <https://datatracker.ietf.org/doc/draft-ietf-aqm-codel/>.

Hoeiland-Joergensen, et alExpires May 23, 2016                 [Page 17]
Internet-Draft                  fq-codel                   November 2015

   [RFC6817]  Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
              "Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
              DOI 10.17487/RFC6817, December 2012,
              <http://www.rfc-editor.org/info/rfc6817>.

   [RTCWEB-QOS]
              Dhesikan, S., Jennings, C., Druta, D., Jones, P., and J.
              Polk, "DSCP and other packet markings for RTCWeb QoS",
              December 2014, <https://datatracker.ietf.org/doc/draft-
              dhesikan-tsvwg-rtcweb-qos/>.

12.2.  Informative References

   [CODEL2012]
              Nichols, K. and V. Jacobson, "Controlling Queue Delay",
              July 2012, <http://queue.acm.org/detail.cfm?id=2209336>.

   [DRR]      Shreedhar, M. and G. Varghese, "Efficient Fair Queueing
              Using Deficit Round Robin", June 1996,
              <http://users.ece.gatech.edu/~siva/ECE4607/presentations/
              DRR.pdf>.

   [DRRPP]    MacGregor, M. and W. Shi, "Deficits for Bursty Latency-
              critical Flows: DRR++", 2000,
              <http://ieeexplore.ieee.org/xpls/
              abs_all.jsp?arnumber=875803>.

   [GONG2014]
              Gong, Y., Rossi, D., Testa, C., Valenti, S., and D. Taht,
              "Fighting the bufferbloat: on the coexistence of AQM and
              low priority congestion control", July 2014,
              <http://perso.telecom-paristech.fr/~drossi/paper/
              rossi14comnet-b.pdf>.

   [SQF2012]  Bonald, T., Muscariello, L., and N. Ostallo, "On the
              impact of TCP and per-flow scheduling on Internet
              Performance - IEEE/ACM transactions on Networking", April
              2012, <http://perso.telecom-
              paristech.fr/~bonald/Publications_files/BMO2011.pdf>.

12.3.  URIs

   [1] http://www.bufferbloat.net/projects/codel/wiki/Cake

Hoeiland-Joergensen, et alExpires May 23, 2016                 [Page 18]
Internet-Draft                  fq-codel                   November 2015

Authors' Addresses

   Toke Hoeiland-Joergensen
   Karlstad University
   Dept. of Computer Science
   Karlstad  65188
   Sweden

   Email: toke.hoiland-jorgensen@kau.se

   Paul McKenney
   IBM Linux Technology Center
   1385 NW Amberglen Parkway
   Hillsboro, OR  97006
   USA

   Email: paulmck@linux.vnet.ibm.com
   URI:   http://www2.rdrop.com/~paulmck/

   Dave Taht
   Teklibre
   2104 W First street
   Apt 2002
   FT Myers, FL  33901
   USA

   Email: dave.taht@gmail.com
   URI:   http://www.teklibre.com/

   Jim Gettys
   21 Oak Knoll Road
   Carlisle, MA  993
   USA

   Email: jg@freedesktop.org
   URI:   https://en.wikipedia.org/wiki/Jim_Gettys

   Eric Dumazet
   Google, Inc.
   1600 Amphitheater Pkwy
   Mountain View, CA  94043
   USA

   Email: edumazet@gmail.com

Hoeiland-Joergensen, et alExpires May 23, 2016                 [Page 19]