FlowQueue-Codel
draft-ietf-aqm-fq-codel-03
The information below is for an old version of the document.
Document | Type |
This is an older version of an Internet-Draft that was ultimately published as RFC 8290.
|
|
---|---|---|---|
Authors | Toke Høiland-Jørgensen , Paul McKenney , dave.taht@gmail.com , Jim Gettys , Eric Dumazet | ||
Last updated | 2015-11-20 | ||
RFC stream | Internet Engineering Task Force (IETF) | ||
Formats | |||
Reviews |
GENART Last Call review
(of
-05)
by Elwyn Davies
Ready w/nits
GENART Last Call review
(of
-05)
by Elwyn Davies
Almost ready
|
||
Additional resources | Mailing list discussion | ||
Stream | WG state | WG Document | |
Document shepherd | Wesley Eddy | ||
IESG | IESG state | Became RFC 8290 (Experimental) | |
Consensus boilerplate | Unknown | ||
Telechat date | (None) | ||
Responsible AD | (None) | ||
Send notices to | (None) |
draft-ietf-aqm-fq-codel-03
Operations Area Working Group F. Baker Internet-Draft Cisco Systems Intended status: BCP June 12, 2012 Expires: December 14, 2012 On Firewalls in Internet Security draft-ietf-opsawg-firewalls-00 Abstract There is an ongoing discussion regarding the place of firewalls in security. This note is intended to capture and try to make sense out of it. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 5.3. Probability of hash collisions Since the Linux FQ-CoDel implementation by default uses 1024 hash buckets, the probability that (say) 100 flows will all hash to the same bucket is something like ten to the power of minus 300. Thus, the probability that at least one of the flows will hash to some other queue is very high indeed. Expanding on this, based on analytical equations for hash collision probabilities, for 100 flows, the probability of no collision is 90.78%; the probability that no more than two of the 100 flows will be involved in any given collision = 99.57%; and the probability that no more than three of the 100 flows will be involved in any given collision = 99.99%. These probabilities can be improved upon by using set-associative hashing, a technique used in the Cake algorithm currently being developed as a further development upon the FQ-CoDel principles. For a 4-way associative hash with the same number of total queues, the probability of no collisions for 100 flows is 99.93%, while for an 8-way associative hash it is ~100%. 5.4. Memory Overhead FQ-CoDel can be implemented with a very low memory footprint (less than 64 bytes per queue on 64 bit systems). These are the data structures used in the Linux implementation: struct codel_vars { u32 count; u32 lastcount; bool dropping; u16 rec_inv_sqrt; codel_time_t first_above_time; codel_time_t drop_next; codel_time_t ldelay; }; struct fq_codel_flow { struct sk_buff *head; struct sk_buff *tail; struct list_head flowchain; int deficit; u32 dropped; /* # of drops (or ECN marks) on flow */ struct codel_vars cvars; }; The master table managing all queues looks like this: Hoeiland-Joergensen, et alExpires May 23, 2016 [Page 12] Internet-Draft fq-codel November 2015 struct fq_codel_sched_data { struct tcf_proto *filter_list; /* optional external classifier */ struct fq_codel_flow *flows; /* Flows table [flows_cnt] */ u32 *backlogs; /* backlog table [flows_cnt] */ u32 flows_cnt; /* number of flows */ u32 perturbation; /* hash perturbation */ u32 quantum; /* psched_mtu(qdisc_dev(sch)); */ struct codel_params cparams; struct codel_stats cstats; u32 drop_overlimit; u32 new_flow_count; struct list_head new_flows; /* list of new flows */ struct list_head old_flows; /* list of old flows */ }; 5.5. Per-Packet Timestamping The CoDel portion of the algorithm requires per-packet timestamps be stored along with the packet. While this approach works well for software-based routers, it may be impossible to retrofit devices that do most of their processing in silicon and lack space or mechanism for timestamping. Also, while perfect resolution is not needed, timestamp resolution below the target is necessary. Furthermore, timestamping functions in the core OS need to be efficient as they are called at least once on each packet enqueue and dequeue. 5.6. Other forms of "Fair Queueing" Much of the scheduling portion of FQ-CoDel is derived from DRR and is substantially similar to DRR++. SFQ-based versions have also been produced and tested in ns2. Other forms of Fair Queueing, such as WFQ or QFQ, have not been thoroughly explored, but there's no a priori reason why the round-robin scheduling of FQ-CoDel couldn't be replaced with something else. 5.7. Differences between CoDel and FQ-CoDel behaviour CoDel can be applied to a single queue system as a straight AQM, where it converges towards an "ideal" drop rate (i.e. one that minimises delay while keeping a high link utilisation), and then optimises around that control point. The scheduling of FQ-CoDel mixes packets of competing flows, which acts to pace bursty flows to better fill the pipe. Additionally, a new flow gets substantial "credit" over other flows until CoDel finds Hoeiland-Joergensen, et alExpires May 23, 2016 [Page 13] Internet-Draft fq-codel November 2015 an ideal drop rate for it. However, for a new flow that exceeds the configured quantum, more time passes before all of its data is delivered (as packets from it, too, are mixed across the other existing queue-building flows). Thus, FQ-CoDel takes longer (as measured in time) to converge towards an ideal drop rate for a given new flow, but does so within fewer delivered _packets_ from that flow. Finally, the flow isolation FQ-CoDel provides means that the CoDel drop mechanism operates on the flows actually building queues, which results in packets being dropped more accurately from the largest flows than CoDel alone manages. Additionally, flow isolation radically improves the transient behaviour of the network when traffic or link characteristics change (e.g. when new flows start up or the link bandwidth changes); while CoDel itself can take a while to respond, fq_codel doesn't miss a beat. 6. Limitations of flow queueing While FQ-CoDel has been shown in many scenarios to offer significant performance gains, there are some scenarios where the scheduling algorithm in particular is not a good fit. This section documents some of the known cases which either may require tweaking the default behaviour, or where alternatives to flow queueing should be considered. 6.1. Fairness between things other than flows In some parts of the network, enforcing flow-level fairness may not be desirable, or some other form of fairness may be more important. An example of this can be an Internet Service Provider that may be more interested in ensuring fairness between customers than between flows. Or a hosting or transit provider that wishes to ensure fairness between connecting Autonomous Systems or networks. Another issue can be that the number of simultaneous flows experienced at a particular link can be too high for flow-based fairness queueing to be effective. Whatever the reason, in a scenario where fairness between flows is not desirable, reconfiguring FQ-CoDel to match on a different characteristic can be a way forward. The implementation in Linux can leverage the packet matching mechanism of the _tc_ subsystem to use any available packet field to partition packets into virtual queues, to for instance match on address or subnet source/destination pairs, application layer characteristics, etc. Furthermore, as commonly deployed today, FQ-CoDel is used with three or more tiers of classification: priority, best effort and Hoeiland-Joergensen, et alExpires May 23, 2016 [Page 14] Internet-Draft fq-codel November 2015 background, based on diffserv markings. Some products do more detailed classification, including deep packet inspection and destination-specific filters to achieve their desired result. 6.2. Flow bunching by opaque encapsulation Where possible, FQ-CoDel will attempt to decapsulate packets before matching on the header fields for the flow hashing. However, for some encapsulation techniques, most notably encrypted VPNs, this is not possible. If several flows are bunched into one such encapsulated tunnel, they will be seen as one flow by the FQ-CoDel algorithm. This means that they will share a queue, and drop behaviour, and so flows inside the encapsulation will not benefit from the implicit prioritisation of FQ-CoDel, but will continue to benefit from the reduced overall queue length from the CoDel algorithm operating on the queue. In addition, when such an encapsulated bunch competes against other flows, it will count as one flow, and not assigned a share of the bandwidth based on how many flows are inside the encapsulation. Depending on the application, this may or may not be desirable behaviour. In cases where it is not, changing FQ-CoDel's matching to not be flow-based (as detailed in the previous subsection above) can be a mitigation. Going forward, having some mechanism for opaque encapsulations to express to the outer layer which flow a packet belongs to, could be a way to mitigate this. 6.3. Low-priority congestion control algorithms In the presence of queue management schemes that contain latency under load, low-priority congestion control algorithms such as LEDBAT [RFC6817] (or, in general, algorithms that try to voluntarily use up less than their fair share of bandwidth) experiences very little added latency when the link is congested. Thus, they lack the signal to back off that added latency previously afforded them. This effect is seen with FQ-CoDel as well as with any effective AQM [GONG2014]. As such, these delay-based algorithms tend to revert to loss-based congestion control, and will consume the fair share of bandwidth afforded to them by the FQ-CoDel scheduler. However, low-priority congestion control mechanisms may be able to take steps to continue to be low priority, for instance by taking into account the vastly reduced level of delay afforded by an AQM, or by using a coupled approach to observing the behaviour of multiple flows. Hoeiland-Joergensen, et alExpires May 23, 2016 [Page 15] Internet-Draft fq-codel November 2015 7. Deployment status and future work The FQ-CoDel algorithm as described in this document has been shipped as part of the Linux kernel since version 3.5, released on the 21st of July, 2012, with the ce_threshold being added in version 4.2. The algorithm has seen widespread testing in a variety of contexts and is configured as the default queueing discipline in a number of mainline Linux distributions (as of this writing at least OpenWRT, Arch Linux and Fedora). We believe it to be a safe default and encourage people running Linux to turn it on: It is a massive improvement over the previous default FIFO queue. Of course there is always room for improvement, and this document has listed some of the know limitations of the algorithm. As such, we encourage further research into algorithm refinements and addressing of limitations. One such effort is undertaken by the bufferbloat community in the form of the Cake [1] queue management scheme. In addition to this we believe the following (non-exhaustive) list of issues to be worthy of further enquiry: o Variations on the flow classification mechanism to fit different notions of flows. For instance, an ISP might want to deploy per- subscriber scheduling, while in other cases several flows can share a 5-tuple, as exemplified by the RTCWEB QoS recommendations [RTCWEB-QOS]. o Interactions between flow queueing and delay-based congestion control algorithms and scavenger protocols. o Other scheduling mechanisms to replace the DRR portion of the algorithm, e.g. QFQ or WFQ. o Sensitivity of parameters, most notably the number of queues and the CoDel parameters. 8. Security Considerations There are no specific security exposures associated with FQ-CoDel that are not also present in current FIFO systems. And some are in fact reduced (e.g. simple minded packet floods). However, some care is needed in the implementation to ensure this is the case. These are included in the description above, however we reiterate them here: o To prevent packets in the new queues from starving old queues, it is important that when a queue on the list of new queues empties, it is moved to _the end of_ the list of old queues. This is described at the end of Section 4.2. Hoeiland-Joergensen, et alExpires May 23, 2016 [Page 16] Internet-Draft fq-codel November 2015 o To prevent an attacker targeting a specific flow for a denial of service attack, the hash that maps packets to queues should not be predictable. To achieve this, FQ-CoDel salts the hash, as described in the beginning of Section 4.1. The size of the salt and the strength of the hash function is obviously a tradeoff between performance and security. The Linux implementation uses a 32 bit random value as the salt and a Jenkins hash function. This makes it possible to achieve very high throughput, and we consider it sufficient to ward off the most obvious attacks. o Packet fragments without a layer 4 header can be hashed into different bins than the first fragment with the header intact. This can cause reordering and/or adversely affect the performance of the flow. Keeping state to match the fragments to the beginning of the packet, or simply putting all fragmented packets into the same queue, are two ways to alleviate this. 9. IANA Considerations This document has no actions for IANA. 10. Acknowledgements Our deepest thanks to Kathie Nichols, Van Jacobson, and all the members of the bufferbloat.net effort for all the help on developing and testing the algorithm. In addition, our thanks to Anil Agarwal for his help with getting the hash collision probabilities in this document right. 11. Conclusions FQ-CoDel is a very general, efficient, nearly parameterless queue management approach combining flow queueing with CoDel. It is a very powerful tool for solving bufferbloat, and we believe it to be safe to turn on by default, as has already happened in a number of Linux distributions. In this document we have documented the Linux implementation in sufficient detail for an independent implementation, and we encourage such implementations be widely deployed. 12. References 12.1. Normative References [CODELDRAFT] Nichols, K., Jacobson, V., McGregor, A., and J. Iyengar, "Controlled Delay Active Queue Management", October 2014, <https://datatracker.ietf.org/doc/draft-ietf-aqm-codel/>. Hoeiland-Joergensen, et alExpires May 23, 2016 [Page 17] Internet-Draft fq-codel November 2015 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, DOI 10.17487/RFC6817, December 2012, <http://www.rfc-editor.org/info/rfc6817>. [RTCWEB-QOS] Dhesikan, S., Jennings, C., Druta, D., Jones, P., and J. Polk, "DSCP and other packet markings for RTCWeb QoS", December 2014, <https://datatracker.ietf.org/doc/draft- dhesikan-tsvwg-rtcweb-qos/>. 12.2. Informative References [CODEL2012] Nichols, K. and V. Jacobson, "Controlling Queue Delay", July 2012, <http://queue.acm.org/detail.cfm?id=2209336>. [DRR] Shreedhar, M. and G. Varghese, "Efficient Fair Queueing Using Deficit Round Robin", June 1996, <http://users.ece.gatech.edu/~siva/ECE4607/presentations/ DRR.pdf>. [DRRPP] MacGregor, M. and W. Shi, "Deficits for Bursty Latency- critical Flows: DRR++", 2000, <http://ieeexplore.ieee.org/xpls/ abs_all.jsp?arnumber=875803>. [GONG2014] Gong, Y., Rossi, D., Testa, C., Valenti, S., and D. Taht, "Fighting the bufferbloat: on the coexistence of AQM and low priority congestion control", July 2014, <http://perso.telecom-paristech.fr/~drossi/paper/ rossi14comnet-b.pdf>. [SQF2012] Bonald, T., Muscariello, L., and N. Ostallo, "On the impact of TCP and per-flow scheduling on Internet Performance - IEEE/ACM transactions on Networking", April 2012, <http://perso.telecom- paristech.fr/~bonald/Publications_files/BMO2011.pdf>. 12.3. URIs [1] http://www.bufferbloat.net/projects/codel/wiki/Cake Hoeiland-Joergensen, et alExpires May 23, 2016 [Page 18] Internet-Draft fq-codel November 2015 Authors' Addresses Toke Hoeiland-Joergensen Karlstad University Dept. of Computer Science Karlstad 65188 Sweden Email: toke.hoiland-jorgensen@kau.se Paul McKenney IBM Linux Technology Center 1385 NW Amberglen Parkway Hillsboro, OR 97006 USA Email: paulmck@linux.vnet.ibm.com URI: http://www2.rdrop.com/~paulmck/ Dave Taht Teklibre 2104 W First street Apt 2002 FT Myers, FL 33901 USA Email: dave.taht@gmail.com URI: http://www.teklibre.com/ Jim Gettys 21 Oak Knoll Road Carlisle, MA 993 USA Email: jg@freedesktop.org URI: https://en.wikipedia.org/wiki/Jim_Gettys Eric Dumazet Google, Inc. 1600 Amphitheater Pkwy Mountain View, CA 94043 USA Email: edumazet@gmail.com Hoeiland-Joergensen, et alExpires May 23, 2016 [Page 19]