Internet-Draft	CC Guidelines	October 2023
Fairhurst & Welzl	Expires 25 April 2024	[Page]

Workgroup:: Network Working Group
Internet-Draft:: draft-ietf-xml2rfc-template-06
Published:: 23 October 2023
Intended Status:: Best Current Practice
Expires:: 25 April 2024
Authors:: G. Fairhurst

University of Aberdeen

M. Welzl

University of Oslo

Guidelines for Internet Congestion Control at Endpoints

Abstract

When published as an RFC, this document provides guidance on the design of methods to avoid congestion collapse and how an endpoint needs to react to congestion. Based on these, and Internet engineering experience, the document provides best current practice for the design of new congestion control methods in Internet protocols.¶

When published, the document will update or replace the Best Current Practice in BCP 41, which currently includes "Congestion Control Principles" provided in RFC2914.¶

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 25 April 2024.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶

▲

1. Introduction

This document has two purposes. It first identifices changes in practice and network design that have occurred since the publications of IETF BCPs on the topic of congestion control and identifies current issues in congestion ocontrol. Second, it updates the guidance on the use of Congestion Control (CC) mechanisms. It also provides background information for the design of new mechanisms. A related document provides guidance on the evaluation of these new methods.¶

The IETF has specified a set of Internet transports (e.g., TCP [RFC9293], UDP [RFC0768], UDP-Lite [RFC3828], SCTP [RFC4960], and DCCP [RFC4340]) as well as protocols layered on top of these transports (e.g., RTP [RFC3550], QUIC [RFC9000] [RFC9002], SCTP/UDP [RFC6951], DCCP/UDP [RFC6773]) and transports that work directly over the IP network layer. These transports are implemented in endpoints (either Internet hosts or routers acting as endpoints), and can be designed to detect and react to network congestion. TCP was the first transport to provide this, although the specifications found in RFC 793 [RFC793] predate the inclusion of CC and did not contain any discussion of using or managing a congestion window (cwnd). RFC 9293 [RFC9293] has addressed this.¶

Section 3 of [RFC2914] states "The equitable sharing of bandwidth among flows depends on the fact that all flows are running compatible congestion control algorithms". Internet transports therefore need to react to avoid congestion that could impact other flows sharing a path. The Requirements for Internet Hosts [RFC1122] formally mandates that endpoints perform CC. "Because congestion control is critical to the stable operation of the Internet, applications and other protocols that choose to use UDP as an Internet transport must employ mechanisms to prevent congestion collapse and to establish some degree of fairness with concurrent flows [RFC8085].¶

The popularity of the Internet has led to the deployment of many implementations: Some use standard CC mechanisms, some have chosen to adopt approaches that differ from present standards. Guidance is needed to ensure safe evolution of the CC methods used by transport protocols.¶

There are several reasons to think that things have changed since the original best current practice was published: At one time, it was common that the serialisation delay of a packet at the bottleneck formed a large proportion of the round trip time (RTT) of a path, motivating a need for conservative loss recovery. This is not often the case for today's higher capacity links. The increase in the link speed often means that for many users, current traffic often does not normally experience persistent congestion, and under-load (inability to achieve the bottleneck rate) is often as common as over-load (exceeding the bottleneck rate) That is, a current challenge is that conservative methods lead to under-utilisation of the path, and safe scalable methods need to be found.¶

There also have been changes in the way that protocol mechanisms are deployed in Internet endpoints:¶

On the one hand, techniques have evolved that allow incremental deployment and testing of new methods which can enable the rapid development of methods to detect and react to congestion. This allows new mechanisms to be tested to ensure the majority users see benefit in the networks they use. There has been considerable progress in developing new loss recovery and congestion responses that have been evaluated in this way.¶

On the other hand, the Internet continues to be heterogenous, some endpoints experience very different network path characteristics and some endpoints generate very different patterns of traffic. There is still a need to avoid harm to other flows (stravation of capacity, unecessary increase of latency, congestion collapse).¶

This document has a focus on unicast point-to-point transports, this includes migration from using one path to another path. Some recommendations [RFC5783] and requirements will apply to point-to-multipoint transports (e.g., multicast), however this is beyond the current document's scope. [RFC2914] provides additional guidance on the use of multicast.¶

Finally, experience has shown that successful protocols developed in a specific context, or for a particular application tend to also become used in a wider range of contexts. Therefore, IETF specifications ought to target deployment on the general Internet, or be specified for use only within a controlled environment.¶

1.1. Incipient and Persistent Congestion

Internet paths experience congestion (loss or delay) when there is excess load at a bottleneck that they traverse. This document differentiates two levels of congestion:¶

Incipient congestion: This is a consequential side effect of the statistical multiplexing of packet flows. There will be times when packets need to be buffered or dropped at the bottleneck(s) on a path, irrespective of the long-term average load.¶
Persistent congestion: This occurs when the pattern of arriving traffic results in over-consumption of a path's resources. Typically this results in packet loss. The effects of persistent congestion might impact the flow that induces the congestion, but could adversely impact other flows, e.g., starving them of resources or reducing the efficiency of the path (e.g., congestion collapse).¶

Flows need to react when they encounter either form congestion to reduce their contribution to the load. For persistent congestion, the reaction needs to be sufficient to avoid excessive harm to other flows.¶

1.2. The Need to Mitigate the Effects of Incipient Congestion

Incipient congestion results during normal operation of the Internet. Buffering (which causes an increase in latency) or congestion loss (discard of a packet) arises when the traffic arriving at a bottleneck exceeds the resources available. A network device uses will drop excess packets when its queue(s) becomes full. This can be managed using Active Queue Management (AQM) [RFC7567], which can be combined with Explicit Congestion Notification (ECN) signalling [RFC3168] to mitigate incipient congestion [RFC8087].¶

Buffers can be divided into pools and traffic can be associated with a specific pool (e.g., using local configuration, or coordinated using the Differentiated Services [RFC2475] architecture). A schedular can [RFC7806] isolate the queuing of packets for different flows, or aggregates of flows, and reduce the impact of flow multiplexing on other flows (e.g., flow scheduling [RFC7567]). This could equally distribute resources between sharing flows, but this equality is explicitly not a requirement [Flow-Rate-Fairness].¶

Even when a path is expected to support such methods, an endpoint MUST NOT rely on the presence and correct configuration of these methods, and therefore needs to employ CC methods that work end-to-end, or employ in-network control, such as a circuit-breaker.¶

In some controlled environments, Internet transports can use mechanisms to reserve capacity. Most Internet paths do not support this. In the absence of such a reservation, endpoints are unable to determine a safe rate at which to start a new transmission. The use of an Internet path therefore requires end-to-end CC mechanisms to detect and respond to congestion.¶

Section 3.3 of [RFC2914] notes that a flow can use CC to "optimize its own performance regarding throughput, delay, and loss. In some circumstances, for example in environments with high statistical multiplexing, the delay and loss rate experienced by a flow are largely independent of its own sending rate." and continues: "in environments with lower levels of statistical multiplexing or with per-flow scheduling, the delay and loss rate experienced by a flow is in part a function of the flow's own sending rate. Thus, a flow can use end-to-end congestion control to limit the delay or loss experienced by its own packets."¶

1.3. The Need to Avoid the Effects of Persistent Congestion

Early RFCs recognised that a poorly designed transport can lead to significant congestion, which could result in severe service degradation or "Internet meltdown". One effect is called "Congestion Collapse", where an increase in the network load results in a decrease in the useful work done by the network. [RFC0896] [RFC0970]. [RFC2914]. This was first observed in the mid 1980s At that time, this was aggrevated by connections thjat did not use CC and which unnecessarily retransmitted packets that were either in transit or had already been received, resulting in a stable persistent congestion [RFC0896].¶

[RFC2914] also notes that it is even more destructive when applications increase their sending rate in response to an increase in the packet loss rate (e.g., automatically using an increased level of FEC (Forward Error Correction)).¶

The problems of congestion collapse have generally been corrected by improvements to the loss recovery and congestion control mechanisms in transport protocols [Jac88], designed to avoid starving other flows of capacity (e.g., [RFC7567]). Section 3.1 describes preventing congestion collapse. [RFC2309] adds that "all UDP-based streaming applications should incorporate effective congestion avoidance mechanisms." [RFC7567] and [RFC8085] both reaffirm the continued need to provide methods to prevent starvation.¶

1.4. New Congestion Control Methods

CC is an evolving subject, responding to changes in protocol design, operation of applications using the network and understanding of the network operation under load. The IETF has provided guidance [RFC5033] for considering and evaluating alternate CC algorithms.¶

The IRTF has described a set of metrics and related trade-off between metrics to compare, contrast, and evaluate CC algorithms [RFC5166]. [RFC5783] provided a snapshot of CC research in 2008. [RFC6077] discussed open issues in CC research in 2011.¶

In contrast to considering the fairness in distributing capacity between flows, a different approach is to analyse persistent congestion effects to understand the harm to other flows (collateral impact of loss, starvation, collapse, etc). Such an analysis of the suitability of a new mechanism can evaluate how changes impact other flows sharing a bottleneck, and consider the impact on the flows that have outliers in performance (e.g., the last 5%, 1%) For example, the performance often does not provide an indication that a new method could starve other applications that share the bottleneck, or when patterns of packets (e.g., bursts) are sent that disrupt the packet timing needed by another application flow.¶

1.5. Current Challenges

Recommendations and requirements on CC control are distributed across many documents in the RFC series. This section gathers and consolidates these recommendations. These, and Internet engineering experience are used to derive the best current practice in the design of Internet CC methods.¶

Standardization of new CC algorithms can avoid an "arms race" among competing protocols [RFC2914]. That is, avoid competition for Internet resource in a way that significantly reduces the ability of other flows to use the Internet.¶

The general recommendation in the UDP Guidelines [RFC8085] is that applications SHOULD leverage existing CC techniques, such as those defined for TCP [RFC9293], TCP-Friendly Rate Control (TFRC) [RFC5348], SCTP [RFC4960], and other IETF-defined transports. This is because there are many trade offs and details that can have a serious impact on the performance of a CC mechanism and upon other traffic that seeks to share a bottlneck.¶

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].¶

The path between endpoints (sometimes called "Internet Hosts" for IPv4 and called "source nodes" and "destination nodes" in IPv6) consists of the endpoint protocol stack at the sender and the receiver (which together implement the transport service), and a succession of links and network devices (routers or middleboxes) forming the network path. The set of network devices forming the path is not usually fixed, and it should generally be assumed that this set can change over arbitrary lengths of time.¶

[RFC5783] defines CC as "the feedback-based adjustment of the rate at which data is sent into the network. Congestion control is an indispensable set of principles and mechanisms for maintaining the stability of the Internet."¶

The document draws on language used in the specifications of TCP and other IETF transports. For example, a protocol timer is generally needed to detect persistent congestion, and this document uses the term Retransmission Timeout (RTO) to refer to the operation of this timer. Similarly, it refers to a congestion window (cwnd) as a variable that controls the rate of transmission by the CC. Each new transport needs to make its own design decisions about how to meet the recommendations and requirements for CC. The use of these terms does not imply that endpoints need to implement functions in the current way used by TCP.¶

Other terminology is directly copied from the cited RFCs.¶

3. Requirements from the RFC Series

3.1. The Need to React to Congestion

This includes:¶

Endpoints MUST perform congestion control [RFC1122] and SHOULD leverage existing techniques [RFC8085].¶
If an application or protocol chooses not to use a CC, it SHOULD control the rate at which it sends datagrams to a destination host, to fulfil the requirements of [RFC2914], as stated in [RFC8085].¶
Endpoints SHOULD control the aggregate traffic that is sent [RFC8085]. An endpoint can become aware of congestion by various means (including, delay variation, timeout, ECN, packet loss). A signal that indicates congestion SHOULD result in a reaction to reduce the sendding rate [RFC8087]).¶
Although network devices can be configured to reduce the impact of multiplexing on other flows, endpoints MUST NOT rely solely on the presence and correct configuration, except in a controlled environment.¶
A transport that does not target Internet deployment needs to be constrained to only operate in a controlled environment (e.g., see Section 3.6 of [RFC8085]) and provide appropriate mechanisms to prevent this traffic from accidentally leaving the controlled environment [RFC8084].¶

3.2. Tolerance to a Diversity of Path Characteristics

Internet transports need to use a CC method designed for Internet paths.¶

Path Capacity: The forward path can be congested in terms of the number of packets it can support and/or the number of rate of bytes it can transfer. The return path (towards the sender) can also be congested. Methods need to operate over paths where capacity in the forward and return directions are significantly different.¶
Path Loss: Paths can experience packet loss for various reasons besides experiencing congestion (e.g., link corruption [RFC3819]), but an endpoint cannot usually reliably disambiguate the cause of loss. Whilst mechanisms below the transport layer can mitigate this loss, the only way to surely confirm that a sending endpoint has successfully communicated with a remote endpoint is to utilise a timer (see Section 4.2) to detect a lack of response that could result from a change in the path or the path characteristics. The detection of congestion and the resulting reduction in rate MUST NOT solely depend upon reception of a signal from the remote endpoint, because congestion indications could themselves be lost due to congestion.¶
Path RTT: The RTT from an endpoint cannot be determined a-priori, and must be measured dynamically (see Section 4.2).¶
Path Change: An endpoint MUST assume that path characteristics can change over time (i.e. path characteristics and sharing traffic once discovered do not necessarily remain valid in the future).¶
Network devices MAY provide mechanisms to mitigate the impact of congestion by transport flows (e.g., priority forwarding of control information, and starvation detection), and ought to mitigate the impact of non-conformant and malicious flows [RFC7567]). These mechanisms complement, but do not replace, the endpoint congestion avoidance mechanisms.¶
Security: Internet endpoints need to be protected from intentional disruption of the service they provide, and from the exploitation of methods to attack other endpoints or services (see Section 3.3).¶

3.3. Protection of Protocol Mechanisms

An endpoint needs to be protected from attacks on the traffic it generates, or attacks that seek to increase the capacity that it consumes (impacting other traffic that share a bottleneck).¶

The following guidance is provided on protection:¶

Off-Path Attack: A design MUST protect from off-path attack to the protocol [RFC8085] (i.e., where the attacker is unable to observe packets). This can lead to a Denial of Service (DoS) vulnerability for the flow being controlled and/or other flows that share network resources along the path.¶
On-Path Attack: A protocol can be designed to protect from on-path attacks (i.e., where an attacker can observe the packets). Protecting from on-path attacks can require more complexity and typically utilises encryption and/or authentication mechanisms (e.g., IPsec [RFC4301], QUIC [RFC9000]).¶
Validation of Signals: To protect from malicious abuse, network signals and control messages (e.g., ICMP [RFC0792]) MUST be validated before they are used (see Section 3.3). Transports MUST at least include protection from off-path attack using signals [RFC8085] (e.g., validating the quoted information in an ICMP message enables checksing that this corresponds to the flow, before utilising the signalling it contains).¶

4. Principles of Congestion Control

This section summarises the principles for providing CC. It describes principles associated with preventing persistent congestion, reacting to incipient congestion and utilising additional path information.¶

4.1. Initialisation and Using Capacity

4.1.1. Starting to use Path Capacity

A sender needs to regulate the maximum volume of data in flight over the interval of the current RTT (the cwnd). It needs to react to incipient congestion.¶

Setting an initial cwnd: A TCP sender "SHOULD set the congestion window to no more than the Restart Window (R)" before beginning transmission, if the sender has not sent data in an interval that exceeds the current retransmission timeout, i.e., when an application becomes idle [RFC9293]. Congestion Window Validation (CWV) [RFC7661] describes how a TCP sender can tentatively maintains a cwnd larger than the path has supported in the last RTT when a flow is application-limited, provided that the endpoint rapidly reduces the cwnd when congestion is detected.¶
Using the cwnd: A sender that does not use capacity has no understanding whether previously used capacity remains available, or whether that capacity has disappeared (e.g., a change in the path that causes a flow to experience a smaller bottleneck, or when more traffic emerges that consumes previously available capacity resulting in a new bottleneck). For this reason, a transport that is limited by the volume of data available to send, MUST NOT continue to grow its cwnd when the current cwnd is more than twice the volume of data acknowledged in the last RTT. The reduction needs to be commensurate with the increase that preceded it. This factor of 2 decrease corresponds to an increase factor of 2 in slow start.¶
Collateral Damage: Even in the absence of congestion, statistical multiplexing can result in transient effects for flows sharing common resources. A sender SHOULD avoid persistently inducing excessive congestion to other flows (collateral damage that could result in flow starvation). For example, avoid a sudden surge in sending rate that lasts for more than one RTT.¶
Burst Mitigation: While an endpoint ought to limit its sending rate at the granularity of the current RTT, this can be insufficient to satisfy the need to mitigate collateral damage. Endpoints SHOULD provide mechanisms to regulate the bursts of transmission that the application/protocol sends (section 3.1.6 of [RFC8085]). ACK-Clocking [RFC9293] can help mitigate bursts when they receive continuous feedback of reception (such as TCP). Sender pacing can also mitigate this [RFC8085], (described in Section 4.6 of [RFC3449]), and has been recommended for TCP in conditions where ACK-Clocking is not effective, (e.g., [RFC3742], [RFC7661]). SCTP [RFC4960] defines a maximum burst length (Max.Burst) with a recommended value of 4 segments to limit the SCTP burst size. QUIC recommends that a sender paces all in-flight packets based on input from the CC [RFC9002].¶

4.1.2. Using More Capacity

When the CC is increasing the cwnd, it transmits faster than the last confirmed safe rate. Such an increase needs to be regarded as tentative and a sender needs to reduce its rate below the last confirmed safe rate when congestion is detected.¶

Increasing the cwnd: In the absence of congestion, an endpoint MAY increase the sending rate or cwnd. This limit should only be increased when there is additional data available to send (i.e., the sender will utilise the additional capacity in the next RTT).¶
A sender MUST NOT increase the sending rate for a time longer than one RTT period after congestion is first detected. This helps manage incipient congestion.¶
Avoiding Overshoot: Overshoot of the cwnd beyond the point of congestion can significantly impact other flows sharing resources along a path, and can impact the performance of the flow itself. As endpoints experience more paths with a large Bandwidth Delay Product (BDP) and a wider range of potential path RTTs, variability or changes in the path can significantly impact the appropriate dynamics for increasing the cwnd (see also burst mitigation, Section 4.1.1). Methods such as HyStart are designed to avoid overshoot [RFC9406].¶
Response to Detected Congestion: The sending rate MUST be below the previously confirmed safe rate for multiple RTT periods after a congestion event. In TCP Reno [RFC9293], this is performed by using a conservative (linear) increase from a slow start threshold that is re-initialised each time congestion is experienced.¶

4.2. Robustness: Timers and Retransmission

An endpoint can utilise timers to implement transport mechanisms, e.g., to recover from loss, to trigger pre-emptive retransmission and other protocol functions. An endpoint that does utilise timers needs to follow the rules in section 3.3 of [RFC8085].¶

Principles include:¶

Initial RTO Interval: When a flow sends the first packet(s), it has no way to know the RTT of the path. An initial timer value is needed to detect any lack of responsiveness from the remote endpoint. In TCP, this is the starting value of the RTO. A safe initial value is important for overall Internet stability [RFC6298] [RFC8085]. In the absence of any knowledge about the latency of a path (including the initial value), senders SHOULD conservatively set the RTO to no less than 1 second. (Although Linux TCP has deployed a smaller initial RTO value, the appendix of [RFC6298] confirms that values shorter than 1 second can be problematic.)¶
Initial RTO Expiry: If the RTO timer expires while awaiting completion of a connection setup, or handshake (e.g., the ACK of a SYN segment in the three-way handshake in TCP), and the implementation is using an RTO of less than 3 seconds, the local endpoint can resend the connection setup. The RTO MUST then be re-initialized to increase it to 3 seconds once data transmission begins (i.e., after the handshake completes) [RFC6298] [RFC8085]. This conservative increase is necessary to avoid congestion collapse when many flows retransmit across a shared bottleneck with restricted capacity.¶
Initial Measured RTO: Once an RTT measurement is available (e.g., through reception of an acknowledgement), the timeout value must be adjusted. This adjustment MUST take into account the RTT variance. For the first sample, this variance cannot be determined, and a local endpoint MUST therefore initialise the variance to RTT/2 (see equation 2.2 of [RFC6298] and related text for UDP in section 3.1.1 of [RFC8085]).¶
Updating the Path RTT: Once an endpoint has started communicating with its peer, the RTT MUST be adjusted by measuring the actual path RTT. This adjustment MUST include adapting to the measured RTT variance (see equation 2.3 of [RFC6298]). An RTO interval SHOULD be set based on recent RTT observations (including the RTT variance) (e.g., Section 3.1.1 of [RFC8085]).¶
Persistent Lack of Feedback: Persistent lack of feedback (e.g., detected by an RTO expiry, or other means) MUST be treated as persistent congestion. A failure to receive any specific response could be a result of a RTT change, change of path, excessive loss, or even congestion collapse. If there is no response within the RTO interval, TCP collapses the cwnd to one segment [RFC9293]. Other transports MUST similarly respond when they fail to receive confirmation of feedback. An endpoint MUST exponentially backoff the RTO interval [RFC8085] each time persistent congestion is detected [RFC1122], until the path characteristics can again be confirmed [RFC6298] [RFC8085].¶
Maximum RTO: A maximum value MAY be placed on the RTO interval. This maximum RTO interval MUST NOT be less than 60 seconds [RFC6298].¶
[[Author Note: Re-check RTO-Consider. ]]¶

4.3. Detecting and Reacting to Incipient Congestion

This section describes the principles related to mitigation of incipient congestion (see Section 1.2).¶

4.3.1. Congestion Control Initialization

When a connection to a new destination is first established, the endpoints have little information about the characteristics of the network path they will use. The safety and responsiveness of new CC proposals needs to be evaluated [RFC5166].¶

Flow Start: A new flow between two endpoints needs to initialise a CC for the path. The TCP slow-start algorithm is an accepted standard for flow startup [RFC9293]. This uses the notion of an Initial coingestion Window (IW) [RFC3390], updated by [RFC6928]). The IW is not the smallest burst size, nor the smallest cwnd. It t is a safe starting point for a path that is not suffering persistent congestion, and is applicable until feedback about the path is received.¶
Utilised Capacity: A CC MAY assume that the recently used capacity between a pair of endpoints is an indication of future capacity that might be available in the near future between the same endpoints (Section 4.3.4). The CC MUST reduce its rate if this is not subsequently confirmed to be true. [[Author note: we likely need to bound this reaction in time or size]].¶

4.3.2. Loss-Based Congestion Detection and Retransmission

This section describes mechanisms to detect loss and provide retransmission, and to protect the network in the absence of timely feedback.¶

Congestion Detection: Loss is typically detected when a sender cannot confirm delivery within an expected period (e.g., by observing the time-ordering of the reception of ACKs, as in TCP DupACK) or by utilising a timer to detect loss (e.g., a transmission timer with a period less than the RTO, [RFC8085] [RFC8985]) or a combination of the two. A transport is usually unable to reliably detect whether a loss is a result of congestion. For this reason, loss needs to be treated as incipient congestion, at least until the cause of loss can be reliably determined.¶
Retransmission: When loss is detected, the sender can choose to retransmit the lost data, ignore the loss, or send other data (e.g., [RFC8085] [RFC9002]), depending on the reliability provided by the transport service. All transmissions consume network capacity, therefore retransmissions MUST NOT increase the network load in response to congestion loss (which worsens that congestion) [RFC8085]. Any method that sends additional data following loss is therefore responsible for CC of the retransmissions (and any other packets sent, including FEC information) as well as the original traffic.¶

4.3.3. Responding to Incipient Congestion

In determining an appropriate congestion response to incipient congestion, designs could consider the size of the packets that experience congestion [RFC4828].¶

Congestion Response: An endpoint MUST promptly reduce the sending rate when there is an indication of congestion (e.g., loss) [RFC2914]. TCP Reno established a method that relies on multiplicative-decrease to halve the sending rate while congestion is detected. This response to congestion indications is sufficient for safe Internet operation, but other decrease factors have also been published in the RFC Series [RFC9438].¶
ECN Detection: ECN can help determine an appropriate cwnd to enable early indication of incipient congestion when it is supported by routers on the path [RFC7567]. An early detection of incipient congestion allows a different reaction to an explicit congestion signal compared to the reaction to a detected packet loss [RFC8311] [RFC8087]. Congestion control design should provide the necessary mechanisms to support ECN [RFC3168] [RFC6679], as described in section 3.1.7 of [RFC8085].¶
Response to ECN Congestion Marking: Simple feedback of received Congestion Experienced (CE) marks [RFC3168] relies only on an indication that congestion has been experienced within the last RTT. This response is appropriate when a flow uses ECT(0) [RFC3168]. ABE modified this reaction to ECN [RFC8511]. Extended RTP feedback and accurate TCP receiver feedback more detail about the CE-marking [I-D.ietf-tcpm-accurate-ecn], supporting a finer granularity of congestion response. The L4S architecture [RFC9330] allows routers to use a different marking system that can provide early reaction to incipient congestion [RFC9332] and defines a reaction for this feedback when packets are marked with ECT(1).¶
[RFC8085] provides guidelines for a sender that does not, or is unable to, adapt the cwnd.¶

4.3.4. Utilising Additional Path Information

Path information can be cached. In TCP, this was previously called TCP Control Block (TCB) sharing, and is now called TCP Control Block Interdependence, [RFC9040]. A CC can also utilise signals from the network to help determine how to regulate the traffic it sends.¶

Utilising Cached Path Information: A transport connection between a pair of endpoints can share CC parameters with other connections that share the same path. A CC that recently used a specific path could allow another flow to take-over the previously consumed capacity. Information used to accelerate the growth of the cwnd MUST be viewed as tentative until it is confirmed that the flow was able to utilise the capacity (i.e., the new flow needs to either "use or loose" the capacity). A sender MUST reduce its rate if the capacity is not confirmed within the current RTO interval.¶
[RFC8085] adds "An application that forks multiple worker processes or otherwise uses multiple sockets to generate UDP datagrams SHOULD perform congestion control over the aggregate traffic."¶
Utilising Network Signals: A mechanism that utilises signals originating in the network (e.g., RSVP, NSIS, Quick-Start, ECN), MUST assume that the set of network devices on the path can change. This motivates use of soft-state for protocols [RFC9049] (e.g., ECN) and includes context-sensitive treatment of "soft" signals provided to the endpoint [RFC5164]. Endpoints MUST assume the set of routers and links forming the path can change and that network devices can be reconfigured or reset. A changing set of on-path devices can also affect which types of packets traverse a path (e.g. whether IP options are supported, or a specific treatment applies.)¶

4.4. Avoiding Persistent Congestion

All endpoints are required to implement mechanisms that avoid persistent congestion and can demonstrate that they do not induce starvation and congestion collapse (see Section 1.3).¶

Principles include:¶

Persistent congestion can result in congestion collapse, which MUST be aggressively avoided [RFC2914]. Endpoints that experience persistent congestion and have already reduced their cwnd to the loss window (e.g., one packet) MUST further reduce the rate if the RTO timer continues to expire. For example, TFRC [RFC5348] continues to reduce its sending rate under persistent congestion to one packet per RTT, and then exponentially backs-off the time between single packet transmissions if a congestion event continues to persist [RFC2914]. QUIC [RFC9002] does not directly specify a period, but does specify a probe to detect tail loss. The Tail Loss Probe (TLP) mechanism [RFC8985] determines that persisent congestion is experienced after a loss for a duration of 2 TLP probes plus the RTO.¶

4.4.1. Avoiding Congestion Collapse and Flow Starvation

Principles include:¶

Transports MUST avoid inducing flow starving flows that share resources along the path.¶
Endpoints MUST treat a loss of all feedback (e.g., RTO expiry) as an indication of persistent congestion.¶
When an endpoint detects persistent congestion, it MUST reduce the maximum rate/cwnd.¶

4.5. Additional Considerations

Many designs place the responsibility of rate-adaption for CC at the sender (source) endpoint, utilising feedback information provided by the remote endpoint (receiver). CC can also be implemented by determining an appropriate rate limit at a receiver and using this limit to control the maximum transport rate (e.g., using methods such as [RFC5348] and [RFC4828]).¶

Applications at an endpoint can send more than one flow. "The specific issue of a browser opening multiple connections to the same destination has been addressed by [RFC2616]. Section 8.1.4 states that "Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy." [RFC9040].¶

5. Acknowledgements

This document owes much to the insight offered by Sally Floyd, both at the time of writing of RFC2914 and her help and review in the many years that followed this.¶

Nicholas Kuhn helped develop the first draft of these guidelines. Tom Jones and Ana Custura reviewed the first version of this draft. Many discussions with Michael Welzl and others have provided immeasurable help to get this far. The University of Aberdeen received funding to support this work from the European Space Agency.¶

6. IANA Considerations

This memo includes no request to IANA.¶

RFC Editor Note: If there are no requirements for IANA, the section will be removed during conversion into an RFC by the RFC Editor.¶

7. Security Considerations

This document introduces no new security considerations. Each RFC listed in this document discusses the security considerations of the specification it contains. The security considerations for the use of transports are provided in the references section of the cited RFCs. Security guidance for applications using UDP is provided in the UDP Usage Guidelines [RFC8085].¶

Section 3.3 describes general requirements relating to the design of safe protocols and their protection from on and off path attack.¶

Section 4.3.4 follows current best practice to validate ICMP messages prior to use.¶

8. Normative References

[RFC1122]: Braden, R., Ed., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, DOI 10.17487/RFC1122, October 1989, <https://www.rfc-editor.org/info/rfc1122>.
[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC2914]: Floyd, S., "Congestion Control Principles", BCP 41, RFC 2914, DOI 10.17487/RFC2914, September 2000, <https://www.rfc-editor.org/info/rfc2914>.
[RFC3168]: Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, September 2001, <https://www.rfc-editor.org/info/rfc3168>.
[RFC3390]: Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's Initial Window", RFC 3390, DOI 10.17487/RFC3390, October 2002, <https://www.rfc-editor.org/info/rfc3390>.
[RFC5348]: Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC 5348, DOI 10.17487/RFC5348, September 2008, <https://www.rfc-editor.org/info/rfc5348>.
[RFC6298]: Paxson, V., Allman, M., Chu, J., and M. Sargent, "Computing TCP's Retransmission Timer", RFC 6298, DOI 10.17487/RFC6298, June 2011, <https://www.rfc-editor.org/info/rfc6298>.
[RFC7567]: Baker, F., Ed. and G. Fairhurst, Ed., "IETF Recommendations Regarding Active Queue Management", BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, <https://www.rfc-editor.org/info/rfc7567>.
[RFC8085]: Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, March 2017, <https://www.rfc-editor.org/info/rfc8085>.

9. Informative References

[Flow-Rate-Fairness]: Briscoe, Bob., "Flow Rate Fairness: Dismantling a Religion, ACM Computer Communication Review 37(2):63-74", April 2007.
[I-D.ietf-tcpm-accurate-ecn]: Briscoe, B., Kühlewind, M., and R. Scheffenegger, "More Accurate Explicit Congestion Notification (ECN) Feedback in TCP", Work in Progress, Internet-Draft, draft-ietf-tcpm-accurate-ecn-26, 24 July 2023, <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-26>.
[Jac88]: Jacobson, V., "Congestion Avoidance and Control", Computer Communication Review, vol. 18, no. 4, pp. 314-329 , August 1988, <ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.>.
[RFC0768]: Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 10.17487/RFC0768, August 1980, <https://www.rfc-editor.org/info/rfc768>.
[RFC0792]: Postel, J., "Internet Control Message Protocol", STD 5, RFC 792, DOI 10.17487/RFC0792, September 1981, <https://www.rfc-editor.org/info/rfc792>.
[RFC0896]: Nagle, J., "Congestion Control in IP/TCP Internetworks", RFC 896, DOI 10.17487/RFC0896, January 1984, <https://www.rfc-editor.org/info/rfc896>.
[RFC0970]: Nagle, J., "On Packet Switches With Infinite Storage", RFC 970, DOI 10.17487/RFC0970, December 1985, <https://www.rfc-editor.org/info/rfc970>.
[RFC2309]: Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J., and L. Zhang, "Recommendations on Queue Management and Congestion Avoidance in the Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, <https://www.rfc-editor.org/info/rfc2309>.
[RFC2475]: Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, DOI 10.17487/RFC2475, December 1998, <https://www.rfc-editor.org/info/rfc2475>.
[RFC2525]: Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, J., Heavens, I., Lahey, K., Semke, J., and B. Volz, "Known TCP Implementation Problems", RFC 2525, DOI 10.17487/RFC2525, March 1999, <https://www.rfc-editor.org/info/rfc2525>.
[RFC2616]: Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, DOI 10.17487/RFC2616, June 1999, <https://www.rfc-editor.org/info/rfc2616>.
[RFC3449]: Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. Sooriyabandara, "TCP Performance Implications of Network Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, December 2002, <https://www.rfc-editor.org/info/rfc3449>.
[RFC3550]: Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July 2003, <https://www.rfc-editor.org/info/rfc3550>.
[RFC3742]: Floyd, S., "Limited Slow-Start for TCP with Large Congestion Windows", RFC 3742, DOI 10.17487/RFC3742, March 2004, <https://www.rfc-editor.org/info/rfc3742>.
[RFC3819]: Karn, P., Ed., Bormann, C., Fairhurst, G., Grossman, D., Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. Wood, "Advice for Internet Subnetwork Designers", BCP 89, RFC 3819, DOI 10.17487/RFC3819, July 2004, <https://www.rfc-editor.org/info/rfc3819>.
[RFC3828]: Larzon, L., Degermark, M., Pink, S., Jonsson, L., Ed., and G. Fairhurst, Ed., "The Lightweight User Datagram Protocol (UDP-Lite)", RFC 3828, DOI 10.17487/RFC3828, July 2004, <https://www.rfc-editor.org/info/rfc3828>.
[RFC4301]: Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, December 2005, <https://www.rfc-editor.org/info/rfc4301>.
[RFC4340]: Kohler, E., Handley, M., and S. Floyd, "Datagram Congestion Control Protocol (DCCP)", RFC 4340, DOI 10.17487/RFC4340, March 2006, <https://www.rfc-editor.org/info/rfc4340>.
[RFC4828]: Floyd, S. and E. Kohler, "TCP Friendly Rate Control (TFRC): The Small-Packet (SP) Variant", RFC 4828, DOI 10.17487/RFC4828, April 2007, <https://www.rfc-editor.org/info/rfc4828>.
[RFC4960]: Stewart, R., Ed., "Stream Control Transmission Protocol", RFC 4960, DOI 10.17487/RFC4960, September 2007, <https://www.rfc-editor.org/info/rfc4960>.
[RFC5033]: Floyd, S. and M. Allman, "Specifying New Congestion Control Algorithms", BCP 133, RFC 5033, DOI 10.17487/RFC5033, August 2007, <https://www.rfc-editor.org/info/rfc5033>.
[RFC5164]: Melia, T., Ed., "Mobility Services Transport: Problem Statement", RFC 5164, DOI 10.17487/RFC5164, March 2008, <https://www.rfc-editor.org/info/rfc5164>.
[RFC5166]: Floyd, S., Ed., "Metrics for the Evaluation of Congestion Control Mechanisms", RFC 5166, DOI 10.17487/RFC5166, March 2008, <https://www.rfc-editor.org/info/rfc5166>.
[RFC5783]: Welzl, M. and W. Eddy, "Congestion Control in the RFC Series", RFC 5783, DOI 10.17487/RFC5783, February 2010, <https://www.rfc-editor.org/info/rfc5783>.
[RFC6077]: Papadimitriou, D., Ed., Welzl, M., Scharf, M., and B. Briscoe, "Open Research Issues in Internet Congestion Control", RFC 6077, DOI 10.17487/RFC6077, February 2011, <https://www.rfc-editor.org/info/rfc6077>.
[RFC6363]: Watson, M., Begen, A., and V. Roca, "Forward Error Correction (FEC) Framework", RFC 6363, DOI 10.17487/RFC6363, October 2011, <https://www.rfc-editor.org/info/rfc6363>.
[RFC6679]: Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., and K. Carlberg, "Explicit Congestion Notification (ECN) for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 2012, <https://www.rfc-editor.org/info/rfc6679>.
[RFC6773]: Phelan, T., Fairhurst, G., and C. Perkins, "DCCP-UDP: A Datagram Congestion Control Protocol UDP Encapsulation for NAT Traversal", RFC 6773, DOI 10.17487/RFC6773, November 2012, <https://www.rfc-editor.org/info/rfc6773>.
[RFC6928]: Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, "Increasing TCP's Initial Window", RFC 6928, DOI 10.17487/RFC6928, April 2013, <https://www.rfc-editor.org/info/rfc6928>.
[RFC6951]: Tuexen, M. and R. Stewart, "UDP Encapsulation of Stream Control Transmission Protocol (SCTP) Packets for End-Host to End-Host Communication", RFC 6951, DOI 10.17487/RFC6951, May 2013, <https://www.rfc-editor.org/info/rfc6951>.
[RFC7661]: Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating TCP to Support Rate-Limited Traffic", RFC 7661, DOI 10.17487/RFC7661, October 2015, <https://www.rfc-editor.org/info/rfc7661>.
[RFC7806]: Baker, F. and R. Pan, "On Queuing, Marking, and Dropping", RFC 7806, DOI 10.17487/RFC7806, April 2016, <https://www.rfc-editor.org/info/rfc7806>.
[RFC793]: Postel, J., "Transmission Control Protocol", RFC 793, DOI 10.17487/RFC0793, September 1981, <https://www.rfc-editor.org/info/rfc793>.
[RFC8084]: Fairhurst, G., "Network Transport Circuit Breakers", BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, <https://www.rfc-editor.org/info/rfc8084>.
[RFC8087]: Fairhurst, G. and M. Welzl, "The Benefits of Using Explicit Congestion Notification (ECN)", RFC 8087, DOI 10.17487/RFC8087, March 2017, <https://www.rfc-editor.org/info/rfc8087>.
[RFC8311]: Black, D., "Relaxing Restrictions on Explicit Congestion Notification (ECN) Experimentation", RFC 8311, DOI 10.17487/RFC8311, January 2018, <https://www.rfc-editor.org/info/rfc8311>.
[RFC8511]: Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst, "TCP Alternative Backoff with ECN (ABE)", RFC 8511, DOI 10.17487/RFC8511, December 2018, <https://www.rfc-editor.org/info/rfc8511>.
[RFC8985]: Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The RACK-TLP Loss Detection Algorithm for TCP", RFC 8985, DOI 10.17487/RFC8985, February 2021, <https://www.rfc-editor.org/info/rfc8985>.
[RFC9000]: Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based Multiplexed and Secure Transport", RFC 9000, DOI 10.17487/RFC9000, May 2021, <https://www.rfc-editor.org/info/rfc9000>.
[RFC9002]: Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, May 2021, <https://www.rfc-editor.org/info/rfc9002>.
[RFC9040]: Touch, J., Welzl, M., and S. Islam, "TCP Control Block Interdependence", RFC 9040, DOI 10.17487/RFC9040, July 2021, <https://www.rfc-editor.org/info/rfc9040>.
[RFC9049]: Dawkins, S., Ed., "Path Aware Networking: Obstacles to Deployment (A Bestiary of Roads Not Taken)", RFC 9049, DOI 10.17487/RFC9049, June 2021, <https://www.rfc-editor.org/info/rfc9049>.
[RFC9293]: Eddy, W., Ed., "Transmission Control Protocol (TCP)", STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, <https://www.rfc-editor.org/info/rfc9293>.
[RFC9330]: Briscoe, B., Ed., De Schepper, K., Bagnulo, M., and G. White, "Low Latency, Low Loss, and Scalable Throughput (L4S) Internet Service: Architecture", RFC 9330, DOI 10.17487/RFC9330, January 2023, <https://www.rfc-editor.org/info/rfc9330>.
[RFC9332]: De Schepper, K., Briscoe, B., Ed., and G. White, "Dual-Queue Coupled Active Queue Management (AQM) for Low Latency, Low Loss, and Scalable Throughput (L4S)", RFC 9332, DOI 10.17487/RFC9332, January 2023, <https://www.rfc-editor.org/info/rfc9332>.
[RFC9406]: Balasubramanian, P., Huang, Y., and M. Olson, "HyStart++: Modified Slow Start for TCP", RFC 9406, DOI 10.17487/RFC9406, May 2023, <https://www.rfc-editor.org/info/rfc9406>.
[RFC9438]: Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., "CUBIC for Fast and Long-Distance Networks", RFC 9438, DOI 10.17487/RFC9438, August 2023, <https://www.rfc-editor.org/info/rfc9438>.

Appendix A. Revision Notes

Note to RFC-Editor: please remove this entire section prior to publication.¶

Previous versions of the document were presented and discsussed in tsvwg, and eveolved through several versions. This version is a refocus towards the newly formed CC Working Group where it is offered as a candidate for progression.¶

Individual draft -00:¶

First draft contributed to CC WG targeting publication as BCP.¶
Reduced overlap¶

Authors' Addresses

Godred Fairhurst

University of Aberdeen

School of Engineering
Fraser Noble Building

Aberdeen

AB24 3UE

United Kingdom

Email: gorry@erg.abdn.ac.uk

Michael Welzl

University of Oslo

Oslo

Norway

Email: michawe@ifi.uio.no

Document	Document type	Expired Internet-Draft (individual) Expired & archived
	Select version	00 01
	Compare versions
	Authors	Gorry Fairhurst , Michael Welzl Email authors
	Replaces	draft-fairhurst-tsvwg-cc
	RFC stream	(None)
	Intended RFC status	(None)
	Other formats	txt html pdf bibtex bibxml