TCP IP Congestion Control in Linux

What are congestion controls in Linux?

What is TCP IP congestion control in Linux?

The vast majority of data on the Internet today is transmitted using the Transmission Control Protocol (TCP) or User Datagram Protocol (UDP) protocols. UDP is a simple stateless protocol that sends packets thorough the network without checking if they arrived to the destination.

On the other hand, TCP is a more complex stateful protocol that establishes a connection between the source and destination of the packet and checks if it arrived correctly to the destination. TCP also supports additional functionality such as segmentation, reliability, and congestion control.

TCP IP congestion control in Linux

TCP's congestion control is used to prevent the congestion of packet transfer on the given network link. It uses the additive increase / multiplicative decrease (AIMD) scheme, with additional schemes including slow start and congestion window (CWND). The AIMD is a closed-loop control algorithm that combines linear growth of the congestion window with an exponential reduction when a congestion takes place. The CWND is maintained by the sender and enables a sender to stop a link between it and the receiver from becoming overloaded with too much traffic.

The size of the congestion window is determined by estimating how much congestion there is on the link. The flow of data over a TCP connection is controlled by the receive window. The receiver and sender can only send less data then its own CWND and the receive window. Slow start begins with the initial CWND size and increases by 1 for every successful acknowledgement received until packet loss is detected or the receive window size is reached.

TCP IP Congestion control in Linux

There are a plethora of algorithms for TCP congestion control. However, Linux only implements a subset of said algorithms, which are listed below.

Binary Increase Congestion control (BIC)

BIC is used by default in Linux kernels 2.6.8 through 2.6.18. The protocol adapts its CWND depending on the binary search increase and additive increase bandwidth probing parameters. The BIC algorithm has the following parameters:

Smax:    The maximum increment
Smin:    The minimum increment
Wmax:    The maximum window size  
B:       Multiplicative window decrease factor
cwnd:    Congestion window size  
bic_inc: Window increment per RTT (round trip time)

The algorithm updates the CWND window at every RTT interval. In case no packets are dropped, the CWND increases in one of three available ways: binary search increase, additive increase or slow start:

if (cwnd < Wmax)                    // binary search OR additive
    bic_inc = (Wmax - cwnd) / 2;
else                                // slow start OR additive
    bic_inc = cwnd - Wmax;

if (bic_inc > Smax)                 // additive
    bic_inc = Smax;
else if (bic_inc < Smin)            // binary search OR slow start
    bic_inc = Smin;

cwnd = cwnd + (bic_inc / cwnd);

When one or more packets are dropped, the CWND is reduced using multiplicative decrease. The decrease happens in two ways using fast or slow convergence depending on the current CWND size:

if (cwnd < Wmax)                    // fast convergence
    Wmax = cwnd * (2-B) / 2;
else                                // slow convergence
    Wmax = cwnd;

cwnd = cwnd * (1-B);

Cubic Binary Increase Congestion control (CUBIC)

CUBIC has been used by default in Linux kernels since version 2.6.19, which was released in November 2006. The protocol improves upon BIC, by modifying the CWND size by using the cubic function of time since last congestion event, which allows the network to stabilize before more bandwidth is allowed. CUBIC determines the CWND size independently from real-time and is not RTT dependent and is simpler than BIC. The CUBIC algorithm has the following parameters:

B:      Multiplicative decrease factor (default value = 0.7)
Wmax:   Window size just before the last reduction
T:      Time elapsed since the last window reduction
C:      A scaling constant (default value = 0.4)
cwnd:   The congestion window at the current time

The size of CWND is then modeled as follows:

$cwnd = C(T-K)^3+w_{max}$ ; where $K=\sqrt{\frac{w_{max}(1-\beta)}{C}}$

Proportional Rate Reduction (PRR)

PRR has been in the Linux kernel to improve loss recovery since version 3.2, which became available in January 2012. The protocol determines the number of new packets to be sent on the basis of number of acknowledgements received to provide a better recovery close to the slow start as possible after a packet loss event is detected. The PRR algorithm has the following parameters:

prr_delivered:  Total bytes delivered to the receiver since the start of the recovery
prr_out:        Total bytes transmitted during recovery.
RecoverFS:      The FlightSize at the start of recovery.
DeliveredData:  The number of bytes that the current acknowledgement indicates that have been delivered to the receiver.
sndcnt:         Determines how many bytes to send in response to each ACK.

The initialization on entering recover:

sshthresh = CongCtrlAlg()       // target CWND after recovery
prr_delivered = 0               // total bytes delivered during recovery
prr_out = 0                     // total bytes sent during recovery
RecoverFS = snd.nxt - snd.una   // FlightSize at the start of recovery

On every acknowledgement during recovery compute:

DeliveredData = delta(snd.una) + delta(SACKd)                       // DeliveredData is number of new bytes that the current acknowledgment indicates have been delivered to the receiver.
prr_delivered += DeliveredData
pipe = PipeAlgorithm()                                              // pipe algorithm according RFC 3517

if (pipe > ssthresh)
    sndcnt = CEIL(prr_delivered * ssthresh/RecoverFS) - prr_out     //proportional rate reduction
else
    ss_limit = MAX(prr_delivered - prr_out, DeliveredData) + 1      // slow start
    sndcnt = MIN(ssthresh - pipe, ss_limit)

sndcnt = MAX(sndcnt, 0)
cwnd = pipe + sndcnt

On any data transmission or retransmission:

prr_out += data_sent

At the end of recovery:

cwnd = ssthresh

Bottleneck Bandwidth and Round-trip propagation time (BBR)

BBR is incorporated in Linux kernels to enable model-based congestion control. This has been the case since version 4.9, which became available in December 2016. The protocol builds the model of the network over time from the maximum bandwidth and round trip time at which the network delivered the last number of packets. It exploits the latency associated with the bufferbloat instead of packet loss to better model the maximum throughput and provide higher throughput and lower latency. The BBR algorithm has two main parts. First, each new acknowledgement provides new RTT and average delivery rate measurements that update the algorithm estimates:

rtt = now - packet.sendtime
update_min_filter(RTpropFilter, rtt)
delivered += packet.size
delivered_time = now
deliveryRate = (delivered - packet.delivered) / (delivered_time - packet.delivered_time)

if (deliveryRate > BtlBwFilter.currentMax || ! packet.app_limited)
    update_max_filter(BtlBwFilter, deliveryRate)

if (app_limited_until > 0)
    app_limited_until = app_limited_until - packet.size

Second, it paces the sending rate of each data packet to match the link's bottleneck:

bdp = BtlBwFilter.currentMax × RTpropFilter.currentMin

if (inflight >= cwnd_gain × bdp)
    // wait for ack or retransmission timeout
    return

if (now >= nextSendTime)
    packet = nextPacketToSend()
    if (! packet)
        app_limited_until = inflight
        return
    packet.app_limited = (app_limited_until > 0)
    packet.sendtime = now
    packet.delivered = delivered
    packet.delivered_time = delivered_time
    ship(packet)
    nextSendTime = now + packet.size / (pacing_gain × BtlBwFilter.currentMax)

timerCallbackAt(send, nextSendTime)

Did you find this article helpful?

0 out of 0 Bunnies found this article helpful

Glossary

Layer 4 (Transport Layer)

The layer of the OSI model that handles traffic between hosts and clients (TCP/UDP).

TCP

Transmission Control Protocol.

Prove your Knowledge.
Earn a bunny diploma.

Test your knowledge in our Junior and Master Quizzes to see where you stand. Prove your mastery by earning an A+ and receiving a diploma.

Start the Quiz

TCP IP Congestion Control in Linux

What is TCP IP congestion control in Linux?

TCP IP Congestion control in Linux

Binary Increase Congestion control (BIC)

Cubic Binary Increase Congestion control (CUBIC)

Proportional Rate Reduction (PRR)

Bottleneck Bandwidth and Round-trip propagation time (BBR)

Did you find this article helpful?

Glossary

Related Articles

Prove your Knowledge.
Earn a bunny diploma.

bunny.net

Products

Developers

Support

TCP IP Congestion Control in Linux

What is TCP IP congestion control in Linux?

TCP IP Congestion control in Linux

Binary Increase Congestion control (BIC)

Cubic Binary Increase Congestion control (CUBIC)

Proportional Rate Reduction (PRR)

Bottleneck Bandwidth and Round-trip propagation time (BBR)

Did you find this article helpful?

Glossary

Related Articles

Prove your Knowledge.Earn a bunny diploma.

Prove your Knowledge.
Earn a bunny diploma.