[TCP]: Update sysctl and congestion control documentation.
Update the documentation to remove the old sysctl values and include the new congestion control infrastructure. Includes changes to tcp.txt by Ian McDonald. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
committed by
David S. Miller
parent
056ede6cfa
commit
9d7bcfc6b8
@@ -304,57 +304,6 @@ tcp_low_latency - BOOLEAN
|
|||||||
changed would be a Beowulf compute cluster.
|
changed would be a Beowulf compute cluster.
|
||||||
Default: 0
|
Default: 0
|
||||||
|
|
||||||
tcp_westwood - BOOLEAN
|
|
||||||
Enable TCP Westwood+ congestion control algorithm.
|
|
||||||
TCP Westwood+ is a sender-side only modification of the TCP Reno
|
|
||||||
protocol stack that optimizes the performance of TCP congestion
|
|
||||||
control. It is based on end-to-end bandwidth estimation to set
|
|
||||||
congestion window and slow start threshold after a congestion
|
|
||||||
episode. Using this estimation, TCP Westwood+ adaptively sets a
|
|
||||||
slow start threshold and a congestion window which takes into
|
|
||||||
account the bandwidth used at the time congestion is experienced.
|
|
||||||
TCP Westwood+ significantly increases fairness wrt TCP Reno in
|
|
||||||
wired networks and throughput over wireless links.
|
|
||||||
Default: 0
|
|
||||||
|
|
||||||
tcp_vegas_cong_avoid - BOOLEAN
|
|
||||||
Enable TCP Vegas congestion avoidance algorithm.
|
|
||||||
TCP Vegas is a sender-side only change to TCP that anticipates
|
|
||||||
the onset of congestion by estimating the bandwidth. TCP Vegas
|
|
||||||
adjusts the sending rate by modifying the congestion
|
|
||||||
window. TCP Vegas should provide less packet loss, but it is
|
|
||||||
not as aggressive as TCP Reno.
|
|
||||||
Default:0
|
|
||||||
|
|
||||||
tcp_bic - BOOLEAN
|
|
||||||
Enable BIC TCP congestion control algorithm.
|
|
||||||
BIC-TCP is a sender-side only change that ensures a linear RTT
|
|
||||||
fairness under large windows while offering both scalability and
|
|
||||||
bounded TCP-friendliness. The protocol combines two schemes
|
|
||||||
called additive increase and binary search increase. When the
|
|
||||||
congestion window is large, additive increase with a large
|
|
||||||
increment ensures linear RTT fairness as well as good
|
|
||||||
scalability. Under small congestion windows, binary search
|
|
||||||
increase provides TCP friendliness.
|
|
||||||
Default: 0
|
|
||||||
|
|
||||||
tcp_bic_low_window - INTEGER
|
|
||||||
Sets the threshold window (in packets) where BIC TCP starts to
|
|
||||||
adjust the congestion window. Below this threshold BIC TCP behaves
|
|
||||||
the same as the default TCP Reno.
|
|
||||||
Default: 14
|
|
||||||
|
|
||||||
tcp_bic_fast_convergence - BOOLEAN
|
|
||||||
Forces BIC TCP to more quickly respond to changes in congestion
|
|
||||||
window. Allows two flows sharing the same connection to converge
|
|
||||||
more rapidly.
|
|
||||||
Default: 1
|
|
||||||
|
|
||||||
tcp_default_win_scale - INTEGER
|
|
||||||
Sets the minimum window scale TCP will negotiate for on all
|
|
||||||
conections.
|
|
||||||
Default: 7
|
|
||||||
|
|
||||||
tcp_tso_win_divisor - INTEGER
|
tcp_tso_win_divisor - INTEGER
|
||||||
This allows control over what percentage of the congestion window
|
This allows control over what percentage of the congestion window
|
||||||
can be consumed by a single TSO frame.
|
can be consumed by a single TSO frame.
|
||||||
@@ -368,6 +317,11 @@ tcp_frto - BOOLEAN
|
|||||||
where packet loss is typically due to random radio interference
|
where packet loss is typically due to random radio interference
|
||||||
rather than intermediate router congestion.
|
rather than intermediate router congestion.
|
||||||
|
|
||||||
|
tcp_congestion_control - STRING
|
||||||
|
Set the congestion control algorithm to be used for new
|
||||||
|
connections. The algorithm "reno" is always available, but
|
||||||
|
additional choices may be available based on kernel configuration.
|
||||||
|
|
||||||
somaxconn - INTEGER
|
somaxconn - INTEGER
|
||||||
Limit of socket listen() backlog, known in userspace as SOMAXCONN.
|
Limit of socket listen() backlog, known in userspace as SOMAXCONN.
|
||||||
Defaults to 128. See also tcp_max_syn_backlog for additional tuning
|
Defaults to 128. See also tcp_max_syn_backlog for additional tuning
|
||||||
|
@@ -1,5 +1,72 @@
|
|||||||
How the new TCP output machine [nyi] works.
|
TCP protocol
|
||||||
|
============
|
||||||
|
|
||||||
|
Last updated: 21 June 2005
|
||||||
|
|
||||||
|
Contents
|
||||||
|
========
|
||||||
|
|
||||||
|
- Congestion control
|
||||||
|
- How the new TCP output machine [nyi] works
|
||||||
|
|
||||||
|
Congestion control
|
||||||
|
==================
|
||||||
|
|
||||||
|
The following variables are used in the tcp_sock for congestion control:
|
||||||
|
snd_cwnd The size of the congestion window
|
||||||
|
snd_ssthresh Slow start threshold. We are in slow start if
|
||||||
|
snd_cwnd is less than this.
|
||||||
|
snd_cwnd_cnt A counter used to slow down the rate of increase
|
||||||
|
once we exceed slow start threshold.
|
||||||
|
snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to.
|
||||||
|
snd_cwnd_stamp Timestamp for when congestion window last validated.
|
||||||
|
snd_cwnd_used Used as a highwater mark for how much of the
|
||||||
|
congestion window is in use. It is used to adjust
|
||||||
|
snd_cwnd down when the link is limited by the
|
||||||
|
application rather than the network.
|
||||||
|
|
||||||
|
As of 2.6.13, Linux supports pluggable congestion control algorithms.
|
||||||
|
A congestion control mechanism can be registered through functions in
|
||||||
|
tcp_cong.c. The functions used by the congestion control mechanism are
|
||||||
|
registered via passing a tcp_congestion_ops struct to
|
||||||
|
tcp_register_congestion_control. As a minimum name, ssthresh,
|
||||||
|
cong_avoid, min_cwnd must be valid.
|
||||||
|
|
||||||
|
Private data for a congestion control mechanism is stored in tp->ca_priv.
|
||||||
|
tcp_ca(tp) returns a pointer to this space. This is preallocated space - it
|
||||||
|
is important to check the size of your private data will fit this space, or
|
||||||
|
alternatively space could be allocated elsewhere and a pointer to it could
|
||||||
|
be stored here.
|
||||||
|
|
||||||
|
There are three kinds of congestion control algorithms currently: The
|
||||||
|
simplest ones are derived from TCP reno (highspeed, scalable) and just
|
||||||
|
provide an alternative the congestion window calculation. More complex
|
||||||
|
ones like BIC try to look at other events to provide better
|
||||||
|
heuristics. There are also round trip time based algorithms like
|
||||||
|
Vegas and Westwood+.
|
||||||
|
|
||||||
|
Good TCP congestion control is a complex problem because the algorithm
|
||||||
|
needs to maintain fairness and performance. Please review current
|
||||||
|
research and RFC's before developing new modules.
|
||||||
|
|
||||||
|
The method that is used to determine which congestion control mechanism is
|
||||||
|
determined by the setting of the sysctl net.ipv4.tcp_congestion_control.
|
||||||
|
The default congestion control will be the last one registered (LIFO);
|
||||||
|
so if you built everything as modules. the default will be reno. If you
|
||||||
|
build with the default's from Kconfig, then BIC will be builtin (not a module)
|
||||||
|
and it will end up the default.
|
||||||
|
|
||||||
|
If you really want a particular default value then you will need
|
||||||
|
to set it with the sysctl. If you use a sysctl, the module will be autoloaded
|
||||||
|
if needed and you will get the expected protocol. If you ask for an
|
||||||
|
unknown congestion method, then the sysctl attempt will fail.
|
||||||
|
|
||||||
|
If you remove a tcp congestion control module, then you will get the next
|
||||||
|
available one. Since reno can not be built as a module, and can not be
|
||||||
|
deleted, it will always be available.
|
||||||
|
|
||||||
|
How the new TCP output machine [nyi] works.
|
||||||
|
===========================================
|
||||||
|
|
||||||
Data is kept on a single queue. The skb->users flag tells us if the frame is
|
Data is kept on a single queue. The skb->users flag tells us if the frame is
|
||||||
one that has been queued already. To add a frame we throw it on the end. Ack
|
one that has been queued already. To add a frame we throw it on the end. Ack
|
||||||
|
Reference in New Issue
Block a user