linux-kernel-test/net/ipv4
Eric Dumazet 9b462d02d6 tcp: TCP Small Queues and strange attractors
TCP Small queues tries to keep number of packets in qdisc
as small as possible, and depends on a tasklet to feed following
packets at TX completion time.
Choice of tasklet was driven by latencies requirements.

Then, TCP stack tries to avoid reorders, by locking flows with
outstanding packets in qdisc in a given TX queue.

What can happen is that many flows get attracted by a low performing
TX queue, and cpu servicing TX completion has to feed packets for all of
them, making this cpu 100% busy in softirq mode.

This became particularly visible with latest skb->xmit_more support

Strategy adopted in this patch is to detect when tcp_wfree() is called
from ksoftirqd and let the outstanding queue for this flow being drained
before feeding additional packets, so that skb->ooo_okay can be set
to allow select_queue() to select the optimal queue :

Incoming ACKS are normally handled by different cpus, so this patch
gives more chance for these cpus to take over the burden of feeding
qdisc with future packets.

Tested:

lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 &

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:18 AM      eth1 595448.00 1190564.00  38381.09 1760253.12      0.00      0.00      1.00
06:16:19 AM      eth1 594858.00 1189686.00  38340.76 1758952.72      0.00      0.00      0.00
06:16:20 AM      eth1 597017.00 1194019.00  38480.79 1765370.29      0.00      0.00      1.00
06:16:21 AM      eth1 595450.00 1190936.00  38380.19 1760805.05      0.00      0.00      0.00
06:16:22 AM      eth1 596385.00 1193096.00  38442.56 1763976.29      0.00      0.00      1.00
06:16:23 AM      eth1 598155.00 1195978.00  38552.97 1768264.60      0.00      0.00      0.00
06:16:24 AM      eth1 594405.00 1188643.00  38312.57 1757414.89      0.00      0.00      1.00
06:16:25 AM      eth1 593366.00 1187154.00  38252.16 1755195.83      0.00      0.00      0.00
06:16:26 AM      eth1 593188.00 1186118.00  38232.88 1753682.57      0.00      0.00      1.00
06:16:27 AM      eth1 596301.00 1192241.00  38440.94 1762733.09      0.00      0.00      0.00
Average:         eth1 595457.30 1190843.50  38381.69 1760664.84      0.00      0.00      0.50
lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog
 backlog 7606336b 2513p requeues 167982
 backlog 224072b 74p requeues 566
 backlog 581376b 192p requeues 5598
 backlog 181680b 60p requeues 1070
 backlog 5305056b 1753p requeues 110166    // Here, this TX queue is attracting flows
 backlog 157456b 52p requeues 1758
 backlog 672216b 222p requeues 3025
 backlog 60560b 20p requeues 24541
 backlog 448144b 148p requeues 21258

lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect

Immediate jump to full bandwidth, and traffic is properly
shard on all tx queues.

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:46 AM      eth1 1397632.00 2795397.00  90081.87 4133031.26      0.00      0.00      1.00
06:16:47 AM      eth1 1396874.00 2793614.00  90032.99 4130385.46      0.00      0.00      0.00
06:16:48 AM      eth1 1395842.00 2791600.00  89966.46 4127409.67      0.00      0.00      1.00
06:16:49 AM      eth1 1395528.00 2791017.00  89946.17 4126551.24      0.00      0.00      0.00
06:16:50 AM      eth1 1397891.00 2795716.00  90098.74 4133497.39      0.00      0.00      1.00
06:16:51 AM      eth1 1394951.00 2789984.00  89908.96 4125022.51      0.00      0.00      0.00
06:16:52 AM      eth1 1394608.00 2789190.00  89886.90 4123851.36      0.00      0.00      1.00
06:16:53 AM      eth1 1395314.00 2790653.00  89934.33 4125983.09      0.00      0.00      0.00
06:16:54 AM      eth1 1396115.00 2792276.00  89984.25 4128411.21      0.00      0.00      1.00
06:16:55 AM      eth1 1396829.00 2793523.00  90030.19 4130250.28      0.00      0.00      0.00
Average:         eth1 1396158.40 2792297.00  89987.09 4128439.35      0.00      0.00      0.50

lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog
 backlog 7900052b 2609p requeues 173287
 backlog 878120b 290p requeues 589
 backlog 1068884b 354p requeues 5621
 backlog 996212b 329p requeues 1088
 backlog 984100b 325p requeues 115316
 backlog 956848b 316p requeues 1781
 backlog 1080996b 357p requeues 3047
 backlog 975016b 322p requeues 24571
 backlog 990156b 327p requeues 21274

(All 8 TX queues get a fair share of the traffic)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 17:16:26 -04:00
..
netfilter netfilter: nft_masq: register/unregister notifiers on module init/exit 2014-10-03 14:24:35 +02:00
af_inet.c ipv4: mentions skb_gro_postpull_rcsum() in inet_gro_receive() 2014-10-01 13:44:05 -04:00
ah4.c ipsec: Remove obsolete MAX_AH_AUTH_LEN 2014-09-18 10:54:36 +02:00
arp.c arp: Do not perturb drop profiles with ignored ARP packets 2014-09-28 17:30:35 -04:00
cipso_ipv4.c cipso: add __init to cipso_v4_cache_init 2014-10-01 15:46:20 -04:00
datagram.c net: Save TX flow hash in sock and set in skbuf on xmit 2014-07-07 21:14:21 -07:00
devinet.c ipv4: fail early when creating netdev named all or default 2014-07-29 11:43:50 -07:00
esp4.c
fib_frontend.c ipv4: Restore accept_local behaviour in fib_validate_source() 2014-08-22 12:23:10 -07:00
fib_lookup.h
fib_rules.c
fib_semantics.c ipv4: fix nexthop attlen check in fib_nh_match 2014-10-14 15:59:37 -04:00
fib_trie.c list: fix order of arguments for hlist_add_after(_rcu) 2014-08-06 18:01:24 -07:00
fou.c gue: Receive side for Generic UDP Encapsulation 2014-10-03 16:53:33 -07:00
geneve.c net: fix a sparse warning 2014-10-07 00:10:47 -04:00
gre_demux.c net: Fix GRE RX to use skb_transport_header for GRE header offset 2014-09-08 15:23:05 -07:00
gre_offload.c net: Remove gso_send_check as an offload callback 2014-09-26 00:22:47 -04:00
icmp.c icmp: add a global rate limitation 2014-09-23 12:47:38 -04:00
igmp.c ipv4: igmp: fix v3 general query drop monitor false positive 2014-10-06 17:14:54 -04:00
inet_connection_sock.c ipv4: make ip_local_reserved_ports per netns 2014-05-14 15:31:45 -04:00
inet_diag.c
inet_fragment.c inet: frags: use kmem_cache for inet_frag_queue 2014-08-02 15:31:31 -07:00
inet_hashtables.c net: use reciprocal_scale() helper 2014-08-23 12:21:21 -07:00
inet_lro.c
inet_timewait_sock.c
inetpeer.c inet: remove dead inetpeer sequence code 2014-09-08 16:42:42 -07:00
ip_forward.c net: rename local_df to ignore_df 2014-05-12 14:03:41 -04:00
ip_fragment.c inet: frags: add __init to ip4_frags_ctl_register 2014-10-01 15:46:19 -04:00
ip_gre.c net: better IFF_XMIT_DST_RELEASE support 2014-10-07 13:22:11 -04:00
ip_input.c
ip_options.c ipv4: rename ip_options_echo to __ip_options_echo() 2014-09-28 16:35:42 -04:00
ip_output.c netfilter: use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) 2014-10-02 18:30:54 +02:00
ip_sockglue.c ipv4: rcu cleanup in ip_ra_control() 2014-09-09 20:10:44 -07:00
ip_tunnel_core.c net: Support for multiple checksums with gso 2014-06-04 22:46:38 -07:00
ip_tunnel.c ip_tunnel: Add GUE support 2014-10-03 16:53:33 -07:00
ip_vti.c net: better IFF_XMIT_DST_RELEASE support 2014-10-07 13:22:11 -04:00
ipcomp.c ipcomp4: Use the IPsec protocol multiplexer API 2014-02-25 07:04:17 +01:00
ipconfig.c ipconfig: Use time_before 2014-08-22 12:23:11 -07:00
ipip.c net: better IFF_XMIT_DST_RELEASE support 2014-10-07 13:22:11 -04:00
ipmr.c net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
Kconfig openvswitch: fix a compilation error when CONFIG_INET is not setW! 2014-10-07 00:10:49 -04:00
Makefile net: Add Geneve tunneling protocol driver 2014-10-06 00:32:20 -04:00
netfilter.c
ping.c net/ipv4: bind ip_nonlocal_bind to current netns 2014-09-09 11:27:09 -07:00
proc.c inet: frag: don't account number of fragment queues 2014-07-27 22:34:36 -07:00
protocol.c net: Export inet_offloads and inet6_offloads 2014-09-19 17:15:31 -04:00
raw.c ipv4: Make IP_MULTICAST_ALL and IP_MSFILTER work on raw sockets 2014-07-23 15:13:26 -07:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-10-02 11:25:43 -07:00
syncookies.c tcp: syncookies: mark cookie_secret read_mostly 2014-08-27 16:30:49 -07:00
sysctl_net_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-10-08 21:40:54 -04:00
tcp_bic.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_cong.c tcp: Change tcp_slow_start function to return void 2014-09-30 17:09:16 -04:00
tcp_cubic.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_dctcp.c net: tcp: add DCTCP congestion control algorithm 2014-09-29 00:13:10 -04:00
tcp_diag.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_fastopen.c tcp: remove unnecessary assignment. 2014-09-29 12:31:12 -04:00
tcp_highspeed.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_htcp.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_hybla.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_illinois.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_input.c tcp: fix tcp_ack() performance problem 2014-10-14 15:59:37 -04:00
tcp_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-10-08 21:40:54 -04:00
tcp_lp.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_memcontrol.c cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() 2014-07-15 11:05:09 -04:00
tcp_metrics.c tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic 2014-08-14 14:38:54 -07:00
tcp_minisocks.c tcp: change TCP_ECN prefixes to lower case 2014-09-29 14:41:22 -04:00
tcp_offload.c net: Remove gso_send_check as an offload callback 2014-09-26 00:22:47 -04:00
tcp_output.c tcp: TCP Small Queues and strange attractors 2014-10-14 17:16:26 -04:00
tcp_probe.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_scalable.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_timer.c tcp: abort orphan sockets stalling on zero window probes 2014-10-01 16:27:52 -04:00
tcp_vegas.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_vegas.h
tcp_veno.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp_westwood.c net: tcp: split ack slow/fast events from cwnd_event 2014-09-29 00:13:10 -04:00
tcp_yeah.c tcp: whitespace fixes 2014-09-01 18:12:45 -07:00
tcp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-10-08 21:40:54 -04:00
tunnel4.c
udp_diag.c
udp_impl.h
udp_offload.c fou: eliminate IPv4,v6 specific GRO functions 2014-10-03 16:53:32 -07:00
udp_tunnel.c udp-tunnel: Add a few more UDP tunnel APIs 2014-09-19 15:57:15 -04:00
udp.c net: merge cases where sock_efree and sock_edemux are the same function 2014-09-05 17:43:45 -07:00
udplite.c net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
xfrm4_input.c
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c inetpeer: get rid of ip_id_count 2014-06-02 11:00:41 -07:00
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-05-24 00:32:30 -04:00
xfrm4_policy.c xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly 2014-03-14 07:28:07 +01:00
xfrm4_protocol.c xfrm4: Remove duplicate semicolon 2014-06-30 07:49:47 +02:00
xfrm4_state.c
xfrm4_tunnel.c