linux-kernel-test

Author	SHA1	Message	Date
David S. Miller	0c1072ae02	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/freescale/fec_main.c drivers/net/ethernet/renesas/sh_eth.c net/ipv4/gre.c The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list) and the splitting of the gre.c code into seperate files. The FEC conflict was two sets of changes adding ethtool support code in an "!CONFIG_M5272" CPP protected block. Finally the sh_eth.c conflict was between one commit add bits set in the .eesr_err_check mask whilst another commit removed the .tx_error_check member and assignments. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-03 14:55:13 -07:00
Nicolas Dichtel	621e84d6f3	dev: introduce skb_scrub_packet() The goal of this new function is to perform all needed cleanup before sending an skb into another netns. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-27 22:29:05 -07:00
Eric Dumazet	bd8a7036c0	gre: fix a possible skb leak commit `68c3316311` ("v4 GRE: Add TCP segmentation offload for GRE") added a possible skb leak, because it frees only the head of segment list, in case a skb_linearize() call fails. This patch adds a kfree_skb_list() helper to fix the bug. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-25 16:07:44 -07:00
Cong Wang	30f3a40f9a	net: remove last caller of skb_tail_offset() and itself Similar to the following commits: commit `00f97da17a` (netpoll: fix position of network header) commit `525cebedb3` (pktgen: Fix position of ip and udp header) using skb_tail_offset() seems not correct since the offset is based on head pointer. With the last caller removed, skb_tail_offset() can be killed finally. Cc: Thomas Graf <tgraf@suug.ch> Cc: Daniel Borkmann <dborkmann@redhat.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-10 22:22:23 -07:00
Eliezer Tamir	0602129286	net: add low latency socket poll Adds an ndo_ll_poll method and the code that supports it. This method can be used by low latency applications to busy-poll Ethernet device queues directly from the socket code. sysctl_net_ll_poll controls how many microseconds to poll. Default is zero (disabled). Individual protocol support will be added by subsequent patches. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Tested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-10 21:22:35 -07:00
David S. Miller	6bc19fb82d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Merge 'net' bug fixes into 'net-next' as we have patches that will build on top of them. This merge commit includes a change from Emil Goode (emilgoode@gmail.com) that fixes a warning that would have been introduced by this merge. Specifically it fixes the pingv6_ops method ipv6_chk_addr() to add a "const" to the "struct net_device *dev" argument and likewise update the dummy_ipv6_chk_addr() declaration. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-05 16:37:30 -07:00
Pravin B Shelar	1e2bd517c1	udp6: Fix udp fragmentation for tunnel traffic. udp6 over GRE tunnel does not work after to GRE tso changes. GRE tso handler passes inner packet but keeps track of outer header start in SKB_GSO_CB(skb)->mac_offset. udp6 fragment need to take care of outer header, which start at the mac_offset, while adding fragment header. This bug is introduced by commit `68c3316311` (GRE: Add TCP segmentation offload for GRE). Reported-by: Dmitry Kravkov <dkravkov@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Tested-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-31 17:06:07 -07:00
Cong Wang	35d0461061	net: clean up skb headers code commit `1a37e412a0` (net: Use 16bits for _headers fields of struct skbuff) converts skb->_header to u16, some #if NET_SKBUFF_DATA_USES_OFFSET are now useless, and to be safe, we could just use "X = (typeof(X)) ~0U;" as suggested by David. Cc: David S. Miller <davem@davemloft.net> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-31 17:02:47 -07:00
Simon Horman	7cc4619005	net, ipv4, ipv6: Correct assignment of skb->network_header to skb->tail This corrects an regression introduced by "net: Use 16bits for *_headers fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In that case skb->tail will be a pointer however skb->network_header is now an offset. This patch corrects the problem by adding a wrapper to return skb tail as an offset regardless of the value of NET_SKBUFF_DATA_USES_OFFSET. It seems that skb->tail that this offset may be more than 64k and some care has been taken to treat such cases as an error. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-28 23:49:07 -07:00
Simon Horman	0d89d2035f	MPLS: Add limited GSO support In the case where a non-MPLS packet is received and an MPLS stack is added it may well be the case that the original skb is GSO but the NIC used for transmit does not support GSO of MPLS packets. The aim of this code is to provide GSO in software for MPLS packets whose skbs are GSO. SKB Usage: When an implementation adds an MPLS stack to a non-MPLS packet it should do the following to skb metadata: * Set skb->inner_protocol to the old non-MPLS ethertype of the packet. skb->inner_protocol is added by this patch. * Set skb->protocol to the new MPLS ethertype of the packet. * Set skb->network_header to correspond to the end of the L3 header, including the MPLS label stack. I have posted a patch, "[PATCH v3.29] datapath: Add basic MPLS support to kernel" which adds MPLS support to the kernel datapath of Open vSwtich. That patch sets the above requirements in datapath/actions.c:push_mpls() and was used to exercise this code. The datapath patch is against the Open vSwtich tree but it is intended that it be added to the Open vSwtich code present in the mainline Linux kernel at some point. Features: I believe that the approach that I have taken is at least partially consistent with the handling of other protocols. Jesse, I understand that you have some ideas here. I am more than happy to change my implementation. This patch adds dev->mpls_features which may be used by devices to advertise features supported for MPLS packets. A new NETIF_F_MPLS_GSO feature is added for devices which support hardware MPLS GSO offload. Currently no devices support this and MPLS GSO always falls back to software. Alternate Implementation: One possible alternate implementation is to teach netif_skb_features() and skb_network_protocol() about MPLS, in a similar way to their understanding of VLANs. I believe this would avoid the need for net/mpls/mpls_gso.c and in particular the calls to __skb_push() and __skb_push() in mpls_gso_segment(). I have decided on the implementation in this patch as it should not introduce any overhead in the case where mpls_gso is not compiled into the kernel or inserted as a module. MPLS GSO suggested by Jesse Gross. Based in part on "v4 GRE: Add TCP segmentation offload for GRE" by Pravin B Shelar. Cc: Jesse Gross <jesse@nicira.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-27 22:50:59 -07:00
Simon Horman	1a37e412a0	net: Use 16bits for _headers fields of struct skbuff In order to mitigate ongoing incresase in the size of struct skbuff use 16 bit integer offsets rather than pointers for inner__headers. This appears to reduce the size of struct skbuff from 0xd0 to 0xc0 bytes on x86_64 with the following all unset. CONFIG_XFRM CONFIG_NF_CONNTRACK CONFIG_NF_CONNTRACK_MODULE NET_SKBUFF_NF_DEFRAG_NEEDED CONFIG_BRIDGE_NETFILTER CONFIG_NET_SCHED CONFIG_IPV6_NDISC_NODETYPE CONFIG_NET_DMA CONFIG_NETWORK_SECMARK Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-27 22:50:59 -07:00
Patrick McHardy	0ebd0ac5ff	net: add function to allocate sk_buff head without data area Add a function to allocate a sk_buff head without any data. This will be used by memory mapped netlink to attach data from the mmaped area to the skb. Additionally change skb_release_all() to check whether the skb has a data area to allow the skb destructor to clear the data pointer in case only a head has been allocated. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:57 -04:00
Patrick McHardy	86a9bad3ab	net: vlan: add protocol argument to packet tagging functions Add a protocol argument to the VLAN packet tagging functions. In case of HW tagging, we need that protocol available in the ndo_start_xmit functions, so it is stored in a new field in the skb. The new field fits into a hole (on 64 bit) and doesn't increase the sks's size. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:46:06 -04:00
David S. Miller	d978a6361a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/nfc/microread/mei.c net/netfilter/nfnetlink_queue_core.c Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which some cleanups are going to go on-top. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-07 18:37:01 -04:00
Patrick McHardy	124dff01af	netfilter: don't reset nf_trace in nf_reset() Commit `130549fe` ("netfilter: reset nf_trace in nf_reset") added code to reset nf_trace in nf_reset(). This is wrong and unnecessary. nf_reset() is used in the following cases: - when passing packets up the the socket layer, at which point we want to release all netfilter references that might keep modules pinned while the packet is queued. nf_trace doesn't matter anymore at this point. - when encapsulating or decapsulating IPsec packets. We want to continue tracing these packets after IPsec processing. - when passing packets through virtual network devices. Only devices on that encapsulate in IPv4/v6 matter since otherwise nf_trace is not used anymore. Its not entirely clear whether those packets should be traced after that, however we've always done that. - when passing packets through virtual network devices that make the packet cross network namespace boundaries. This is the only cases where we clearly want to reset nf_trace and is also what the original patch intended to fix. Add a new function nf_reset_trace() and use it in dev_forward_skb() to fix this properly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-05 15:38:10 -04:00
Julian Anastasov	932bc4d7a5	net: add skb_dst_set_noref_force Rename skb_dst_set_noref to __skb_dst_set_noref and add force flag as suggested by David Miller. The new wrapper skb_dst_set_noref_force will force dst entries that are not cached to be attached as skb dst without taking reference as long as provided dst is reclaimed after RCU grace period. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-04-02 00:22:53 +02:00
Ying Xue	fbbdb8f096	net: fix compile error of implicit declaration of skb_probe_transport_header The commit 40893fd(net: switch to use skb_probe_transport_header()) involes a new error accidently. When NET_SKBUFF_DATA_USES_OFFSE is not enabled, below compile error happens: CC net/packet/af_packet.o net/packet/af_packet.c: In function ‘packet_sendmsg_spkt’: net/packet/af_packet.c:1516:2: error: implicit declaration of function ‘skb_probe_transport_header’ [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors make[2]: * [net/packet/af_packet.o] Error 1 make[1]: * [net/packet] Error 2 make: *** [net] Error 2 As it seems skb_probe_transport_header() is not related to NET_SKBUFF_DATA_USES_OFFSE, we should move the definition of skb_probe_transport_header() out of scope of NET_SKBUFF_DATA_USES_OFFSE macro. Cc: Jason Wang <jasowang@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-27 23:21:58 -04:00
Jason Wang	5203cd28db	net: core: introduce skb_probe_transport_header() Sometimes, we need probe and set the transport header for packets (e.g from untrusted source). This patch introduces a new helper skb_probe_transport_header() which tries to probe and set the l4 header through skb_flow_dissect(), if not just set the transport header to the hint passed by caller. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-27 12:48:31 -04:00
Gao feng	130549fed8	netfilter: reset nf_trace in nf_reset We forgot to clear the nf_trace of sk_buff in nf_reset, When we use veth device, this nf_trace information will be leaked from one net namespace to another net namespace. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-03-25 14:21:23 +01:00
Daniel Borkmann	f77668dc25	net: flow_dissector: add __skb_get_poff to get a start offset to payload __skb_get_poff() returns the offset to the payload as far as it could be dissected. The main user is currently BPF, so that we can dynamically truncate packets without needing to push actual payload to the user space and instead can analyze headers only. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 13:15:45 -04:00
David S. Miller	61816596d1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull in the 'net' tree to get Daniel Borkmann's flow dissector infrastructure change. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:46:26 -04:00
Pavel Emelyanov	cca7af3889	skb: Propagate pfmemalloc on skb from head page only Hi. I'm trying to send big chunks of memory from application address space via TCP socket using vmsplice + splice like this mem = mmap(128Mb); vmsplice(pipe[1], mem); /* splice memory into pipe / splice(pipe[0], tcp_socket); / send it into network / When I'm lucky and a huge page splices into the pipe and then into the socket _and_ client and server ends of the TCP connection are on the same host, communicating via lo, the whole connection gets stuck! The sending queue becomes full and app stops writing/splicing more into it, but the receiving queue remains empty, and that's why. The __skb_fill_page_desc observes a tail page of a huge page and erroneously propagates its page->pfmemalloc value onto socket (the pfmemalloc on tail pages contain garbage). Then this skb->pfmemalloc leaks through lo and due to the tcp_v4_rcv sk_filter if (skb->pfmemalloc && !sock_flag(sk, SOCK_MEMALLOC)) / true */ return -ENOMEM goto release_and_discard; no packets reach the socket. Even TCP re-transmits are dropped by this, as skb cloning clones the pfmemalloc flag as well. That said, here's the proper page->pfmemalloc propagation onto socket: we must check the huge-page's head page only, other pages' pfmemalloc and mapping values do not contain what is expected in this place. However, I'm not sure whether this fix is _complete_, since pfmemalloc propagation via lo also oesn't look great. Both, bit propagation from page to skb and this check in sk_filter, were introduced by `c48a11c7` (netvm: propagate page->pfmemalloc to skb), in v3.5 so Mel and stable@ are in Cc. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-14 11:53:32 -04:00
Eric Dumazet	16fad69cfe	tcp: fix skb_availroom() Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack : https://code.google.com/p/chromium/issues/detail?id=182056 commit `a21d45726a` (tcp: avoid order-1 allocations on wifi and tx path) did a poor choice adding an 'avail_size' field to skb, while what we really needed was a 'reserved_tailroom' one. It would have avoided commit `22b4a4f22d` (tcp: fix retransmit of partially acked frames) and this commit. Crash occurs because skb_split() is not aware of the 'avail_size' management (and should not be aware) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mukesh Agrawal <quiche@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-14 11:49:45 -04:00
Pravin B Shelar	7313626745	tunneling: Add generic Tunnel segmentation. Adds generic tunneling offloading support for IPv4-UDP based tunnels. GSO type is added to request this offload for a skb. netdev feature NETIF_F_UDP_TUNNEL is added for hardware offloaded udp-tunnel support. Currently no device supports this feature, software offload is used. This can be used by tunneling protocols like VXLAN. CC: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-09 16:09:17 -05:00
Pravin B Shelar	aefbd2b3c2	tunneling: Capture inner mac header during encapsulation. This patch adds inner mac header. This will be used in next patch to find tunner header length. Header len is required to copy tunnel header to each gso segment. This patch does not change any functionality. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-09 16:08:57 -05:00
Pravin B Shelar	68c3316311	v4 GRE: Add TCP segmentation offload for GRE Following patch adds GRE protocol offload handler so that skb_gso_segment() can segment GRE packets. SKB GSO CB is added to keep track of total header length so that skb_segment can push entire header. e.g. in case of GRE, skb_segment need to push inner and outer headers to every segment. New NETIF_F_GRE_GSO feature is added for devices which support HW GRE TSO offload. Currently none of devices support it therefore GRE GSO always fall backs to software GSO. [ Compute pkt_len before ip_local_out() invocation. -DaveM ] Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-15 15:17:11 -05:00
Pravin B Shelar	14bbd6a565	net: Add skb_unclone() helper function. This function will be used in next GRE_GSO patch. This patch does not change any functionality. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com>	2013-02-15 15:10:37 -05:00
Pravin B Shelar	c9af6db4c1	net: Fix possible wrong checksum generation. Patch `cef401de7b` (net: fix possible wrong checksum generation) fixed wrong checksum calculation but it broke TSO by defining new GSO type but not a netdev feature for that type. net_gso_ok() would not allow hardware checksum/segmentation offload of such packets without the feature. Following patch fixes TSO and wrong checksum. This patch uses same logic that Eric Dumazet used. Patch introduces new flag SKBTX_SHARED_FRAG if at least one frag can be modified by the user. but SKBTX_SHARED_FRAG flag is kept in skb shared info tx_flags rather than gso_type. tx_flags is better compared to gso_type since we can have skb with shared frag without gso packet. It does not link SHARED_FRAG to GSO, So there is no need to define netdev feature for this. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-13 13:30:10 -05:00
Alexander Duyck	e5e6730588	skbuff: Move definition of NETDEV_FRAG_PAGE_MAX_SIZE In order to address the fact that some devices cannot support the full 32K frag size we need to have the value accessible somewhere so that we can use it to do comparisons against what the device can support. As such I am moving the values out of skbuff.c and into skbuff.h. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-08 17:33:35 -05:00
Eric Dumazet	cef401de7b	net: fix possible wrong checksum generation Pravin Shelar mentioned that GSO could potentially generate wrong TX checksum if skb has fragments that are overwritten by the user between the checksum computation and transmit. He suggested to linearize skbs but this extra copy can be avoided for normal tcp skbs cooked by tcp_sendmsg(). This patch introduces a new SKB_GSO_SHARED_FRAG flag, set in skb_shinfo(skb)->gso_type if at least one frag can be modified by the user. Typical sources of such possible overwrites are {vm}splice(), sendfile(), and macvtap/tun/virtio_net drivers. Tested: $ netperf -H 7.7.8.84 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3959.52 $ netperf -H 7.7.8.84 -t TCP_SENDFILE TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3216.80 Performance of the SENDFILE is impacted by the extra allocation and copy, and because we use order-0 pages, while the TCP_STREAM uses bigger pages. Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-28 00:27:15 -05:00
Eric Dumazet	fda55eca5a	net: introduce skb_transport_header_was_set() We have skb_mac_header_was_set() helper to tell if mac_header was set on a skb. We would like the same for transport_header. __netif_receive_skb() doesn't reset the transport header if already set by GRO layer. Note that network stacks usually reset the transport header anyway, after pulling the network header, so this change only allows a followup patch to have more precise qdisc pkt_len computation for GSO packets at ingress side. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-08 17:51:54 -08:00
Joseph Gasparakis	6a674e9c75	net: Add support for hardware-offloaded encapsulation This patch adds support in the kernel for offloading in the NIC Tx and Rx checksumming for encapsulated packets (such as VXLAN and IP GRE). For Tx encapsulation offload, the driver will need to set the right bits in netdev->hw_enc_features. The protocol driver will have to set the skb->encapsulation bit and populate the inner headers, so the NIC driver will use those inner headers to calculate the csum in hardware. For Rx encapsulation offload, the driver will need to set again the skb->encapsulation flag and the skb->ip_csum to CHECKSUM_UNNECESSARY. In that case the protocol driver should push the decapsulated packet up to the stack, again with CHECKSUM_UNNECESSARY. In ether case, the protocol driver should set the skb->encapsulation flag back to zero. Finally the protocol driver should have NETIF_F_RXCSUM flag set in its features. Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-09 00:20:28 -05:00
Michael S. Tsirkin	25121173f7	skb: api to report errors for zero copy skbs Orphaning frags for zero copy skbs needs to allocate data in atomic context so is has a chance to fail. If it does we currently discard the skb which is safe, but we don't report anything to the caller, so it can not recover by e.g. disabling zero copy. Add an API to free skb reporting such errors: this is used by tun in case orphaning frags fails. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:57 -04:00
Michael S. Tsirkin	e19d6763cc	skb: report completion status for zero copy skbs Even if skb is marked for zero copy, net core might still decide to copy it later which is somewhat slower than a copy in user context: besides copying the data we need to pin/unpin the pages. Add a parameter reporting such cases through zero copy callback: if this happens a lot, device can take this into account and switch to copying in user context. This patch updates all users but ignores the passed value for now: it will be used by follow-up patches. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:57 -04:00
Willem de Bruijn	ecd5cf5dc5	net: compute skb->rxhash if nic hash may be 3-tuple Network device drivers can communicate a Toeplitz hash in skb->rxhash, but devices differ in their hashing capabilities. All compute a 5-tuple hash for TCP over IPv4, but for other connection-oriented protocols, they may compute only a 3-tuple. This breaks RPS load balancing, e.g., for TCP over IPv6 flows. Additionally, for GRE and other tunnels, the kernel computes a 5-tuple hash over the inner packet if possible, but devices do not. This patch recomputes the rxhash in software in all cases where it cannot be certain that a 5-tuple was computed. Device drivers can avoid recomputation by setting the skb->l4_rxhash flag. Recomputing adds cycles to each packet when RPS is enabled or the packet arrives over a tunnel. A comparison of 200x TCP_STREAM between two servers running unmodified netnext with rxhash computation in hardware vs software (using ethtool -K eth0 rxhash [on\|off]) shows how much time is spent in __skb_get_rxhash in this worst case: 0.03% swapper [kernel.kallsyms] [k] __skb_get_rxhash 0.03% swapper [kernel.kallsyms] [k] __skb_get_rxhash 0.05% swapper [kernel.kallsyms] [k] __skb_get_rxhash With 200x TCP_RR it increases to 0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash 0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash 0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash I considered having the patch explicitly skips recomputation when it knows that it will not improve the hash (TCP over IPv4), but that conditional complicates code without saving many cycles in practice, because it has to take place after flow dissector. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-31 13:56:40 -04:00
Eric Dumazet	acb600def2	net: remove skb recycling Over time, skb recycling infrastructure got litle interest and many bugs. Generic rx path skb allocation is now using page fragments for efficient GRO / TCP coalescing, and recyling a tx skb for rx path is not worth the pain. Last identified bug is that fat skbs can be recycled and it can endup using high order pages after few iterations. With help from Maxime Bizon, who pointed out that commit `87151b8689` (net: allow pskb_expand_head() to get maximum tailroom) introduced this regression for recycled skbs. Instead of fixing this bug, lets remove skb recycling. Drivers wanting really hot skbs should use build_skb() anyway, to allocate/populate sk_buff right before netif_receive_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-07 00:40:54 -04:00
Eric Dumazet	47061bc440	net: skb_share_check() should use consume_skb() In order to avoid false drop_monitor indications, we should call consume_skb() if skb_clone() was successful. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-04 01:27:57 -07:00
Mel Gorman	0614002bb5	netvm: propagate page->pfmemalloc from skb_alloc_page to skb The skb->pfmemalloc flag gets set to true iff during the slab allocation of data in __alloc_skb that the the PFMEMALLOC reserves were used. If page splitting is used, it is possible that pages will be allocated from the PFMEMALLOC reserve without propagating this information to the skb. This patch propagates page->pfmemalloc from pages allocated for fragments to the skb. It works by reintroducing and expanding the skb_alloc_page() API to take an skb. If the page was allocated from pfmemalloc reserves, it is automatically copied. If the driver allocates the page before the skb, it should call skb_propagate_pfmemalloc() after the skb is allocated to ensure the flag is copied properly. Failure to do so is not critical. The resulting driver may perform slower if it is used for swap-over-NBD or swap-over-NFS but it should not result in failure. [davem@davemloft.net: API rename and consistency] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:46 -07:00
Mel Gorman	c48a11c7ad	netvm: propagate page->pfmemalloc to skb The skb->pfmemalloc flag gets set to true iff during the slab allocation of data in __alloc_skb that the the PFMEMALLOC reserves were used. If the packet is fragmented, it is possible that pages will be allocated from the PFMEMALLOC reserve without propagating this information to the skb. This patch propagates page->pfmemalloc from pages allocated for fragments to the skb. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:46 -07:00
Mel Gorman	c93bdd0e03	netvm: allow skb allocation to use PFMEMALLOC reserves Change the skb allocation API to indicate RX usage and use this to fall back to the PFMEMALLOC reserve when needed. SKBs allocated from the reserve are tagged in skb->pfmemalloc. If an SKB is allocated from the reserve and the socket is later found to be unrelated to page reclaim, the packet is dropped so that the memory remains available for page reclaim. Network protocols are expected to recover from this packet loss. [a.p.zijlstra@chello.nl: Ideas taken from various patches] [davem@davemloft.net: Use static branches, coding style corrections] [sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:46 -07:00
Michael S. Tsirkin	a353e0ce0f	skbuff: add an api to orphan frags Many places do if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY)) skb_copy_ubufs(skb, gfp_mask); to copy and invoke frag destructors if necessary. Add an inline helper for this. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
Eric Dumazet	62b1a8ab9b	net: remove skb_orphan_try() Orphaning skb in dev_hard_start_xmit() makes bonding behavior unfriendly for applications sending big UDP bursts : Once packets pass the bonding device and come to real device, they might hit a full qdisc and be dropped. Without orphaning, the sender is automatically throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming sk_sndbuf is not too big) We could try to defer the orphaning adding another test in dev_hard_start_xmit(), but all this seems of little gain, now that BQL tends to make packets more likely to be parked in Qdisc queues instead of NIC TX ring, in cases where performance matters. Reverts commits : `fc6055a5ba` net: Introduce skb_orphan_try() `87fd308cfc` net: skb_tx_hash() fix relative to skb_orphan_try() and removes SKBTX_DRV_NEEDS_SK_REF flag Reported-and-bisected-by: Jean-Michel Hautbois <jhautbois@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-15 15:30:15 -07:00
Felix Fietkau	617c8c1123	skb: avoid unnecessary reallocations in __skb_cow At the beginning of __skb_cow, headroom gets set to a minimum of NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not cloned and the headroom is just below NET_SKB_PAD, but still more than the amount requested by the caller. This was showing up frequently in my tests on VLAN tx, where vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN). Locally generated packets should have enough headroom, and for forward paths, we already have NET_SKB_PAD bytes of headroom, so we don't need to add any extra space here. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-29 17:30:08 -04:00
Eric Dumazet	bad43ca832	net: introduce skb_try_coalesce() Move tcp_try_coalesce() protocol independent part to skb_try_coalesce(). skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly, to build optimized skbs (less sk_buff, and possibly less 'headers') skb_try_coalesce() is zero copy, unless the copy can fit in destination header (its a rare case) kfree_skb_partial() is also moved to net/core/skbuff.c and exported, because IPv6 will need it in patch (ipv6: use skb coalescing in reassembly). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-19 18:34:57 -04:00
Eric Dumazet	6f532612cc	net: introduce netdev_alloc_frag() Fix two issues introduced in commit `a1c7fff7e1` ( net: netdev_alloc_skb() use build_skb() ) - Must be IRQ safe (non NAPI drivers can use it) - Must not leak the frag if build_skb() fails to allocate sk_buff This patch introduces netdev_alloc_frag() for drivers willing to use build_skb() instead of __netdev_alloc_skb() variants. Factorize code so that : __dev_alloc_skb() is a wrapper around __netdev_alloc_skb(), and dev_alloc_skb() a wrapper around netdev_alloc_skb() Use __GFP_COLD flag. Almost all network drivers now benefit from skb->head_frag infrastructure. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-18 13:31:25 -04:00
David S. Miller	0d6c4a2e46	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/e1000e/param.c drivers/net/wireless/iwlwifi/iwl-agn-rx.c drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c drivers/net/wireless/iwlwifi/iwl-trans.h Resolved the iwlwifi conflict with mainline using 3-way diff posted by John Linville and Stephen Rothwell. In 'net' we added a bug fix to make iwlwifi report a more accurate skb->truesize but this conflicted with RX path changes that happened meanwhile in net-next. In e1000e a conflict arose in the validation code for settings of adapter->itr. 'net-next' had more sophisticated logic so that logic was used. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-07 23:35:40 -04:00
Alexander Duyck	ec47ea8247	skb: Add inline helper for getting the skb end offset from head With the recent changes for how we compute the skb truesize it occurs to me we are probably going to have a lot of calls to skb_end_pointer - skb->head. Instead of running all over the place doing that it would make more sense to just make it a separate inline skb_end_offset(skb) that way we can return the correct value without having gcc having to do all the optimization to cancel out skb->head - skb->head. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-06 13:13:19 -04:00
Alexander Duyck	3a7c1ee4ab	skb: Add skb_head_is_locked helper function This patch adds support for a skb_head_is_locked helper function. It is meant to be used any time we are considering transferring the head from skb->head to a paged frag. If the head is locked it means we cannot remove the head from the skb so it must be copied or we must take the skb as a whole. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-03 13:18:37 -04:00
Eric Dumazet	d961949660	net: fix two typos in skbuff.h fix kernel doc typos in function names Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-01 09:40:19 -04:00
Eric Dumazet	18d0700024	net: skb_peek()/skb_peek_tail() cleanups remove useless casts and rename variables for less confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-01 09:39:48 -04:00

1 2 3 4 5 ...

374 Commits