Commit Graph

441692 Commits

Author SHA1 Message Date
Tom Herbert
79e0f1c9f2 ipv6: Need to sock_put on csum error
Commit 4068579e1e ("net: Implmement
RFC 6936 (zero RX csums for UDP/IPv6)") introduced zero checksums
being allowed for IPv6, but in the case that a socket disallows a
zero checksum on RX we need to sock_put.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 23:17:16 -04:00
David S. Miller
5a50a92784 Merge branch 'tipc-next'
Ying Xue says:

====================
tipc: purge signal handler infrastructure

When we delay some actions to be executed in asynchronous contexts,
these usually add unnecessary code complexities, and make their
behaviours unpredictable and indeterministic. Moreover, as the signal
handler infrastructure is first stopped when tipc module is removed,
this may cause some potential risks for us. For instance, although
signal handler is already stopped, some tipc components still submit
signal requests to signal handler infrastructure, which may lead to
some resources not to be released or freed correctly.

So the series aims to convert all actions being performed in tasklet
context asynchronously with interface provided by signal handler
infrastructure to be executed synchronously, thereby deleting the
whole infrastructure of signal handler.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:54 -04:00
Ying Xue
52ff872055 tipc: purge signal handler infrastructure
In the previous commits of this series, we removed all asynchronous
actions which were based on the tasklet handler - "tipc_k_signal()".

So the moment has now come when we can completely remove the tasklet
handler infrastructure. That is done with this commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:45 -04:00
Ying Xue
3f5a12bd9f tipc: avoid to asynchronously reset all links
Postpone the actions of resetting all links until after bclink
lock is released, avoiding to asynchronously reset all links.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:45 -04:00
Ying Xue
eb8b00f5f2 tipc: convert allocations of global variables associated with bclink
Convert allocations of global variables associated with bclink from
static way to dynamical way for the convenience of bclink instance
initialisation. Meanwhile, this also helps TIPC support name space
in the future easily.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:45 -04:00
Ying Xue
d69afc90b8 tipc: define new functions to operate bc_lock
As we are going to do more jobs when bc_lock is released, the two
operations of holding/releasing the lock should be encapsulated with
functions. In addition, we move bc_lock spin lock into tipc_bclink
structure avoiding to define the global variable.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:44 -04:00
Ying Xue
ca0c42732c tipc: avoid to asynchronously deliver name tables to peer node
Postpone the actions of delivering name tables until after node
lock is released, avoiding to do it under asynchronous context.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:44 -04:00
Ying Xue
9d56194968 tipc: remove TIPC_NAMES_GONE node flag
Since previously what all publications pertaining to the lost node
were removed from name table was finished in tasklet context
asynchronously, we need to TIPC_NAMES_GONE flag indicating whether
the node cleanup work is finished or not. But now as the cleanup work
has been finished when node lock is released, the flag becomes
meaningless for us.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:44 -04:00
Ying Xue
9db9fdd198 tipc: avoid to asynchronously notify subscriptions
Postpone the actions of notifying subscriptions until after node lock
is released, avoiding to asynchronously execute registered handlers
when node is lost.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:44 -04:00
Ying Xue
10f465c496 tipc: rename setup_blocked variable of node struct to flags
Rename setup_blocked variable of node struct to a more common
name called "flags", which will be used to represent kinds of
node states.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:44 -04:00
Ying Xue
486f930ac5 tipc: adjust order of variables in tipc_node structure
Move more frequently used variables up to the head of tipc_node
structure, hopefully improving a bit performance.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:44 -04:00
Ying Xue
5356f3d7d4 tipc: always use tipc_node_lock() to hold node lock
Although we obtain node lock with tipc_node_lock() in most time, there
are still places where we directly use native spin lock interface
to grab node lock. But as we will do more jobs in the future when node
lock is released, we should ensure that tipc_node_lock() is always
called when node lock is taken.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 17:26:43 -04:00
Alexey Charkov
5b579e212f net: via-rhine: Convert #ifdef USE_MMIO to a runtime flag
This introduces another flag in 'quirks' to replace the preprocessor
define (USE_MMIO) used to indicate whether the device needs a
separate enable routine to operate in MMIO mode.

All of the currently known platform Rhine cores operate in MMIO
mode by default, and on PCI it is preferred over PIO for performance
reasons. However, a comment in code suggests that some (?) early
Rhine cores only work in PIO mode, so they should not be switched
to MMIO.

Enabling MMIO on PCI is still triggered by the same Kconfig option
to avoid breaking user configs needlessly, but this can be changed
going forward towards automatic runtime detection in case a list of
PIO-only Rhine revisions can be compiled.

This also fixes a couple of compiler warnings detected by Fengguang
Wu's test bot (!USE_MMIO case):

   drivers/net/ethernet/via/via-rhine.c: In function 'rhine_init_one_pci':
   drivers/net/ethernet/via/via-rhine.c:1108:1: warning: label 'err_out_unmap' defined but not used [-Wunused-label]
    err_out_unmap:
    ^
   drivers/net/ethernet/via/via-rhine.c:1022:6: warning: unused variable 'i' [-Wunused-variable]
     int i, rc;
         ^
   drivers/net/ethernet/via/via-rhine.c:916:22: warning: 'quirks' may be used uninitialized in this function [-Wmaybe-uninitialized]
     phy_id = rp->quirks & rqIntPHY ? 1 : 0;
                         ^
   drivers/net/ethernet/via/via-rhine.c:1026:6: note: 'quirks' was declared here
     u32 quirks;
         ^

Signed-off-by: Alexey Charkov <alchark@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:36:41 -04:00
WANG Cong
07c8e35a38 ipv6: remove unused function ipv6_inherit_linklocal()
It is no longer used after commit e837735ec4
(ip6_tunnel: ensure to always have a link local address).

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:30:20 -04:00
David S. Miller
c020b9d420 Merge branch 'inet_csums'
Tom Herbert says:

====================
net: Checksum offload changes

I am working on overhauling RX checksum offload. Goals of this effort
are:

- Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
- Preserve CHECKSUM_COMPLETE through encapsulation layers
- Don't do skb_checksum more than once per packet
- Unify GRO and non-GRO csum verification as much as possible
- Unify the checksum functions (checksum_init)
- Simply code

What is in this first patch set:

- Create a common "checksum_init" function which is called from
  TCPv{4,6} and UDPv{4,6}
- Add some for RFC6936, UDP/IPv6 zero checksums
- Add architecture support for csum_add and provide implementations
  for x86_64 and Sparc 32 and 64 bit (please test the latter)

Please review carefully and test if possible, mucking with basic
checksum functions is always a little precarious :-)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:28:52 -04:00
Tom Herbert
4068579e1e net: Implmement RFC 6936 (zero RX csums for UDP/IPv6)
RFC 6936 relaxes the requirement of RFC 2460 that UDP/IPv6 packets which
are received with a zero UDP checksum value must be dropped. RFC 6936
allows zero checksums to support tunnels over UDP.

When sk_no_check is set we allow on a socket we allow a zero IPv6
UDP checksum. This is for both sending zero checksum and accepting
a zero checksum on receive.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:30 -04:00
Tom Herbert
e4f45b7f40 net: Call skb_checksum_init in IPv6
Call skb_checksum_init instead of private functions.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:30 -04:00
Tom Herbert
ed70fcfcee net: Call skb_checksum_init in IPv4
Call skb_checksum_init instead of private functions.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:30 -04:00
Tom Herbert
76ba0aae67 net: Generalize checksum_init functions
Create a general __skb_checksum_validate function (actually a
macro) to subsume the various checksum_init functions. This
function can either init the checksum, or do the full validation
(logically checksum_init+skb_check_complete)-- a flag specifies
if full vaidation is performed. Also, there is a flag to the function
to indicate that zero checksums are allowed (to support optional
UDP checksums).

Added several stub functions for calling __skb_checksum_validate.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:29 -04:00
Tom Herbert
20fce54fa7 sparc: csum_add for Sparc
versions.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:29 -04:00
Tom Herbert
4405b4d635 net: Change x86_64 add32_with_carry to allow memory operand
Note add32_with_carry(a, b) is suboptimal, as it forces
a and b in registers.

b could be a memory or a register operand.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:29 -04:00
Tom Herbert
a278534406 x86_64: csum_add for x86_64
Add csum_add function for x86_64.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:29 -04:00
Tom Herbert
07064c6e02 net: Allow csum_add to be provided in arch
csum_add is really nothing more then add-with-carry which
can be implemented efficiently in some architectures.
Allow architecture to define this protected by HAVE_ARCH_CSUM_ADD.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 15:26:29 -04:00
David S. Miller
2ad0649687 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-05-02

Please pull this batch of updates intended for the 3.16 stream...

For the mac80211 bits, Johannes says:

"In this round we have a large number of small features and
improvements from people too numerous to list here. The only really
bit thing is Michał and Luca's CSA work (including changing how
interface combination verification is done)."

For the Bluetooth bits, Gustavo says:

"Here goes some patches for the -next release. There is nothing
really special for this pull request, just a bunch of refactors,
fixes and clean ups."

For the ath10k/ath6kl bits, Kalle says:

"For ath6kl Kalle fixed a bunch of checkpatch warnings.

In ath10k we had more changes, major ones being:

* fix memory allocation failures after a firmware crash (Michal)

* some rework of DFS configuration to enable it correctly in all cases
  (Michal)

* add a new firmware crash option to make it possible to crash 10.1
  firmware for testing purposes (Marek P)

* fix RTS/CTS protection in certain cases (Marek K)

* fix wrong RSSI and rate reporting in some cases (Janusz)

* fix firmware stats reporting (Chun, Ben & Bartosz)"

For the iwlwifi bits, Emmanuel says:

"I have here a bunch of unrelated things. I disabled support for
-7.ucode which means that I can removed a lot of code. Eliad has
a brand new feature: we reduce the Tx power when the link allows -
this reduces our power consumption. The regular changes in power and
scan area. One interesting thing though is the patches from Johannes,
we have now GRO which allows to increase our throughput in TCP Rx. The
main advantage is that it reduces the number of TCP Acks - these TCP
Acks are completely useless when we are using A-MPDU since the first
packet of the A-MPDU generates a TCP Ack which is made obsolete by
the next packets."

Along with that, there are a variety of updates to b43, mwifiex,
rtl8180 and wil6210 drivers and a handful of other updates here
and there.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:36:26 -04:00
David S. Miller
05f4640979 Merge branch 'am437x'
George Cherian says:

====================
The series adds CPTS support for AM4372.

Patch 1 - DT changes w.r.t clock changes for AM33xx.
Patch 2 - CPTS clock name harcoding in the driver is removed.
	  Easier to pass the clock name from dt rather than hardcoding in driver.
	  Also in prepration for DRA7x CPTS support.
Patch 3 - Enable the CPTS support for both DRA7x and AM4372 in the driver.
Patch 4 - Enable the Annexe F for L2 PTP for AM437x and DRA7x.
Patch 5 - Change the default clocksource to dpll_core_m5
Patch 6 - DT changes for AM4372.

v1 -> v2
	Patch 1 and 2 Re-ordering.
	Seperate TS_BITS define for Hw version V2 and V3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:19:00 -04:00
George Cherian
de21b26e51 ARM: dts: am4372: Add clock names for cpsw and cpts
Add CPSW fck and CPTS clock and clock names for AM4372

Signed-off-by: George Cherian <george.cherian@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:18:50 -04:00
George Cherian
f9786f419d ARM: AM43xx: clk: Change the cpts ref clock source to dpll_core_m5 clk
cpsw_cpts_rft_clk has got the choice of 3 clocksources
 -dpll_core_m4_ck
 -dpll_core_m5_ck
 -dpll_disp_m2_ck

By default dpll_core_m4_ck is selected, witn this as clock
source the CPTS doesnot work properly. It gives clockcheck errors
while running PTP.

 clockcheck: clock jumped backward or running slower than expected!

By selecting dpll_core_m5_ck as the clocksource fixes this issue.
In AM335x dpll_core_m5_ck is the default clocksource.

Signed-off-by: George Cherian <george.cherian@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:18:50 -04:00
George Cherian
09c5537246 drivers: net: cpsw: Enable Annexe F Time sync
Enable the Annex F Time Sync explicitly for DRA7x and AM4372.
With this enabled the L2 PTP is working.

while at that rename TS_BIT8 to TS_TTL_NONZERO

Signed-off-by: George Cherian <george.cherian@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:18:50 -04:00
George Cherian
f7d403cb38 drivers: net: cpsw: Enable CPTS for DRA7xx and AM4372
Enable cpts hardware time stamping for Dra7xx and AM4372.
This enables PTPv2 for DRA7xx and AM4372.

Signed-off-by: George Cherian <george.cherian@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:18:50 -04:00
George Cherian
d0415e7cc0 drivers: net: cpts: Remove hardcoded clock name for CPTS
CPTS refclk name is hardcoded, which makes it fail in case of DRA7x
Remove the hardcoded clock name for CPTS refclk and get the same from DT.

Signed-off-by: George Cherian <george.cherian@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:18:50 -04:00
George Cherian
0987a6ef94 ARM: dts: am33xx: Add clock names for cpsw and cpts
Add CPSW fck and CPTS clock and clock names

Signed-off-by: George Cherian <george.cherian@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:18:49 -04:00
Roopa Prabhu
56bfa7ee7c unregister_netdevice : move RTM_DELLINK to until after ndo_uninit
This patch fixes ordering of rtnl notifications during unregister_netdevice
by moving RTM_DELLINK notification to until after ndo_uninit.

The problem was seen with unregistering bond netdevices.

bond ndo_uninit callback generates a few RTM_NEWLINK notifications for
NETDEV_CHANGEADDR and NETDEV_FEAT_CHANGE. This is seen mostly when the
bond is deleted with slaves still enslaved to the bond.

During unregister netdevice (rollback_registered_many to be specific)
bond ndo_uninit is called after RTM_DELLINK notification goes out.
This results in userspace seeing RTM_DELLINK followed by a couple of
RTM_NEWLINK's.

In userspace problem was seen with libnl. libnl cache deletes the bond
when it sees RTM_DELLINK and re-adds the bond with the following
RTM_NEWLINK. Resulting in a stale bond entry in libnl cache when the kernel
has already deleted the bond.

This patch has been tested for bond, bridges and vlan devices.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-05 13:11:36 -04:00
David S. Miller
f4a7b5eec2 Merge branch 'filter-cleanups'
Daniel Borkmann says:

====================
BPF cleanups

v3->v4:
 - Sorry, noticed and fixed a typo in patch 3, rest as is
v2->v3:
 - Included Dave's feedback for unsigned long type in patch 3
 - Patch 1 and patch 2 unchanged since v1, dropped other
   two for now
v1->v2:
 - Only changed patch 5 as to suggestion from Alexei
 - Rest is the same
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-04 19:46:59 -04:00
Daniel Borkmann
eb9672f4a1 net: filter: misc/various cleanups
This contains only some minor misc cleanpus. We can spare us the
extra variable declaration in __skb_get_pay_offset(), the cast in
__get_random_u32() is rather unnecessary and in __sk_migrate_realloc()
we can remove the memcpy() and do a direct assignment of the structs.
Latter was suggested by Fengguang Wu found with coccinelle. Also,
remaining pointer casts of long should be unsigned long instead.

Suggested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-04 19:46:31 -04:00
Daniel Borkmann
30743837dd net: filter: make register naming more comprehensible
The current code is a bit hard to parse on which registers can be used,
how they are mapped and all play together. It makes much more sense to
define this a bit more clearly so that the code is a bit more intuitive.
This patch cleans this up, and makes naming a bit more consistent among
the code. This also allows for moving some of the defines into the header
file. Clearing of A and X registers in __sk_run_filter() do not get a
particular register name assigned as they have not an 'official' function,
but rather just result from the concrete initial mapping of old BPF
programs. Since for BPF helper functions for BPF_CALL we already use
small letters, so be consistent here as well. No functional changes.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-04 19:46:31 -04:00
Daniel Borkmann
5bcfedf06f net: filter: simplify label names from jump-table
This patch simplifies label naming for the BPF jump-table.
When we define labels via DL(), we just concatenate/textify
the combination of instruction opcode which consists of the
class, subclass, word size, target register and so on. Each
time we leave BPF_ prefix intact, so that e.g. the preprocessor
generates a label BPF_ALU_BPF_ADD_BPF_X for DL(BPF_ALU, BPF_ADD,
BPF_X) whereas a label name of ALU_ADD_X is much more easy
to grasp. Pure cleanup only.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-04 19:46:31 -04:00
Alexei Starovoitov
dfee07ccef net: filter: doc: expand and improve BPF documentation
In particular, this patch tries to clarify internal BPF calling
convention and adds internal BPF examples, JIT guide, use cases.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-04 19:40:00 -04:00
Eric Dumazet
249015515f tcp: remove in_flight parameter from cong_avoid() methods
Commit e114a710aa ("tcp: fix cwnd limited checking to improve
congestion control") obsoleted in_flight parameter from
tcp_is_cwnd_limited() and its callers.

This patch does the removal as promised.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 19:23:07 -04:00
Eric Dumazet
e114a710aa tcp: fix cwnd limited checking to improve congestion control
Yuchung discovered tcp_is_cwnd_limited() was returning false in
slow start phase even if the application filled the socket write queue.

All congestion modules take into account tcp_is_cwnd_limited()
before increasing cwnd, so this behavior limits slow start from
probing the bandwidth at full speed.

The problem is that even if write queue is full (aka we are _not_
application limited), cwnd can be under utilized if TSO should auto
defer or TCP Small queues decided to hold packets.

So the in_flight can be kept to smaller value, and we can get to the
point tcp_is_cwnd_limited() returns false.

With TCP Small Queues and FQ/pacing, this issue is more visible.

We fix this by having tcp_cwnd_validate(), which is supposed to track
such things, take into account unsent_segs, the number of segs that we
are not sending at the moment due to TSO or TSQ, but intend to send
real soon. Then when we are cwnd-limited, remember this fact while we
are processing the window of ACKs that comes back.

For example, suppose we have a brand new connection with cwnd=10; we
are in slow start, and we send a flight of 9 packets. By the time we
have received ACKs for all 9 packets we want our cwnd to be 18.
We implement this by setting tp->lsnd_pending to 9, and
considering ourselves to be cwnd-limited while cwnd is less than
twice tp->lsnd_pending (2*9 -> 18).

This makes tcp_is_cwnd_limited() more understandable, by removing
the GSO/TSO kludge, that tried to work around the issue.

Note the in_flight parameter can be removed in a followup cleanup
patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-02 17:54:35 -04:00
Stéphane Graber
4e8bbb819d net: Allow tc changes in user namespaces
This switches a few remaining capable(CAP_NET_ADMIN) to ns_capable so
that root in a user namespace may set tc rules inside that namespace.

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-02 17:43:25 -04:00
David S. Miller
3c4de5a0a3 Merge branch 'davinci_mdio'
Grygorii Strashko says:

====================
introduce devm_mdiobus_alloc/free and clean up davinci mdio

Introduce a resource managed devm_mdiobus_alloc[_size]()/devm_mdiobus_free()
to automatically clean up MDIO bus alocations made by MDIO drivers,
thus leading to simplified MDIO drivers code.

Clean up Davinci MDIO driver and use new devm API.

Changes in v3:
- added devm_mdiobus_alloc_size() and
  devm_mdiobus_alloc() converted to be just a simple wrapper now.

Changes in v2:
- minor comments taken into account
- additional patches added for cleaning up Davinci MDIO driver
====================

Acked-by: Santosh Shilimkar<santosh.shilimkar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-02 16:17:18 -04:00
Grygorii Strashko
9728e1a7d3 net: davinci_mdio: simplify IO memory mapping
Simplify IO memory mapping by using devm_ioremap_resource()
which will do all errors handling and reporting for us.

Acked-and-tested-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-02 16:16:26 -04:00
Grygorii Strashko
4e8b4c802c net: davinci_mdio: drop pinctrl_pm_select_default_state from probe
The "default" pinctrl state is set by Drivers core now before
calling the driver's probe.
Hence, it's safe to drop pinctrl_pm_select_default_state() call
from Davinci mdio driver probe.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Acked-and-tested-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-02 16:16:26 -04:00
Grygorii Strashko
50d0636eef net: davinci_mdio: use devm_* api
Use devm_* API for memory allocation and to get device's clock
to simplify driver's code.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-and-tested-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-02 16:16:26 -04:00
Grygorii Strashko
6d48f44b7b mdio_bus: implement devm_mdiobus_alloc/devm_mdiobus_free
Add a resource managed devm_mdiobus_alloc[_size]()/devm_mdiobus_free()
to automatically clean up MDIO bus alocations made by MDIO drivers,
thus leading to simplified MDIO drivers code.

Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-and-tested-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-02 16:16:26 -04:00
Alexey Charkov
ca8b6e04bc net: via-rhine: Drop revision property, use quirks instead
This adds two new flags to quirks and thus removes the need to carry
revision in rhine_private. As a result, the init logic is simplified
a bit.

This also fixes a compiler warning in OF code on 64bit due to pointer
casting:

        drivers/net/ethernet/via/via-rhine.c: In function ‘rhine_init_one_platform’:
        drivers/net/ethernet/via/via-rhine.c:1132:13: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
          revision = (u32)match->data;
                     ^

That code was added in commit 2d283862dc
("net: via-rhine: add OF bus binding").

Tested in platform configuration on a VIA WM8950 APC Rock board.

Reported-by: Jan Moskyto Matejka <mq@suse.cz>
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-02 15:55:42 -04:00
John W. Linville
406a94d7fa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-05-02 13:47:50 -04:00
KY Srinivasan
c25aaf814a hyperv: Enable sendbuf mechanism on the send path
We send packets using a copy-free mechanism (this is the Guest to Host transport
via VMBUS). While this is obviously optimal for large packets,
it may not be optimal for small packets. Hyper-V host supports
a second mechanism for sending packets that is "copy based". We implement that
mechanism in this patch.

In this version of the patch I have addressed a comment from David Miller.

With this patch (and all of the other offload and VRSS patches), we are now able
to almost saturate a 10G interface between Linux VMs on Hyper-V
on different hosts - close to  9 Gbps as measured via iperf.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-30 13:48:46 -04:00
Dinh Nguyen
cc80ee1360 net: stmmac: set phy to use polling by default
mii_irq[] array is never initialized anywhere in the driver, thus mii_irq[]
will always equate to zero. So, for the case where the PHY does not have an
irq, we should use PHY_POLL for that situation.

Signed-off-by: Dinh Nguyen <dinguyen@altera.com>
Tested-by: Vince Bridgers <vbridger@altera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-30 13:31:26 -04:00
Zhangjie \(HZ\)
6ebbc1a638 virtio-net: Set needed_headroom for virtio-net when VIRTIO_F_ANY_LAYOUT is true
This is a small supplement for commit e7428e95a0
("virtio-net: put virtio-net header inline with data"). TCP packages have
enough room to put virtio-net header in, but UDP packages do not. By
setting dev->needed_headroom for virtio-net device, UDP packages could have
enough room.

For UDP packages, sk_buff is alloced in fun __ip_append_data. The size is
"alloclen + hh_len + 15", and "hh_len = LL_RESERVED_SPACE(rt-dst.dev);".
The Macro is defined as follows:
#define LL_RESERVED_SPACE(dev) \
     ((((dev)->hard_header_len+(dev)->needed_headroom)\
     &~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
By default, for UDP packages, after skb is allocated, only 16 bytes
reserved. And 2 bytes remained after mac header is set. That is not enough
to put virtio-net header in. If we set dev->needed_headroom to 12 or 10
(according to mergeable_rx_bufs is on or off ), more room can be reserved.
Then there is enough room for UDP packages to put the header in.

test result list as below:
guest and host: suse11sp3, netperf, intel 2.4GHz
+-------+---------+---------+---------+---------+
|       |   old             |   new             |
+-------+---------+---------+---------+---------+
| UDP   |  Gbit/s | pps     |  Gbit/s | pps     |
| 64    |  0.57   | 692232  |  0.61   | 742420  |
| 256   |  1.60   | 686860  |  1.71   | 733331  |
| 512   |  2.92   | 674576  |  3.07   | 710446  |
| 1024  |  4.99   | 598977  |  5.17   | 620821  |
| 1460  |  5.68   | 483757  |  7.16   | 610519  |
| 4096  |  6.98   | 637468  |  7.21   | 658471  |
+-------+---------+---------+---------+---------+

Signed-off-by: Zhang Jie <zhangjie14@huawei.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-30 13:31:26 -04:00