Commit Graph

191011 Commits

Author SHA1 Message Date
Vladislav Zolotarov
8eb5a20ccc bnx2x: use mask in test_registers() to avoid parity error
Properly mask the value to be written to the register (according to the register size) during the self-test.
Otherwise immediate parity error would be generated.

Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 13:17:10 -07:00
Vladislav Zolotarov
1ac218c83f bnx2x: Fixed MSI-X enabling flow
Try to enable less MSI-X vectors if initial request has failed.

Author: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 13:17:09 -07:00
Vladislav Zolotarov
dea7aab192 bnx2x: Added new statistics
Added total_mcast/bcast_pkts_transmitted statistics.

Author: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 13:17:09 -07:00
Vladislav Zolotarov
cdaa7cb84b bnx2x: White spaces
White spaces, code readability and prints.

Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 13:17:08 -07:00
Vladislav Zolotarov
2145a92057 bnx2x: Protect code with NOMCP
Don't run code that can't be run if MCP is not present.
This will prevent NULL pointer dereferencing.

Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 13:17:07 -07:00
Vladislav Zolotarov
02e3c6cb3f bnx2x: Increase DMAE max write size for 57711
Increase DMAE max write size for 57711 to the maximum allowed value.

Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 13:17:07 -07:00
Vladislav Zolotarov
34f24c7fc0 bnx2x: Use VPD-R V0 entry to display firmware revision
Author: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 13:17:06 -07:00
Vladislav Zolotarov
72fd071833 bnx2x: Parity errors handling for 57710 and 57711
This patch introduces the parity errors handling code for 57710 and 57711 chips.

HW is configured to stop all DMA transactions to the host and sending packets to the network
once parity error is detected, which is meant to prevent silent data corruption.
At the same time HW generates the attention interrupt to every function of the device where parity
has been detected so that driver can start the recovery flow.

The recovery is actually resetting the chip and restarting the driver on all active functions
of the chip where the parity error has been reported.

Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 13:17:05 -07:00
Tyler Hicks
9f37622f89 eCryptfs: Turn lower lookup error messages into debug messages
Vaugue warnings about ENAMETOOLONG errors when looking up an encrypted
file name have caused many users to become concerned about their data.
Since this is a rather harmless condition, I'm moving this warning to
only be printed when the ecryptfs_verbosity module param is 1.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-04-19 14:42:18 -05:00
Tyler Hicks
3a8380c075 eCryptfs: Copy lower directory inode times and size on link
The timestamps and size of a lower inode involved in a link() call was
being copied to the upper parent inode.  Instead, we should be
copying lower parent inode's timestamps and size to the upper parent
inode.  I discovered this bug using the POSIX test suite at Tuxera.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-04-19 14:42:15 -05:00
Jeff Mahoney
133b8f9d63 ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode
Since tmpfs has no persistent storage, it pins all its dentries in memory
so they have d_count=1 when other file systems would have d_count=0.
->lookup is only used to create new dentries. If the caller doesn't
instantiate it, it's freed immediately at dput(). ->readdir reads
directly from the dcache and depends on the dentries being hashed.

When an ecryptfs mount is mounted, it associates the lower file and dentry
with the ecryptfs files as they're accessed. When it's umounted and
destroys all the in-memory ecryptfs inodes, it fput's the lower_files and
d_drop's the lower_dentries. Commit 4981e081 added this and a d_delete in
2008 and several months later commit caeeeecf removed the d_delete. I
believe the d_drop() needs to be removed as well.

The d_drop effectively hides any file that has been accessed via ecryptfs
from the underlying tmpfs since it depends on it being hashed for it to
be accessible. I've removed the d_drop on my development node and see no
ill effects with basic testing on both tmpfs and persistent storage.

As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs
BUGs on umount. This is due to the dentries being unhashed.
tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop
the reference pinning the dentry. It skips unhashed and negative dentries,
but shrink_dcache_for_umount_subtree doesn't. Since those dentries
still have an elevated d_count, we get a BUG().

This patch removes the d_drop call and fixes both issues.

This issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=567887

Reported-by:  Árpád Bíró <biroa@demasz.hu>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-04-19 14:42:13 -05:00
Christian Pulvermacher
cfce08c6bd ecryptfs: fix error code for missing xattrs in lower fs
If the lower file system driver has extended attributes disabled,
ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP.
This breaks execution of programs in the ecryptfs mount, since the
kernel expects the latter error when checking for security
capabilities in xattrs.

Signed-off-by: Christian Pulvermacher <pulvermacher@gmx.de>
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-04-19 14:42:09 -05:00
Tyler Hicks
3a60a1686f eCryptfs: Decrypt symlink target for stat size
Create a getattr handler for eCryptfs symlinks that is capable of
reading the lower target and decrypting its path.  Prior to this patch,
a stat's st_size field would represent the strlen of the encrypted path,
while readlink() would return the strlen of the decrypted path.  This
could lead to confusion in some userspace applications, since the two
values should be equal.

https://bugs.launchpad.net/bugs/524919

Reported-by: Loïc Minier <loic.minier@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-04-19 14:41:51 -05:00
Linus Torvalds
76e506a754 Fix ISDN/Gigaset build failure
Commit b91ecb00 ("gigaset: include cleanup cleanup") removed an implicit
sched.h inclusion that came in via slab.h, and caused various compile
problems as a result.

This should fix it.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-19 11:53:17 -07:00
Linus Torvalds
85341c6136 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: Make RCU lockdep check the lockdep_recursion variable
  rcu: Update docs for rcu_access_pointer and rcu_dereference_protected
  rcu: Better explain the condition parameter of rcu_dereference_check()
  rcu: Add rcu_access_pointer and rcu_dereference_protected
2010-04-19 08:35:47 -07:00
Linus Torvalds
375db4810b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  gigaset: include cleanup cleanup
  packet : remove init_net restriction
  WAN: flush tx_queue in hdlc_ppp to prevent panic on rmmod hw_driver.
  ip: Fix ip_dev_loopback_xmit()
  net: dev_pick_tx() fix
  fib: suppress lockdep-RCU false positive in FIB trie.
  tun: orphan an skb on tx
  forcedeth: fix tx limit2 flag check
  iwlwifi: work around bogus active chains detection
2010-04-19 07:27:45 -07:00
Linus Torvalds
73c6c7fbb7 Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: add FireMV 2400 PCI ID.
  drm/radeon/kms: allow R500 regs VAP_ALT_NUM_VERTICES and VAP_INDEX_OFFSET
  drivers/gpu/radeon: Add MSPOS regs to safe list.
  drm/radeon/kms: disable the tv encoder when tv/cv is not in use
  drm/radeon/kms: adjust pll settings for tv
  drm/radeon/kms: fix tv dac conflict resolver
  drm/radeon/kms/evergreen: don't enable hdmi audio stuff
  drm/radeon/kms/atom: fix dual-link DVI on DCE3.2/4.0
  drm/radeon/kms: fix rs600 tlb flush
  drm/radeon/kms: print GPU family and device id when loading
  drm/radeon/kms: fix calculation of mipmapped 3D texture sizes
  drm/radeon/kms: only change mode when coherent value changes.
  drm/radeon/kms: more atom parser fixes (v2)
2010-04-19 07:27:06 -07:00
Linus Torvalds
eb3e5cce2b Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 5974/1: arm/mach-at91 Makefile: remove two blanks.
  ARM: 6052/1: kdump: make kexec work in interrupt context
  ARM: 6051/1: VFP: preserve the HW context when calling signal handlers
  ARM: 6050/1: VFP: fix the SMP versions of vfp_{sync,flush}_hwstate
  ARM: 6007/1: fix highmem with VIPT cache and DMA
  ARM: 5975/1: AT91 slow-clock suspend: don't wait when turning PLLs off
2010-04-19 07:26:21 -07:00
Dan Carpenter
07a71415d5 pcmcia: fix error handling in cm4000_cs.c
In the original code we used -ENODEV as the number of bytes to
copy_to_user() and we didn't release the locks.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2010-04-19 16:04:13 +02:00
Dave Airlie
79b9517a33 drm/radeon/kms: add FireMV 2400 PCI ID.
This is an M24/X600 chip.

From RH# 581927

cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-04-19 18:53:10 +10:00
David S. Miller
6c94b1ee0c sparc64: Fix PREEMPT_ACTIVE value.
It currently overlaps the NMI bit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 01:30:51 -07:00
Paul E. McKenney
bc293d62b2 rcu: Make RCU lockdep check the lockdep_recursion variable
The lockdep facility temporarily disables lockdep checking by
incrementing the current->lockdep_recursion variable.  Such
disabling happens in NMIs and in other situations where lockdep
might expect to recurse on itself.

This patch therefore checks current->lockdep_recursion, disabling RCU
lockdep splats when this variable is non-zero.  In addition, this patch
removes the "likely()", as suggested by Lai Jiangshan.

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: David Miller <davem@davemloft.net>
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: eric.dumazet@gmail.com
LKML-Reference: <20100415195039.GA22623@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-19 08:37:19 +02:00
Marek Olšák
cae94b0ad9 drm/radeon/kms: allow R500 regs VAP_ALT_NUM_VERTICES and VAP_INDEX_OFFSET
[airlied: fix V_A_N_V to not be safe and fix check to make sure only r500
 - bump userspace version]

Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-04-19 14:17:02 +10:00
Corbin Simpson
f12eebb0ac drivers/gpu/radeon: Add MSPOS regs to safe list.
Permits MSAA and D3D-style rasterization.

[airlied: add rs600]

Signed-off-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-04-19 14:04:32 +10:00
Alex Deucher
d3a67a43b0 drm/radeon/kms: disable the tv encoder when tv/cv is not in use
Switching between TV and VGA caused VGA to break on some systems
since the TV encoder was left enabled when VGA was used.

fixes fdo bug 25520.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-04-19 13:52:52 +10:00
Alex Deucher
a1a4b23b66 drm/radeon/kms: adjust pll settings for tv
May fix fdo bug 26582.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-04-19 13:52:43 +10:00
Alex Deucher
08d075116d drm/radeon/kms: fix tv dac conflict resolver
On systems with the tv dac shared between DVI and TV,
we can only use the dac for one of the connectors.
However, when using a digital monitor on the DVI port,
you can use the dac for the TV connector just fine.
Check the use_digital status when resolving the conflict.

Fixes fdo bug 27649, possibly others.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-04-19 11:59:58 +10:00
Alex Deucher
16823d16f5 drm/radeon/kms/evergreen: don't enable hdmi audio stuff
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-04-19 11:59:41 +10:00
Alex Deucher
b317a9ce22 drm/radeon/kms/atom: fix dual-link DVI on DCE3.2/4.0
Got broken during the evergreen merge.
Fixes fdo bug 27001.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-04-19 11:59:20 +10:00
Jerome Glisse
30f69f3fb2 drm/radeon/kms: fix rs600 tlb flush
Typo in in flush leaded to no flush of the RS600 tlb which
ultimately leaded to massive system ram corruption, with
this patch everythings seems to work properly.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-04-19 11:26:13 +10:00
Jerome Glisse
1b5331d9c6 drm/radeon/kms: print GPU family and device id when loading
This will help figuring out GPU when looking at bugs log.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-04-19 11:25:24 +10:00
Eric Dumazet
fc6055a5ba net: Introduce skb_orphan_try()
Transmitted skb might be attached to a socket and a destructor, for
memory accounting purposes.

Traditionally, this destructor is called at tx completion time, when skb
is freed.

When tx completion is performed by another cpu than the sender, this
forces some cache lines to change ownership. XPS was an attempt to give
tx completion to initial cpu.

David idea is to call destructor right before giving skb to device (call
to ndo_start_xmit()). Because device queues are usually small, orphaning
skb before tx completion is not a big deal. Some drivers already do
this, we could do it in upper level.

There is one known exception to this early orphaning, called tx
timestamping. It needs to keep a reference to socket until device can
give a hardware or software timestamp.

This patch adds a skb_orphan_try() helper, to centralize all exceptions
to early orphaning in one spot, and use it in dev_hard_start_xmit().

"tbench 16" results on a Nehalem machine (2 X5570  @ 2.93GHz)
before: Throughput 4428.9 MB/sec 16 procs
after: Throughput 4448.14 MB/sec 16 procs

UDP should get even better results, its destructor being more complex,
since SOCK_USE_WRITE_QUEUE is not set (four atomic ops instead of one)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-18 02:39:41 -07:00
Eric Dumazet
9958da0501 net: remove time limit in process_backlog()
- There is no point to enforce a time limit in process_backlog(), since
other napi instances dont follow same rule. We can exit after only one
packet processed...
The normal quota of 64 packets per napi instance should be the norm, and
net_rx_action() already has its own time limit.
Note : /proc/net/core/dev_weight can be used to tune this 64 default
value.

- Use DEFINE_PER_CPU_ALIGNED for softnet_data definition.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-18 02:36:13 -07:00
Tilman Schmidt
b91ecb0027 gigaset: include cleanup cleanup
Commit 5a0e3ad causes slab.h to be included twice in many of the
Gigaset driver's source files, first via the common include file
gigaset.h and then a second time directly. Drop the spares, and
use the opportunity to clean up a few more similar cases.

Impact: cleanup, no functional change
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-18 02:33:29 -07:00
Linus Torvalds
13bd8e4673 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
  drm/i915: Ignore LVDS EDID when it is unavailabe or invalid
  drm/i915: Add no_lvds entry for the Clientron U800
  drm/i915: Rename many remaining uses of "output" to encoder or connector.
  drm/i915: Rename intel_output to intel_encoder.
  agp/intel: intel_845_driver is an agp driver!
  drm/i915: introduce to_intel_bo helper
  drm/i915: Disable FBC on 915GM and 945GM.
2010-04-17 14:28:50 -07:00
Linus Torvalds
d6f533c8c7 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPI: EC: Limit burst to 64 bits
2010-04-17 10:58:38 -07:00
Linus Torvalds
65832940eb Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: don't warn on EAGAIN in inode reclaim
  xfs: ensure that sync updates the log tail correctly
2010-04-17 10:57:56 -07:00
Julia Lawall
42d284b986 drivers/pcmcia: Add missing local_irq_restore
Use local_irq_restore in this error-handling case just like in the one just
below.

A simplified version of the semantic patch that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
expression E1;
identifier f;
@@

f (...) { <+...
* local_irq_save (E1,...);
... when != E1
* return ...;
...+> }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2010-04-17 17:54:38 +02:00
Timur Maximov
6f4567c8cf serial_cs: MD55x support (PCMCIA GPRS/EDGE modem) (kernel 2.6.33)
Many PCMCIA GPRS modems like: Onda Edge N100E, Novaway PC98 (OEM SPC98Z),
Rovermate Edgus Adaptmate-039 and others have same construction and
identification:

lspcmcia -vvv
Product Name:   Generic Modem: MD55x 1.00 Serial number: xxxxx-xxx
Identification: manf_id: 0x015d card_id: 0x4c45
                function: 2 (serial)
                prod_id(1): "Generic" (0xc49e4731)
                prod_id(2): "Modem: MD55x" (0x8913b110)
                prod_id(3): "1.00" (0x83dbf271)
                prod_id(4): "Serial number: xxxxx-xxx" (0x73ee9514)

Serial connection to GSM module based on Elan VPU16551 PCMCIA UART with
datasheet recommeded 14.7456MHz crystal oscillator.

By default serial_cs set UART clock == 1843200 Hz
For correct work need set clock 14745600 Hz.
This quirk already present in driver, only need add device in quirk list.

Signed-off-by: Timur Maximov <xcom.org@gmail.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2010-04-17 17:53:32 +02:00
Dominik Brodowski
a8408c17d0 pcmcia: avoid late calls to pccard_validate_cis
pccard_validate_cis() nowadays destroys the CIS cache. Therefore,
calling it after card setup should be avoided. We can't control
the deprecated PCMCIA ioctl (which is only used on ARM nowadays),
but we can avoid -- and report -- any other calls.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2010-04-17 17:37:33 +02:00
Eric Dumazet
8770acf049 rps: rps_sock_flow_table is mostly read
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-17 00:54:36 -07:00
Tom Herbert
fec5e652e5 rfs: Receive Flow Steering
This patch implements receive flow steering (RFS).  RFS steers
received packets for layer 3 and 4 processing to the CPU where
the application for the corresponding flow is running.  RFS is an
extension of Receive Packet Steering (RPS).

The basic idea of RFS is that when an application calls recvmsg
(or sendmsg) the application's running CPU is stored in a hash
table that is indexed by the connection's rxhash which is stored in
the socket structure.  The rxhash is passed in skb's received on
the connection from netif_receive_skb.  For each received packet,
the associated rxhash is used to look up the CPU in the hash table,
if a valid CPU is set then the packet is steered to that CPU using
the RPS mechanisms.

The convolution of the simple approach is that it would potentially
allow OOO packets.  If threads are thrashing around CPUs or multiple
threads are trying to read from the same sockets, a quickly changing
CPU value in the hash table could cause rampant OOO packets--
we consider this a non-starter.

To avoid OOO packets, this solution implements two types of hash
tables: rps_sock_flow_table and rps_dev_flow_table.

rps_sock_table is a global hash table.  Each entry is just a CPU
number and it is populated in recvmsg and sendmsg as described above.
This table contains the "desired" CPUs for flows.

rps_dev_flow_table is specific to each device queue.  Each entry
contains a CPU and a tail queue counter.  The CPU is the "current"
CPU for a matching flow.  The tail queue counter holds the value
of a tail queue counter for the associated CPU's backlog queue at
the time of last enqueue for a flow matching the entry.

Each backlog queue has a queue head counter which is incremented
on dequeue, and so a queue tail counter is computed as queue head
count + queue length.  When a packet is enqueued on a backlog queue,
the current value of the queue tail counter is saved in the hash
entry of the rps_dev_flow_table.

And now the trick: when selecting the CPU for RPS (get_rps_cpu)
the rps_sock_flow table and the rps_dev_flow table for the RX queue
are consulted.  When the desired CPU for the flow (found in the
rps_sock_flow table) does not match the current CPU (found in the
rps_dev_flow table), the current CPU is changed to the desired CPU
if one of the following is true:

- The current CPU is unset (equal to RPS_NO_CPU)
- Current CPU is offline
- The current CPU's queue head counter >= queue tail counter in the
rps_dev_flow table.  This checks if the queue tail has advanced
beyond the last packet that was enqueued using this table entry.
This guarantees that all packets queued using this entry have been
dequeued, thus preserving in order delivery.

Making each queue have its own rps_dev_flow table has two advantages:
1) the tail queue counters will be written on each receive, so
keeping the table local to interrupting CPU s good for locality.  2)
this allows lockless access to the table-- the CPU number and queue
tail counter need to be accessed together under mutual exclusion
from netif_receive_skb, we assume that this is only called from
device napi_poll which is non-reentrant.

This patch implements RFS for TCP and connected UDP sockets.
It should be usable for other flow oriented protocols.

There are two configuration parameters for RFS.  The
"rps_flow_entries" kernel init parameter sets the number of
entries in the rps_sock_flow_table, the per rxqueue sysfs entry
"rps_flow_cnt" contains the number of entries in the rps_dev_flow
table for the rxqueue.  Both are rounded to power of two.

The obvious benefit of RFS (over just RPS) is that it achieves
CPU locality between the receive processing for a flow and the
applications processing; this can result in increased performance
(higher pps, lower latency).

The benefits of RFS are dependent on cache hierarchy, application
load, and other factors.  On simple benchmarks, we don't necessarily
see improvement and sometimes see degradation.  However, for more
complex benchmarks and for applications where cache pressure is
much higher this technique seems to perform very well.

Below are some benchmark results which show the potential benfit of
this patch.  The netperf test has 500 instances of netperf TCP_RR
test with 1 byte req. and resp.  The RPC test is an request/response
test similar in structure to netperf RR test ith 100 threads on
each host, but does more work in userspace that netperf.

e1000e on 8 core Intel
   No RFS or RPS		104K tps at 30% CPU
   No RFS (best RPS config):    290K tps at 63% CPU
   RFS				303K tps at 61% CPU

RPC test	tps	CPU%	50/90/99% usec latency	Latency StdDev
  No RFS/RPS	103K	48%	757/900/3185		4472.35
  RPS only:	174K	73%	415/993/2468		491.66
  RFS		223K	73%	379/651/1382		315.61

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-16 16:01:27 -07:00
Daniel Lezcano
1c4f019732 packet : remove init_net restriction
The af_packet protocol is used by Perl to do ioctls as reported by
Stephane Riviere:

"Net::RawIP relies on SIOCGIFADDR et SIOCGIFHWADDR to get the IP and MAC
addresses of the network interface."

But in a new network namespace these ioctl fail because it is disabled for
a namespace different from the init_net_ns.

These two lines should not be there as af_inet and af_packet are
namespace aware since a long time now. I suppose we forget to remove these
lines because we sent the af_packet first, before af_inet was supported.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Reported-by: Stephane Riviere <stephane.riviere@regis-dgac.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-16 15:41:04 -07:00
Krzysztof Halasa
31f634a63d WAN: flush tx_queue in hdlc_ppp to prevent panic on rmmod hw_driver.
tx_queue is used as a temporary queue when not allowed to queue skb
directly to the hw device driver (which may sleep). Most paths flush
it before returning, but ppp_start() currently cannot. Make sure we
don't leave skbs pointing to a non-existent device.

Thanks to Michael Barkowski for reporting this problem.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-16 15:41:03 -07:00
Johannes Berg
e7cb49550e iwlwifi: make scan antenna forcing more generic
Some future hardware will also require some antenna
overrides so make the current logic more generic;
right now it is semantically based on a workaround
for off-channel reception but the reasons for the
new antenna overrides will be different.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
2010-04-16 13:54:29 -07:00
Johannes Berg
ee102603c0 iwlwifi: remove monitor check
Off-channel reception is acceptable in monitor
mode, and checking for monitor mode this way is
not really correct anyway since it could be the
case while operating.

Now iwl_is_monitor_mode() is no longer used so
remove it completely.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
2010-04-16 13:54:16 -07:00
Johannes Berg
fa58b6a3b3 iwlwifi: don't check monitor for scanning
Monitor mode operation need not (and probably should
not) affect scanning this way since real monitoring
can not properly happen while scanning anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
2010-04-16 13:54:06 -07:00
Johannes Berg
b2e8690d5a iwlwifi: rename TX_CMD_FLG_BT_DIS_MSK
The flag name is a little misleading, this
flag instructs the device to ignore bluetooth
messages for purposes of frame transmissions,
so rename the flag to TX_CMD_FLG_IGNORE_BT.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
2010-04-16 13:53:46 -07:00
Johannes Berg
65b52bde68 iwlwifi: make BT coex config a virtual method
Some future hardware will require a different command to
be sent for bluetooth coexist, so make this a virtual
method that can be changed on a per-device basis.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
2010-04-16 13:53:34 -07:00
Wey-Yi Guy
f4388adc92 iwlwifi: more code clean up for agn devices
Since multiple new devices having similar uCode architecture and use same
registers address, remove more reference to 5000 series to eliminate the
confusion.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
2010-04-16 13:53:20 -07:00