This implements a mqprio queueing discipline that by default creates
a pfifo_fast qdisc per tx queue and provides the needed configuration
interface.
Using the mqprio qdisc the number of tcs currently in use along
with the range of queues alloted to each class can be configured. By
default skbs are mapped to traffic classes using the skb priority.
This mapping is configurable.
Configurable parameters,
struct tc_mqprio_qopt {
__u8 num_tc;
__u8 prio_tc_map[TC_BITMASK + 1];
__u8 hw;
__u16 count[TC_MAX_QUEUE];
__u16 offset[TC_MAX_QUEUE];
};
Here the count/offset pairing give the queue alignment and the
prio_tc_map gives the mapping from skb->priority to tc.
The hw bit determines if the hardware should configure the count
and offset values. If the hardware bit is set then the operation
will fail if the hardware does not implement the ndo_setup_tc
operation. This is to avoid undetermined states where the hardware
may or may not control the queue mapping. Also minimal bounds
checking is done on the count/offset to verify a queue does not
exceed num_tx_queues and that queue ranges do not overlap. Otherwise
it is left to user policy or hardware configuration to create
useful mappings.
It is expected that hardware QOS schemes can be implemented by
creating appropriate mappings of queues in ndo_tc_setup().
One expected use case is drivers will use the ndo_setup_tc to map
queue ranges onto 802.1Q traffic classes. This provides a generic
mechanism to map network traffic onto these traffic classes and
removes the need for lower layer drivers to know specifics about
traffic types.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides a mechanism for lower layer devices to
steer traffic using skb->priority to tx queues. This allows
for hardware based QOS schemes to use the default qdisc without
incurring the penalties related to global state and the qdisc
lock. While reliably receiving skbs on the correct tx ring
to avoid head of line blocking resulting from shuffling in
the LLD. Finally, all the goodness from txq caching and xps/rps
can still be leveraged.
Many drivers and hardware exist with the ability to implement
QOS schemes in the hardware but currently these drivers tend
to rely on firmware to reroute specific traffic, a driver
specific select_queue or the queue_mapping action in the
qdisc.
By using select_queue for this drivers need to be updated for
each and every traffic type and we lose the goodness of much
of the upstream work. Firmware solutions are inherently
inflexible. And finally if admins are expected to build a
qdisc and filter rules to steer traffic this requires knowledge
of how the hardware is currently configured. The number of tx
queues and the queue offsets may change depending on resources.
Also this approach incurs all the overhead of a qdisc with filters.
With the mechanism in this patch users can set skb priority using
expected methods ie setsockopt() or the stack can set the priority
directly. Then the skb will be steered to the correct tx queues
aligned with hardware QOS traffic classes. In the normal case with
single traffic class and all queues in this class everything
works as is until the LLD enables multiple tcs.
To steer the skb we mask out the lower 4 bits of the priority
and allow the hardware to configure upto 15 distinct classes
of traffic. This is expected to be sufficient for most applications
at any rate it is more then the 8021Q spec designates and is
equal to the number of prio bands currently implemented in
the default qdisc.
This in conjunction with a userspace application such as
lldpad can be used to implement 8021Q transmission selection
algorithms one of these algorithms being the extended transmission
selection algorithm currently being used for DCB.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a rtnetlink request specifies a negative or zero ifindex and has no
interface name attribute, but has a group attribute, then the chenges
are made to all the interfaces belonging to the specified group.
Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Net devices can now be grouped, enabling simpler manipulation from
userspace. This patch adds a group field to the net_device structure, as
well as rtnetlink support to query and modify it.
Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up some unused macros in net/*.
1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE.
2. never be used since introduced to kernel.
e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To reduce the possibility of losing an interrupt in the handler due to a
race between an interrupt processing and disable/enable of interrupts,
enable MSIX one shot.
Also, add support for adaptive interrupt coalesing
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Masroor Vettuparambil <masroor.vettuparambil@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware PXE EPROM version detection is failing due to passing the
wrong parameter into firmware query function. Also, the version
printing function has an extraneous newline.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Sivakumar Subramani <sivakumar.subramani@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reorder the commands to be in the inverse order of their allocations
(instead of the random order they appear to be in), propagate return
code on errors from pci_request_region and register_netdev, reduce the
config_dev_cnt and total_dev_cnt counters on remove, and return the
correct error code for vdev->vpaths kzalloc failures. Also, prevent
leaking of vdev->vpaths memory and netdev in vxge_probe error path due
to freeing for these not occurring in vxge_device_unregister.
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Sivakumar Subramani <sivakumar.subramani@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Unify "numa=" command line option handling
Revert "x86: Make relocatable kernel work with new binutils"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits)
sctp: user perfect name for Delayed SACK Timer option
net: fix can_checksum_protocol() arguments swap
Revert "netlink: test for all flags of the NLM_F_DUMP composite"
gianfar: Fix misleading indentation in startup_gfar()
net/irda/sh_irda: return to RX mode when TX error
net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
USB CDC NCM: tx_fixup() race condition fix
ns83820: Avoid bad pointer deref in ns83820_init_one().
ipv6: Silence privacy extensions initialization
bnx2x: Update bnx2x version to 1.62.00-4
bnx2x: Fix AER setting for BCM57712
bnx2x: Fix BCM84823 LED behavior
bnx2x: Mark full duplex on some external PHYs
bnx2x: Fix BCM8073/BCM8727 microcode loading
bnx2x: LED fix for BCM8727 over BCM57712
bnx2x: Common init will be executed only once after POR
bnx2x: Swap BCM8073 PHY polarity if required
iwlwifi: fix valid chain reading from EEPROM
ath5k: fix locking in tx_complete_poll_work
ath9k_hw: do PA offset calibration only on longcal interval
...
When clearing the DMA channel, clear all status bits.
When handling a DMA interrupt, clear only the interrupt
status bits that have been read and are passed to the
channel's interrupt handler, not every status bit.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: G, Manjunath Kondaiah <manjugk@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
gpio7 on the tps65930 is used as an output on the devkit8000 and
gpio1 is not connected. Remove gpio7 and change gpio1 to pulldown
Signed-off-by: Daniel Morsing <daniel.morsing@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
commit 0363466866 (net offloading: Convert checksums to use
centrally computed features.) mistakenly swapped can_checksum_protocol()
arguments.
This broke IPv6 on bnx2 for instance, on NIC without TCPv6 checksum
offloads.
Reported-by: Hans de Bruin <jmdebruin@xmsnet.nl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When NOHZ=y and high res timers are disabled (via cmdline or
Kconfig) tick_nohz_switch_to_nohz() will notify the user about
switching into NOHZ mode. Nothing is printed for the case where
HIGH_RES_TIMERS=y. Fix this for the HIGH_RES_TIMERS=y case by
duplicating the printk from the low res NOHZ path in the high
res NOHZ path.
This confused me since I was thinking 'dmesg | grep -i NOHZ' would
tell me if NOHZ was enabled, but if I have hrtimers there is
nothing.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1295419594-13085-1-git-send-email-sboyd@codeaurora.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_event_init_task() should clear child->perf_event_ctxp[]
before anything else. Otherwise, if
perf_event_init_context(perf_hw_context) fails,
perf_event_free_task() can free perf_event_ctxp[perf_sw_context]
copied from parent->perf_event_ctxp[] by dup_task_struct().
Also move the initialization of perf_event_mutex and
perf_event_list from perf_event_init_context() to
perf_event_init_context().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
LKML-Reference: <20110119182228.GC12183@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
find_get_context() must not install the new perf_event_context
if the task has already passed perf_event_exit_task().
If nothing else, this means the memory leak. Initially
ctx->refcount == 2, it is supposed that
perf_event_exit_task_context() should participate and do the
necessary put_ctx().
find_lively_task_by_vpid() checks PF_EXITING but this buys
nothing, by the time we call find_get_context() this task can be
already dead. To the point, cmpxchg() can succeed when the task
has already done the last schedule().
Change find_get_context() to populate task->perf_event_ctxp[]
under task->perf_event_mutex, this way we can trust PF_EXITING
because perf_event_exit_task() takes the same mutex.
Also, change perf_event_exit_task_context() to use
rcu_dereference(). Probably this is not strictly needed, but
with or without this change find_get_context() can race with
setup_new_exec()->perf_event_exit_task(), rcu_dereference()
looks better.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
LKML-Reference: <20110119182207.GB12183@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Earlier patches select HAVE_SCHED_CLOCK for omaps. To have working sched_clock
also for MPU timer, we need to implement it in a way where the right one gets
selected during the runtime.
Signed-off-by: Tony Lindgren <tony@atomide.com>
For omap15xx and 730 we need to use the MPU timer
as the 32K timer is not available. For omap16xx
we want to use the 32K timer because of PM. Fix this
by allowing to build in both timers.
Signed-off-by: Tony Lindgren <tony@atomide.com>
When no tstamp extension exists, ct_delta_time() returns -1, which is
then assigned to an u64 and tested for negative values to decide
whether to display the lifetime. This obviously doesn't work, use
a s64 and merge the two minor functions into one.
Signed-off-by: Patrick McHardy <kaber@trash.net>
In later patches, we're going to need to have finer-grained control
over the addition and removal of these structs from the pending_mid_q
and we'll need to be able to call the destructor while holding the
spinlock. Move the locked sections out of both routines and into
the callers. Fix up current callers of DeleteMidQEntry to call a new
routine that dequeues the entry and then destroys it.
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
It's an atomic_t and the code accesses the "counter" field in it directly
instead of using atomic_read(). It also is sometimes accessed under a
spinlock and sometimes not. Move it out of the spinlock since we don't need
belt-and-suspenders for something that's just informational.
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The TCP_Server_Info is refcounted and every SMB session holds a
reference to it. Thus, smb_ses_list is always going to be empty when
cifsd is coming down. This is dead code.
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
If CIFSSMBWrite2 returns -EAGAIN, then the error should be considered
temporary. CIFS should retry the write instead of setting an error on
the mapping and returning.
For WB_SYNC_ALL, just retry the write immediately. In the WB_SYNC_NONE
case, call redirty_page_for_writeback on all of the pages that didn't
get written out and then move on.
Also, fix up the handling of a short write with a successful return
code. MS-CIFS says that 0 bytes_written means ENOSPC or EFBIG. It
doesn't mention what a short, but non-zero write means, so for now
treat it as we would an -EAGAIN return.
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When we get oplock break notification we should set the appropriate
value of OplockLevel field in oplock break acknowledge according to
the oplock level held by the client in this time. As we only can have
level II oplock or no oplock in the case of oplock break, we should be
aware only about clientCanCacheRead field in cifsInodeInfo structure.
Also fix bug connected with wrong interpretation of OplockLevel field
during oplock break notification processing.
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This adds destination address-based selection. The old "inverse"
member is overloaded (memory-wise) with a new "flags" variable,
similar to how J.Park did it with xt_string rev 1. Since revision 0
userspace only sets flag 0x1, no great changes are made to explicitly
test for different revisions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
sound/pci/hda/patch_realtek.c: In function ‘alc_apply_fixup’:
sound/pci/hda/patch_realtek.c:1724:14: warning: unused variable ‘modelname’
snd_printdd() is evaluated only when CONFIG_SND_DEBUG_VERBOSE=y.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Commit a4f740cf, "of/flattree: Add of_flat_dt_match() helper function"
introduced build failures in arch/powerpc/platform/83xx by mistyping
'static' as 'struct' in the compatible string list, and omitting a few
semicolons. This patch fixes it.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch adds flow-based timestamping for conntracks. This
conntrack extension is disabled by default. Basically, we use
two 64-bits variables to store the creation timestamp once the
conntrack has been confirmed and the other to store the deletion
time. This extension is disabled by default, to enable it, you
have to:
echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp
This patch allows to save memory for user-space flow-based
loogers such as ulogd2. In short, ulogd2 does not need to
keep a hashtable with the conntrack in user-space to know
when they were created and destroyed, instead we use the
kernel timestamp. If we want to have a sane IPFIX implementation
in user-space, this nanosecs resolution timestamps are also
useful. Other custom user-space applications can benefit from
this via libnetfilter_conntrack.
This patch modifies the /proc output to display the delta time
in seconds since the flow start. You can also obtain the
flow-start date by means of the conntrack-tools.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When the lirc drivers were converted over to using memdup_user, I
mistakenly also removed corresponding calls to kfree. Add those back. I
also screwed up on the allocation error check in lirc_serial, using if
(PTR_ERR()) instead of if (IS_ERR()), which broke transmit.
Reported-by: Jiri Fojtasek <jiri.fojtasek@hlohovec.net>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The current hdpvr code kmalloc's a new buffer for every i2c read and
write. Rather than do that, lets allocate a buffer in the driver's
device struct and just use that every time.
The size I've chosen for the buffer is the maximum size I could
ascertain might be used by either ir-kbd-i2c or lirc_zilog, plus a bit
of padding (lirc_zilog may use up to 100 bytes on tx, rounded that up
to 128).
Note that this might also remedy user reports of very sluggish behavior
of IR receive with hdpvr hardware.
v2: make sure (len <= (dev->i2c_buf)) [Jean Delvare]
Reported-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
A number of things going on here, but the end result is that the IR part
on the hdpvr gets enabled, and can be used with ir-kbd-i2c and/or
lirc_zilog.
First up, there are some conditional build fixes that come into play
whether i2c is built-in or modular. Second, we're swapping out
i2c_new_probed_device() for i2c_new_device(), as in my testing, probing
always fails, but we *know* that all hdpvr devices have a z8 chip at
0x70 and 0x71. Third, we're poking at an i2c address directly without a
client, and writing some magic bits to actually turn on this IR part
(this could use some improvement in the future). Fourth, some of the
i2c_adapter storage has been reworked, as the existing implementation
used to lead to an oops following i2c changes c. 2.6.31.
Earlier editions of this patch have been floating around the 'net for a
while, including being patched into Fedora kernels, and they *do* work.
This specific version isn't yet tested, beyond loading ir-kbd-i2c and
confirming that it does bind to the RX address of the hdpvr.
[mchehab@redhat.com: I2C_CLASS_TV_ANALOG is not defined. Fix compilation bug]
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Andy Walls <awalls@md.metrocast.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Fixes an egregious bug in mceusb driver, where the receiver was being
put into idle mode far sooner than it should have, thanks to storing a
timeout value that in us where it should be ns. Basically, the receiver
kept going into idle mode before a trailing space had been fully
received, which was causing problems for some protocols, most notably
manifesting as lirc userspace never receiving a trailing space for any
rc5 signals.
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Fix a bug in v4l2_device_unregister where the sd pointer can be dereferenced
after it was freed.
Normally the i2c adapter is removed before this function is called. Removing
the adapter will also unregister all subdevs on that adapter, so generally
v4l2_device_unregister has nothing to do. However, in the case of a platform
i2c bus that bus is generally not freed.
In that case, after freeing the i2c subdevice the code will fall into the
second block when it tests if the subdev is a SPI device. But by that time
the subdev is already freed and the kernel oopses.
The fix is trivial: continue with the loop after freeing the i2c or spi
subdevice.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Reported-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Zeroing video_device.dev causes a memory leak if video_set_drvdata
was called before video_register_device was called. video_set_drvdata
calls dev_set_drvdata which allocates video_device.dev.p.
memsetting this will prevent freeing of that memory.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
After a detach zero the whole device state to ensure a clean slate
on the next attach.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>