Commit Graph

469978 Commits

Author SHA1 Message Date
Al Viro
4023bfc9f3 be careful with nd->inode in path_init() and follow_dotdot_rcu()
in the former we simply check if dentry is still valid after picking
its ->d_inode; in the latter we fetch ->d_inode in the same places
where we fetch dentry and its ->d_seq, under the same checks.

Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-14 14:24:47 -04:00
Al Viro
7bd88377d4 don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu()
return the value instead, and have path_init() do the assignment.  Broken by
"vfs: Fix absolute RCU path walk failures due to uninitialized seq number",
which was Cc-stable with 2.6.38+ as destination.  This one should go where
it went.

To avoid dummy value returned in case when root is already set (it would do
no harm, actually, since the only caller that doesn't ignore the return value
is guaranteed to have nd->root *not* set, but it's more obvious that way),
lift the check into callers.  And do the same to set_root(), to keep them
in sync.

Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-14 14:19:44 -04:00
Linus Torvalds
02c1be3d0c Merge tag 'ntb-3.17' of git://github.com/jonmason/ntb
Pull ntb driver bugfixes from Jon Mason:
 "NTB driver fixes for queue spread and buffer alignment.  Also, update
  to MAINTAINERS to reflect new e-mail address"

* tag 'ntb-3.17' of git://github.com/jonmason/ntb:
  ntb: Add alignment check to meet hardware requirement
  MAINTAINERS: update NTB info
  NTB: correct the spread of queues over mw's
2014-09-14 10:54:12 -07:00
Linus Torvalds
8ac19f0d90 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull ARM irq chip fixes from Thomas Gleixner:
 "Another pile of ARM specific irq chip fixlets:

   - off by one bugs in the crossbar driver
   - missing annotations
   - a bunch of "make it compile" updates

  I pulled the lot today from Jason, but it has been in -next for at
  least a week"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: gic-v3: Declare rdist as __percpu pointer to __iomem pointer
  irqchip: gic: Make gic_default_routable_irq_domain_ops static
  irqchip: exynos-combiner: Fix compilation error on ARM64
  irqchip: crossbar: Off by one bugs in init
  irqchip: gic-v3: Tag all low level accessors __maybe_unused
  irqchip: gic-v3: Only define gic_peek_irq() when building SMP
2014-09-14 10:37:10 -07:00
Denis CIOCCA
a31d092899 iio:magnetometer: bugfix magnetometers gain values
This patch fix gains values. The first driver was designed using
engineering samples, in mass production the values are changed.

Signed-off-by: Denis Ciocca <denis.ciocca@st.com>
Cc: Stable@vger.kernel.org
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2014-09-14 18:21:23 +01:00
Ludovic Desroches
d4f51956ac iio: adc: at91: don't use the last converted data register
If touchscreen mode is enabled and a conversion is requested on another
channel, the result in the last converted data register can be a
touchscreen relative value. Starting a conversion involves to do a
conversion for all active channel. It starts with ADC channels and ends
with touchscreen channels. Then if ADC_LCD register is not read quickly,
its content may be a touchscreen conversion.
To remove this temporal constraint, the conversion value is taken from
the channel data register.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Stable@vger.kernel.org
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2014-09-14 18:20:12 +01:00
Subbaraya Sundeep Bhatta
1887e724e2 iio: adc: xilinx-xadc: assign auxiliary channels address correctly
This patch fixes incorrect logic for assigning address
to auxiliary channels of xilinx xadc.

Signed-off-by: Subbaraya Sundeep Bhatta <sbhatta@xilinx.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Stable@vger.kernel.org
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2014-09-14 18:18:22 +01:00
Ard Biesheuvel
85c8555ff0 KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()
Read-only memory ranges may be backed by the zero page, so avoid
misidentifying it a a MMIO pfn.

This fixes another issue I identified when testing QEMU+KVM_UEFI, where
a read to an uninitialized emulated NOR flash brought in the zero page,
but mapped as a read-write device region, because kvm_is_mmio_pfn()
misidentifies it as a MMIO pfn due to its PG_reserved bit being set.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: b88657674d ("ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-14 16:26:05 +02:00
Ard Biesheuvel
0b70068e47 mm: export symbol dependencies of is_zero_pfn()
In order to make the static inline function is_zero_pfn() callable by
modules, export its symbol dependencies 'zero_pfn' and (for s390 and
mips) 'zero_page_mask'.

We need this for KVM, as CONFIG_KVM is a tristate for all supported
architectures except ARM and arm64, and testing a pfn whether it refers
to the zero page is required to correctly distinguish the zero page
from other special RAM ranges that may also have the PG_reserved bit
set, but need to be treated as MMIO memory.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-14 16:25:14 +02:00
Dave Young
3eddc69ffe x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
3.16 kernel boot fail with earlyprintk=efi, it keeps scrolling at the
bottom line of screen.

Bisected, the first bad commit is below:
commit 86dfc6f339
Author: Lv Zheng <lv.zheng@intel.com>
Date:   Fri Apr 4 12:38:57 2014 +0800

    ACPICA: Tables: Fix table checksums verification before installation.

I did some debugging by enabling both serial and efi earlyprintk, below is
some debug dmesg, seems early_ioremap fails in scroll up function due to
no free slot, see below dmesg output:

  WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:116 __early_ioremap+0x90/0x1c4()
  __early_ioremap(ed00c800, 00000c80) not found slot
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 3.17.0-rc1+ #204
  Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS J61 v03.15 05/09/2013
  Call Trace:
    dump_stack+0x4e/0x7a
    warn_slowpath_common+0x75/0x8e
    ? __early_ioremap+0x90/0x1c4
    warn_slowpath_fmt+0x47/0x49
    __early_ioremap+0x90/0x1c4
    ? sprintf+0x46/0x48
    early_ioremap+0x13/0x15
    early_efi_map+0x24/0x26
    early_efi_scroll_up+0x6d/0xc0
    early_efi_write+0x1b0/0x214
    call_console_drivers.constprop.21+0x73/0x7e
    console_unlock+0x151/0x3b2
    ? vprintk_emit+0x49f/0x532
    vprintk_emit+0x521/0x532
    ? console_unlock+0x383/0x3b2
    printk+0x4f/0x51
    acpi_os_vprintf+0x2b/0x2d
    acpi_os_printf+0x43/0x45
    acpi_info+0x5c/0x63
    ? __acpi_map_table+0x13/0x18
    ? acpi_os_map_iomem+0x21/0x147
    acpi_tb_print_table_header+0x177/0x186
    acpi_tb_install_table_with_override+0x4b/0x62
    acpi_tb_install_standard_table+0xd9/0x215
    ? early_ioremap+0x13/0x15
    ? __acpi_map_table+0x13/0x18
    acpi_tb_parse_root_table+0x16e/0x1b4
    acpi_initialize_tables+0x57/0x59
    acpi_table_init+0x50/0xce
    acpi_boot_table_init+0x1e/0x85
    setup_arch+0x9b7/0xcc4
    start_kernel+0x94/0x42d
    ? early_idt_handlers+0x120/0x120
    x86_64_start_reservations+0x2a/0x2c
    x86_64_start_kernel+0xf3/0x100

Quote reply from Lv.zheng about the early ioremap slot usage in this case:

"""
In early_efi_scroll_up(), 2 mapping entries will be used for the src/dst screen buffer.
In drivers/acpi/acpica/tbutils.c, we've improved the early table loading code in acpi_tb_parse_root_table().
We now need 2 mapping entries:
1. One mapping entry is used for RSDT table mapping. Each RSDT entry contains an address for another ACPI table.
2. For each entry in RSDP, we need another mapping entry to map the table to perform necessary check/override before installing it.

When acpi_tb_parse_root_table() prints something through EFI earlyprintk console, we'll have 4 mapping entries used.
The current 4 slots setting of early_ioremap() seems to be too small for such a use case.
"""

Thus increase the slot to 8 in this patch to fix this issue.
boot-time mappings become 512 page with this patch.

Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org> # v3.16
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-14 15:24:31 +01:00
Thomas Gleixner
938c04a870 Merge tag 'irqchip-urgent-3.17' of git://git.infradead.org/users/jcooper/linux into irq/urgent
irqchip fixes for v3.17 from Jason Cooper

 - GIC/GICV3: Various fixlets
 - crossbar: Fix off-by-one bug
 - exynos-combiner: Fix arm64 build error
2014-09-14 15:20:54 +02:00
Dave Jiang
3cc5ba1938 ntb: Add alignment check to meet hardware requirement
The NTB translate register must have the value to be BAR size aligned.
This alignment check make sure that the DMA memory allocated has the
proper alignment. Another requirement for NTB to function properly with
memory window BAR size greater or equal to 4M is to use the CMA feature
in 3.16 kernel with the appropriate CONFIG_CMA_ALIGNMENT and
CONFIG_CMA_SIZE_MBYTES set.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2014-09-14 00:10:38 -04:00
Jon Mason
9ef6bf6c75 MAINTAINERS: update NTB info
Update my contact info to my personal email address and add Dave Jiang.

Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2014-09-14 00:10:38 -04:00
Jon Mason
a1413cfbcb NTB: correct the spread of queues over mw's
The detection of an uneven number of queues on the given memory windows
was not correct.  The mw_num is zero based and the mod should be
division to spread them evenly over the mw's.

Signed-off-by: Jon Mason <jon.mason@intel.com>
2014-09-14 00:10:38 -04:00
Al Viro
f5be3e2912 fix bogus read_seqretry() checks introduced in b37199e
read_seqretry() returns true on mismatch, not on match...

Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-13 22:14:16 -04:00
Al Viro
6f18493e54 move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon)
and lock the right list there

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-13 22:14:03 -04:00
Al Viro
f77ced6637 [fix] lustre: d_make_root() does iput() on dentry allocation failure
double-free is a bad thing

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-13 22:13:39 -04:00
Linus Torvalds
1536340e7c Merge branches 'locking-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex and timer fixes from Thomas Gleixner:
 "A oneliner bugfix for the jinxed futex code:

   - Drop hash bucket lock in the error exit path.  I really could slap
     myself for intruducing that bug while fixing all the other horror
     in that code three month ago ...

  and the timer department is not too proud about the following fixes:

   - Deal with a long standing rounding bug in the timeval to jiffies
     conversion.  It's a real issue and this fix fell through the cracks
     for quite some time.

   - Another round of alarmtimer fixes.  Finally this code gets used
     more widely and the subtle issues hidden for quite some time are
     noticed and fixed.  Nothing really exciting, just the itty bitty
     details which bite the serious users here and there"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Unlock hb->lock in futex_wait_requeue_pi() error path

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  alarmtimer: Lock k_itimer during timer callback
  alarmtimer: Do not signal SIGEV_NONE timers
  alarmtimer: Return relative times in timer_gettime
  jiffies: Fix timeval conversion to jiffies
2014-09-13 14:22:12 -07:00
David S. Miller
9e07a42238 Merge branch 'bridge_vlan_filtering'
Vladislav Yasevich says:

====================
bridge: Two small fixes to vlan filtering code.

This series corrects 2 small issues that I've ran across recently
while doing more work with vlan filtering changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:22:02 -04:00
Vlad Yasevich
635126b7ca bridge: Allow clearing of pvid and untagged bitmap
Currently, it is possible to modify the vlan filter
configuration to add pvid or untagged support.
For example:
  bridge vlan add vid 10 dev eth0
  bridge vlan add vid 10 dev eth0 untagged pvid

The second statement will modify vlan 10 to
include untagged and pvid configuration.
However, it is currently impossible to go backwards
  bridge vlan add vid 10 dev eth0 untagged pvid
  bridge vlan add vid 10 dev eth0

Here nothing happens.  This patch correct this so
that any modifiers not supplied are removed from
the configuration.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:21:56 -04:00
Vlad Yasevich
20adfa1a81 bridge: Check if vlan filtering is enabled only once.
The bridge code checks if vlan filtering is enabled on both
ingress and egress.   When the state flip happens, it
is possible for the bridge to currently be forwarding packets
and forwarding behavior becomes non-deterministic.  Bridge
may drop packets on some interfaces, but not others.

This patch solves this by caching the filtered state of the
packet into skb_cb on ingress.  The skb_cb is guaranteed to
not be over-written between the time packet entres bridge
forwarding path and the time it leaves it.  On egress, we
can then check the cached state to see if we need to
apply filtering information.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:21:56 -04:00
Nikolay Aleksandrov
9a72c2da69 bonding: fix div by zero while enslaving and transmitting
The problem is that the slave is first linked and slave_cnt is
incremented afterwards leading to a div by zero in the modes that use it
as a modulus. What happens is that in bond_start_xmit()
bond_has_slaves() is used to evaluate further transmission and it becomes
true after the slave is linked in, but when slave_cnt is used in the xmit
path it is still 0, so fetch it once and transmit based on that. Since
it is used only in round-robin and XOR modes, the fix is only for them.
Thanks to Eric Dumazet for pointing out the fault in my first try to fix
this.

Call trace (took it out of net-next kernel, but it's the same with net):
[46934.330038] divide error: 0000 [#1] SMP
[46934.330041] Modules linked in: bonding(O) 9p fscache
snd_hda_codec_generic crct10dif_pclmul
[46934.330041] bond0: Enslaving eth1 as an active interface with an up
link
[46934.330051]  ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio
ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc
serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console
snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore
virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic
pata_acpi floppy [last unloaded: bonding]
[46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G           O
3.17.0-rc4+ #27
[46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti:
ffff88005b728000
[46934.330059] RIP: 0010:[<ffffffffa0198c33>]  [<ffffffffa0198c33>]
bond_start_xmit+0x1c3/0x450 [bonding]
[46934.330060] RSP: 0018:ffff88005b72b7f8  EFLAGS: 00010246
[46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX:
000000000000002a
[46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI:
ffff88004b077940
[46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09:
ffff88004a83e000
[46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12:
ffff88004b3f0500
[46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15:
ffff88004b077940
[46934.330063] FS:  00007fbd91a4c740(0000) GS:ffff88005f080000(0000)
knlGS:0000000000000000
[46934.330064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4:
00000000000406e0
[46934.330069] Stack:
[46934.330071]  ffffffff811e6169 00000000e772fa05 ffff88004b077000
ffff88004b3f0500
[46934.330072]  ffffffff81d17d18 000000000000002a 0000000000000000
ffff88005b72b8a0
[46934.330073]  ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4
ffff88005b302000
[46934.330073] Call Trace:
[46934.330077]  [<ffffffff811e6169>] ?
__kmalloc_node_track_caller+0x119/0x300
[46934.330084]  [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410
[46934.330086]  [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90
[46934.330088]  [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590
[46934.330089]  [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20
[46934.330090]  [<ffffffff8168f022>] arp_xmit+0x22/0x60
[46934.330091]  [<ffffffff8168f090>] arp_send.part.16+0x30/0x40
[46934.330092]  [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0
[46934.330094]  [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0
[46934.330096]  [<ffffffff8162875a>] neigh_probe+0x4a/0x70
[46934.330097]  [<ffffffff8162979c>] __neigh_event_send+0xac/0x230
[46934.330098]  [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220
[46934.330100]  [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0
[46934.330101]  [<ffffffff81660478>] ip_finish_output+0x1f8/0x860
[46934.330102]  [<ffffffff81661f08>] ip_output+0x58/0x90
[46934.330103]  [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0
[46934.330104]  [<ffffffff81661640>] ip_local_out_sk+0x30/0x40
[46934.330105]  [<ffffffff81662a66>] ip_send_skb+0x16/0x50
[46934.330106]  [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40
[46934.330107]  [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30
[46934.330110]  [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60
[46934.330111]  [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0
[46934.330113]  [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0
[46934.330114]  [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0
[46934.330115] bond0: Adding slave eth2
[46934.330116]  [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0
[46934.330118]  [<ffffffff81603248>] ?
move_addr_to_kernel.part.20+0x28/0x80
[46934.330121]  [<ffffffff811b4477>] ? might_fault+0x47/0x50
[46934.330122]  [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0
[46934.330125]  [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530
[46934.330127]  [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50
[46934.330129]  [<ffffffff81242b38>] ? fsnotify+0x238/0x310
[46934.330130]  [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90
[46934.330131]  [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20
[46934.330134]  [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b
[46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9
1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c
89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00
[46934.330146] RIP  [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450
[bonding]
[46934.330146]  RSP <ffff88005b72b7f8>

CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
Fixes: 278b208375 ("bonding: initial RCU conversion")
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:16:56 -04:00
David S. Miller
638898f11a Merge branch 'r8169-net'
Hayes Wang says:

====================
r8169: fix rx vlan

There are two issues for hw rx vlan. The patches are
used to fix them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:52:52 -04:00
hayeswang
36d8e82541 r8169: fix setting rx vlan
The setting should depend on the new features not the current one.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:52:44 -04:00
hayeswang
48c20407f4 r8169: fix the default setting of rx vlan
If the parameter "features" of __rtl8169_set_features() is equal to
dev->features, the variable "changed" is alwayes 0, and nothing would
be changed.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:52:44 -04:00
Guy Martin
8920649120 parisc: Implement new LWS CAS supporting 64 bit operations.
The current LWS cas only works correctly for 32bit. The new LWS allows
for CAS operations of variable size.

Signed-off-by: Guy Martin <gmsoft@tuxicoman.be>
Cc: <stable@vger.kernel.org> # 3.13+
Signed-off-by: Helge Deller <deller@gmx.de>
2014-09-13 22:40:48 +02:00
Linus Torvalds
99d263d4c5 vfs: fix bad hashing of dentries
Josef Bacik found a performance regression between 3.2 and 3.10 and
narrowed it down to commit bfcfaa77bd ("vfs: use 'unsigned long'
accesses for dcache name comparison and hashing"). He reports:

 "The test case is essentially

      for (i = 0; i < 1000000; i++)
              mkdir("a$i");

  On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k
  dir/sec with 3.10.  This is because we spend waaaaay more time in
  __d_lookup on 3.10 than in 3.2.

  The new hashing function for strings is suboptimal for <
  sizeof(unsigned long) string names (and hell even > sizeof(unsigned
  long) string names that I've tested).  I broke out the old hashing
  function and the new one into a userspace helper to get real numbers
  and this is what I'm getting:

      Old hash table had 1000000 entries, 0 dupes, 0 max dupes
      New hash table had 12628 entries, 987372 dupes, 900 max dupes
      We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash

  My test does the hash, and then does the d_hash into a integer pointer
  array the same size as the dentry hash table on my system, and then
  just increments the value at the address we got to see how many
  entries we overlap with.

  As you can see the old hash function ended up with all 1 million
  entries in their own bucket, whereas the new one they are only
  distributed among ~12.5k buckets, which is why we're using so much
  more CPU in __d_lookup".

The reason for this hash regression is two-fold:

 - On 64-bit architectures the down-mixing of the original 64-bit
   word-at-a-time hash into the final 32-bit hash value is very
   simplistic and suboptimal, and just adds the two 32-bit parts
   together.

   In particular, because there is no bit shuffling and the mixing
   boundary is also a byte boundary, similar character patterns in the
   low and high word easily end up just canceling each other out.

 - the old byte-at-a-time hash mixed each byte into the final hash as it
   hashed the path component name, resulting in the low bits of the hash
   generally being a good source of hash data.  That is not true for the
   word-at-a-time case, and the hash data is distributed among all the
   bits.

The fix is the same in both cases: do a better job of mixing the bits up
and using as much of the hash data as possible.  We already have the
"hash_32|64()" functions to do that.

Reported-by: Josef Bacik <jbacik@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-13 11:30:10 -07:00
Linus Torvalds
23d0db76ff Make hash_64() use a 64-bit multiply when appropriate
The hash_64() function historically does the multiply by the
GOLDEN_RATIO_PRIME_64 number with explicit shifts and adds, because
unlike the 32-bit case, gcc seems unable to turn the constant multiply
into the more appropriate shift and adds when required.

However, that means that we generate those shifts and adds even when the
architecture has a fast multiplier, and could just do it better in
hardware.

Use the now-cleaned-up CONFIG_ARCH_HAS_FAST_MULTIPLIER (together with
"is it a 64-bit architecture") to decide whether to use an integer
multiply or the explicit sequence of shift/add instructions.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-13 11:24:03 -07:00
Linus Torvalds
72d9310460 Make ARCH_HAS_FAST_MULTIPLIER a real config variable
It used to be an ad-hoc hack defined by the x86 version of
<asm/bitops.h> that enabled a couple of library routines to know whether
an integer multiply is faster than repeated shifts and additions.

This just makes it use the real Kconfig system instead, and makes x86
(which was the only architecture that did this) select the option.

NOTE! Even for x86, this really is kind of wrong.  If we cared, we would
probably not enable this for builds optimized for netburst (P4), where
shifts-and-adds are generally faster than multiplies.  This patch does
*not* change that kind of logic, though, it is purely a syntactic change
with no code changes.

This was triggered by the fact that we have other places that really
want to know "do I want to expand multiples by constants by hand or
not", particularly the hash generation code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-13 11:14:53 -07:00
Linus Torvalds
186cec317e Merge tag 'dm-3.17-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fix from Mike Snitzer:
 "Fix a race in the DM cache target that caused dirty blocks to be
  marked as clean.  This could cause no writeback to occur or spurious
  dirty block counts"

* tag 'dm-3.17-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: fix race causing dirty blocks to be marked as clean
2014-09-13 10:04:10 -07:00
Mugunthan V N
618073e30c drivers: net: cpsw: dual_emac: in suspend/resume bring down/up all the netdev
During suspend and resume in Dual EMAC, second port is not working as in
suspend/resume only the first slave netdev is closed and opened. So bring
down and up all the interfaces that are up during suspend/resume.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Tested-by: Nishanth Menon <nm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 13:00:05 -04:00
Jianqun
d67f660edb ASoC: rockchip-i2s: dt: swap tx and rx channed request number in example
Reference to RK3288 TRM, fix an error channel id for i2s tx and rx
Table 10-1 DMAC_BUS Request Mapping Table
Req number	Source	Polarity
0		I2S tx	High level
1		I2S rx	High level

Tested on RK3288 board.

Signed-off-by: Jianqun <jay.xu@rock-chips.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2014-09-13 09:39:58 -07:00
Linus Torvalds
645cc09381 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A small collection of fixes for the current rc series.  This contains:

   - Two small blk-mq patches from Rob Elliott, cleaning up error case
     at init time.

   - A fix from Ming Lei, fixing SG merging for blk-mq where
     QUEUE_FLAG_SG_NO_MERGE is the default.

   - A dev_t minor lifetime fix from Keith, fixing an issue where a
     minor might be reused before all references to it were gone.

   - Fix from Alan Stern where an unbalanced queue bypass caused SCSI
     some headaches when it does a series of add/del on devices without
     fully registrering the queue.

   - A fix from me for improving the scaling of tag depth in blk-mq if
     we are short on memory"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: scale depth and rq map appropriate if low on memory
  Block: fix unbalanced bypass-disable in blk_register_queue
  block: Fix dev_t minor allocation lifetime
  blk-mq: cleanup after blk_mq_init_rq_map failures
  blk-mq: pass along blk_mq_alloc_tag_set return values
  blk-merge: fix blk_recount_segments
2014-09-13 09:39:55 -07:00
Jianqun
2f1e93f81c ASoC: rockchip-i2s: fix registers' property of rockchip i2s controller
Reference rockchip I2S controller TRM, modify some registers' property
I2S_FIFOLR: read / write, but not volatile, not precious
I2S_INTSR: read / write
I2S_CLR: volatile, register value will be cleared by read

Test on RK3288 with max98090.

Signed-off-by: Jianqun Xu <jay.xu@rock-chips.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2014-09-13 09:35:43 -07:00
Jianqun
07833d8831 ASoC: rockchip-i2s: fix master mode set bit error
Fix error format set to I2S master or slave mode.
Test on RK3288 board with max98090.

Signed-off-by: Jianqun Xu <jay.xu@rock-chips.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2014-09-13 09:34:16 -07:00
Grygorii Strashko
8936decdd9 spi: davinci: request cs_gpio's from probe
Now CS GPIOs are requested from struct spi_master.setup() callback
and that causes failures when Client SPI device is getting accessed
through SPIDEV driver. The failure happens, because .setup() callback
may be called many times from IOCTL handler and when it's called
second time gpio_request() will fail and return -EBUSY.

Hence, fix it by moving CS GPIOs requesting code in .probe().

Reported-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2014-09-13 09:25:46 -07:00
Linus Torvalds
fc486b03ca Merge tag 'stable/for-linus-3.17-b-rc4-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen ARM bugfix from Stefano Stabellini:
 "The patches fix the "xen_add_mach_to_phys_entry: cannot add" bug that
  has been affecting xen on arm and arm64 guests since 3.16.  They
  require a few hypervisor side changes that just went in xen-unstable.

  A couple of days ago David sent out a pull request with a few other
  Xen fixes (it is already in master).  Sorry we didn't synchronized
  better among us"

* tag 'stable/for-linus-3.17-b-rc4-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/arm: remove mach_to_phys rbtree
  xen/arm: reimplement xen_dma_unmap_page & friends
  xen/arm: introduce XENFEAT_grant_map_identity
2014-09-12 17:45:27 -07:00
Mark Einon
0bc9b73be4 drivers: net: b44: Fix typo in returning multicast stats
nstat->multicast refers to received packets, not transmitted as
is returned here. Change it so that received packet stats are
given.

Signed-off-by: Mark Einon <mark.einon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 18:23:55 -04:00
David S. Miller
cffc6c4c94 Merge tag 'master-2014-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-09-11

Please pull this batch of fixes intended for the 3.17 stream:

For the mac80211 bits, Johannes says:

"Two more fixes for mac80211 - one of them addresses a long-standing
issue that we only found when using vendor events more frequently;
the other addresses some bad information being reported in userspace
that people were starting to actually look at."

For the iwlwifi bits, Emmanuel says:

"I re-enable scheduled scan on firmware that contain the fix for
the bug that Linus reported.  A few trivial fixes: endianity issues,
the same DTIM period fix that I did in mac80211.  Eyal fixes a few
issues we identified with EAPOL, we now send them just as if they were
management frames, this solves interrop issues.  Johannes has another
set of trivial fixes, while Luca fixes the way we configure the filters
in the firmware. Last but not least, a new device is added by Oren."

Emmanuel was traveling, resulting in his pull to be a bit larger than
I would have liked to see at this point.  FWIW, I have asked Emmanuel
to be much more strict for any more pull requests in this cycle.

In addition to the above, Sujith Manoharan reverts an earlier ath9k
patch.  The earlier change was found to allow for the device to sleep
too long and miss beacons.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 18:21:47 -04:00
David S. Miller
a7f8289d12 Merge tag 'linux-can-fixes-for-3.17-20140911' of git://gitorious.org/linux-can/linux-can
Marc Kleine-Budde says:

====================
pull-request: can 2014-09-11

this is a pull request for the current release cycle of a single patch.

The patch by David Jander fixes a scheduling while atomic problem in the
flexcan driver, that was introduced by me in v3.14-rc6.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 18:04:04 -04:00
Sabrina Dubroca
381f4dca48 ipv6: clean up anycast when an interface is destroyed
If we try to rmmod the driver for an interface while sockets with
setsockopt(JOIN_ANYCAST) are alive, some refcounts aren't cleaned up
and we get stuck on:

  unregister_netdevice: waiting for ens3 to become free. Usage count = 1

If we LEAVE_ANYCAST/close everything before rmmod'ing, there is no
problem.

We need to perform a cleanup similar to the one for multicast in
addrconf_ifdown(how == 1).

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:33:06 -04:00
David S. Miller
dcbc0054d7 Merge branch 'arc_emac'
Beniamino Galvani says:

====================
net: arc_emac: fix tx issues

the patches below solve some issues found in the tx ring reclaim
strategy currently implemented in the arc_emac driver.

Without these patches a simple outgoing UDP flow blocks almost
immediately with the socket send buffer full, until some new rx
packets trigger a clean of the tx ring.

Everything seems to work fine on a Radxa Rock with this fix applied.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:18:09 -04:00
Beniamino Galvani
74dd40bca9 net: arc_emac: prevent reuse of unreclaimed tx descriptors
This patch changes the logic in tx path to ensure that tx descriptors
are reused for transmission only after they have been reclaimed by
arc_emac_tx_clean().

Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:18:03 -04:00
Beniamino Galvani
7ce7679d6b net: arc_emac: enable tx interrupts
In the current implementation the cleaning of tx ring is done by the
NAPI poll handler, which is scheduled after rx interrupts. Thus, in
absence of received packets the reclaim of used tx buffers is never
executed, blocking further transmission.

This can be easily reproduced starting the transmission of a UDP flow
with iperf, which blocks almost immediately because skbs are not
returned to the stack and the socket send buffer becomes full.

The patch enables tx interrupts so that the tx reclaim is scheduled
after completed transmissions.

Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:17:49 -04:00
Richard Larocque
474e941bed alarmtimer: Lock k_itimer during timer callback
Locks the k_itimer's it_lock member when handling the alarm timer's
expiry callback.

The regular posix timers defined in posix-timers.c have this lock held
during timout processing because their callbacks are routed through
posix_timer_fn().  The alarm timers follow a different path, so they
ought to grab the lock somewhere else.

Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sharvil Nanavati <sharvil@google.com>
Signed-off-by: Richard Larocque <rlarocque@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-09-12 13:59:12 -07:00
Richard Larocque
265b81d23a alarmtimer: Do not signal SIGEV_NONE timers
Avoids sending a signal to alarm timers created with sigev_notify set to
SIGEV_NONE by checking for that special case in the timeout callback.

The regular posix timers avoid sending signals to SIGEV_NONE timers by
not scheduling any callbacks for them in the first place.  Although it
would be possible to do something similar for alarm timers, it's simpler
to handle this as a special case in the timeout.

Prior to this patch, the alarm timer would ignore the sigev_notify value
and try to deliver signals to the process anyway.  Even worse, the
sanity check for the value of sigev_signo is skipped when SIGEV_NONE was
specified, so the signal number could be bogus.  If sigev_signo was an
unitialized value (as it often would be if SIGEV_NONE is used), then
it's hard to predict which signal will be sent.

Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sharvil Nanavati <sharvil@google.com>
Signed-off-by: Richard Larocque <rlarocque@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-09-12 13:59:12 -07:00
Richard Larocque
e86fea7649 alarmtimer: Return relative times in timer_gettime
Returns the time remaining for an alarm timer, rather than the time at
which it is scheduled to expire.  If the timer has already expired or it
is not currently scheduled, the it_value's members are set to zero.

This new behavior matches that of the other posix-timers and the POSIX
specifications.

This is a change in user-visible behavior, and may break existing
applications.  Hopefully, few users rely on the old incorrect behavior.

Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sharvil Nanavati <sharvil@google.com>
Signed-off-by: Richard Larocque <rlarocque@google.com>
[jstultz: minor style tweak]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-09-12 13:59:11 -07:00
Andrew Hunter
d78c9300c5 jiffies: Fix timeval conversion to jiffies
timeval_to_jiffies tried to round a timeval up to an integral number
of jiffies, but the logic for doing so was incorrect: intervals
corresponding to exactly N jiffies would become N+1. This manifested
itself particularly repeatedly stopping/starting an itimer:

setitimer(ITIMER_PROF, &val, NULL);
setitimer(ITIMER_PROF, NULL, &val);

would add a full tick to val, _even if it was exactly representable in
terms of jiffies_ (say, the result of a previous rounding.)  Doing
this repeatedly would cause unbounded growth in val.  So fix the math.

Here's what was wrong with the conversion: we essentially computed
(eliding seconds)

jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)

by using scaling arithmetic, which took the best approximation of
NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
x/(2^USEC_JIFFIE_SC), and computed:

jiffies = (usec * x) >> USEC_JIFFIE_SC

and rounded this calculation up in the intermediate form (since we
can't necessarily exactly represent TICK_NSEC in usec.) But the
scaling arithmetic is a (very slight) *over*approximation of the true
value; that is, instead of dividing by (1 usec/ 1 jiffie), we
effectively divided by (1 usec/1 jiffie)-epsilon (rounding
down). This would normally be fine, but we want to round timeouts up,
and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
would be fine if our division was exact, but dividing this by the
slightly smaller factor was equivalent to adding just _over_ 1 to the
final result (instead of just _under_ 1, as desired.)

In particular, with HZ=1000, we consistently computed that 10000 usec
was 11 jiffies; the same was true for any exact multiple of
TICK_NSEC.

We could possibly still round in the intermediate form, adding
something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
convert usec->nsec, round in nanoseconds, and then convert using
time*spec*_to_jiffies.  This adds one constant multiplication, and is
not observably slower in microbenchmarks on recent x86 hardware.

Tested: the following program:

int main() {
  struct itimerval zero = {{0, 0}, {0, 0}};
  /* Initially set to 10 ms. */
  struct itimerval initial = zero;
  initial.it_interval.tv_usec = 10000;
  setitimer(ITIMER_PROF, &initial, NULL);
  /* Save and restore several times. */
  for (size_t i = 0; i < 10; ++i) {
    struct itimerval prev;
    setitimer(ITIMER_PROF, &zero, &prev);
    /* on old kernels, this goes up by TICK_USEC every iteration */
    printf("previous value: %ld %ld %ld %ld\n",
           prev.it_interval.tv_sec, prev.it_interval.tv_usec,
           prev.it_value.tv_sec, prev.it_value.tv_usec);
    setitimer(ITIMER_PROF, &prev, NULL);
  }
    return 0;
}

Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Paul Turner <pjt@google.com>
Reported-by: Aaron Jacobs <jacobsa@google.com>
Signed-off-by: Andrew Hunter <ahh@google.com>
[jstultz: Tweaked to apply to 3.17-rc]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-09-12 13:59:03 -07:00
Tejun Heo
e09c2c2954 workqueue: apply __WQ_ORDERED to create_singlethread_workqueue()
create_singlethread_workqueue() is a compat interface for single
threaded workqueue which maps to ordered workqueue w/ rescuer in the
current implementation.  create_singlethread_workqueue() currently
implemented by invoking alloc_workqueue() w/ appropriate parameters.

8719dceae2 ("workqueue: reject adjusting max_active or applying
attrs to ordered workqueues") introduced __WQ_ORDERED to protect
ordered workqueues against dynamic attribute changes which can break
ordering guarantees but forgot to apply it to
create_singlethread_workqueue().  This in itself is okay as nobody
currently uses dynamic attribute change on workqueues created with
create_singlethread_workqueue().

However, 4c16bd327c ("workqueue: implement NUMA affinity for unbound
workqueues") broke singlethreaded guarantee for ordered workqueues
through allocating a separate pool_workqueue on each NUMA node by
default.  A later change 8a2b753844 ("workqueue: fix ordered
workqueues in NUMA setups") fixed it by allocating only one global
pool_workqueue if __WQ_ORDERED is set.

Combined, the __WQ_ORDERED omission in create_singlethread_workqueue()
became critical breaking its single threadedness and ordering
guarantee.

Let's make create_singlethread_workqueue() wrap
alloc_ordered_workqueue() instead so that it inherits __WQ_ORDERED and
can implicitly track future ordered_workqueue changes.

v2: I missed that __WQ_ORDERED now protects against pwq splitting
    across NUMA nodes and incorrectly described the patch as a
    nice-to-have fix to protect against future dynamic attribute
    usages.  Oleg pointed out that this is actually a critical
    breakage due to 8a2b753844 ("workqueue: fix ordered workqueues
    in NUMA setups").

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mike Anderson <mike.anderson@us.ibm.com>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Gustavo Luiz Duarte <gduarte@redhat.com>
Cc: Tomas Henzl <thenzl@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues")
2014-09-13 05:13:08 +09:00
Thomas Gleixner
13c42c2f43 futex: Unlock hb->lock in futex_wait_requeue_pi() error path
futex_wait_requeue_pi() calls futex_wait_setup(). If
futex_wait_setup() succeeds it returns with hb->lock held and
preemption disabled. Now the sanity check after this does:

        if (match_futex(&q.key, &key2)) {
	   	ret = -EINVAL;
		goto out_put_keys;
	}

which releases the keys but does not release hb->lock.

So we happily return to user space with hb->lock held and therefor
preemption disabled.

Unlock hb->lock before taking the exit route.

Reported-by: Dave "Trinity" Jones <davej@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-09-12 22:04:36 +02:00