Commit Graph

398644 Commits

Author SHA1 Message Date
Linus Torvalds
6cccc7d301 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "This includes both the first pile of Ceph patches (which I sent to
  torvalds@vger, sigh) and a few new patches that add support for
  fscache for Ceph.  That includes a few fscache core fixes that David
  Howells asked go through the Ceph tree.  (Thanks go to Milosz Tanski
  for putting this feature together)

  This first batch of patches (included here) had (has) several
  important RBD bug fixes, hole punch support, several different
  cleanups in the page cache interactions, improvements in the truncate
  code (new truncate mutex to avoid shenanigans with i_mutex), and a
  series of fixes in the synchronous striping read/write code.

  On top of that is a random collection of small fixes all across the
  tree (error code checks and error path cleanup, obsolete wq flags,
  etc)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits)
  ceph: use d_invalidate() to invalidate aliases
  ceph: remove ceph_lookup_inode()
  ceph: trivial buildbot warnings fix
  ceph: Do not do invalidate if the filesystem is mounted nofsc
  ceph: page still marked private_2
  ceph: ceph_readpage_to_fscache didn't check if marked
  ceph: clean PgPrivate2 on returning from readpages
  ceph: use fscache as a local presisent cache
  fscache: Netfs function for cleanup post readpages
  FS-Cache: Fix heading in documentation
  CacheFiles: Implement interface to check cache consistency
  FS-Cache: Add interface to check consistency of a cached object
  rbd: fix null dereference in dout
  rbd: fix buffer size for writes to images with snapshots
  libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc
  rbd: fix I/O error propagation for reads
  ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem
  ceph: allow sync_read/write return partial successed size of read/write.
  ceph: fix bugs about handling short-read for sync read mode.
  ceph: remove useless variable revoked_rdcache
  ...
2013-09-09 09:13:22 -07:00
Linus Torvalds
255ae3fbd2 Merge tag 'metag-for-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
Pull metag architecture changes from James Hogan:
 - Device tree updates for TZ1090 GPIO drivers merged via GPIO tree.
 - Add driver for ImgTec PDC irqchip as found in TZ1090 SoC.
 - Add linux-metag mailing list to MAINTAINERS file.

* tag 'metag-for-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  irq-imgpdc: add ImgTec PDC irqchip driver
  MAINTAINERS: add linux-metag mailing list
  metag: tz1090: instantiate gpio-tz1090-pdc
  metag: tz1090: select and instantiate gpio-tz1090
  metag: tz1090: select and instantiate irq-imgpdc
2013-09-09 09:09:44 -07:00
Konrad Rzeszutek Wilk
fb78e58c27 Revert "xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM"
This reverts commit 70dd4998cb.

Now that the bugs have been resolved we can re-enable the
PV ticketlock implementation under PVHVM Xen guests.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-09-09 12:06:45 -04:00
Konrad Rzeszutek Wilk
3310bbedac xen/spinlock: Don't setup xen spinlock IPI kicker if disabled.
There is no need to setup this kicker IPI if we are never going
to use the paravirtualized ticketlock mechanism.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-09-09 12:06:38 -04:00
Konrad Rzeszutek Wilk
26a7999527 xen/smp: Update pv_lock_ops functions before alternative code starts under PVHVM
Before this patch we would patch all of the pv_lock_ops sites
using alternative assembler. Then later in the bootup cycle
change the unlock_kick and lock_spinning to the Xen specific -
without re patching.

That meant that for the core of the kernel we would be running
with the baremetal version of unlock_kick and lock_spinning while
for modules we would have the proper Xen specific slowpaths.

As most of the module uses some API from the core kernel that ended
up with slowpath lockers waiting forever to be kicked (b/c they
would be using the Xen specific slowpath logic). And the
kick never came b/c the unlock path that was taken was the
baremetal one.

On PV we do not have the problem as we initialise before the
alternative code kicks in.

The fix is to make the updating of the pv_lock_ops function
be done before the alternative code starts patching.

Note that this patch fixes issues discovered by commit
f10cd522c5.
("xen: disable PV spinlocks on HVM") wherein it mentioned

   PV spinlocks cannot possibly work with the current code because they are
   enabled after pvops patching has already been done, and because PV
   spinlocks use a different data structure than native spinlocks so we
   cannot switch between them dynamically.

The first problem is solved by this patch.

The second problem has been solved by commit
816434ec4a
(Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)

P.S.
There is still the commit 70dd4998cb
(xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM) to
revert but that can be done later after all other bugs have been
fixed.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-09-09 12:06:31 -04:00
Konrad Rzeszutek Wilk
6055aaf87d xen/spinlock: We don't need the old structure anymore
As we are using the generic ticketlock structs and these
old structures are not needed anymore.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-09-09 12:06:24 -04:00
Konrad Rzeszutek Wilk
1fb3a8b2cf xen/spinlock: Fix locking path engaging too soon under PVHVM.
The xen_lock_spinning has a check for the kicker interrupts
and if it is not initialized it will spin normally (not enter
the slowpath).

But for PVHVM case we would initialize the kicker interrupt
before the CPU came online. This meant that if the booting
CPU used a spinlock and went in the slowpath - it would
enter the slowpath and block forever. The forever part because
during bootup: the spinlock would be taken _before_ the CPU
sets itself to be online (more on this further), and we enter
to poll on the event channel forever.

The bootup CPU (see commit fc78d343fa
"xen/smp: initialize IPI vectors before marking CPU online"
for details) and the CPU that started the bootup consult
the cpu_online_mask to determine whether the booting CPU should
get an IPI. The booting CPU has to set itself in this mask via:

  set_cpu_online(smp_processor_id(), true);

However, if the spinlock is taken before this (and it is) and
it polls on an event channel - it will never be woken up as
the kernel will never send an IPI to an offline CPU.

Note that the PVHVM logic in sending IPIs is using the HVM
path which has numerous checks using the cpu_online_mask
and cpu_active_mask. See above mention git commit for details.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-09-09 12:06:16 -04:00
Konrad Rzeszutek Wilk
65320fceda Merge tag 'v3.11-rc7' into stable/for-linus-3.12
Linux 3.11-rc7

As we need the git commit 28817e9de4f039a1a8c1fe1df2fa2df524626b9e
Author: Chuck Anderson <chuck.anderson@oracle.com>
Date:   Tue Aug 6 15:12:19 2013 -0700

    xen/smp: initialize IPI vectors before marking CPU online

* tag 'v3.11-rc7': (443 commits)
  Linux 3.11-rc7
  ARC: [lib] strchr breakage in Big-endian configuration
  VFS: collect_mounts() should return an ERR_PTR
  bfs: iget_locked() doesn't return an ERR_PTR
  efs: iget_locked() doesn't return an ERR_PTR()
  proc: kill the extra proc_readfd_common()->dir_emit_dots()
  cope with potentially long ->d_dname() output for shmem/hugetlb
  usb: phy: fix build breakage
  USB: OHCI: add missing PCI PM callbacks to ohci-pci.c
  staging: comedi: bug-fix NULL pointer dereference on failed attach
  lib/lz4: correct the LZ4 license
  memcg: get rid of swapaccount leftovers
  nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection
  nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error
  drivers/platform/olpc/olpc-ec.c: initialise earlier
  ipv4: expose IPV4_DEVCONF
  ipv6: handle Redirect ICMP Message with no Redirected Header option
  be2net: fix disabling TX in be_close()
  Revert "ACPI / video: Always call acpi_video_init_brightness() on init"
  Revert "genetlink: fix family dump race"
  ...

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-09-09 12:05:37 -04:00
Linus Torvalds
89c5a9461d Merge tag 'arc-v3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC changes from Vineet Gupta:

 - ARC MM changes:
    - preparation for MMUv4 (accomodate new PTE bits, new cmds)
    - Rework the ASID allocation algorithm to remove asid-mm reverse map
 - Boilerplate code consolidation in Exception Handlers
 - Disable FRAME_POINTER for ARC
 - Unaligned Access Emulation for Big-Endian from Noam
 - Bunch of fixes (udelay, missing accessors) from Mischa

* tag 'arc-v3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: fix new Section mismatches in build (post __cpuinit cleanup)
  Kconfig.debug: Add FRAME_POINTER anti-dependency for ARC
  ARC: Fix __udelay calculation
  ARC: remove console_verbose() from setup_arch()
  ARC: Add read*_relaxed to asm/io.h
  ARC: Handle un-aligned user space access in BE.
  ARC: [ASID] Track ASID allocation cycles/generations
  ARC: [ASID] activate_mm() == switch_mm()
  ARC: [ASID] get_new_mmu_context() to conditionally allocate new ASID
  ARC: [ASID] Refactor the TLB paranoid debug code
  ARC: [ASID] Remove legacy/unused debug code
  ARC: No need to flush the TLB in early boot
  ARC: MMUv4 preps/3 - Abstract out TLB Insert/Delete
  ARC: MMUv4 preps/2 - Reshuffle PTE bits
  ARC: MMUv4 preps/1 - Fold PTE K/U access flags
  ARC: Code cosmetics (Nothing semantical)
  ARC: Entry Handler tweaks: Optimize away redundant IRQ_DISABLE_SAVE
  ARC: Exception Handlers Code consolidation
  ARC: Add some .gitignore entries
2013-09-09 09:05:33 -07:00
Bartlomiej Zolnierkiewicz
2bc552df76 of/platform: add error reporting to of_amba_device_create()
Add error reporting to of_amba_device_create() so the user knows
when (and why) some device tree nodes fail to initialize.

[ The issue was spotted on Universal C210 board (using revision 0 of
  ARM Exynos4210 SoC) on which initialization was silently failing
  for PL330 MDMA1 device tree node (it was using the wrong addres
  resulting in amba_device_add() returning -ENODEV). ]

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-09-09 17:04:52 +01:00
Linus Torvalds
833ae40b51 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu fixes from Greg Ungerer:
 "Just a small collection of cleanups and fixes this time, no big
  changes.  The most interresting are to make the m68k and m68knommu
  consistently use CONFIG_IOMAP, clean out some unused board config
  options and flush the cache on signal stack creation"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: remove 16 unused boards in Kconfig.machine
  m68k: define 'VM_DATA_DEFAULT_FLAGS' no matter whether has 'NOMMU' or not
  m68knommu: user generic iomap to support ioread*/iowrite*
  m68k/coldfire: flush cache when creating the signal stack frame
  m68knommu: Mark functions only called from setup_arch() __init
2013-09-09 09:04:46 -07:00
Linus Torvalds
20e029d791 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
 "This pile contains mostly fixes and improvements for issues identified
  by Richard W M Jones while adding UML as backend to libguestfs"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Add irq chip um/mask handlers
  um: prctl: Do not include linux/ptrace.h
  um: Run UML in it's own session.
  um: Cleanup SIGTERM handling
  um: ubd: Introduce submit_request()
  um: ubd: Add REQ_FLUSH suppport
  um: Implement probe_kernel_read()
  um: hostfs: Fix writeback
2013-09-09 09:03:46 -07:00
Yijing Wang
d84ff46a9e irq/of: Fix comment typo for irq_of_parse_and_map
Fix trivial comment typo for irq_of_parse_and_map().

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-09-09 17:03:19 +01:00
Konrad Rzeszutek Wilk
c3f31f6a6f Merge branch 'x86/spinlocks' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into stable/for-linus-3.12
* 'x86/spinlocks' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared as static"
  kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  kvm guest: Add configuration support to enable debug information for KVM Guests
  kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi
  xen, pvticketlock: Allow interrupts to be enabled while blocking
  x86, ticketlock: Add slowpath logic
  jump_label: Split jumplabel ratelimit
  x86, pvticketlock: When paravirtualizing ticket locks, increment by 2
  x86, pvticketlock: Use callee-save for lock_spinning
  xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
  xen, pvticketlock: Xen implementation for PV ticket locks
  xen: Defer spinlock setup until boot CPU setup
  x86, ticketlock: Collapse a layer of functions
  x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
  x86, spinlock: Replace pv spinlocks with pv ticketlocks
2013-09-09 12:01:15 -04:00
Julien Grall
e1a9c16b30 xen/arm: disable cpuidle and cpufreq when linux is running as dom0
When linux is running as dom0, Xen doesn't show the physical cpu but a
virtual CPU.
On some ARM SOC (for instance the exynos 5250), linux registers callbacks
for cpuidle and cpufreq. When these callbacks are called, they will modify
directly the physical cpu not the virtual one. It can impact the whole board
instead of only dom0.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-09-09 11:35:26 +00:00
Boris Ostrovsky
d7f8f48d1e xen/p2m: Don't call get_balloon_scratch_page() twice, keep interrupts disabled for multicalls
m2p_remove_override() calls get_balloon_scratch_page() in
MULTI_update_va_mapping() even though it already has pointer to this page from
the earlier call (in scratch_page). This second call doesn't have a matching
put_balloon_scratch_page() thus not restoring preempt count back. (Also, there
is no put_balloon_scratch_page() in the error path.)

In addition, the second multicall uses __xen_mc_entry() which does not disable
interrupts. Rearrange xen_mc_* calls to keep interrupts off while performing
multicalls.

This commit fixes a regression introduced by:

commit ee0726407f
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Tue Jul 23 17:23:54 2013 +0000

    xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2013-09-09 10:50:52 +00:00
Rob Herring
9dd4b2944c ARM: xen: only set pm function ptrs for Xen guests
xen_pm_init was unconditionally setting pm_power_off and arm_pm_restart
function pointers. This breaks multi-platform kernels. Make this
conditional on running as a Xen guest and make it a late_initcall to
ensure it is setup after platform code for Dom0.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: stable@vger.kernel.org
2013-09-09 10:50:26 +00:00
Heiko Carstens
9e75c6274a s390/irq: reduce size of external interrupt handler hash array
Change the hash algorithm a bit so it produces only values in the
range of 0..31.
This allows to reduce the size of the external interrupt handler hash
array even further while making sure that each of the known interrupt
sources keeps its unique hash with the slightly modified algorithm:

0x1004 --> 12
0x1201 --> 10
0x1202 --> 11
0x1406 --> 16
0x1407 --> 17
0x2401 --> 19
0x2603 --> 22
0x4000 --> 0

This also means that the entire array now fits into exactly one cache
line; so add a proper align statement as well.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-09 08:57:32 +02:00
Ian Kent
ac83871996 autofs4 - fix device ioctl mount lookup
When reconnecting to automounts at startup an autofs ioctl is used
to find the device and inode of existing mounts so they can be used
to open a file descriptor of possibly covered mounts.

At this time the the caller might not yet "own" the mount so it can
trigger calling ->d_automount(). This causes automount to hang when
trying to reconnect to direct or offset mount types.

Consequently kern_path() can't be used but kern_path_mountpoint() can be.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-08 22:07:47 -04:00
Linus Torvalds
e5c832d555 vfs: fix dentry RCU to refcounting possibly sleeping dput()
This is the fix that the last two commits indirectly led up to - making
sure that we don't call dput() in a bad context on the dentries we've
looked up in RCU mode after the sequence count validation fails.

This basically expands d_rcu_to_refcount() into the callers, and then
fixes the callers to delay the dput() in the failure case until _after_
we've dropped all locks and are no longer in an RCU-locked region.

The case of 'complete_walk()' was trivial, since its failure case did
the unlock_rcu_walk() directly after the call to d_rcu_to_refcount(),
and as such that is just a pure expansion of the function with a trivial
movement of the resulting dput() to after 'unlock_rcu_walk()'.

In contrast, the unlazy_walk() case was much more complicated, because
not only does convert two different dentries from RCU to be reference
counted, but it used to not call unlock_rcu_walk() at all, and instead
just returned an error and let the caller clean everything up in
"terminate_walk()".

Happily, one of the dentries in question (called "parent" inside
unlazy_walk()) is the dentry of "nd->path", which terminate_walk() wants
a refcount to anyway for the non-RCU case.

So what the new and improved unlazy_walk() does is to first turn that
dentry into a refcounted one, and once that is set up, the error cases
can continue to use the terminate_walk() helper for cleanup, but for the
non-RCU case.  Which makes it possible to drop out of RCU mode if we
actually hit the sequence number failure case.

Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-08 18:13:49 -07:00
Aaron Lu
9e266ece21 virtio_pci: pm: Use CONFIG_PM_SLEEP instead of CONFIG_PM
The virtio_pci_freeze/restore are defined under CONFIG_PM but is used
by SET_SYSTEM_SLEEP_PM_OPS macro, which is defined under
CONFIG_PM_SLEEP. So if CONFIG_PM_SLEEP is not cofigured but
CONFIG_PM_RUNTIME is, the following warning message appeared:

drivers/virtio/virtio_pci.c:770:12: warning: ‘virtio_pci_freeze’ defined but not used [-Wunused-function]
 static int virtio_pci_freeze(struct device *dev)
            ^
drivers/virtio/virtio_pci.c:790:12: warning: ‘virtio_pci_restore’ defined but not used [-Wunused-function]
 static int virtio_pci_restore(struct device *dev)
            ^
Fix it by changing CONFIG_PM to CONFIG_PM_SLEEP.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-09-09 10:02:53 +09:30
Al Viro
2d86465101 introduce kern_path_mountpoint()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-08 20:20:23 -04:00
Al Viro
197df04c74 rename user_path_umountat() to user_path_mountpoint_at()
... and move the extern from linux/namei.h to fs/internal.h,
along with that of vfs_path_lookup().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-08 20:20:21 -04:00
Al Viro
35759521ee take unlazy_walk() into umount_lookup_last()
... and massage it a bit to reduce nesting

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-08 20:20:19 -04:00
Linus Torvalds
0d98439ea3 vfs: use lockred "dead" flag to mark unrecoverably dead dentries
This simplifies the RCU to refcounting code in particular.

I was originally intending to leave this for later, but walking through
all the dput() logic (see previous commit), I realized that the dput()
"might_sleep()" check was misleadingly weak.  And I removed it as
misleading, both for performance profiling and for debugging.

However, the might_sleep() debugging case is actually true: the final
dput() can indeed sleep, if the inode of the dentry that you are
releasing ends up sleeping at iput time (see dentry_iput()).  So the
problem with the might_sleep() in dput() wasn't that it wasn't true, it
was that it wasn't actually testing and triggering on the interesting
case.

In particular, just about *any* dput() can indeed sleep, if you happen
to race with another thread deleting the file in question, and you then
lose the race to the be the last dput() for that file.  But because it's
a very rare race, the debugging code would never trigger it in practice.

Why is this problematic? The new d_rcu_to_refcount() (see commit
15570086b5: "vfs: reimplement d_rcu_to_refcount() using
lockref_get_or_lock()") does a dput() for the failure case, and it does
it under the RCU lock.  So potentially sleeping really is a bug.

But there's no way I'm going to fix this with the previous complicated
"lockref_get_or_lock()" interface.  And rather than revert to the old
and crufty nested dentry locking code (which did get this right by
delaying the reference count updates until they were verified to be
safe), let's make forward progress.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-08 13:46:52 -07:00
Linus Torvalds
8aab6a2733 vfs: reorganize dput() memory accesses
This is me being a bit OCD after all the dentry optimization work this
merge window: profiles end up showing 'dput()' as a rather expensive
operation, and there were two unrelated bad reasons for that.

The first reason was reading d_lockref.count for debugging purposes,
which touches the lockref cacheline (for reads) before really need to.
More importantly, the debugging test in question is _wrong_, and has
hidden bugs.  It's true that we can only sleep when the count goes down
to zero, but the test as-is hides the much more subtle bug that happens
if we race with somebody else deleting the file.

Anyway we _will_ touch that cacheline, but let's do it for a write and
in the right routine (ie in "lockref_put_or_lock()") which annotates the
costs better.  So remove the misleading debug code.

The other was an unnecessary access to the cacheline that contains the
d_lru list, just to check whether we already were on the LRU list or
not.  This is exactly what we have d_flags for, so that we can avoid
touching extra cache lines for the common case.  So just add another bit
for "is this dentry on the LRU".

Finally, mark the tests properly likely/unlikely, so that the common
fast-paths are dense in the instruction stream.

This makes the profiles look much saner.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-08 13:26:18 -07:00
Linus Torvalds
b409624ad5 Merge git://git.infradead.org/users/willy/linux-nvme
Pull NVM Express driver update from Matthew Wilcox.

* git://git.infradead.org/users/willy/linux-nvme:
  NVMe: Merge issue on character device bring-up
  NVMe: Handle ioremap failure
  NVMe: Add pci suspend/resume driver callbacks
  NVMe: Use normal shutdown
  NVMe: Separate controller init from disk discovery
  NVMe: Separate queue alloc/free from create/delete
  NVMe: Group pci related actions in functions
  NVMe: Disk stats for read/write commands only
  NVMe: Bring up cdev on set feature failure
  NVMe: Fix checkpatch issues
  NVMe: Namespace IDs are unsigned
  NVMe: Update nvme_id_power_state with latest spec
  NVMe: Split header file into user-visible and kernel-visible pieces
  NVMe: Call nvme_process_cq from submission path
  NVMe: Remove "process_cq did something" message
  NVMe: Return correct value from interrupt handler
  NVMe: Disk IO statistics
  NVMe: Restructure MSI / MSI-X setup
  NVMe: Use kzalloc instead of kmalloc+memset
2013-09-07 20:19:02 -07:00
Linus Torvalds
c4c1725228 Merge tag 'ntb-3.12' of git://github.com/jonmason/ntb
Pull NTB (non-transparent bridge) updates from Jon Mason:
 "NTB driver bug fixes to address issues in NTB-RP enablement, spad,
  debugfs, and USD/DSD identification.

  Add a workaround on Xeon NTB devices for b2bdoorbell errata.  Also,
  add new NTB driver features to support 32bit x86, DMA engine support,
  and NTB-RP support.

  Finally, a few clean-ups and update to MAINTAINERS for the NTB git
  tree and wiki location"

* tag 'ntb-3.12' of git://github.com/jonmason/ntb:
  ntb: clean up unnecessary MSI/MSI-X capability find
  MAINTAINERS: Add Website and Git Tree for NTB
  NTB: Update Version
  NTB: Comment Fix
  NTB: Remove unused variable
  NTB: Remove References of non-B2B BWD HW
  NTB: NTB-RP support
  NTB: Rename Variables for NTB-RP
  NTB: Use DMA Engine to Transmit and Receive
  NTB: Enable 32bit Support
  NTB: Update Device IDs
  NTB: BWD Link Recovery
  NTB: Xeon Errata Workaround
  NTB: Correct debugfs to work with more than 1 NTB Device
  NTB: Correct USD/DSD Identification
  NTB: Correct Number of Scratch Pad Registers
  NTB: Add Error Handling in ntb_device_setup
2013-09-07 20:17:44 -07:00
Linus Torvalds
8de4651abe Merge tag 'mfd-3.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-next
Pull MFD (multi-function device) updates from Samuel Ortiz:
 "For the 3.12 merge window we have one new driver for the DA9063 PMIC
  from Dialog Semiconductor.

  Besides that driver we also have:

   - Device tree support for the s2mps11 driver

   - More devm_* conversion for the pm8921, max89xx, menelaus, tps65010,
     wl1273 and pcf50633-adc drivers.

   - A conversion to threaded IRQ and IRQ domain for the twl6030 driver.

   - A fairly big update for the rtsx driver: Better power saving
     support, better vendor settings handling, and a few fixes.

   - Support for a couple more boards (COMe-bHL6 and COMe-cTH6) for the
     Kontron driver.

   - A conversion to the dev_get_platdata() API for all MFD drivers.

   - A removal of non-DT (legacy) support for the twl6040 driver.

   - A few fixes and additions (Mic detect level) to the wm5110 register
     tables.

   - Regmap support for the davinci_voicecodec driver.

   - The usual bunch of minor cleanups and janitorial fixes"

* tag 'mfd-3.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-next: (81 commits)
  mfd: ucb1x00-core: Rewrite ucb1x00_add_dev()
  mfd: ab8500-debugfs: Apply a check for -ENOMEM after allocating memory for event name
  mfd: ab8500-debugfs: Apply a check for -ENOMEM after allocating memory for sysfs
  mfd: timberdale: Use module_pci_driver
  mfd: timberdale: Remove redundant break
  mfd: timberdale: Staticize local variables
  mfd: ab8500-debugfs: Staticize local variables
  mfd: db8500-prcmu: Staticize clk_mgt
  mfd: db8500-prcmu: Use ANSI function declaration
  mfd: omap-usb-host: Staticize usbhs_driver_name
  mfd: 88pm805: Fix potential NULL pdata dereference
  mfd: 88pm800: Fix potential NULL pdata dereference
  mfd: twl6040: Use regmap for register cache
  mfd: davinci_voicecodec: Provide a regmap for register I/O
  mfd: davinci_voicecodec: Remove unused read and write functions
  mmc: memstick: rtsx: Modify copyright comments
  mmc: rtsx: Clear SD_CLK toggle enable bit if switching voltage fail
  mfd: mmc: rtsx: Change default tx phase
  mfd: pcf50633-adc: Use devm_*() functions
  mfd: rtsx: Copyright modifications
  ...
2013-09-07 20:14:19 -07:00
Linus Torvalds
327fff3e13 Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull misc kbuild updates from Michal Marek:
 "In the kbuild misc branch, I have:
   - make rpm-pkg updates, most importantly the rpm package now calls
     /sbin/installkernel
   - make deb-pkg: debuginfo split, correct kernel image path for
     parisc, mips and powerpc and a couple more minor fixes
   - New coccinelle check"

* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  scripts/checkkconfigsymbols.sh: replace echo -e with printf
  Provide version number for Debian firmware package
  coccinelle: replace 0/1 with false/true in functions returning bool
  deb-pkg: add a hook argument to match debian hooks parameters
  deb-pkg: fix installed image path on parisc, mips and powerpc
  deb-pkg: split debug symbols in their own package
  deb-pkg: use KCONFIG_CONFIG instead of .config file directly
  rpm-pkg: add generation of kernel-devel
  rpm-pkg: install firmware files in kernel relative directory
  rpm-pkg: add %post section to create initramfs and grub hooks
2013-09-07 19:47:35 -07:00
Linus Torvalds
1ff5e37e72 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild update from Michal Marek:
 "Only these two commits are in the kbuild branch this time:
   - Using filechk for include/config/kernel.release
   - Cleanup in scripts/sortextable.c"

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: Do not overwrite include/config/kernel.release needlessly
  scripts: remove unused function in sortextable.c
2013-09-07 19:46:50 -07:00
Al Viro
4e10f3c988 Kill indirect include of file.h from eventfd.h, use fdget() in cgroup.c
kernel/cgroup.c is the only place in the tree that relies on eventfd.h
pulling file.h; move that include there.  Switch from eventfd_fget()/fput()
to fdget()/fdput(), while we are at it - eventfd_ctx_fileget() will fail
on non-eventfd descriptors just fine, no need to do that check twice...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-07 19:54:57 -04:00
Al Viro
d040790391 prune_super(): sb->s_op is never NULL
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-07 19:54:56 -04:00
Al Viro
dfc59e2c90 exportfs: don't assume that ->iterate() won't feed us too long entries
On some filesystems it's impossible even with fs corruption, but we'd
better not rely on that, what with memcpy() into on-stack array we
are doing there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-07 19:54:55 -04:00
Al Viro
5d8943b04b afs: get rid of redundant ->d_name.len checks
No dentry can get to directory modification methods without
having passed either ->lookup() or ->atomic_open(); if name is
rejected by those two (or by ->d_hash()) with an error, it won't
be seen by anything else.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-07 19:54:55 -04:00
Linus Torvalds
e7d33bb5ea lockref: add ability to mark lockrefs "dead"
The only actual current lockref user (dcache) uses zero reference counts
even for perfectly live dentries, because it's a cache: there may not be
any users, but that doesn't mean that we want to throw away the dentry.

At the same time, the dentry cache does have a notion of a truly "dead"
dentry that we must not even increment the reference count of, because
we have pruned it and it is not valid.

Currently that distinction is not visible in the lockref itself, and the
dentry cache validation uses "lockref_get_or_lock()" to either get a new
reference to a dentry that already had existing references (and thus
cannot be dead), or get the dentry lock so that we can then verify the
dentry and increment the reference count under the lock if that
verification was successful.

That's all somewhat complicated.

This adds the concept of being "dead" to the lockref itself, by simply
using a count that is negative.  This allows a usage scenario where we
can increment the refcount of a dentry without having to validate it,
and pushing the special "we killed it" case into the lockref code.

The dentry code itself doesn't actually use this yet, and it's probably
too late in the merge window to do that code (the dentry_kill() code
with its "should I decrement the count" logic really is pretty complex
code), but let's introduce the concept at the lockref level now.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-07 15:49:18 -07:00
Weston Andros Adamson
b1b3e13694 NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
Commit 97431204ea introduced a regression
that causes SECINFO_NO_NAME to fail without sending an RPC if:

 1) the nfs_client's rpc_client is using krb5i/p (now tried by default)
 2) the current user doesn't have valid kerberos credentials

This situation is quite common - as of now a sec=sys mount would use
krb5i for the nfs_client's rpc_client and a user would hardly be faulted
for not having run kinit.

The solution is to use the machine cred when trying to use an integrity
protected auth flavor for SECINFO_NO_NAME.

Older servers may not support using the machine cred or an integrity
protected auth flavor for SECINFO_NO_NAME in every circumstance, so we fall
back to using the user's cred and the filesystem's auth flavor in this case.

We run into another problem when running against linux nfs servers -
they return NFS4ERR_WRONGSEC when using integrity auth flavor (unless the
mount is also that flavor) even though that is not a valid error for
SECINFO*.  Even though it's against spec, handle WRONGSEC errors on
SECINFO_NO_NAME by falling back to using the user cred and the
filesystem's auth flavor.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 18:39:25 -04:00
Trond Myklebust
0aea92bf67 NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
Also don't worry about obsolete mount flags...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 18:38:04 -04:00
Linus Torvalds
44a0cf9292 lockref: fix docbook argument names
The code got rewritten, but the comments got copied as-is from older
versions, and as a result the argument name in the comment didn't
actually match the code any more.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-07 15:30:29 -07:00
Trond Myklebust
47040da3c7 NFSv4: Allow security autonegotiation for submounts
In cases where the parent super block was not mounted with a 'sec=' line,
allow autonegotiation of security for the submounts.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 17:52:42 -04:00
Trond Myklebust
41d058c3ba NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
Ensure that nfs4_proc_lookup_common respects the NFS_MOUNT_SECFLAVOUR
flag.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 17:52:42 -04:00
Linus Torvalds
dc0755cdb1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 2 (of many) from Al Viro:
 "Mostly Miklos' series this time"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  constify dcache.c inlined helpers where possible
  fuse: drop dentry on failed revalidate
  fuse: clean up return in fuse_dentry_revalidate()
  fuse: use d_materialise_unique()
  sysfs: use check_submounts_and_drop()
  nfs: use check_submounts_and_drop()
  gfs2: use check_submounts_and_drop()
  afs: use check_submounts_and_drop()
  vfs: check unlinked ancestors before mount
  vfs: check submounts and drop atomically
  vfs: add d_walk()
  vfs: restructure d_genocide()
2013-09-07 14:36:57 -07:00
Linus Torvalds
c7c4591db6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace changes from Eric Biederman:
 "This is an assorted mishmash of small cleanups, enhancements and bug
  fixes.

  The major theme is user namespace mount restrictions.  nsown_capable
  is killed as it encourages not thinking about details that need to be
  considered.  A very hard to hit pid namespace exiting bug was finally
  tracked and fixed.  A couple of cleanups to the basic namespace
  infrastructure.

  Finally there is an enhancement that makes per user namespace
  capabilities usable as capabilities, and an enhancement that allows
  the per userns root to nice other processes in the user namespace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns:  Kill nsown_capable it makes the wrong thing easy
  capabilities: allow nice if we are privileged
  pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
  userns: Allow PR_CAPBSET_DROP in a user namespace.
  namespaces: Simplify copy_namespaces so it is clear what is going on.
  pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
  sysfs: Restrict mounting sysfs
  userns: Better restrictions on when proc and sysfs can be mounted
  vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces
  kernel/nsproxy.c: Improving a snippet of code.
  proc: Restrict mounting the proc filesystem
  vfs: Lock in place mounts from more privileged users
2013-09-07 14:35:32 -07:00
Linus Torvalds
11c7b03d42 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Nothing major for this kernel, just maintenance updates"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits)
  apparmor: add the ability to report a sha1 hash of loaded policy
  apparmor: export set of capabilities supported by the apparmor module
  apparmor: add the profile introspection file to interface
  apparmor: add an optional profile attachment string for profiles
  apparmor: add interface files for profiles and namespaces
  apparmor: allow setting any profile into the unconfined state
  apparmor: make free_profile available outside of policy.c
  apparmor: rework namespace free path
  apparmor: update how unconfined is handled
  apparmor: change how profile replacement update is done
  apparmor: convert profile lists to RCU based locking
  apparmor: provide base for multiple profiles to be replaced at once
  apparmor: add a features/policy dir to interface
  apparmor: enable users to query whether apparmor is enabled
  apparmor: remove minimum size check for vmalloc()
  Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes
  Smack: network label match fix
  security: smack: add a hash table to quicken smk_find_entry()
  security: smack: fix memleak in smk_write_rules_list()
  xattr: Constify ->name member of "struct xattr".
  ...
2013-09-07 14:34:07 -07:00
Linus Torvalds
6be48f2940 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 3.12:

   - Added MODULE_SOFTDEP to allow pre-loading of modules.
   - Reinstated crct10dif driver using the module softdep feature.
   - Allow via rng driver to be auto-loaded.

   - Split large input data when necessary in nx.
   - Handle zero length messages correctly for GCM/XCBC in nx.
   - Handle SHA-2 chunks bigger than block size properly in nx.

   - Handle unaligned lengths in omap-aes.
   - Added SHA384/SHA512 to omap-sham.
   - Added OMAP5/AM43XX SHAM support.
   - Added OMAP4 TRNG support.

   - Misc fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (66 commits)
  Reinstate "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework"
  hwrng: via - Add MODULE_DEVICE_TABLE
  crypto: fcrypt - Fix bitoperation for compilation with clang
  crypto: nx - fix SHA-2 for chunks bigger than block size
  crypto: nx - fix GCM for zero length messages
  crypto: nx - fix XCBC for zero length messages
  crypto: nx - fix limits to sg lists for AES-CCM
  crypto: nx - fix limits to sg lists for AES-XCBC
  crypto: nx - fix limits to sg lists for AES-GCM
  crypto: nx - fix limits to sg lists for AES-CTR
  crypto: nx - fix limits to sg lists for AES-CBC
  crypto: nx - fix limits to sg lists for AES-ECB
  crypto: nx - add offset to nx_build_sg_lists()
  padata - Register hotcpu notifier after initialization
  padata - share code between CPU_ONLINE and CPU_DOWN_FAILED, same to CPU_DOWN_PREPARE and CPU_UP_CANCELED
  hwrng: omap - reorder OMAP TRNG driver code
  crypto: omap-sham - correct dma burst size
  crypto: omap-sham - Enable Polling mode if DMA fails
  crypto: tegra-aes - bitwise vs logical and
  crypto: sahara - checking the wrong variable
  ...
2013-09-07 14:31:18 -07:00
Linus Torvalds
0ffb01d9de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A quick set of fixes, some to deal with fallout from yesterday's
  net-next merge.

   1) Fix compilation of bnx2x driver with CONFIG_BNX2X_SRIOV disabled,
      from Dmitry Kravkov.

   2) Fix a bnx2x regression caused by one of Dave Jones's mistaken
      braces changes, from Eilon Greenstein.

   3) Add some protective filtering in the netlink tap code, from Daniel
      Borkmann.

   4) Fix TCP congestion window growth regression after timeouts, from
      Yuchung Cheng.

   5) Correctly adjust TCP's rcv_ssthresh for out of order packets, from
      Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  tcp: properly increase rcv_ssthresh for ofo packets
  net: add documentation for BQL helpers
  mlx5: remove unused MLX5_DEBUG param in Kconfig
  bnx2x: Restore a call to config_init
  bnx2x: fix broken compilation with CONFIG_BNX2X_SRIOV is not set
  tcp: fix no cwnd growth after timeout
  net: netlink: filter particular protocols from analyzers
2013-09-07 14:27:46 -07:00
Trond Myklebust
5e6b19901b NFSv4: Fix security auto-negotiation
NFSv4 security auto-negotiation has been broken since
commit 4580a92d44 (NFS:
Use server-recommended security flavor by default (NFSv3))
because nfs4_try_mount() will automatically select AUTH_SYS
if it sees no auth flavours.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
2013-09-07 16:18:30 -04:00
Trond Myklebust
19e7b8d240 NFS: Clean up nfs_parse_security_flavors()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 16:12:45 -04:00
Trond Myklebust
74c9881162 NFS: Clean up the auth flavour array mess
What is the point of having a 'auth_flavor_len' field, if it is
always set to 1, and can't be used to determine if the user has
selected an auth flavour?
This cleanup goes back to using auth_flavor_len for its original
intended purpose, and gets rid of the ad-hoc replacements.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 16:11:24 -04:00
Linus Torvalds
7b4022fa17 Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
Pull hwmon fixes from Jean Delvare.

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  hwmon: (emc6w201) Do not declare enum variable
  hwmon: (w83792d) Update module author
2013-09-07 10:54:19 -07:00