Commit Graph

398425 Commits

Author SHA1 Message Date
Alex Williamson
8b27ee60bf vfio-pci: PCI hot reset interface
The current VFIO_DEVICE_RESET interface only maps to PCI use cases
where we can isolate the reset to the individual PCI function.  This
means the device must support FLR (PCIe or AF), PM reset on D3hot->D0
transition, device specific reset, or be a singleton device on a bus
for a secondary bus reset.  FLR does not have widespread support,
PM reset is not very reliable, and bus topology is dictated by the
system and device design.  We need to provide a means for a user to
induce a bus reset in cases where the existing mechanisms are not
available or not reliable.

This device specific extension to VFIO provides the user with this
ability.  Two new ioctls are introduced:
 - VFIO_DEVICE_PCI_GET_HOT_RESET_INFO
 - VFIO_DEVICE_PCI_HOT_RESET

The first provides the user with information about the extent of
devices affected by a hot reset.  This is essentially a list of
devices and the IOMMU groups they belong to.  The user may then
initiate a hot reset by calling the second ioctl.  We must be
careful that the user has ownership of all the affected devices
found via the first ioctl, so the second ioctl takes a list of file
descriptors for the VFIO groups affected by the reset.  Each group
must have IOMMU protection established for the ioctl to succeed.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-09-04 11:28:04 -06:00
Alex Williamson
3bc4f3993b Merge remote branch 'origin/master' into next-merge 2013-09-04 11:25:44 -06:00
Alexander Sverdlin
c08751c851 net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4
net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4

Initially the problem was observed with ipsec, but later it became clear that
SCTP data chunk fragmentation algorithm has problems with MTU values which are
not multiple of 4. Test program was used which just transmits 2000 bytes long
packets to other host. tcpdump was used to observe re-fragmentation in IP layer
after SCTP already fragmented data chunks.

With MTU 1500:
12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500)
    10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596)
    10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
    10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

With MTU 1499:
13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492)
    10.151.38.153.39084 > 10.151.24.91.54321: sctp[|sctp]
13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28)
    10.151.38.153 > 10.151.24.91: ip-proto-132
13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
    10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0]
13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
    10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

Here problem in data portion limit calculation leads to re-fragmentation in IP,
which is sub-optimal. The problem is max_data initial value, which doesn't take
into account the fact, that data chunk must be padded to 4-bytes boundary.
It's enough to correct max_data, because all later adjustments are correctly
aligned to 4-bytes boundary.

After the fix is applied, everything is fragmented correctly for uneven MTUs:
15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496)
    10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
    10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
    10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

The bug was there for years already, but
 - is a performance issue, the packets are still transmitted
 - doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438)

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 13:20:27 -04:00
Julia Lawall
0a171933a4 drivers:net: delete premature free_irq
Free_irq is not needed if there has been no request_irq.  Free_irq is
removed from both the probe and remove functions.  The correct request_irq
and free_irq are found in the open and close functions.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
@@

*e = platform_get_irq(...);
... when != request_irq(e,...)
*free_irq(e,...)
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 13:18:19 -04:00
Carlos O'Donell
cfd280c912 net: sync some IP headers with glibc
Solution:
=========

- Synchronize linux's `include/uapi/linux/in6.h'
  with glibc's `inet/netinet/in.h'.
- Synchronize glibc's `inet/netinet/in.h with linux's
  `include/uapi/linux/in6.h'.
- Allow including the headers in either other.
- First header included defines the structures and macros.

Details:
========

The kernel promises not to break the UAPI ABI so I don't
see why we can't just have the two userspace headers
coordinate?

If you include the kernel headers first you get those,
and if you include the glibc headers first you get those,
and the following patch arranges a coordination and
synchronization between the two.

Let's handle `include/uapi/linux/in6.h' from linux,
and `inet/netinet/in.h' from glibc and ensure they compile
in any order and preserve the required ABI.

These two patches pass the following compile tests:

cat >> test1.c <<EOF
int main (void) {
  return 0;
}
EOF
gcc -c test1.c

cat >> test2.c <<EOF
int main (void) {
  return 0;
}
EOF
gcc -c test2.c

One wrinkle is that the kernel has a different name for one of
the members in ipv6_mreq. In the kernel patch we create a macro
to cover the uses of the old name, and while that's not entirely
clean it's one of the best solutions (aside from an anonymous
union which has other issues).

I've reviewed the code and it looks to me like the ABI is
assured and everything matches on both sides.

Notes:
- You want netinet/in.h to include bits/in.h as early as possible,
  but it needs in_addr so define in_addr early.
- You want bits/in.h included as early as possible so you can use
  the linux specific code to define __USE_KERNEL_DEFS based on
  the _UAPI_* macro definition and use those to cull in.h.
- glibc was missing IPPROTO_MH, added here.

Compile tested and inspected.

Reported-by: Thomas Backlund <tmb@mageia.org>
Cc: Thomas Backlund <tmb@mageia.org>
Cc: libc-alpha@sourceware.org
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 13:12:43 -04:00
Dan Carpenter
42a5a5c128 sfc: check for allocation failure
It upsets static analyzers when we don't check for allocation failure.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 13:07:47 -04:00
Alex Williamson
17638db1b8 vfio-pci: Test for extended config space
Having PCIe/PCI-X capability isn't enough to assume that there are
extended capabilities.  Both specs define that the first capability
header is all zero if there are no extended capabilities.  Testing
for this avoids an erroneous message about hiding capability 0x0 at
offset 0x100.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-09-04 10:58:52 -06:00
H. Peter Anvin
f2a7b303d6 x86, paravirt: Remove duplicate definition for DEF_NATIVE
DEF_NATIVE() is defined in paravirt_types.h, remove duplicate
definition in paravirt.c

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andi Kleen <ak@linux.kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/CA%2B55aFxVv==DC0JdS87V%2BcPr-twN%2BTujYg5XmgHOjJOAkZ4xwQ@mail.gmail.com
2013-09-04 09:46:43 -07:00
David S. Miller
b163b42fd2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
This series contains updates to igb only.

Todd provides a fix for igb to not look for a PBA in the iNVM on
devices that are flashless.

Akeem provides igb patches to add a new PHY id for i354, as well as
a couple of patches to implement the new PHY id.  He also provides
several patches to correctly report the appropriate media type as
well as correctly report advertised/supported link for i354 devices.
Lastly Akeem implements a 1 second delay mechanism for i210 devices
to avoid erroneous link issue with the link partner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 12:40:37 -04:00
Linus Torvalds
cb3e4330e6 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "Misc smaller fixes:

   - a parse_setup_data() boot crash fix

   - a memblock and an __early_ioremap cleanup

   - turn the always-on CONFIG_ARCH_MEMORY_PROBE=y into a configurable
     option and turn it off - it's an unrobust debug facility, it
     shouldn't be enabled by default"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: avoid remapping data in parse_setup_data()
  x86: Use memblock_set_current_limit() to set limit for memblock.
  mm: Remove unused variable idx0 in __early_ioremap()
  mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default
2013-09-04 09:39:26 -07:00
Linus Torvalds
aafcd5d757 Merge branch 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 relocation changes from Ingo Molnar:
 "This tree contains a single change, ELF relocation handling in C - one
  of the kernel randomization patches that makes sense even without
  randomization present upstream"

* 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, relocs: Move ELF relocation handling to C
2013-09-04 09:38:10 -07:00
Linus Torvalds
6832d9652f Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers/nohz changes from Ingo Molnar:
 "It mostly contains fixes and full dynticks off-case optimizations, by
  Frederic Weisbecker"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  nohz: Include local CPU in full dynticks global kick
  nohz: Optimize full dynticks's sched hooks with static keys
  nohz: Optimize full dynticks state checks with static keys
  nohz: Rename a few state variables
  vtime: Always debug check snapshot source _before_ updating it
  vtime: Always scale generic vtime accounting results
  vtime: Optimize full dynticks accounting off case with static keys
  vtime: Describe overriden functions in dedicated arch headers
  m68k: hardirq_count() only need preempt_mask.h
  hardirq: Split preempt count mask definitions
  context_tracking: Split low level state headers
  vtime: Fix racy cputime delta update
  vtime: Remove a few unneeded generic vtime state checks
  context_tracking: User/kernel broundary cross trace events
  context_tracking: Optimize context switch off case with static keys
  context_tracking: Optimize guest APIs off case with static key
  context_tracking: Optimize main APIs off case with static key
  context_tracking: Ground setup for static key use
  context_tracking: Remove full dynticks' hacky dependency on wide context tracking
  nohz: Only enable context tracking on full dynticks CPUs
  ...
2013-09-04 09:36:54 -07:00
David S. Miller
48f8e0af86 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
The following batch contains:

* Three fixes for the new synproxy target available in your
  net-next tree, from Jesper D. Brouer and Patrick McHardy.

* One fix for TCPMSS to correctly handling the fragmentation
  case, from Phil Oester. I'll pass this one to -stable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 12:28:02 -04:00
Trond Myklebust
f6de7a39c1 NFSv4: Document the recover_lost_locks kernel parameter
Rename the new 'recover_locks' kernel parameter to 'recover_lost_locks'
and change the default to 'false'. Document why in
Documentation/kernel-parameters.txt

Move the 'recover_lost_locks' kernel parameter to fs/nfs/super.c to
make it easy to backport to kernels prior to 3.6.x, which don't have
a separate NFSv4 module.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:32 -04:00
NeilBrown
ef1820f9be NFSv4: Don't try to recover NFSv4 locks when they are lost.
When an NFSv4 client loses contact with the server it can lose any
locks that it holds.

Currently when it reconnects to the server it simply tries to reclaim
those locks.  This might succeed even though some other client has
held and released a lock in the mean time.  So the first client might
think the file is unchanged, but it isn't.  This isn't good.

If, when recovery happens, the locks cannot be claimed because some
other client still holds the lock, then we get a message in the kernel
logs, but the client can still write.  So two clients can both think
they have a lock and can both write at the same time.  This is equally
not good.

There was a patch a while ago
  http://comments.gmane.org/gmane.linux.nfs/41917

which tried to address some of this, but it didn't seem to go
anywhere.  That patch would also send a signal to the process.  That
might be useful but for now this patch just causes writes to fail.

For NFSv4 (unlike v2/v3) there is a strong link between the lock and
the write request so we can fairly easily fail any IO of the lock is
gone.  While some applications might not expect this, it is still
safer than allowing the write to succeed.

Because this is a fairly big change in behaviour a module parameter,
"recover_locks", is introduced which defaults to true (the current
behaviour) but can be set to "false" to tell the client not to try to
recover things that were lost.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:32 -04:00
Trond Myklebust
40b5ea0c25 SUNRPC: Add tracepoints to help debug socket connection issues
Add client side debugging to help trace socket connection/disconnection
and unexpected state change issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:31 -04:00
Chuck Lever
b6a85258d8 NFS: Fix warning introduced by NFSv4.0 transport blocking patches
When CONFIG_NFS_V4_1 is not enabled, gcc emits this warning:

linux/fs/nfs/nfs4state.c:255:12: warning:
 ‘nfs4_begin_drain_session’ defined but not used [-Wunused-function]
 static int nfs4_begin_drain_session(struct nfs_client *clp)
            ^

Eventually NFSv4.0 migration recovery will invoke this function, but
that has not yet been merged.  Hide nfs4_begin_drain_session()
behind CONFIG_NFS_V4_1 for now.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:30 -04:00
Chuck Lever
1cec16abf2 When CONFIG_NFS_V4_1 is not enabled, "make C=2" emits this warning:
linux/fs/nfs/nfs4session.c:337:6: warning:
 symbol 'nfs41_set_target_slotid' was not declared. Should it be static?

Move nfs41_set_target_slotid() and nfs41_update_target_slotid() back
behind CONFIG_NFS_V4_1, since, in the final revision of this work,
they are used only in NFSv4.1 and later.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:30 -04:00
Linus Torvalds
228abe73ad Merge branch 'x86-fb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fb changes from Ingo Molnar:
 "This tree includes preparatory patches for SimpleDRM driver support,
  by David Herrmann.  They clean up x86 framebuffer support by creating
  simplefb devices wherever possible.  More background can be found at

     http://lwn.net/Articles/558104/"

* 'x86-fb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  fbdev: fbcon: select VT_HW_CONSOLE_BINDING
  fbdev: efifb: bind to efi-framebuffer
  fbdev: vesafb: bind to platform-framebuffer device
  fbdev: simplefb: add common x86 RGB formats
  x86: sysfb: move EFI quirks from efifb to sysfb
  x86: provide platform-devices for boot-framebuffers
  fbdev: simplefb: mark as fw and allocate apertures
  fbdev: simplefb: add init through platform_data
2013-09-04 09:12:17 -07:00
Linus Torvalds
1f9c52e16b Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu feature fixes from Ingo Molnar:
 "Two small cpufeature support updates"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix override new_cpu_data.x86 with 486
  x86, cpufeature: Use new CC_HAVE_ASM_GOTO
2013-09-04 09:11:16 -07:00
Linus Torvalds
9cb87aaf40 Merge branches 'x86-boot-for-linus' and 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull tiny x86 boot cleanups from Ingo Molnar.

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Fix a sanity check in printf.c

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, boot: Fix warning due to undeclared strlen()
2013-09-04 09:10:27 -07:00
Linus Torvalds
2a475501b8 Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/asmlinkage changes from Ingo Molnar:
 "As a preparation for Andi Kleen's LTO patchset (link time
  optimizations using GCC's -flto which build time optimization has
  steadily increased in quality over the past few years and might
  eventually be usable for the kernel too) this tree includes a handful
  of preparatory patches that make function calling convention
  annotations consistent again:

   - Mark every function without arguments (or 64bit only) that is used
     by assembly code with asmlinkage()

   - Mark every function with parameters or variables that is used by
     assembly code as __visible.

  For the vanilla kernel this has documentation, consistency and
  debuggability advantages, for the time being"

* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asmlinkage: Fix warning in xen asmlinkage change
  x86, asmlinkage, vdso: Mark vdso variables __visible
  x86, asmlinkage, power: Make various symbols used by the suspend asm code visible
  x86, asmlinkage: Make dump_stack visible
  x86, asmlinkage: Make 64bit checksum functions visible
  x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops
  x86, asmlinkage, apm: Make APM data structure used from assembler visible
  x86, asmlinkage: Make syscall tables visible
  x86, asmlinkage: Make several variables used from assembler/linker script visible
  x86, asmlinkage: Make kprobes code visible and fix assembler code
  x86, asmlinkage: Make various syscalls asmlinkage
  x86, asmlinkage: Make 32bit/64bit __switch_to visible
  x86, asmlinkage: Make _*_start_kernel visible
  x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible
  x86, asmlinkage: Change dotraplinkage into __visible on 32bit
  x86: Fix sys_call_table type in asm/syscall.h
2013-09-04 08:42:44 -07:00
Dong Fang
05726acabe fuse: use list_for_each_entry() for list traversing
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-09-04 17:42:42 +02:00
Linus Torvalds
3d7e5fc37f Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/asm changes from Ingo Molnar:
 "Main changes:

   - Apply low level mutex optimization on x86-64, by Wedson Almeida
     Filho.

   - Change bitops to be naturally 'long', by H Peter Anvin.

   - Add TSX-NI opcodes support to the x86 (instrumentation) decoder, by
     Masami Hiramatsu.

   - Add clang compatibility adjustments/workarounds, by Jan-Simon
     Möller"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, doc: Update uaccess.h comment to reflect clang changes
  x86, asm: Fix a compilation issue with clang
  x86, asm: Extend definitions of _ASM_* with a raw format
  x86, insn: Add new opcodes as of June, 2013
  x86/ia32/asm: Remove unused argument in macro
  x86, bitops: Change bitops to be native operand size
  x86: Use asm-goto to implement mutex fast path on x86-64
2013-09-04 08:39:38 -07:00
Linus Torvalds
6924a4672d Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/apic changes from Ingo Molnar:
 "Smaller fixes"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioapic: Check attr against the previous setting when programmed more than once
  x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock
  x86/acpi: Fix incorrect sanity check in acpi_register_lapic()
2013-09-04 08:39:05 -07:00
Linus Torvalds
ac3c1c4f1c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer changes from Ingo Molnar:
 "Various clocksource driver updates: extend the core with memory mapped
  hardware (mmio) support and add new (ARM) Moxart SoC and sun4i
  hardware support"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  clocksource: arch_timer: Add support for memory mapped timers
  clocksource: arch_timer: Push the read/write wrappers deeper
  Documentation: Add memory mapped ARM architected timer binding
  clocksource: arch_timer: Pass clock event to set_mode callback
  clocksource: arch_timer: Make register accessors less error-prone
  ARM: clocksource: moxart: documentation: Update device tree bindings document
  ARM: clocksource: moxart: Add bitops.h include
  ARM: clocksource: moxart: documentation: Fix device tree bindings document
  ARM: clocksource: Add support for MOXA ART SoCs
  clocksource: cadence_ttc: Reuse clocksource as sched_clock
  clocksource: cadence_ttc: Remove unused header
  clocksource: sun4i: Fix bug when switching from periodic to oneshot modes
  clocksource: sun4i: Cleanup parent clock setup
  clocksource: sun4i: Remove TIMER_SCAL variable
  clocksource: sun4i: Factor out some timer code
  clocksource: sun4i: Fix the next event code
  clocksource: sun4i: Don't forget to enable the clock we use
  clocksource: sun4i: Add clocksource and sched clock drivers
  clocksource: sun4i: rename AUTORELOAD define to RELOAD
  clocksource: sun4i: Wrap macros arguments in parenthesis
  ...
2013-09-04 08:38:26 -07:00
Linus Torvalds
5e0b3a4e88 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "Various optimizations, cleanups and smaller fixes - no major changes
  in scheduler behavior"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix the sd_parent_degenerate() code
  sched/fair: Rework and comment the group_imb code
  sched/fair: Optimize find_busiest_queue()
  sched/fair: Make group power more consistent
  sched/fair: Remove duplicate load_per_task computations
  sched/fair: Shrink sg_lb_stats and play memset games
  sched: Clean-up struct sd_lb_stat
  sched: Factor out code to should_we_balance()
  sched: Remove one division operation in find_busiest_queue()
  sched/cputime: Use this_cpu_add() in task_group_account_field()
  cpumask: Fix cpumask leak in partition_sched_domains()
  sched/x86: Optimize switch_mm() for multi-threaded workloads
  generic-ipi: Kill unnecessary variable - csd_flags
  numa: Mark __node_set() as __always_inline
  sched/fair: Cleanup: remove duplicate variable declaration
  sched/__wake_up_sync_key(): Fix nr_exclusive tasks which lead to WF_SYNC clearing
2013-09-04 08:36:35 -07:00
Daniel Vetter
a2dc53e7dc drm/i915: fix i9xx_crtc_clock_get for multiplied pixels
The dpll actually runs at the port clock so we don't need
to multiply it again with the pixel multiplier to get the
adjusted_mode.clock. This is in contrast to the ironlake
pixel clock readout code which uses the fdi dotclock: That
one does _not_ run with multiplied pixels.

This issue goes back to the original clock readout code added
in

commit f1f644dc66
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Thu Jun 27 00:39:25 2013 +0300

    drm/i915: get mode clock when reading the pipe config v9

Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:03 +02:00
Daniel Vetter
eeb4793779 drm/i915: handle sdvo input pixel multiplier correctly again
The sdvo input timing needs to be the actual mode, the sdvo
encoder automatically adjusts for the need of pixel doubling or
quadrupling. This was lost in pipe config conversion of the
pixel multiplier in

commit 6cc5f341b5
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Mar 27 00:44:53 2013 +0100

    drm/i915: add pipe_config->pixel_multiplier

While at it ditch the intel_ prefix from the crtc in
intel_sdvo_mode_set.

Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:03 +02:00
Daniel Vetter
645416f5ad drm/i915: fix hpd work vs. flush_work in the pageflip code deadlock
Historically we've run our own driver hotplug handling in our own
work-queue, which then launched the drm core hotplug handling in the
system workqueue. This is important since we flush our own driver
workqueue in the pageflip code while hodling modeset locks, and only
the drm hotplug code grabbed these locks. But with

commit 69787f7da6
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Oct 23 18:23:34 2012 +0000

    drm: run the hpd irq event code directly

this was changed and now we could deadlock in our flip handler if
there's a hotplug work blocking the progress of the crucial unpin
works. So this broke the careful deadlock avoidance implemented in

commit b4a98e57fc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Nov 1 09:26:26 2012 +0000

    drm/i915: Flush outstanding unpin tasks before pageflipping

Since the rule thus far has been that work items on our own workqueue
may never grab modeset locks simply restore that rule again.

v2: Add a comment to the declaration of dev_priv->wq to warn readers
about the tricky implications of using it. Suggested by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Stuart Abercrombie <sabercrombie@chromium.org>
Reported-by: Stuart Abercrombie <sabercrombie@chromium.org>
References: http://permalink.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/26239
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Squash in a comment at the place where we schedule the work.
Requested after-the-fact by Chris on irc since the hpd work isn't the
only place we botch this.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-04 17:34:02 +02:00
Linus Torvalds
0d99b70873 Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "As a first remark I'd like to point out that the obsolete '-f'
  (--force) option, which has not done anything for several releases,
  has been removed from 'perf record' and related utilities.  Everyone
  please update muscle memory accordingly! :-)

  Main changes on the perf kernel side:

   - Performance optimizations:
        . for trace events, by Steve Rostedt.
        . for time values, by Peter Zijlstra

   - New hardware support:
        . for Intel Silvermont (22nm Atom) CPUs, by Zheng Yan
        . for Intel SNB-EP uncore PMUs, by Zheng Yan

   - Enhanced hardware support:
        . for Intel uncore PMUs: add filter support for QPI boxes, by Zheng Yan

   - Core perf events code enhancements and fixes:
        . for full-nohz feature handling, by Frederic Weisbecker
        . for group events, by Jiri Olsa
        . for call chains, by Frederic Weisbecker
        . for event stream parsing, by Adrian Hunter

   - New ABI details:
        . Add attr->mmap2 attribute, by Stephane Eranian
        . Add PERF_EVENT_IOC_ID ioctl to return event ID, by Jiri Olsa
        . Export u64 time_zero on the mmap header page to allow TSC
          calculation, by Adrian Hunter
        . Add dummy software event, by Adrian Hunter.
        . Add a new PERF_SAMPLE_IDENTIFIER to make samples always
          parseable, by Adrian Hunter.
        . Make Power7 events available via sysfs, by Runzhen Wang.

   - Code cleanups and refactorings:
        . for nohz-full, by Frederic Weisbecker
        . for group events, by Jiri Olsa

   - Documentation updates:
        . for perf_event_type, by Peter Zijlstra

  Main changes on the perf tooling side (some of these tooling changes
  utilize the above kernel side changes):

   - Lots of 'perf trace' enhancements:

        . Make 'perf trace' command line arguments consistent with
          'perf record', by David Ahern.

        . Allow specifying syscalls a la strace, by Arnaldo Carvalho de Melo.

        . Add --verbose and -o/--output options, by Arnaldo Carvalho de Melo.

        . Support ! in -e expressions, to filter a list of syscalls,
          by Arnaldo Carvalho de Melo.

        . Arg formatting improvements to allow masking arguments in
          syscalls such as futex and open, where the some arguments are
          ignored and thus should not be printed depending on other args,
          by Arnaldo Carvalho de Melo.

        . Beautify futex open, openat, open_by_handle_at, lseek and futex
          syscalls, by Arnaldo Carvalho de Melo.

        . Add option to analyze events in a file versus live, so that
          one can do:

           [root@zoo ~]# perf record -a -e raw_syscalls:* sleep 1
           [ perf record: Woken up 0 times to write data ]
           [ perf record: Captured and wrote 25.150 MB perf.data (~1098836 samples) ]
           [root@zoo ~]# perf trace -i perf.data -e futex --duration 1
              17.799 ( 1.020 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, ua
             113.344 (95.429 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 4294967
             133.778 ( 1.042 ms): 18004 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 429496
           [root@zoo ~]#

          By David Ahern.

        . Honor target pid / tid options when analyzing a file, by David Ahern.

        . Introduce better formatting of syscall arguments, including so
          far beautifiers for mmap, madvise, syscall return values,
          by Arnaldo Carvalho de Melo.

        . Handle HUGEPAGE defines in the mmap beautifier, by David Ahern.

   - 'perf report/top' enhancements:

        . Do annotation using /proc/kcore and /proc/kallsyms when
          available, removing the forced need for a vmlinux file kernel
          assembly annotation. This also improves this use case because
          vmlinux has just the initial kernel image, not what is actually
          in use after various code patchings by things like alternatives.
          By Adrian Hunter.

        . Add --ignore-callees=<regex> option to collapse undesired parts
          of call graphs, by Greg Price.

        . Simplify symbol filtering by doing it at machine class level,
          by Adrian Hunter.

        . Add support for callchains in the gtk UI, by Namhyung Kim.

        . Add --objdump option to 'perf top', by Sukadev Bhattiprolu.

   - 'perf kvm' enhancements:

        . Add option to print only events that exceed a specified time
          duration, by David Ahern.

        . Improve stack trace printing, by David Ahern.

        . Update documentation of the live command, by David Ahern

        . Add perf kvm stat live mode that combines aspects of 'perf kvm
          stat' record and report, by David Ahern.

        . Add option to analyze specific VM in perf kvm stat report, by
          David Ahern.

        . Do not require /lib/modules/* on a guest, by Jason Wessel.

   - 'perf script' enhancements:

        . Fix symbol offset computation for some dsos, by David Ahern.

        . Fix named threads support, by David Ahern.

        . Don't install scripting files files when perl/python support
          is disabled, by Arnaldo Carvalho de Melo.

   - 'perf test' enhancements:

        . Add various improvements and fixes to the "vmlinux matches
          kallsyms" 'perf test' entry, related to the /proc/kcore
          annotation feature. By Adrian Hunter.

        . Add sample parsing test, by Adrian Hunter.

        . Add test for reading object code, by Adrian Hunter.

        . Add attr record group sampling test, by Jiri Olsa.

        . Misc testing infrastructure improvements and other details,
          by Jiri Olsa.

   - 'perf list' enhancements:

        . Skip unsupported hardware events, by Namhyung Kim.

        . List pmu events, by Andi Kleen.

   - 'perf diff' enhancements:

        . Add support for more than two files comparison, by Jiri Olsa.

   - 'perf sched' enhancements:

        . Various improvements, including removing reliance on some
          scheduler tracepoints that provide the same information as the
          PERF_RECORD_{FORK,EXIT} events. By David Ahern.

        . Remove odd build stall by moving a large struct initialization
          from a local variable to a global one, by Namhyung Kim.

   - 'perf stat' enhancements:

        . Add --initial-delay option to skip measuring for a defined
          startup phase, by Andi Kleen.

   - Generic perf tooling infrastructure/plumbing changes:

        . Tidy up sample parsing validation, by Adrian Hunter.

        . Fix up jobserver setup in libtraceevent Makefile.
          by Arnaldo Carvalho de Melo.

        . Debug improvements, by Adrian Hunter.

        . Fix correlation of samples coming after PERF_RECORD_EXIT event,
          by David Ahern.

        . Improve robustness of the topology parsing code,
          by Stephane Eranian.

        . Add group leader sampling, that allows just one event in a group
          to sample while the other events have just its values read,
          by Jiri Olsa.

        . Add support for a new modifier "D", which requests that the
          event, or group of events, be pinned to the PMU.
          By Michael Ellerman.

        . Support callchain sorting based on addresses, by Andi Kleen

        . Prep work for multi perf data file storage, by Jiri Olsa.

        . libtraceevent cleanups, by Namhyung Kim.

  And lots and lots of other fixes and code reorganizations that did not
  make it into the list, see the shortlog, diffstat and the Git log for
  details!"

[ Also merge a leftover from the 3.11 cycle ]

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Prevent race in unthrottling code

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (237 commits)
  perf trace: Tell arg formatters the arg index
  perf trace: Add beautifier for open's flags arg
  perf trace: Add beautifier for lseek's whence arg
  perf tools: Fix symbol offset computation for some dsos
  perf list: Skip unsupported events
  perf tests: Add 'keep tracking' test
  perf tools: Add support for PERF_COUNT_SW_DUMMY
  perf: Add a dummy software event to keep tracking
  perf trace: Add beautifier for futex 'operation' parm
  perf trace: Allow syscall arg formatters to mask args
  perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()
  perf: Export struct perf_branch_entry to userspace
  perf: Add attr->mmap2 attribute to an event
  perf/x86: Add Silvermont (22nm Atom) support
  perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X
  perf trace: Handle missing HUGEPAGE defines
  perf trace: Honor target pid / tid options when analyzing a file
  perf trace: Add option to analyze events in a file versus live
  perf evlist: Add tracepoint lookup by name
  perf tests: Add a sample parsing test
  ...
2013-09-04 08:25:35 -07:00
Heiko Carstens
82003c3e60 s390/irq: rework irq subclass handling
Let's not add a function for every external interrupt subclass for
which we need reference counting. Just have two register/unregister
functions which have a subclass parameter:

void irq_subclass_register(enum irq_subclass subclass);
void irq_subclass_unregister(enum irq_subclass subclass);

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-04 17:19:13 +02:00
Heiko Carstens
50ce749d0d s390/irq: use hlists for external interrupt handler array
Use hlists for the hashed array of external interrupt handlers.
Reduces the size of the array by 50% (2KB).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-04 17:19:10 +02:00
Heiko Carstens
8237ac3c4c s390/dumpstack: convert print_symbol to %pSR
This is the same as what other architectures did.
The change has also the advantage that there won't be any interleaving
messages between printk() and print_symbol().

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-04 17:19:07 +02:00
Hendrik Brueckner
ae6834c1e4 s390/perf: Remove print_hex_dump_bytes() debug output
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-04 17:19:04 +02:00
Heiko Carstens
3b459c5421 s390: update defconfig
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-04 17:19:01 +02:00
Heiko Carstens
4784955a52 s390/bpf,jit: fix address randomization
Add misssing braces to hole calculation. This resulted in an addition
instead of an substraction. Which in turn means that the jit compiler
could try to write out of bounds of the allocated piece of memory.

This bug was introduced with aa2d2c73 "s390/bpf,jit: address randomize
and write protect jit code".

Fixes this one:

[   37.320956] Unable to handle kernel pointer dereference at virtual kernel address 000003ff80231000
[   37.320984] Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   37.320993] Modules linked in: dm_multipath scsi_dh eadm_sch dm_mod ctcm fsm autofs4
[   37.321007] CPU: 28 PID: 6443 Comm: multipathd Not tainted 3.10.9-61.x.20130829-s390xdefault #1
[   37.321011] task: 0000004ada778000 ti: 0000004ae3304000 task.ti: 0000004ae3304000
[   37.321014] Krnl PSW : 0704c00180000000 000000000012d1de (bpf_jit_compile+0x198e/0x23d0)
[   37.321022]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 000000004350207d 0000004a00000001 0000000000000007 000003ff80231002
[   37.321029]            0000000000000007 000003ff80230ffe 00000000a7740000 000003ff80230f76
[   37.321032]            000003ffffffffff 000003ff00000000 000003ff0000007d 000000000071e820
[   37.321035]            0000004adbe99950 000000000071ea18 0000004af3d9e7c0 0000004ae3307b80
[   37.321046] Krnl Code: 000000000012d1d0: 41305004            la      %r3,4(%r5)
                          000000000012d1d4: e330f0f80021        clg     %r3,248(%r15)
                         #000000000012d1da: a7240009            brc     2,12d1ec
                         >000000000012d1de: 50805000            st      %r8,0(%r5)
                          000000000012d1e2: e330f0f00004        lg      %r3,240(%r15)
                          000000000012d1e8: 41303004            la      %r3,4(%r3)
                          000000000012d1ec: e380f0e00004        lg      %r8,224(%r15)
                          000000000012d1f2: e330f0f00024        stg     %r3,240(%r15)
[   37.321074] Call Trace:
[   37.321077] ([<000000000012da78>] bpf_jit_compile+0x2228/0x23d0)
[   37.321083]  [<00000000006007c2>] sk_attach_filter+0xfe/0x214
[   37.321090]  [<00000000005d2d92>] sock_setsockopt+0x926/0xbdc
[   37.321097]  [<00000000005cbfb6>] SyS_setsockopt+0x8a/0xe8
[   37.321101]  [<00000000005ccaa8>] SyS_socketcall+0x264/0x364
[   37.321106]  [<0000000000713f1c>] sysc_nr_ok+0x22/0x28
[   37.321113]  [<000003fffce10ea8>] 0x3fffce10ea8
[   37.321118] INFO: lockdep is turned off.
[   37.321121] Last Breaking-Event-Address:
[   37.321124]  [<000000000012d192>] bpf_jit_compile+0x1942/0x23d0
[   37.321132]
[   37.321135] Kernel panic - not syncing: Fatal exception: panic_on_oops

Cc: stable@vger.kernel.org # v3.11
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-04 17:18:55 +02:00
Linus Torvalds
4689550bb2 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core/locking changes from Ingo Molnar:
 "Main changes:

   - another mutex optimization, from Davidlohr Bueso

   - improved lglock lockdep tracking, from Michel Lespinasse

   - [ assorted smaller updates, improvements, cleanups. ]"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  generic-ipi/locking: Fix misleading smp_call_function_any() description
  hung_task debugging: Print more info when reporting the problem
  mutex: Avoid label warning when !CONFIG_MUTEX_SPIN_ON_OWNER
  mutex: Do not unnecessarily deal with waiters
  mutex: Fix/document access-once assumption in mutex_can_spin_on_owner()
  lglock: Update lockdep annotations to report recursive local locks
  lockdep: Introduce lock_acquire_exclusive()/shared() helper macros
2013-09-04 08:18:19 -07:00
Linus Torvalds
b854e4de0b Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "Main RCU changes this cycle were:

   - Full-system idle detection.  This is for use by Frederic
     Weisbecker's adaptive-ticks mechanism.  Its purpose is to allow the
     timekeeping CPU to shut off its tick when all other CPUs are idle.

   - Miscellaneous fixes.

   - Improved rcutorture test coverage.

   - Updated RCU documentation"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
  nohz_full: Add full-system-idle state machine
  jiffies: Avoid undefined behavior from signed overflow
  rcu: Simplify _rcu_barrier() processing
  rcu: Make rcutorture emit online failures if verbose
  rcu: Remove unused variable from rcu_torture_writer()
  rcu: Sort rcutorture module parameters
  rcu: Increase rcutorture test coverage
  rcu: Add duplicate-callback tests to rcutorture
  doc: Fix memory-barrier control-dependency example
  rcu: Update RTFP documentation
  nohz_full: Add full-system-idle arguments to API
  nohz_full: Add full-system idle states and variables
  nohz_full: Add per-CPU idle-state tracking
  nohz_full: Add rcu_dyntick data for scalable detection of all-idle state
  nohz_full: Add Kconfig parameter for scalable detection of all-idle state
  nohz_full: Add testing information to documentation
  rcu: Eliminate unused APIs intended for adaptive ticks
  rcu: Select IRQ_WORK from TREE_PREEMPT_RCU
  rculist: list_first_or_null_rcu() should use list_entry_rcu()
  ...
2013-09-04 08:17:12 -07:00
Michal Simek
54ea21f078 microblaze: Show message when reset gpio is not present
Signed-off-by: Michal Simek <monstr@monstr.eu>
2013-09-04 17:01:37 +02:00
Bob Peterson
068213f7d3 GFS2: Remove unnecessary memory barrier
Function test_and_clear_bit implies a memory barrier, so subsequent
memory barriers are unnecessary.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-04 15:58:21 +01:00
Stanislaw Gruszka
5a8e01f8fa sched/cputime: Do not scale when utime == 0
scale_stime() silently assumes that stime < rtime, otherwise
when stime == rtime and both values are big enough (operations
on them do not fit in 32 bits), the resulting scaling stime can
be bigger than rtime. In consequence utime = rtime - stime
results in negative value.

User space visible symptoms of the bug are overflowed TIME
values on ps/top, for example:

 $ ps aux | grep rcu
 root         8  0.0  0.0      0     0 ?        S    12:42   0:00 [rcuc/0]
 root         9  0.0  0.0      0     0 ?        S    12:42   0:00 [rcub/0]
 root        10 62422329  0.0  0     0 ?        R    12:42 21114581:37 [rcu_preempt]
 root        11  0.1  0.0      0     0 ?        S    12:42   0:02 [rcuop/0]
 root        12 62422329  0.0  0     0 ?        S    12:42 21114581:35 [rcuop/1]
 root        10 62422329  0.0  0     0 ?        R    12:42 21114581:37 [rcu_preempt]

or overflowed utime values read directly from /proc/$PID/stat

Reference:

  https://lkml.org/lkml/2013/8/20/259

Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: stable@vger.kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20130904131602.GC2564@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-04 16:31:25 +02:00
Christoph Hellwig
02afc27fae direct-io: Handle O_(D)SYNC AIO
Call generic_write_sync() from the deferred I/O completion handler if
O_DSYNC is set for a write request.  Also make sure various callers
don't call generic_write_sync if the direct I/O code returns
-EIOCBQUEUED.

Based on an earlier patch from Jan Kara <jack@suse.cz> with updates from
Jeff Moyer <jmoyer@redhat.com> and Darrick J. Wong <darrick.wong@oracle.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04 09:23:46 -04:00
Christoph Hellwig
7b7a8665ed direct-io: Implement generic deferred AIO completions
Add support to the core direct-io code to defer AIO completions to user
context using a workqueue.  This replaces opencoded and less efficient
code in XFS and ext4 (we save a memory allocation for each direct IO)
and will be needed to properly support O_(D)SYNC for AIO.

The communication between the filesystem and the direct I/O code requires
a new buffer head flag, which is a bit ugly but not avoidable until the
direct I/O code stops abusing the buffer_head structure for communicating
with the filesystems.

Currently this creates a per-superblock unbound workqueue for these
completions, which is taken from an earlier patch by Jan Kara.  I'm
not really convinced about this use and would prefer a "normal" global
workqueue with a high concurrency limit, but this needs further discussion.

JK: Fixed ext4 part, dynamic allocation of the workqueue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04 09:23:46 -04:00
Joel Fernandes
5622ff1a4d dma: edma: Remove limits on number of slots
With this series, this check is no longer required and
we shouldn't need to reject drivers DMA'ing more than the
MAX number of slots.

Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-09-04 18:38:46 +05:30
Joel Fernandes
b267b3bc1e dma: edma: Leave linked to Null slot instead of DUMMY slot
Dummy slot has been used as a way for missed-events not to be
reported as missing. This has been particularly troublesome for cases
where we might want to temporarily pause all incoming events.

For EDMA DMAC, there is no way to do any such pausing of events as
the occurence of the "next" event is not software controlled.
Using "edma_pause" in IRQ handlers doesn't help as by then the event
in concern from the slave is already missed.

Linking a dummy slot, is seen to absorb these events which we didn't
want to miss. So we don't link to dummy, but instead leave it linked
to NULL set, allow an error condition and detect the channel that
missed it.

Consider the case where we have a scatter-list like:
SG1->SG2->SG3->SG4->SG5->SG6->Null

For ex, for a MAX_NR_SG of 2, earlier we were splitting this as:
SG1->SG2->Null
SG3->SG4->Null
SG5->SG6->Null

Now we split it as
SG1->SG2->Null
SG3->SG4->Null
SG5->SG6->Dummy

This approach results in lesser unwanted interrupts that occur
for the last list split. The Dummy slot has the property of not
raising an error condition if events are missed unlike the Null
slot. We are OK with this as we're done with processing the
whole list once we reach Dummy.

Signed-off-by: Joel Fernandes <joelf@ti.com>
[modifed duplicate s-o-b & patch title]
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-09-04 18:38:46 +05:30
Joel Fernandes
c5f47990aa dma: edma: Find missed events and issue them
In an effort to move to using Scatter gather lists of any size with
EDMA as discussed at [1] instead of placing limitations on the driver,
we work through the limitations of the EDMAC hardware to find missed
events and issue them.

The sequence of events that require this are:

For the scenario where MAX slots for an EDMA channel is 3:

SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null

The above SG list will have to be DMA'd in 2 sets:

(1) SG1 -> SG2 -> SG3 -> Null
(2) SG4 -> SG5 -> SG6 -> Null

After (1) is succesfully transferred, the events from the MMC controller
donot stop coming and are missed by the time we have setup the transfer
for (2). So here, we catch the events missed as an error condition and
issue them manually.

In the second part of the patch, we make handle the NULL slot cases:
For crypto IP, we continue to receive events even continuously in
NULL slot, the setup of the next set of SG elements happens after
the error handler executes. This is results in some recursion problems.
Due to this, we continously receive error interrupts when we manually
trigger an event from the error handler.

We fix this, by first detecting if the Channel is currently transferring
from a NULL slot or not, that's where the edma_read_slot in the error
callback from interrupt handler comes in. With this we can determine if
the set up of the next SG list has completed, and we manually trigger
only in this case. If the setup has _not_ completed, we are still in NULL
so we just set a missed flag and allow the manual triggerring to happen
in edma_execute which will be eventually called. This fixes the above
mentioned race conditions seen with the crypto drivers.

[1] http://marc.info/?l=linux-omap&m=137416733628831&w=2

Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-09-04 18:38:46 +05:30
Joel Fernandes
96874b9a24 ARM: edma: Add function to manually trigger an EDMA channel
Manual trigger for events missed as a result of splitting a
scatter gather list and DMA'ing it in batches. Add a helper
function to trigger a channel incase any such events are missed.

Signed-off-by: Joel Fernandes <joelf@ti.com>
Acked-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-09-04 18:38:46 +05:30
Joel Fernandes
534070622d dma: edma: Write out and handle MAX_NR_SG at a given time
Process SG-elements in batches of MAX_NR_SG if they are greater
than MAX_NR_SG. Due to this, at any given time only those many
slots will be used in the given channel no matter how long the
scatter list is. We keep track of how much has been written
inorder to process the next batch of elements in the scatter-list
and detect completion.

For such intermediate transfer completions (one batch of MAX_NR_SG),
make use of pause and resume functions instead of start and stop
when such intermediate transfer is in progress or completed as we
donot want to clear any pending events.

Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-09-04 18:38:46 +05:30
Joel Fernandes
6fbe24da82 dma: edma: Setup parameters to DMA MAX_NR_SG at a time
Changes are made here for configuring existing parameters to support
DMA'ing them out in batches as needed.

Also allocate as many as slots as needed by the SG list, but not more
than MAX_NR_SG. Then these slots will be reused accordingly.
For ex, if MAX_NR_SG=10, and number of SG entries is 40, still only
10 slots will be allocated to DMA the entire SG list of size 40.

Also enable TC interrupts for slots that are a last in a current
iteration, or that fall on a MAX_NR_SG boundary.

Signed-off-by: Joel Fernandes <joelf@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-09-04 18:38:46 +05:30