Commit Graph

401135 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo
46d525eae2 perf test: Update command line callchain attribute tests
The "struct perf_event_attr setup" entry in 'perf test' is in fact a
series of tests that will exec the tools, passing different sets of
command line arguments to then intercept the sys_perf_event_open
syscall, in user space, to check that the perf_event_attr->sample_type
and other feature request bits are setup as expected.

We recently restored the callchain requesting command line argument, -g,
to not require a parameter ("dwarf" or "fp"), instead using a default
("fp" for now) and making the long option variant, --call-chain, be the
one to be used when a different callchain collection method is
preferred.

The "struct perf_event_attr setup" test failed because we forgot to
update the tests involving callchains, not switching from, '-g dwarf' to
'--call-chain dwarf', making 'perf test' detect it:

  [root@sandy ~]# perf test -v 13
  13: struct perf_event_attr setup                           :
  --- start ---
  running '/home/acme/libexec/perf-core/tests/attr/test-record-basic'
  running '/home/acme/libexec/perf-core/tests/attr/test-record-branch-any'
  <SNIP>
  running '/home/acme/libexec/perf-core/tests/attr/test-record-graph-default'
  running '/home/acme/libexec/perf-core/tests/attr/test-record-graph-dwarf'
  expected sample_type=12583, got 295
  expected exclude_callchain_user=1, got 0
  expected sample_stack_user=8192, got 0
  FAILED '/home/acme/libexec/perf-core/tests/attr/test-record-graph-dwarf' - match failure
  ---- end ----
  struct perf_event_attr setup: FAILED!
  [root@sandy ~]#

Fix all of them now to use --call-chain when explicitely specifying a
method.

There is still work to do, as '-g fp', for instance, passed without
problems.

In that case 'perf test' saw no problems as the intercepted syscall got
the bits as expected, i.e. the default is 'fp', but the fact that 'fp'
may be an existing program and the specified workload would then be
passed as a parameter to it is an usability problem that needs fixing.

Next merge window tho.

Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-jr3oq1k5iywnp7vvqlslzydm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-11-01 10:42:57 -03:00
Wei Yang
32bf5bd181 perf bench: Fix two warnings
There are two warnings in bench/numa, when building this on 32-bit
machine.

The warning output is attached:

bench/numa.c:1113:20: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
bench/numa.c:1161:6: error: format ‘%lx’ expects argument of t'long unsigned int’, but argument 5 has type ‘u64’ [-Werror=format]

This patch fixes these two warnings.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Link: http://lkml.kernel.org/r/1379839764-9245-1-git-send-email-weiyang@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-11-01 10:41:54 -03:00
Michael Hudson-Doyle
53805eca3d perf tools: Remove cast of non-variadic function to variadic
The 4fb71074a5 (perf ui/hist: Consolidate hpp helpers) cset introduced
a cast of percent_color_snprintf to a function pointer type with
varargs.  Change percent_color_snprintf to be variadic and remove the
cast.

The symptom of this was all percentages being reported as 0.00% in perf
report --stdio output on the armhf arch.

Signed-off-by: Michael Hudson-Doyle <michael.hudson@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/87zjppvw7y.fsf@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-11-01 10:40:51 -03:00
Peter Zijlstra
e8a923cc1f perf/x86: Fix NMI measurements
OK, so what I'm actually seeing on my WSM is that sched/clock.c is
'broken' for the purpose we're using it for.

What triggered it is that my WSM-EP is broken :-(

  [    0.001000] tsc: Fast TSC calibration using PIT
  [    0.002000] tsc: Detected 2533.715 MHz processor
  [    0.500180] TSC synchronization [CPU#0 -> CPU#6]:
  [    0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock.
  [    0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed

For some reason it consistently detects TSC skew, even though NHM+
should have a single clock domain for 'reasonable' systems.

This marks sched_clock_stable=0, which means that we do fancy stuff to
try and get a 'sane' clock. Part of this fancy stuff relies on the tick,
clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until
it either wakes up or gets kicked by another cpu.

While this is perfectly fine for the scheduler -- it only cares about
actually running stuff, and when we're running stuff we're obviously not
idle. This does somewhat break down for perf which can trigger events
just fine on an otherwise idle cpu.

So I've got NMIs get get 'measured' as taking ~1ms, which actually
don't last nearly that long:

          <idle>-0     [013] d.h.   886.311970: rcu_nmi_enter <-do_nmi
  ...
          <idle>-0     [013] d.h.   886.311997: perf_sample_event_took: HERE!!! : 1040990

So ftrace (which uses sched_clock(), not the fancy bits) only sees
~27us, but we measure ~1ms !!

Now since all this measurement stuff lives in x86 code, we can actually
fix it.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:20 +01:00
Peter Zijlstra
bf378d341e perf: Fix perf ring buffer memory ordering
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky <victork@il.ibm.com>
Tested-by: Victor Kaplansky <victork@il.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:19 +01:00
Ingo Molnar
cd65718712 Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

 * Add color overhead for stdio output buffer, which fixes
   --stdio output being chopped up on the hot (red) entries,
   fix from Jiri Olsa.

 * Get 'perf record -g -a sleep 1' working again, removing the
   need for -- separating perf options from the workload, restoring
   ages old behaviour, fix from Jiri Olsa.
   More patches allowing ~/.perfconfig setting up of default
   callchain collecting method ("fp" or "dwarf") left for next
   merge window.

 * Fixup mmap event consumption, where we were acking the
   consumption by writing the tail before actually accessing
   the event, which could lead to using overwritten records
   in things like 'perf record --call-graph'. From Zhouyi Zhou.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-29 09:06:07 +01:00
Zhouyi Zhou
8e50d384cc perf tools: Fixup mmap event consumption
The tail position of the event buffer should only be modified after
actually use that event.

If not the event buffer could be invalid before use, and segment fault
occurs when invoking perf top -G.

Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Cc: David Ahern <dsahern@gmail.com>
Cc: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Link: http://lkml.kernel.org/r/1382600613-32177-1-git-send-email-zhouzhouyi@gmail.com
[ Simplified the logic using exit gotos and renamed write_tail method to mmap_consume ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-28 16:06:00 -03:00
Jiri Olsa
ae779a6309 perf top: Split -G and --call-graph
Splitting -G and --call-graph for record command, so we could use '-G'
with no option.

The '-G' option now takes NO argument and enables the configured unwind
method, which is currently the frame pointers method.

It will be possible to configure unwind method via config file in
upcoming patches.

All current '-G' arguments is overtaken by --call-graph option.

NOTE: The documentation for top --call-graph option
      was wrongly copied from report command.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382797536-32303-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-28 16:06:00 -03:00
Jiri Olsa
09b0fd45ff perf record: Split -g and --call-graph
Splitting -g and --call-graph for record command, so we could use '-g'
with no option.

The '-g' option now takes NO argument and enables the configured unwind
method, which is currently the frame pointers method.

It will be possible to configure unwind method via config file in
upcoming patches.

All current '-g' arguments is overtaken by --call-graph option.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382797536-32303-2-git-send-email-jolsa@redhat.com
[ reordered -g/--call-graph on --help and expanded the man page
  according to comments by David Ahern and Namhyung Kim ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-28 16:05:59 -03:00
Jiri Olsa
9754c4f9b2 perf hists: Add color overhead for stdio output buffer
Following commit tightened up the buffer size for output to strict width
of used format columns:

  99cf666 perf hists: Fix formatting of long symbol names

This works fine until you hit color overhead output which places extra
bytes into output buffer. We need to account for color overhead in the
output buffer. Adding maximum color byte size to the output buffer size.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382700293-1803-1-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-28 16:05:59 -03:00
Ingo Molnar
d17cccbea9 Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

* Fix up /proc/PID/maps parsing, where perfectly fine mmap entries
  were being trown away when synthesizing PERF_RECORD_MMAP for
  preexisting threads, prevenging symbol resolution to work
  for those threads, broken in the MMAP2 removal. Reported and
  pinpointed by Markus Trippelsdorf,

* Fix mem leak in the python 'perf script' backend, due to missing Py_DECREFs
  on dict entries, fix from Joseph Schuchart.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-28 15:56:50 +01:00
Arnaldo Carvalho de Melo
2fd869f08a perf tools: Fix up /proc/PID/maps parsing
When introducing support for MMAP2 we considered more parts of each map
representation in /proc/PID/maps, and when disabling it we forgot to
reduce the number of expected parsed/assigned entries in the sscanf
call, fix it to expect the right number of desired fields, 5.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Based-on-a-patch-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-vrbo1wik997ahjzl1chm3bdm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-28 09:38:12 -03:00
Linus Torvalds
959f58544b Linux 3.12-rc7 2013-10-27 16:12:03 -07:00
Linus Torvalds
a2ff82065b Merge branch 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
 "This is a 2-line patch to save the CPU register which holds our task
  thread info pointer before calling a firmware function and then to
  restore it again afterwards.

  This is necessary because on some 64bit machines the high-order 32bits
  are being clobbered by the firmware call, and thus we failed to bring
  up secondary CPUs (and instead crashed the kernel) in some situations
  eg if we had more than 4GB RAM.  This patch fixes a bug which has been
  since ever in the parisc linux kernel and which prevented some people
  to use a 64bit kernel"

* 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM
2013-10-27 10:45:00 -07:00
Linus Torvalds
aff22d3f1a Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "This tree contains a clockevents regression fix for certain ARM
  subarchitectures"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents: Sanitize ticks to nsec conversion
2013-10-27 10:29:25 -07:00
Linus Torvalds
e2756f5e0f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "The tree contains three fixes:

   - Two tooling fixes

   - Reversal of the new 'MMAP2' extended mmap record ABI, introduced in
     this merge window.  (Patches were proposed to fix it but it was all
     a bit late and we felt it's safer to just delay the ABI one more
     kernel release and do it right)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Disable PERF_RECORD_MMAP2 support
  perf scripting perl: Fix build error on Fedora 12
  perf probe: Fix to initialize fname always before use it
2013-10-27 10:28:35 -07:00
Linus Torvalds
1c99ca43a4 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
 "This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on
  kernels built with GCC 3.x (there are still such distros)"

Side note: it's not just a fix for old gcc versions, it's also removing
an incredibly broken/subtle check that LLVM had issues with, and that
made no sense.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mutex: Avoid gcc version dependent __builtin_constant_p() usage
2013-10-27 10:18:15 -07:00
Linus Torvalds
acda24c47e Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Here are the outstanding target pending fixes for v3.12-rc7.

  This includes a number of EXTENDED_COPY related fixes as a result of
  Thomas and Doug's continuing testing and feedback.

  Also included is an important vhost/scsi fix that addresses a long
  standing issue where the 'write' parameter for get_user_pages_fast()
  was incorrectly set for virtio-scsi WRITEs -> DMA_TO_DEVICE, and not
  for virtio-scsi READs -> DMA_FROM_DEVICE.

  This resulted in random userspace segfaults and other unpleasantness
  on KVM host, and unfortunately has been an issue since the initial
  merge of vhost/scsi in v3.6.  This patch is CC'ed to stable, along
  with two other less critical items"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter
  target/pscsi: fix return value check
  target: Fail XCOPY for non matching source + destination block_size
  target: Generate failure for XCOPY I/O with non-zero scsi_status
  target: Add missing XCOPY I/O operation sense_buffer
  iser-target: check device before dereferencing its variable
  target: Return an error for WRITE SAME with ANCHOR==1
  target: Fix assignment of LUN in tracepoints
  target: Reject EXTENDED_COPY when emulate_3pc is disabled
  target: Allow non zero ListID in EXTENDED_COPY parameter list
  target: Make target_do_xcopy failures return INVALID_PARAMETER_LIST
2013-10-27 10:16:33 -07:00
Linus Torvalds
63e656083d Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine fixes from Vinod Koul:
 "Here is the late fixes pull request for dmaengine while you fly back
  from KS.

  We have a new dmaengine ML hosted by vger so a patch for that along
  with addition of Dave as driver mainatainer for ioat.  Other fixes are
  memeory leak fixes on edma driver, small fixes on rcar-hpbdma driver
  by Sergei"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: edma: fix another memory leak
  dma: edma: Fix memory leak
  MAINTAINERS: add to ioatdma maintainer list
  MAINTAINERS: add the new dmaengine mailing list
2013-10-27 10:13:03 -07:00
Helge Deller
54e181e073 parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM
Since the beginning of the parisc-linux port, sometimes 64bit SMP kernels were
not able to bring up other CPUs than the monarch CPU and instead crashed the
kernel.  The reason was unclear, esp. since it involved various machines (e.g.
J5600, J6750 and SuperDome). Testing showed, that those crashes didn't happened
when less than 4GB were installed, or if a 32bit Linux kernel was booted.

In the end, the fix for those SMP problems is trivial:
During the early phase of the initialization of the CPUs, including the monarch
CPU, the PDC_PSW firmware function to enable WIDE (=64bit) mode is called.
It's documented that this firmware function may clobber various registers, and
one one of those possibly clobbered registers is %cr30 which holds the task
thread info pointer.

Now, if %cr30 would always have been clobbered, then this bug would have been
detected much earlier. But lots of testing finally showed, that - at least for
%cr30 - on some machines only the upper 32bits of the 64bit register suddenly
turned zero after the firmware call.

So, after finding the root cause, the explanation for the various crashes
became clear:
- On 32bit SMP Linux kernels all upper 32bit were zero, so we didn't faced this
  problem.
- Monarch CPUs in 64bit mode always booted sucessfully, because the inital task
  thread info pointer was below 4GB.
- Secondary CPUs booted sucessfully on machines with less than 4GB RAM because
  the upper 32bit were zero anyay.
- Secondary CPus failed to boot if we had more than 4GB RAM and the task thread
  info pointer was located above the 4GB boundary.

Finally, the patch to fix this problem is trivial by saving the %cr30 register
before the firmware call and restoring it afterwards.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # 2.6.12+
Signed-off-by: Helge Deller <deller@gmx.de>
2013-10-27 15:58:44 +01:00
Linus Torvalds
20582e34c8 Merge tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from
 "These fix two bugs in the intel_pstate driver, a hibernate bug leading
  to nasty resume failures sometimes and acpi-cpufreq initialization bug
  that causes problems to happen during module unload when intel_pstate
  is in use.

  Specifics:

   - Fix for rounding errors in intel_pstate causing CPU utilization to
     be underestimated from Brennan Shacklett.

   - intel_pstate fix to always use the correct max pstate value when
     computing the min pstate from Dirk Brandewie.

   - Hibernation fix for deadlocking resume in cases when the probing of
     the device containing the image is deferred from Russ Dill.

   - acpi-cpufreq fix to prevent the module from staying in memory when
     the driver cannot be registered and then attempting to unregister
     things that have never been registered on exit"

* tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  acpi-cpufreq: Fail initialization if driver cannot be registered
  PM / hibernate: Move software_resume to late_initcall_sync
  intel_pstate: Correct calculation of min pstate value
  intel_pstate: Improve accuracy by not truncating until final result
2013-10-26 04:38:47 +01:00
Linus Torvalds
d255c59aab Merge tag 'for-linus-20131025' of git://git.infradead.org/linux-mtd
Pull final mtd fixes from Brian Norris:
 "A few more last-minute regression fixes, prepared jointly by me and
  David Woodhouse:

   - Revert pxa3xx to its old name to avoid breaking existing
     'mtdparts=' boot strings.

   - Return GPMI NAND to its legacy ECC layout for backwards
     compatibility.  We will revisit this in 3.13.

  A note from David on the latter fix: 'This leaves a harmless cosmetic
  warning about an unused function.  At this point in the cycle I really
  don't care.'"

* tag 'for-linus-20131025' of git://git.infradead.org/linux-mtd:
  mtd: gpmi: fix ECC regression
  mtd: nand: pxa3xx: Fix registered MTD name
2013-10-25 20:15:13 +01:00
Nicholas Bellinger
60a01f558a vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter
This patch addresses a long-standing bug where the get_user_pages_fast()
write parameter used for setting the underlying page table entry permission
bits was incorrectly set to write=1 for data_direction=DMA_TO_DEVICE, and
passed into get_user_pages_fast() via vhost_scsi_map_iov_to_sgl().

However, this parameter is intended to signal WRITEs to pinned userspace
PTEs for the virtio-scsi DMA_FROM_DEVICE -> READ payload case, and *not*
for the virtio-scsi DMA_TO_DEVICE -> WRITE payload case.

This bug would manifest itself as random process segmentation faults on
KVM host after repeated vhost starts + stops and/or with lots of vhost
endpoints + LUNs.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Asias He <asias@redhat.com>
Cc: <stable@vger.kernel.org> # 3.6+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-25 11:03:34 -07:00
Wei Yongjun
58932e96e4 target/pscsi: fix return value check
In case of error, the function scsi_host_lookup() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-25 10:42:09 -07:00
Linus Torvalds
f55ac56d5e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes (try two) from Al Viro:
 "nfsd performance regression fix + seq_file lseek(2) fix"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  seq_file: always update file->f_pos in seq_lseek()
  nfsd regression since delayed fput()
2013-10-25 18:16:47 +01:00
David Woodhouse
031e2777e0 mtd: gpmi: fix ECC regression
The "legacy" ECC layout used until 3.12-rc1 uses all the OOB area by
computing the ECC strength and ECC step size ourselves.

Commit 2febcdf84b ("mtd: gpmi: set the BCHs geometry with the ecc info")
makes the driver use the ECC info (ECC strength and ECC step size)
provided by the MTD code, and creates a different NAND ECC layout
for the BCH, and use the new ECC layout. This causes a regression:

   We can not mount the ubifs which was created by the old NAND ECC layout.

This patch fixes this issue by reverting to the legacy ECC layout.

We will probably introduce a new device-tree property to indicate that
the new ECC layout can be used. For now though, for the imminent 3.12
release, we just unconditionally revert to the 3.11 behaviour.

This leaves a harmless cosmetic warning about an unused function. At
this point in the cycle I really don't care.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Huang Shijie <b32955@freescale.com>
Acked-by: Marek Vasut <marex@denx.de>
Tested-by: Marek Vasut <marex@denx.de>
2013-10-25 10:09:43 -07:00
Gu Zheng
05e16745c0 seq_file: always update file->f_pos in seq_lseek()
This issue was first pointed out by Jiaxing Wang several months ago, but no
further comments:
https://lkml.org/lkml/2013/6/29/41

As we know pread() does not change f_pos, so after pread(), file->f_pos
and m->read_pos become different. And seq_lseek() does not update file->f_pos
if offset equals to m->read_pos, so after pread() and seq_lseek()(lseek to
m->read_pos), then a subsequent read may read from a wrong position, the
following program produces the problem:

    char str1[32] = { 0 };
    char str2[32] = { 0 };
    int poffset = 10;
    int count = 20;

    /*open any seq file*/
    int fd = open("/proc/modules", O_RDONLY);

    pread(fd, str1, count, poffset);
    printf("pread:%s\n", str1);

    /*seek to where m->read_pos is*/
    lseek(fd, poffset+count, SEEK_SET);

    /*supposed to read from poffset+count, but this read from position 0*/
    read(fd, str2, count);
    printf("read:%s\n", str2);

out put:
pread:
 ck_netbios_ns 12665
read:
 nf_conntrack_netbios

/proc/modules:
nf_conntrack_netbios_ns 12665 0 - Live 0xffffffffa038b000
nf_conntrack_broadcast 12589 1 nf_conntrack_netbios_ns, Live 0xffffffffa0386000

So we always update file->f_pos to offset in seq_lseek() to fix this issue.

Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-25 10:46:40 -04:00
Rafael J. Wysocki
75c0758137 acpi-cpufreq: Fail initialization if driver cannot be registered
Make acpi_cpufreq_init() return error codes when the driver cannot be
registered so that the module doesn't stay useless in memory and so
that acpi_cpufreq_exit() doesn't attempt to unregister things that
have never been registered when the module is unloaded.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2013-10-25 16:22:47 +02:00
Linus Torvalds
4208c47199 Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
 "There's really only one bugfix in this branch, which is a fix for
  timers on the integrator platform.  Since Linus Walleij is
  resurrecting support for the platform it seems valuable to get the fix
  into 3.12 even though the regression has been around a while.

  The rest are a handful of maintainers updates.  If you prefer to hold
  those until 3.13 then just merge the first patch on the branch which
  is the fix"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: Add maintainers entry for Rockchip SoCs
  MAINTAINERS: Tegra updates, and driver ownership
  MAINTAINERS: ARM: mvebu: add Sebastian Hesselbarth
  ARM: integrator: deactivate timer0 on the Integrator/CP
2013-10-25 11:49:23 +01:00
Linus Torvalds
88829dfe4b Merge tag 'ecryptfs-3.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
 "Two important fixes
   - Fix long standing memory leak in the (rarely used) public key
     support
   - Fix large file corruption on 32 bit architectures"

* tag 'ecryptfs-3.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: fix 32 bit corruption issue
  ecryptfs: Fix memory leakage in keystore.c
2013-10-25 07:32:01 +01:00
Russ Dill
d3c345dbc7 PM / hibernate: Move software_resume to late_initcall_sync
software_resume is being called after deferred_probe_initcall in
drivers base. If the probing of the device that contains the resume
image is deferred, and the system has been instructed to wait for
it to show up, this wait will occur in software_resume. This causes
a deadlock.

Move software_resume into late_initcall_sync so that it happens
after all the other late_initcalls.

Signed-off-by: Russ Dill <Russ.Dill@ti.com>
Acked-by: Pavel Machek <Pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-25 01:58:49 +02:00
Ezequiel Garcia
18a84e935e mtd: nand: pxa3xx: Fix registered MTD name
In a recent commit:

  commit f455578dd9
  Author: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
  Date:   Mon Aug 12 14:14:53 2013 -0300

  mtd: nand: pxa3xx: Remove hardcoded mtd name

  There's no advantage in using a hardcoded name for the mtd device.
  Instead use the provided by the platform_device.

The MTD name was changed to use the one provided by the platform_device.
However, this can be problematic as some users want to set partitions
using the kernel parameter 'mtdparts', where the name is needed.

Therefore, to avoid regressions in users relying in 'mtdparts' we revert
the change and use the previous one 'pxa3xx_nand-0'.

While at it, let's put a big comment and prevent this change from happening
ever again.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2013-10-24 14:44:28 -07:00
Colin Ian King
43b7c6c6a4 eCryptfs: fix 32 bit corruption issue
Shifting page->index on 32 bit systems was overflowing, causing
data corruption of > 4GB files. Fix this by casting it first.

https://launchpad.net/bugs/1243636

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reported-by: Lars Duesing <lars.duesing@camelotsweb.de>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2013-10-24 12:36:30 -07:00
Vinod Koul
7261828776 dmaengine: edma: fix another memory leak
commit 4b6271a6 fix a menory leak but one more existed in driver so fix that

Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-10-24 22:17:50 +05:30
Valentin Ilie
4b6271a644 dma: edma: Fix memory leak
When it fails to allocate a slot, edesc should be free'd before return;

Signed-off-by: Valentin Ilie <valentin.ilie@gmail.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-10-24 22:16:15 +05:30
Joseph Schuchart
c0268e8d1f perf script python: Fix mem leak due to missing Py_DECREFs on dict entries
We are using the Python scripting interface in perf to extract kernel
events relevant for performance analysis of HPC codes. We noticed that
the "perf script" call allocates a significant amount of memory (in the
order of several 100 MiB) during it's run, e.g. 125 MiB for a 25 MiB
input file:

  $> perf record -o perf.data -a -R -g fp \
       -e power:cpu_frequency -e sched:sched_switch \
       -e sched:sched_migrate_task -e sched:sched_process_exit \
       -e sched:sched_process_fork -e sched:sched_process_exec \
       -e cycles  -m 4096 --freq 4000
  $> /usr/bin/time perf script -i perf.data -s dummy_script.py
  0.84user 0.13system 0:01.92elapsed 51%CPU (0avgtext+0avgdata
  125532maxresident)k
  73072inputs+0outputs (57major+33086minor)pagefaults 0swaps

Upon further investigation using the valgrind massif tool, we noticed
that Python objects that are created in trace-event-python.c via
PyString_FromString*() (and their Integer and Long counterparts) are
never free'd.

The reason for this seem to be missing Py_DECREF calls on the objects
that are returned by these functions and stored in the Python
dictionaries. The Python dictionaries do not steal references (as
opposed to Python tuples and lists) but instead add their own reference.

Hence, the reference that is returned by these object creation functions
is never released and the memory is leaked. (see [1,2])

The attached patch fixes this by wrapping all relevant calls to
PyDict_SetItemString() and decrementing the reference counter
immediately after the Python function call.

This reduces the allocated memory to a reasonable amount:

  $> /usr/bin/time perf script -i perf.data -s dummy_script.py
  0.73user 0.05system 0:00.79elapsed 99%CPU (0avgtext+0avgdata
  49132maxresident)k
  0inputs+0outputs (0major+14045minor)pagefaults 0swaps

For comparison, with a 120 MiB input file the memory consumption
reported by time drops from almost 600 MiB to 146 MiB.

The patch has been tested using Linux 3.8.2 with Python 2.7.4 and Linux
3.11.6 with Python 2.7.5.

Please let me know if you need any further information.

[1] http://docs.python.org/2/c-api/tuple.html#PyTuple_SetItem
[2] http://docs.python.org/2/c-api/dict.html#PyDict_SetItemString

Signed-off-by: Joseph Schuchart <joseph.schuchart@tu-dresden.de>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lkml.kernel.org/r/1381468543-25334-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-24 10:16:54 -03:00
Nicholas Bellinger
48502ddbfb target: Fail XCOPY for non matching source + destination block_size
This patch adds an explicit check + failure for XCOPY I/O to source +
destination devices with a non-matching block_size.

This limitiation is currently due to the fact that the scatterlist
memory allocated for the XCOPY READ operation is passed zero-copy
to the XCOPY WRITE operation.

Reported-by: Thomas Glanzmann <thomas@glanzmann.de>
Reported-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-24 00:28:52 -07:00
Nicholas Bellinger
8a955d6dcc target: Generate failure for XCOPY I/O with non-zero scsi_status
This patch adds the missing non-zero se_cmd->scsi_status check required
for local XCOPY I/O within target_xcopy_issue_pt_cmd() to signal an
exception case failure.

This will trigger the generation of SAM_STAT_CHECK_CONDITION status
from within target_xcopy_do_work() process context code.

Reported-by: Thomas Glanzmann <thomas@glanzmann.de>
Reported-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-24 00:28:27 -07:00
Nicholas Bellinger
366bda191c target: Add missing XCOPY I/O operation sense_buffer
This patch adds the missing xcopy_pt_cmd->sense_buffer[] required for
correctly handling CHECK_CONDITION exceptions within the locally
generated XCOPY I/O path.

Also update target_xcopy_read_source() + target_xcopy_setup_pt_cmd()
to pass this buffer into transport_init_se_cmd() to correctly setup
se_cmd->sense_buffer.

Reported-by: Thomas Glanzmann <thomas@glanzmann.de>
Reported-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Thomas Glanzmann <thomas@glanzmann.de>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-24 00:28:08 -07:00
Linus Torvalds
e6036c0b88 Merge tag 'md/3.12-fixes' of git://neil.brown.name/md
Pull md bugfixes from Neil Brown:
 "Assorted md bug-fixes for 3.12.

  All tagged for -stable releases too"

* tag 'md/3.12-fixes' of git://neil.brown.name/md:
  raid5: avoid finding "discard" stripe
  raid5: set bio bi_vcnt 0 for discard request
  md: avoid deadlock when md_set_badblocks.
  md: Fix skipping recovery for read-only arrays.
2013-10-24 07:45:34 +01:00
Linus Torvalds
be6e8c7604 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "This is a set of two fixes which cause oopses (Buslogic, qla2xxx) and
 one fix which may cause a hang because of request miscounting (sd)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] sd: call blk_pm_runtime_init before add_disk
  [SCSI] qla2xxx: Fix request queue null dereference.
  [SCSI] BusLogic: Fix an oops when intializing multimaster adapter
2013-10-24 07:44:47 +01:00
Vu Pham
0a66614b93 iser-target: check device before dereferencing its variable
This patch changes isert_connect_release() to correctly check for
the existence struct isert_device *device before checking for
isert_device->use_frwr.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-23 21:42:33 -07:00
Shaohua Li
d47648fcf0 raid5: avoid finding "discard" stripe
SCSI discard will damage discard stripe bio setting, eg, some fields are
changed. If the stripe is reused very soon, we have wrong bios setting. We
remove discard stripe from hash list, so next time the strip will be fully
initialized.

Suitable for backport to 3.7+.

Cc: <stable@vger.kernel.org> (3.7+)
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-10-24 13:00:24 +11:00
Shaohua Li
37c61ff31e raid5: set bio bi_vcnt 0 for discard request
SCSI layer will add new payload for discard request. If two bios are merged
to one, the second bio has bi_vcnt 1 which is set in raid5. This will confuse
SCSI and cause oops.

Suitable for backport to 3.7+

Cc: stable@vger.kernel.org (v3.7+)
Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
2013-10-24 12:57:36 +11:00
Bian Yu
905b0297a9 md: avoid deadlock when md_set_badblocks.
When operate harddisk and hit errors, md_set_badblocks is called after
scsi_restart_operations which already disabled the irq. but md_set_badblocks
will call write_sequnlock_irq and enable irq. so softirq can preempt the
current thread and that may cause a deadlock. I think this situation should
use write_sequnlock_irqsave/irqrestore instead.

I met the situation and the call trace is below:
[  638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010
[  638.921923]  lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0
[  638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ #37
[  638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013
[  638.927816]  ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007
[  638.929829]  ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8
[  638.931848]  ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8
[  638.933884] Call Trace:
[  638.935867]  <IRQ>  [<ffffffff8172ff85>] dump_stack+0x55/0x76
[  638.937878]  [<ffffffff81730030>] spin_dump+0x8a/0x8f
[  638.939861]  [<ffffffff81730056>] spin_bug+0x21/0x26
[  638.941836]  [<ffffffff81336de4>] do_raw_spin_lock+0xa4/0xc0
[  638.943801]  [<ffffffff8173f036>] _raw_spin_lock+0x66/0x80
[  638.945747]  [<ffffffff814a73ed>] ? scsi_device_unbusy+0x9d/0xd0
[  638.947672]  [<ffffffff8173fb1b>] ? _raw_spin_unlock+0x2b/0x50
[  638.949595]  [<ffffffff814a73ed>] scsi_device_unbusy+0x9d/0xd0
[  638.951504]  [<ffffffff8149ec47>] scsi_finish_command+0x37/0xe0
[  638.953388]  [<ffffffff814a75e8>] scsi_softirq_done+0xa8/0x140
[  638.955248]  [<ffffffff8130e32b>] blk_done_softirq+0x7b/0x90
[  638.957116]  [<ffffffff8104fddd>] __do_softirq+0xfd/0x330
[  638.958987]  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
[  638.960861]  [<ffffffff8174a5cc>] call_softirq+0x1c/0x30
[  638.962724]  [<ffffffff81004c7d>] do_softirq+0x8d/0xc0
[  638.964565]  [<ffffffff8105024e>] irq_exit+0x10e/0x150
[  638.966390]  [<ffffffff8174ad4a>] smp_apic_timer_interrupt+0x4a/0x60
[  638.968223]  [<ffffffff817499af>] apic_timer_interrupt+0x6f/0x80
[  638.970079]  <EOI>  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
[  638.971899]  [<ffffffff8173fa6a>] ? _raw_spin_unlock_irq+0x3a/0x50
[  638.973691]  [<ffffffff8173fa60>] ? _raw_spin_unlock_irq+0x30/0x50
[  638.975475]  [<ffffffff81562393>] md_set_badblocks+0x1f3/0x4a0
[  638.977243]  [<ffffffff81566e07>] rdev_set_badblocks+0x27/0x80
[  638.978988]  [<ffffffffa00d97bb>] raid5_end_read_request+0x36b/0x4e0 [raid456]
[  638.980723]  [<ffffffff811b5a1d>] bio_endio+0x1d/0x40
[  638.982463]  [<ffffffff81304ff3>] req_bio_endio.isra.65+0x83/0xa0
[  638.984214]  [<ffffffff81306b9f>] blk_update_request+0x7f/0x350
[  638.985967]  [<ffffffff81306ea1>] blk_update_bidi_request+0x31/0x90
[  638.987710]  [<ffffffff813085e0>] __blk_end_bidi_request+0x20/0x50
[  638.989439]  [<ffffffff8130862f>] __blk_end_request_all+0x1f/0x30
[  638.991149]  [<ffffffff81308746>] blk_peek_request+0x106/0x250
[  638.992861]  [<ffffffff814a62a9>] ? scsi_kill_request.isra.32+0xe9/0x130
[  638.994561]  [<ffffffff814a633a>] scsi_request_fn+0x4a/0x3d0
[  638.996251]  [<ffffffff813040a7>] __blk_run_queue+0x37/0x50
[  638.997900]  [<ffffffff813045af>] blk_run_queue+0x2f/0x50
[  638.999553]  [<ffffffff814a5750>] scsi_run_queue+0xe0/0x1c0
[  639.001185]  [<ffffffff814a7721>] scsi_run_host_queues+0x21/0x40
[  639.002798]  [<ffffffff814a2e87>] scsi_restart_operations+0x177/0x200
[  639.004391]  [<ffffffff814a4fe9>] scsi_error_handler+0xc9/0xe0
[  639.005996]  [<ffffffff814a4f20>] ? scsi_unjam_host+0xd0/0xd0
[  639.007600]  [<ffffffff81072f6b>] kthread+0xdb/0xe0
[  639.009205]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
[  639.010821]  [<ffffffff81748cac>] ret_from_fork+0x7c/0xb0
[  639.012437]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170

This bug was introduce in commit  2e8ac30312
(the first time rdev_set_badblock was call from interrupt context),
so this patch is appropriate for 3.5 and subsequent kernels.

Cc: <stable@vger.kernel.org> (3.5+)
Signed-off-by: Bian Yu <bianyu@kedacom.com>
Reviewed-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-10-24 12:57:11 +11:00
Lukasz Dorau
61e4947c99 md: Fix skipping recovery for read-only arrays.
Since:
        commit 7ceb17e87b
        md: Allow devices to be re-added to a read-only array.

spares are activated on a read-only array. In case of raid1 and raid10
personalities it causes that not-in-sync devices are marked in-sync
without checking if recovery has been finished.

If a read-only array is degraded and one of its devices is not in-sync
(because the array has been only partially recovered) recovery will be skipped.

This patch adds checking if recovery has been finished before marking a device
in-sync for raid1 and raid10 personalities. In case of raid5 personality
such condition is already present (at raid5.c:6029).

Bug was introduced in 3.10 and causes data corruption.

Cc: stable@vger.kernel.org
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-10-24 12:55:17 +11:00
Dave Jiang
18ebd564e4 MAINTAINERS: add to ioatdma maintainer list
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[djbw: add dmaengine list]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-10-23 21:51:18 +05:30
Vinod Koul
17b59560ef MAINTAINERS: add the new dmaengine mailing list
We have a new mailing list hosted by vger for dmaengine

Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-10-23 21:48:05 +05:30
Aaron Lu
10c580e423 [SCSI] sd: call blk_pm_runtime_init before add_disk
Sujit has found a race condition that would make q->nr_pending
unbalanced, it occurs as Sujit explained:

"
sd_probe_async() ->
	add_disk() ->
		disk_add_event() ->
			schedule(disk_events_workfn)
	sd_revalidate_disk()
	blk_pm_runtime_init()
return;

Let's say the disk_events_workfn() calls sd_check_events() which tries
to send test_unit_ready() and because of sd_revalidate_disk() trying to
send another commands the test_unit_ready() might be re-queued as the
tagged command queuing is disabled.

So the race condition is -

Thread 1 			  |		Thread 2
sd_revalidate_disk()		  |	sd_check_events()
...nr_pending = 0 as q->dev = NULL|	scsi_queue_insert()
blk_runtime_pm_init()		  | 	blk_pm_requeue_request() ->
				  |	nr_pending = -1 since
				  |	q->dev != NULL
"

The problem is, the test_unit_ready request doesn't get counted the
first time it is queued, so the later decrement of q->nr_pending in
blk_pm_requeue_request makes it unbalanced.

Fix this by calling blk_pm_runtime_init before add_disk so that all
requests initiated there will all be counted.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reported-and-tested-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-23 14:09:18 +01:00
Chad Dupuis
36008cf118 [SCSI] qla2xxx: Fix request queue null dereference.
If an invalid IOCB is returned on the response queue then the index into the
request queue map could be invalid and could return to us a bogus value. This
could cause us to try to deference an invalid pointer and cause an exception.

If we encounter this condition, simply return as no context can be established
for this response.

Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-23 14:09:18 +01:00