The gpmi-nand driver uses virt_addr_valid() to check whether a buffer
is suitable for dma. If it's not, a driver allocated buffer is used
instead. Then after a page read the driver allocated buffer must be
copied to the user supplied buffer. This does not happen since commit
7725cc8593.
This patch fixes the issue. The bug is encountered with UBI which uses a
vmalloced buffer for the volume table.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Tested-by: snijsure@grid-net.com
Acked-by: Huang Shijie <b32955@freescale.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The following commit changes the function used to copy from/to
the hardware buffer to memcpy_[from|to]io. This does not work
since the hardware cannot handle the byte accesses used by these
functions. Instead of reverting this patch introduce 32bit
correspondents of these functions.
| commit 5775ba36ea9c760c2d7e697dac04f2f7fc95aa62
| Author: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
| Date: Tue Apr 24 10:05:22 2012 +0200
|
| mtd: mxc_nand: fix several sparse warnings about incorrect address space
|
| Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
| Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The intent here was clearly to set result to true if the 0x40000000 flag
was set. But instead there was a | vs & typo and we always set result
to true.
Artem: check the spec at
wiki.laptop.org/images/5/5c/88ALP01_Datasheet_July_2007.pdf
and this fix looks correct.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In the new PCM streaming logic, the interface number is assigned to
usb stream instance (subs->interface) after the format and rate setups
are succeeded, but some codes are still passing subs->interface as the
reference to helper functions. This leads to initializing with an
invalid iface number (-1).
This patch replaces the wrong references with the ones from the target
fmt correctly.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Kevin discovered that commit c8d82ff68f
("ARM: OMAP2/3: hwmod data: Add 32k-sync timer data to hwmod
database") broke CORE idle on OMAP3. This prevents device low power
states.
The root cause is that the 32K sync timer IP block does not support
smart-idle mode[1], and so the hwmod code keeps the IP block in
no-idle mode while it is active. This in turn prevents the WKUP
clockdomain from transitioning to idle. There is a hardcoded sleep
dependency that prevents the CORE_L3 and CORE_CM clockdomains from
transitioning to idle when the WKUP clockdomain is active[2], so the
chip cannot enter any device low power states.
It turns out that there is no need to take the 32k sync timer out of
idle. The IP block itself probably does not have any native idle
handling at all, due to its simplicity. Furthermore, the PRCM will
never request target idle for this IP block while the kernel is
running, due to the sleep dependency that prevents the WKUP
clockdomain from idling while the CORE_L3 clockdomain is active. So
we can safely leave the 32k sync timer in target-force-idle mode, even
while we continue to access it.
This workaround is implemented by defining a new clockdomain flag,
CLKDM_ACTIVE_WITH_MPU, that indicates that the clockdomain is
guaranteed to be active whenever the MPU is inactive. If an IP
block's main functional clock exists inside this clockdomain, and the
IP block does not support smart-idle modes, then the hwmod code will
place the IP block into target force-idle mode even when enabled. The
WKUP clockdomains on OMAP3/4 are marked with this flag. (On OMAP2xxx,
no OCP header existed on the 32k sync timer.) Other clockdomains also
should be marked with this flag, but those changes are deferred until
a later merge window, to create a minimal fix.
Another theoretically clean fix for this problem would be to implement
PM runtime-based control for 32k sync timer accesses. These PM
runtime calls would need to located in a custom clocksource, since the
32k sync timer is currently used as an MMIO clocksource. But in
practice, there would be little benefit to doing so; and there would
be some cost, due to the addition of unnecessary lines of code and the
additional CPU overhead of the PM runtime and hwmod code - unnecessary
in this case.
Another possible fix would have been to modify the pm34xx.c code to
force the IP block idle before entering WFI. But this would not have
been an acceptable approach: we are trying to remove this type of
centralized IP block idle control from the PM code.
This patch is a collaboration between Kevin Hilman <khilman@ti.com>
and Paul Walmsley <paul@pwsan.com>.
Thanks to Vaibhav Hiremath <hvaibhav@ti.com> for providing comments on
an earlier version of this patch. Thanks to Tero Kristo
<t-kristo@ti.com> for identifying a bug in an earlier version of this
patch. Thanks to Benoît Cousson <b-cousson@ti.com> for identifying
some bugs in several versions of this patch and for implementation
comments.
References:
1. Table 16-96 "REG_32KSYNCNT_SYSCONFIG" of the OMAP34xx TRM Rev. ZU
(SWPU223U), available from:
http://www.ti.com/pdfs/wtbu/OMAP34x_ES3.1.x_PUBLIC_TRM_vzU.zip
2. Table 4-72 "Sleep Dependencies" of the OMAP34xx TRM Rev. ZU
(SWPU223U)
3. ibid.
Cc: Tony Lindgren <tony@atomide.com>
Cc: Vaibhav Hiremath <hvaibhav@ti.com>
Cc: Benoît Cousson <b-cousson@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Kevin Hilman <khilman@ti.com>
xhci: Fix driver hang and resume error path.
Hi Greg,
Here's two bug fixes for 3.5. The first fixes an issue with port
connections not being reported to the USB core after a system resume.
The second fixes a driver hang when there are two back to back stalls on
an endpoint.
Sarah Sharp
Pull ARM SoC fixes from Arnd Bergmann:
"Small fixes on multiple ARM platforms
- A build regression from a previous fix on dove and mv78xx0
- Two fixes for recently (3.5-rc1) changed mmp/pxa code
- multiple omap2+ bug fixes
- two trivial fixes for i.MX
- one v3.5 regression for mxs"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: apx4devkit: fix FEC enabling PHY clock
ARM: OMAP2+: hwmod data: Fix wrong McBSP clock alias on OMAP4
ARM: OMAP4: hwmod data: temporarily comment out data for the usb_host_fs and aess IP blocks
ARM: Orion: Fix WDT compile for Dove and MV78xx0
ARM: mmp: remove mach/gpio-pxa.h
ARM: imx: assert SCC gate stays enabled
ARM: OMAP4: TWL6030: ensure sys_nirq1 is mux'd and wakeup enabled
ARM: OMAP2: Overo: init I2C before MMC to fix MMC suspend/resume failure
ARM: imx27_visstrim_m10: Do not include <asm/system.h>
ARM: pxa: hx4700: Fix basic suspend/resume
Pull KVM fix from Marcelo Tosatti:
"Memory leak and oops on the x86 mmu code, and sanitization of the
KVM_IRQFD ioctl."
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: fix shrinking page from the empty mmu
KVM: fix fault page leak
KVM: Sanitize KVM_IRQFD flags
KVM: Add missing KVM_IRQFD API documentation
KVM: Pass kvm_irqfd to functions
Pull leds fix from Bryan Wu:
"Fix for heartbeat led trigger driver"
* 'fixes-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: heartbeat: fix bug on panic
Pull btrfs updates from Chris Mason:
"I held off on my rc5 pull because I hit an oops during log recovery
after a crash. I wanted to make sure it wasn't a regression because
we have some logging fixes in here.
It turns out that a commit during the merge window just made it much
more likely to trigger directory logging instead of full commits,
which exposed an old bug.
The new backref walking code got some additional fixes. This should
be the final set of them.
Josef fixed up a corner where our O_DIRECT writes and buffered reads
could expose old file contents (not stale, just not the most recent).
He and Liu Bo fixed crashes during tree log recover as well.
Ilya fixed errors while we resume disk balancing operations on
readonly mounts."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: run delayed directory updates during log replay
Btrfs: hold a ref on the inode during writepages
Btrfs: fix tree log remove space corner case
Btrfs: fix wrong check during log recovery
Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
Btrfs: resume balance on rw (re)mounts properly
Btrfs: restore restriper state on all mounts
Btrfs: fix dio write vs buffered read race
Btrfs: don't count I/O statistic read errors for missing devices
Btrfs: resolve tree mod log locking issue in btrfs_next_leaf
Btrfs: fix tree mod log rewind of ADD operations
Btrfs: leave critical region in btrfs_find_all_roots as soon as possible
Btrfs: always put insert_ptr modifications into the tree mod log
Btrfs: fix tree mod log for root replacements at leaf level
Btrfs: support root level changes in __resolve_indirect_ref
Btrfs: avoid waiting for delayed refs when we must not
Thanks to Charles Wang for spotting the defects in the current code:
- If we go idle during the sample window -- after sampling, we get a
negative bias because we can negate our own sample.
- If we wake up during the sample window we get a positive bias
because we push the sample to a known active period.
So rewrite the entire nohz load-avg muck once again, now adding
copious documentation to the code.
Reported-and-tested-by: Doug Smythies <dsmythies@telus.net>
Reported-and-tested-by: Charles Wang <muming.wq@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1340373782.18025.74.camel@twins
[ minor edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull DT fixes from Rob Herring:
"Mainly some documentation updates and 2 fixes:
- An export symbol fix for of_platform_populate from Stephen W.
- A fix for the order compatible entries are matched to ensure the
first compatible string is matched when there are multiple matches."
Normally these would go through Grant Likely (thus the "fixes-for-grant"
branch name), but Grant is in the middle of moving to Scotland, and is
practically offline until sometime in August. So pull directly from Rob.
* 'fixes-for-grant' of git://sources.calxeda.com/kernel/linux:
of: match by compatible property first
dt: mc13xxx.txt: Fix gpio number assignment
dt: fsl-fec.txt: Fix gpio number assignment
dt: fsl-mma8450.txt: Add missing 'reg' description
dt: fsl-imx-esdhc.txt: Fix gpio number assignment
dt: fsl-imx-cspi.txt: Fix comment about GPIOs used for chip selects
of: Add Avionic Design vendor prefix
of: export of_platform_populate()
Initialize the gpio chip's of_node to the device's node
to work with DT based system.
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The chained handler was set for the platform device with id == 0.
When the gpio devices are instantiated by a device tree, all have id ==
-1 and so the handler was unset resulting in unusable gpio irqs on
i.MX21 and i.MX27 (when using oftree).
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
We already have this pattern at quite a few places, and moving part of
the modeset helper stuff into the driver will add more.
v2: Don't clobber the crtc struct name with the macro parameter ...
v3: Convert two more places noticed by Paulo Zanoni.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The bit 2 and 3 in GPIO flag are allocated for the
flag OPEN_DRAIN/OPEN_SOURCE. These bits are reused
for the flag EXPORT/EXPORT_CHANGEABLE and so creating
conflict.
Fix this conflict by assigning bit 4 and 5 for the
flag EXPORT/EXPORT_CHANGEABLE.
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This patch fixes two checks for valid gpio number, formerly (wrongly)
considering zero as invalid, now using gpio_is_valid().
Signed-off-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Without this, modules can't use this API, leading to build failures.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Not paying attention to the value being set is a bad thing because it
means that we'll not set the hardware up to reflect what was requested.
Not setting the hardware up to reflect what was requested means that the
caller won't get the results they wanted.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The feature GPIO_MSM_V1 is only available on three SoCs. On all other MSM SoCs
the INT_GPIO_GROUP{1,2} is undeclared, but Kconfig does allow such
configurations. Therefore the produced configuration is valid, but does not
compile. The problem is fixed by adding the missing Kconfig constraints.
drivers/gpio/gpio-msm-v1.c: In function âmsm_init_gpioâ:
drivers/gpio/gpio-msm-v1.c:629:26: error: 'INT_GPIO_GROUP1' undeclared
drivers/gpio/gpio-msm-v1.c:630:26: error: 'INT_GPIO_GROUP2' undeclared
Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
If there is no platform data available, the driver shouldn't use the
pointer or it will oops. Since things will mostly work nonetheless,
(the BIOS may have set up the pins properly), I'd better not fail the
probe even in this case.
Signed-off-by: Alessandro Rubini <rubini@gnudd.com>
Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
arch/arm/mm/init.c: In function 'arm_memblock_init':
arch/arm/mm/init.c:380: warning: comparison of distinct pointer types lacks a cast
by fixing the typecast in its definition when DMA_ZONE is disabled.
This was missed in 4986e5c7c (ARM: mm: fix type of the arm_dma_limit
global variable).
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix:
net/netfilter/xt_connbytes.c: In function 'connbytes_mt':
net/netfilter/xt_connbytes.c:43: warning: passing argument 1 of 'atomic64_read' discards qualifiers from pointer target type
...
by adding the missing const.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
'sub pc, pc, #1b-2b+8-2' results in address<1:0> == '10'.
sub pc, pc, #const (== ADR pc, #const) performs an interworking branch
(BXWritePC()) on ARMv7+ and a simple branch (BranchWritePC()) on earlier
versions.
In ARM state, BXWritePC() is UNPREDICTABLE when address<1:0> == '10'.
In ARM state on ARMv6+, BranchWritePC() ignores address<1:0>. Before
ARMv6, BranchWritePC() is UNPREDICTABLE if address<1:0> != '00'
So the instruction is UNPREDICTABLE both before and after v6.
Acked-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The tileoffset register only supports a limited offset in x/y of 4096,
so for giant screen configuration with a shared fb we wrap around.
Fix this by computing a linear offset in tiles (pages) and only use
the tileoffset register to offset within the tile.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To avoid recomputing the display framebuffer offset on gen2/3
pageflips. This is also prep work to do similar trickery on gen4+
Also:
- kill "Start", such upper-case remnants from the ddx must surely die.
- rename "Offset" to linear_offset, to make it clearer that on gen4+
this is only used by the hw for linear buffers, for tiled buffers it
uses the TILEOFF register.
- call DSAPADDR DSPLINOFF on gen4+ for the same reason (and because
the documentation really renamed the register).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
MI display flips can't handle some changes in the framebuffer
format or layout. Return an error in such cases.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Zero initialize the mode_cmd structure when creating the kernel
framebuffer. Avoids having uninitialized data in offsets[0] for
instance.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PM related fixes for omaps mostly to get suspend/resume
working again.
* tag 'omap-fixes-for-v3.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: hwmod data: Fix wrong McBSP clock alias on OMAP4
ARM: OMAP4: hwmod data: temporarily comment out data for the usb_host_fs and aess IP blocks
ARM: OMAP4: TWL6030: ensure sys_nirq1 is mux'd and wakeup enabled
ARM: OMAP2: Overo: init I2C before MMC to fix MMC suspend/resume failure
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We currently return -EPERM if the user requests mode exclusion that is
not supported by the CPU. This looks pretty confusing from userspace
and is inconsistent with other architectures (ppc, x86).
This patch returns -EOPNOTSUPP instead.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This reverts commit 6b5c8045ec.
Conflicts:
arch/arm/kernel/ptrace.c
The new syscall restarting code can lead to problems if we take an
interrupt in userspace just before restarting the svc instruction. If
a signal is delivered when returning from the interrupt, the
TIF_SYSCALL_RESTARTSYS will remain set and cause any syscalls executed
from the signal handler to be treated as a restart of the previously
interrupted system call. This includes the final sigreturn call, meaning
that we may fail to exit from the signal context. Furthermore, if a
system call made from the signal handler requires a restart via the
restart_block, it is possible to clear the thread flag and fail to
restart the originally interrupted system call.
The right solution to this problem is to perform the restarting in the
kernel, avoiding the possibility of handling a further signal before the
restart is complete. Since we're almost at -rc6, let's revert the new
method for now and aim for in-kernel restarting at a later date.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Avoid polluting drivers with a set_domain() macro, which interferes with
structure member names:
drivers/net/wireless/ath/ath9k/dfs_pattern_detector.c:294:33: error: macro "set_domain" passed 2 arguments, but takes just 1
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
io flag is involved in it to serialize the unaligned aio. As
*private is not initialized in init_sync_kiocb() of do_sync_write(),
this unaligned io flag may be unexpectly set in an aligned dio.
And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
will hang forever at ocfs2_aiodio_wait() in ocfs2_file_aio_write().
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
The issue with this check is that it results in userspace receiving an
-EIO while the gpu reset hasn't completed, resulting in fallback to sw
rendering or worse.
Now there's also a stern comment in intel_ring_wait_seqno saying that
intel_ring_begin should not return -EAGAIN, ever, because some callers
can't handle that. But after an audit of the callsites I don't see any
issues. I guess the last problematic spot disappeared with the removal
of the pipelined fencing code.
So do the right thing and call check_wedge, which should properly
decide whether an -EAGAIN or -EIO is appropriate if wedged is set.
Note that the early check for a wedged gpu before touching the ring is
rather important (and it took me quite some time of acting like the
densest doofus to figure that out): If we don't do that and the gpu
died for good, not having been resurrect by the reset code, userspace
can merrily fill up the entire ring until it notices that something is
amiss.
Allowing userspace to emit more render, despite that we know that it
will fail can't lead to anything good (and by experience can lead to
all sorts of havoc, including angering the OOM gods and hard-hanging
the hw for good).
v2: Fix EAGAIN mispell, noticed by Chris Wilson.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... instead of looping endless with no hope of ever serving that
page-fault. We only need to break out of this loop when the gpu died,
to run the reset work (and hopefully resurrect it).
To clarify questions Chris raised on irc: This is about handling I/O
errors not from our own code, but e.g. when the disk died when trying
to swap in a gem bo. So this patch remidies the issue that the current
handling only handles gpu-death-induced cases of -EIO. Admittedly,
dying disks are much rarer than hanging gpus ...To clarify questions
Chris raised on irc: This is about handling I/O errors not from our
own code, but e.g. when the disk died when trying to swap in a gem bo.
So this patch remidies the issue that the current handling only
handles gpu-death-induced cases of -EIO. Admittedly, dying disks are
much rarer than hanging gpus ...
This seems to have been lost in:
commit d9bc7e9f32
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Feb 7 13:09:31 2011 +0000
drm/i915: Fix infinite loop regression from 21dd3734
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the gpu reset no longer using a trylock we've increased the
chances of userspace getting stuck quite a bit. To make that
(hopefully) rare case more paletable time out when waiting for the gpu
reset code to complete and signal this little issue to the caller by
returning -EIO.
This should help userspace to somewhat gracefully fall back and
hopefully allow the user to grab some logs and reboot the machine
(instead of staring at a frozen X screen in agony).
Suggested by Chris Wilson because I've been stubborn about allowing
the gpu reset code no to fail, ever (by removing the trylock).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So don't return -EAGAIN, even in the case of a gpu hang. Remap it to
-EIO instead. Note that this isn't really an issue with
interruptability, but more that we have quite a few codepaths (mostly
around kms stuff) that simply can't handle any errors and hence not
even -EAGAIN. Instead of adding proper failure paths so that we could
restart these ioctls we've opted for the cheap way out of sleeping
non-interruptibly. Which works everywhere but when the gpu dies,
which this patch fixes.
So essentially interruptible == false means 'wait for the gpu or die
trying'.'
This patch is a bit ugly because intel_ring_begin is all non-interruptible
and hence only returns -EIO. But as the comment in there says,
auditing all the callsites would be a pain.
To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
and intel_wait_ring_buffer. Also use the opportunity to clarify the
different cases in i915_gem_check_wedge a bit with comments.
v2: Don't access dev_priv->mm.interruptible from check_wedge - we
might not hold dev->struct_mutex, making this racy. Instead pass
interruptible in as a parameter. I've noticed this because I've hit a
BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
added in
commit b4aca0106c
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Apr 25 20:50:12 2012 -0700
drm/i915: extract some common olr+wedge code
although that commit is missing any justification for this. I guess
it's just copy&paste, because the same commit add the same BUG_ON
check to check_olr, where it indeed makes sense.
But in check_wedge everything we access is protected by other means,
so this is superflous. And because it now gets in the way (we add a
new caller in __wait_seqno, which can be called without
dev->struct_mutext) let's just remove it.
v3: Group all the i915_gem_check_wedge refactoring into this patch, so
that this patch here is all about not returning -EAGAIN to callsites
that can't handle syscall restarting.
v4: Add clarification what interuptible == fales means in our code,
requested by Ben Widawsky.
v5: Fix EAGAIN mispell noticed by Chris Wilson.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>