Implement timer_broadcast for the arm architecture, allowing for the use
of clock_event_device_drivers decoupled from the timer tick broadcast
mechanism.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Currently, the ARM backend must maintain a redundant list of timers for
the purpose of centralising timer broadcast functionality. This prevents
sharing timer drivers across architectures.
This patch moves the pain of dealing with timer broadcasts to the core
clockevents tick broadcast code, which already maintains its own list
of timers.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Pull more device-mapper fixes from Alasdair G Kergon:
"A fix for stacked dm thin devices and a fix for the new dm WRITE SAME
support."
* tag 'dm-3.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm: fix write same requests counting
dm thin: fix queue limits stacking
Alex writes:
"A few more radeon fixes for 3.8. Mostly small stuff. The big
change is disabling the use of the DMA ring for VM PT updates. This
reverts back to the 3.7 behavior. Problem is we can get huge PT
updates in certain cases that are too big for the DMA ring. I've
got patches to use an IB for this so I can re-enable the use of the
DMA ring for VM PT updates in 3.9. This request also includes the
patches from the last pull request I sent on Monday in case you haven't
pulled them yet."
* 'drm-fixes-3.8' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: switch back to the CP ring for VM PT updates
drm/radeon: prevent crash in the ring space allocation
drm/radeon: Calling object_unrefer() when creating fb failure
drm/radeon/r5xx-r7xx: wait for the MC to settle after MC blackout
drm/radeon/evergreen+: wait for the MC to settle after MC blackout
drm/radeon: protect against div by 0 in backend setup
drm/radeon: fix backend map setup on 1 RB sumo boards
drm/radeon: add quirk for RV100 board
drm/radeon: add WAIT_UNTIL to the non-VM safe regs list for cayman/TN
drm/radeon: fix MC blackout on evergreen+
This patch fixes a possible divide by zero bug when the fabric_max_sectors
device attribute is written and backend se_device failed to be successfully
configured -> enabled.
Go ahead and use block_size=512 within se_dev_set_fabric_max_sectors()
in the event of a target_configure_device() failure case, as no valid
dev->dev_attrib.block_size value will have been setup yet.
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch fixes a v3.8-rc1 regression bug where an unconfigured se_device
was incorrectly allowed to perform a fabric port-link. This bug was
introduced in commit:
commit 0fd97ccf45
Author: Christoph Hellwig <hch@infradead.org>
Date: Mon Oct 8 00:03:19 2012 -0400
target: kill struct se_subsystem_dev
which ended up dropping the original se_subsystem_dev->se_dev_ptr check
preventing this from happening with pre commit 0fd97ccf code.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
PullHID fixes from Jiri Kosina:
- fix i2c-hid and hidraw interaction, by Benjamin Tissoires
- a quirk to make a particular device (Formosa IR receiver) work
properly, by Nicholas Santos
* 'for-3.8/upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: i2c-hid: fix i2c_hid_output_raw_report
HID: usbhid: quirk for Formosa IR receiver
HID: remove x bit from sensor doc
Pull NFS client bugfixes from Trond Myklebust:
- Error reporting in nfs_xdev_mount incorrectly maps all errors to
ENOMEM
- Fix an NFSv4 refcounting issue
- Fix a mount failure when the server reboots during NFSv4 trunking
discovery
- NFSv4.1 mounts may need to run the lease recovery thread.
- Don't silently fail setattr() requests on mountpoints
- Fix a SUNRPC socket/transport livelock and priority queue issue
- We must handle NFS4ERR_DELAY when resetting the NFSv4.1 session.
* tag 'nfs-for-3.8-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session
SUNRPC: When changing the queue priority, ensure that we change the owner
NFS: Don't silently fail setattr() requests on mountpoints
NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recovery
NFSv4: Fix NFSv4 trunking discovery
NFSv4: Fix NFSv4 reference counting for trunked sessions
NFS: Fix error reporting in nfs_xdev_mount
Pull MIPS updates from Ralf Baechle:
"A number of fixes all across the MIPS tree. No area is particularly
standing out and things have cooled down quite nicely for a release."
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Function tracer: Fix broken function tracing
mips: Move __virt_addr_valid() to a place for MIPS 64
MIPS: Netlogic: Fix UP compilation on XLR
MIPS: AR71xx: Fix AR71XX_PCI_MEM_SIZE
MIPS: AR724x: Fix AR724X_PCI_MEM_SIZE
MIPS: Lantiq: Fix cp0_perfcount_irq mapping
MIPS: DSP: Fix DSP mask for registers.
MIPS: Fix build failure by adding definition of pfn_pmd().
MIPS: Octeon: Fix warning.
MIPS: delay.c: Check BITS_PER_LONG instead of __SIZEOF_LONG__
MIPS: PNX833x: Fix comment.
MIPS: Add struct p_format to union mips_instruction.
MIPS: Export <asm/break.h>.
MIPS: BCM47xx: Enable SSB prerequisite SSB_DRIVER_PCICORE.
MIPS: BCM47xx: Select GPIOLIB for BCMA on bcm47xx platform
MIPS: vpe.c: Fix null pointer dereference in print arguments.
For large VM page table updates, we can sometimes generate
more packets than there is space on the ring. This happens
more readily with the DMA ring since it is 64K (vs 1M for the
CP). For now, switch back to the CP. For the next kernel,
I have a patch to utilize IBs for VM PT updates which
alleviates this problem.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=58354
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If the requested number of DWs on the ring is larger than
the size of the ring itself, return an error.
In testing with large VM updates, we've seen crashes when we
try and allocate more space on the ring than the total size
of the ring without checking.
This prevents the crash but for large VM updates or bo moves
of very large buffers, we will need to break the transaction
down into multiple batches. I have patches to use IBs for
the next kernel.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Some chips seem to need a little delay after blacking out
the MC before the requests actually stop. Stop DMAR errors
reported by Shuah Khan.
Reported-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's OK to get kick before backend is set or after
it is cleared, we can just ignore it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
On receiving the SYN-ACK, Fast Open checks icsk_retransmit for SYN
retransmission to detect SYN/data drops. But if F-RTO is disabled,
icsk_retransmit is reset at step D of tcp_fastretrans_alert() (
under tcp_ack()) before tcp_rcv_fastopen_synack(). The fix is to use
total_retrans instead which accounts for SYN retransmission regardless
the use of F-RTO.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_ip6 is incorrectly using the IPv4-specific ip_cmsg_recv to handle
ancillary data. This means that socket options such as IPV6_RECVPKTINFO are
not honoured in userspace.
Convert l2tp_ip6 to use the IPv6-specific handler.
Ref: net/ipv6/udp.c
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_datagram_recv_ctl and ip6_datagram_send_ctl are used for handling IPv6
ancillary data. Since ip6_datagram_send_ctl is already publicly exported for
use in modules, ip6_datagram_recv_ctl should also be available to support
ancillary data in the receive path.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The datagram_*_ctl functions in net/ipv6/datagram.c are IPv6-specific. Since
datagram_send_ctl is publicly exported it should be appropriately named to
reflect the fact that it's for IPv6 only.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If occurs a LE or SCO hci_conn timeout and the connection is already
established (BT_CONNECTED state), the connection is not terminated as
expected. This bug can be reproduced using l2test or scotest tool.
Once the connection is established, kill l2test/scotest and the
connection won't be terminated.
This patch fixes hci_conn_disconnect helper so it is able to
terminate LE and SCO connections, as well as ACL.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
The conn->smp_chan pointer can be NULL if SMP PDUs arrive at unexpected
moments. To avoid NULL pointer dereferences the code should be checking
for this and disconnect if an unexpected SMP PDU arrives. This patch
fixes the issue by adding a check for conn->smp_chan for all other PDUs
except pairing request and security request (which are are the first
PDUs to come to initialize the SMP context).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
CC: stable@vger.kernel.org
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
i2c_hid_output_raw_report is used by hidraw to forward set_report requests.
The current implementation of i2c_hid_set_report needs to take the
report_id as an argument. The report_id is stored in the first byte
of the buffer in argument of i2c_hid_output_raw_report.
Not removing the report_id from the given buffer adds this byte 2 times
in the command, leading to a non working command.
Reported-by: Andrew Duggan <aduggan@synaptics.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
If we're booted in HYP mode, it is possible that we'll run some
kind of virtualized environment. In this case, it is a better to
switch to the physical timers, and leave the virtual timers to
guests.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Currently the documentation for the arch_timer devicetree binding only
lists "arm,armv7-timer".
Add "arm,armv8-timer" to the list of compatible strings.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
The arch_timer driver supports a superset of the functionality of the
arm_generic driver, and is not tied to a particular arch.
This patch moves arm64 to use the arch_timer driver, gaining additional
functionality in doing so, and removes the (now unused) arm_generic
driver. Timer-related hooks specific to arm64 are moved into
arch/arm64/kernel/time.c.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Currently arch_counter_get_cnt{p,v}ct can be speculated, allowing for
stale time values to be read. This could be problematic for the delay
loop and other sensitive functions, as the time delta could jump around
unexpectedly.
This patch adds isbs to arch_counter_get_cnt{p,v}ct, preventing this
possibility.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
The core functionality of the arch_timer driver is not directly tied to
anything under arch/arm, and can be split out.
This patch factors out the core of the arch_timer driver, so it can be
shared with other architectures. A couple of functions are added so
that architecture-specific code can interact with the driver without
needing to touch its internals.
The ARM_ARCH_TIMER config variable is moved out to
drivers/clocksource/Kconfig, existing uses in arch/arm are replaced with
HAVE_ARM_ARCH_TIMER, which selects it.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Several bits in CNTKCTL reset to 0, including PL0VTEN. For architectures
using the generic timer which wish to have a fast gettimeofday vDSO
implementation, these bits must be set to 1 by the kernel. For
architectures without a vDSO, it's best to leave the bits set to 0 for
now to ensure that if and when support is added, it's implemented sanely
architecture wide.
As the bootloader might set PL0VTEN to a value that doesn't correspond
to that which the kernel prefers, we must explicitly set it to the
architecture port's preferred value.
This patch adds arch_counter_set_user_access, which sets the PL0 access
permissions to that required by the architecture. For arch/arm, this
currently means disabling all userspace access.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, the arch_timer driver is tied to the arm port, as it relies
on code in arch/arm/smp.c to setup and teardown timers as cores are
hotplugged on and off. The timer is registered through an arm-specific
registration mechanism, preventing sharing the driver with the arm64
port.
This patch moves the driver to using a cpu notifier instead, making it
easier to port.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Without the isbs in arch_timer_get_cnt{p,v}ct the cpu may speculate
reads and return stale values. This could be bad for code sensitive to
changes in expected deltas between calls (e.g. the delay loop).
Without isbs in arch_timer_reg_write the processor may reorder
instructions around enabling/disabling of the timer or writing the
compare value, which we probably don't want.
This patch adds isbs to prevent those issues.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Currently the arch_timer register accessors are thrown together with
the main driver, preventing us from porting the driver to other
architectures.
This patch moves the register accessors into a header file, as with
the arm64 version. Constants required by the accessors are also moved.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
The CNTFRQ register is not duplicated for physical and virtual timers,
and accessing it as if it were is confusing.
Instead, use a separate accessor which doesn't take the access type
as a parameter.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
We're currently inconsistent with respect to our accesses to the
physical and virtual counters, mixing and matching the two.
This patch introduces and uses a function pointer for accessing the
correct counter based on whether we're using physical or virtual
interrupts. All current accesses to the counter accessors are redirected
through it.
When the driver is moved out to drivers/clocksource, there's the
possibility that code called before the timer code is initialised will
attempt to call arch_timer_read_counter (e.g. sched_clock for AArch64).
To avoid having to have to check whether the timer has been initialised
either in arch_timer_read_counter or one of it's callers, a default
implementation is assigned that simply returns 0.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
To ensure the correct size of types, use u64 for the return value of
arch_timer_get_cnt{p,v}ct, and u32 for arch_timer_rate, matching the
size of the registers these values are taken from. While we're changing
them anyway, simplify the implementation of arch_timer_get_cnt{p,v}ct.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
This check is a holdover from the pre-devicetree days. As the timer
is not probed except by platforms which register it via devicetree,
it's not strictly necessary.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
When we get the device_node for the arch timer, it's refcount is
automatically incremented in of_find_matching_node, but it is
never decremented.
This patch decrements the refcount on the node after we're finished
using it.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Function tracing is currently broken for all 32 bit MIPS platforms.
When tracing is enabled, the kernel immediately hangs on boot.
This is a result of commit b732d439cb
that changes the kernel/trace/Kconfig file so that is no longer
forces FRAME_POINTER when FUNCTION_TRACING is enabled.
MIPS frame pointers are generally considered to be useless because
they cannot be used to unwind the stack. Unfortunately the MIPS
function tracing code has bugs that are masked by the use of frame
pointers. This commit fixes the bugs so that MIPS frame pointers
don't need to be enabled.
The bugs are a result of the odd calling sequence used to call the trace
routine. This calling sequence is inserted into every traceable function
when the tracing CONFIG option is enabled. This sequence is generated
for 32bit MIPS platforms by the compiler via the "-pg" flag.
Part of the sequence is "addiu sp,sp,-8" in the delay slot after every
call to the trace routine "_mcount" (some legacy thing where 2 arguments
used to be pushed on the stack). The _mcount routine is expected to
adjust the sp by +8 before returning. So when not disabled, the original
jalr and addiu will be there, so _mcount has to adjust sp.
The problem is that when tracing is disabled for a function, the
"jalr _mcount" instruction is replaced with a nop, but the
"addiu sp,sp,-8" is still executed and the stack pointer is left
trashed. When frame pointers are enabled the problem is masked
because any access to the stack is done through the frame
pointer and the stack pointer is restored from the frame pointer when
the function returns.
This patch writes two nops starting at the address of the "jalr _mcount"
instruction whenever tracing is disabled. This means that the
"addiu sp,sp.-8" will be converted to a nop along with the "jalr". When
disabled, there will be two nops.
This is SMP safe because the first time this happens is during
ftrace_init() which is before any other processor has been started.
Subsequent calls to enable/disable tracing when other CPUs ARE running
will still be safe because the enable will only change the first nop
to a "jalr" and the disable, while writing 2 nops, will only be changing
the "jalr". This patch also stops using stop_machine() to call the
tracer enable/disable routines and calls them directly because the
routines are SMP safe.
When the kernel first boots we have to be able to handle the gcc
generated jalr, addui sequence until ftrace_init gets a chance to run
and change the sequence. At this point mcount just adjusts the stack
and returns. When ftrace_init runs, we convert the jalr/addui to nops.
Then whenever tracing is enabled we convert the first nop to a "jalr
mcount+8". The mcount+8 entry point skips the stack adjust.
[ralf@linux-mips.org: Folded in Steven Rostedt's build fix.]
Signed-off-by: Al Cooper <alcooperx@gmail.com>
Cc: rostedt@goodmis.org
Cc: ddaney.cavm@gmail.com
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4806/
Patchwork: https://patchwork.linux-mips.org/patch/4841/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
When processing write same requests, fix dm to send the configured
number of WRITE SAME requests to the target rather than the number of
discards, which is not always the same.
Device-mapper WRITE SAME support was introduced by commit
23508a96cd ("dm: add WRITE SAME support").
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Commit d3ce884318 "MIPS: Fix modpost error in modules attepting to use
virt_addr_valid()" moved __virt_addr_valid() from a macro in a header
file to a function in ioremap.c. But ioremap.c is only compiled for MIPS
32, and not for MIPS 64.
When compiling for my yeeloong2, which supposedly supports hibernation,
which compiles kernel/power/snapshot.c which calls virt_addr_valid(), I
got this error:
LD init/built-in.o
kernel/built-in.o: In function `memory_bm_free':
snapshot.c:(.text+0x4c9c4): undefined reference to `__virt_addr_valid'
snapshot.c:(.text+0x4ca58): undefined reference to `__virt_addr_valid'
kernel/built-in.o: In function `snapshot_write_next':
(.text+0x4e44c): undefined reference to `__virt_addr_valid'
kernel/built-in.o: In function `snapshot_write_next':
(.text+0x4e890): undefined reference to `__virt_addr_valid'
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2
I suspect that __virt_addr_valid() is fine for mips 64. I moved it to
mmap.c such that it gets compiled for mips 64 and 32.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4842/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
thin_io_hints() is blindly copying the queue limits from the thin-pool
which can lead to incorrect limits being set. The fix here simply
deletes the thin_io_hints() hook which leaves the existing stacking
infrastructure to set the limits correctly.
When a thin-pool uses an MD device for the data device a thin device
from the thin-pool must respect MD's constraints about disallowing a bio
from spanning multiple chunks. Otherwise we can see problems. If the raid0
chunksize is 1152K and thin-pool chunksize is 256K I see the following
md/raid0 error (with extra debug tracing added to thin_endio) when
mkfs.xfs is executed against the thin device:
md/raid0:md99: make_request bug: can't convert block across chunks or bigger than 1152k 6688 127
device-mapper: thin: bio sector=2080 err=-5 bi_size=130560 bi_rw=17 bi_vcnt=32 bi_idx=0
This extra DM debugging shows that the failing bio is spanning across
the first and second logical 1152K chunk (sector 2080 + 255 takes the
bio beyond the first chunk's boundary of sector 2304). So the bio
splitting that DM is doing clearly isn't respecting the MD limits.
max_hw_sectors_kb is 127 for both the thin-pool and thin device
(queue_max_hw_sectors returns 255 so we'll excuse sysfs's lack of
precision). So this explains why bi_size is 130560.
But the thin device's max_hw_sectors_kb should be 4 (PAGE_SIZE) given
that it doesn't have a .merge function (for bio_add_page to consult
indirectly via dm_merge_bvec) yet the thin-pool does sit above an MD
device that has a compulsory merge_bvec_fn. This scenario is exactly
why DM must resort to sending single PAGE_SIZE bios to the underlying
layer. Some additional context for this is available in the header for
commit 8cbeb67a ("dm: avoid unsupported spanning of md stripe boundaries").
Long story short, the reason a thin device doesn't properly get
configured to have a max_hw_sectors_kb of 4 (PAGE_SIZE) is that
thin_io_hints() is blindly copying the queue limits from the thin-pool
device directly to the thin device's queue limits.
Fix this by eliminating thin_io_hints. Doing so is safe because the
block layer's queue limits stacking already enables the upper level thin
device to inherit the thin-pool device's discard and minimum_io_size and
optimal_io_size limits that get set in pool_io_hints. But avoiding the
queue limits copy allows the thin and thin-pool limits to be different
where it is important, namely max_hw_sectors_kb.
Reported-by: Daniel Browning <db@kavod.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>