Now that the per-sb shrinker is responsible for shrinking 2 or more
caches, increase the batch size to keep econmies of scale for
shrinking each cache. Increase the shrinker batch size to 1024
objects.
To allow for a large increase in batch size, add a conditional
reschedule to prune_icache_sb() so that we don't hold the LRU spin
lock for too long. This mirrors the behaviour of the
__shrink_dcache_sb(), and allows us to increase the batch size
without needing to worry about problems caused by long lock hold
times.
To ensure that filesystems using the per-sb shrinker callouts don't
cause problems, document that the object freeing method must
reschedule appropriately inside loops.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now we have a per-superblock shrinker implementation, we can add a
filesystem specific callout to it to allow filesystem internal
caches to be shrunk by the superblock shrinker.
Rather than perpetuate the multipurpose shrinker callback API (i.e.
nr_to_scan == 0 meaning "tell me how many objects freeable in the
cache), two operations will be added. The first will return the
number of objects that are freeable, the second is the actual
shrinker call.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we have per-sb shrinkers with a lifecycle that is a subset
of the superblock lifecycle and can reliably detect a filesystem
being unmounted, there is not longer any race condition for the
iprune_sem to protect against. Hence we can remove it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
With context based shrinkers, we can implement a per-superblock
shrinker that shrinks the caches attached to the superblock. We
currently have global shrinkers for the inode and dentry caches that
split up into per-superblock operations via a coarse proportioning
method that does not batch very well. The global shrinkers also
have a dependency - dentries pin inodes - so we have to be very
careful about how we register the global shrinkers so that the
implicit call order is always correct.
With a per-sb shrinker callout, we can encode this dependency
directly into the per-sb shrinker, hence avoiding the need for
strictly ordering shrinker registrations. We also have no need for
any proportioning code for the shrinker subsystem already provides
this functionality across all shrinkers. Allowing the shrinker to
operate on a single superblock at a time means that we do less
superblock list traversals and locking and reclaim should batch more
effectively. This should result in less CPU overhead for reclaim and
potentially faster reclaim of items from each filesystem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Both the filesystem and the lock manager can associate operations with a
lock. Confusingly, one of them (fl_release_private) actually has the
same name in both operation structures.
It would save some confusion to give the lock-manager ops different
names.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
For improving insight into IO completion behaviour.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
The list of active AIL cursors uses a roll-your-own linked list with
special casing for the AIL push cursor. Simplify this code by
replacing the list with standard struct list_head lists, and use a
separate list_head to track the active cursors. This allows us to
treat the AIL push cursor as a generic cursor rather than as a
special case, further simplifying the code.
Further, fix the duplicate push cursor initialisation that the
special case handling was hiding, and clean up all the comments
around the active cursor list handling.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
xfs_trans_ail_cursor_set() doesn't set the cursor to the current log
item, it sets it to the next item. There is already a function for
doing this - xfs_trans_ail_cursor_next() - and the _set function is
simply a two line wrapper. Remove it and open code the setting of
the cursor in the two locations that call it to remove the
confusion.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Delayed logging can insert tens of thousands of log items into the
AIL at the same LSN. When the committing of log commit records
occur, we can get insertions occurring at an LSN that is not at the
end of the AIL. If there are thousands of items in the AIL on the
tail LSN, each insertion has to walk the AIL to find the correct
place to insert the new item into the AIL. This can consume large
amounts of CPU time and block other operations from occurring while
the traversals are in progress.
To avoid this repeated walk, use a AIL cursor to record
where we should be inserting the new items into the AIL without
having to repeat the walk. The cursor infrastructure already
provides this functionality for push walks, so is a simple extension
of existing code. While this will not avoid the initial walk, it
will avoid repeating it tens of thousands of times during a single
checkpoint commit.
This version includes logic improvements from Christoph Hellwig.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
On xfs exports, nfsd is incorrectly returning ENOENT instead of
ESTALE on attempts to use a filehandle of a deleted file (spotted
with pynfs test PUTFH3). The ENOENT was coming from xfs_iget.
(It's tempting to wonder whether we should just map all xfs_iget
errors to ESTALE, but I don't believe so--xfs_iget can also return
ENOMEM at least, which we wouldn't want mapped to ESTALE.)
While we're at it, the other return of ENOENT in xfs_nfs_get_inode()
also looks wrong.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Remove the second parameter to xfs_sb_count() since all callers of
the function set them.
Also, fix the header comment regarding it being called periodically.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Avoid creating superfluous NUMA domains on non-NUMA systems
sched: Allow for overlapping sched_domain spans
sched: Break out cpu_power from the sched_group structure
Terribly embarassing. Don't know how I committed this, but its
KERN_WARNING not KERN_WARN.
This fixes the following compile error:
kernel/time/timekeeping.c: In function ‘__timekeeping_inject_sleeptime’:
kernel/time/timekeeping.c:608: error: ‘KERN_WARN’ undeclared (first use in this function)
kernel/time/timekeeping.c:608: error: (Each undeclared identifier is reported only once
kernel/time/timekeeping.c:608: error: for each function it appears in.)
kernel/time/timekeeping.c:608: error: expected ‘)’ before string constant
make[2]: *** [kernel/time/timekeeping.o] Error 1
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: John Stultz <john.stultz@linaro.org>
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86. reboot: Make Dell Latitude E6320 use reboot=pci
x86, doc only: Correct real-mode kernel header offset for init_size
x86: Disable AMD_NUMA for 32bit for now
Since 6be63bbbda (lmo) rsp.
af3a1f6f48 (kernel.org) the Malta code does
no longer probe for presence of GCMP if CMP is not configured. This means
that the variable gcmp_present well be left at its default value of -1
which normally is meant to indicate that GCMP has not yet been mmapped.
This non-zero value is now interpreted as GCMP being present resulting
in a write attempt to a GCMP register resulting in a crash.
Reported and a build fix on top of my fix by Rob Landley <rob@landley.net>.
Reported-by: Rob Landley <rob@landley.net>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/2413/
Renumbering was necessary because I had already wired up setns(2) in the
linux-mips.org tree in commit c3fce54644cabbb90700cc3acc040718a377f609
[MIPS: Wire up new sendmmsg syscall.] but the same syscall numbers were
used by 7b21fddd08 [ns: Wire up the setns
system call] resulting in a conflict.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit ee6114202e48dc282c6d04741e9c5d7171eb8bb8 (lmo) rsp.
1685f3b158 (kernel.org) ["MIPS: SMTC: Move
declaration of smtc_init_secondary to <asm/smtc.h>."] didn't quite do
that - it rather lost the declaration of smtc_init_secondary resulting in
a build error.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
After runtime conversion to handle clk, iclk node is not used.
However fclk node is still used to get clock rate.
Signed-off-by: Balaji T K <balajitk@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
* Add runtime pm support to HSMMC host controller.
* Use runtime pm API to enable/disable HSMMC clock.
* Use runtime autosuspend APIs to enable auto suspend delay.
Based on OMAP HSMMC runtime implementation by Kevin Hilman and
Kishore Kadiyala.
Signed-off-by: Balaji T K <balajitk@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
lazy_disable framework in OMAP HSMMC manages multiple low power states and
card is powered off after inactivity time of 8 seconds. Based on previous
discussion on the list, card power (regulator) handling (when to power
OFF/ON) should ideally be handled by core layer. Remove usage of lazy
disable to allow core layer _only_ to handle card power. With the removal
of lazy disable framework, MMC regulators are left ON until MMC_POWER_OFF
via set_ios.
Signed-off-by: Balaji T K <balajitk@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Non default Drive Strength cannot be set automatically. It is a function
of the board design and only if there is a specific platform handler can
it be set. The platform handler needs to take into account the board
design. Pass to the platform code the necessary information.
For example: The card and host controller may indicate they support HIGH
and LOW drive strength. There is no way to know what should be chosen
without specific board knowledge. Setting HIGH may lead to reflections
and setting LOW may not suffice. There is no mechanism (like ethernet
duplex or speed pulses) to determine what should be done automatically.
If no platform handler is defined -- use the default value.
Signed-off-by: Philip Rakity <prakity@marvell.com>
Reviewed-by: Arindam Nath <arindam.nath@amd.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Change mmc_blk_issue_rw_rq() to become asynchronous.
The execution flow looks like this:
* The mmc-queue calls issue_rw_rq(), which sends the request
to the host and returns back to the mmc-queue.
* The mmc-queue calls issue_rw_rq() again with a new request.
* This new request is prepared in issue_rw_rq(), then it waits for
the active request to complete before pushing it to the host.
* When the mmc-queue is empty it will call issue_rw_rq() with a NULL
req to finish off the active request without starting a new request.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Tested-by: Sourav Poddar <sourav.poddar@ti.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
The way the request data is organized in the mmc queue struct, it only
allows processing of one request at a time. This patch adds a new struct
to hold mmc queue request data such as sg list, request, blk request and
bounce buffers, and updates any functions depending on the mmc queue
struct. This prepares for using multiple active requests in one mmc queue.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Tested-by: Sourav Poddar <sourav.poddar@ti.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Add a test that measures how the mmc bandwidth depends on the numbers of
sg elements in the sg list. The transfer size if fixed and sg length goes
from a few up to 512. The purpose is to measure overhead caused by
multiple sg elements.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Tested-by: Sourav Poddar <sourav.poddar@ti.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Add four tests for read and write performance per
different transfer size, 4k to 4M.
* Read using blocking mmc request
* Read using non-blocking mmc request
* Write using blocking mmc request
* Write using non-blocking mmc request
The host driver must support pre_req() and post_req()
in order to run the non-blocking test cases.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Tested-by: Sourav Poddar<sourav.poddar@ti.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
pre_req() runs dma_map_sg() and prepares the dma descriptor for the next
mmc data transfer. post_req() runs dma_unmap_sg. If not calling pre_req()
before mmci_request(), mmci_request() will prepare the cache and dma just
like it did it before. It is optional to use pre_req() and post_req()
for mmci.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
pre_req() runs dma_map_sg(), post_req() runs dma_unmap_sg. If not calling
pre_req() before omap_hsmmc_request(), dma_map_sg will be issued before
starting the transfer. It is optional to use pre_req(). If issuing
pre_req(), post_req() must be called as well.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Tested-by: Sourav Poddar <sourav.poddar@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Previously there has only been one function mmc_wait_for_req()
to start and wait for a request. This patch adds:
* mmc_start_req() - starts a request wihtout waiting
If there is on ongoing request wait for completion
of that request and start the new one and return.
Does not wait for the new command to complete.
This patch also adds new function members in struct mmc_host_ops
only called from core.c:
* pre_req - asks the host driver to prepare for the next job
* post_req - asks the host driver to clean up after a completed job
The intention is to use pre_req() and post_req() to do cache maintenance
while a request is active. pre_req() can be called while a request is
active to minimize latency to start next job. post_req() can be used after
the next job is started to clean up the request. This will minimize the
host driver request end latency. post_req() is typically used before
ending the block request and handing over the buffer to the block layer.
Add a host-private member in mmc_data to be used by pre_req to mark the
data. The host driver will then check this mark to see if the data is
prepared or not.
Signed-off-by: Per Forlin <per.forlin@linaro.org>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Venkatraman S <svenkatr@ti.com>
Tested-by: Sourav Poddar <sourav.poddar@ti.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Update the OMAP HSMMC entry from the MAINTAINERS file as I will
no longer be able to maintain this driver.
Signed-off-by: Madhusudhan Chikkature <madhu.cr@ti.com>
[khilman@ti.com: change to Orphan rather than complete removal]
Signed-off-by: Kevin Hilman <khilman@ti.com>
Acked-by: Venkatraman S <svenkatr@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
This driver has been used for years with this option enabled.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Unless MMC_CAP_8_BIT_DATA is set, the bus width defaults to 4.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>