linux-kernel-test/Documentation
David Howells 9f10523f89 FS-Cache: Fix operation state management and accounting
Fix the state management of internal fscache operations and the accounting of
what operations are in what states.

This is done by:

 (1) Give struct fscache_operation a enum variable that directly represents the
     state it's currently in, rather than spreading this knowledge over a bunch
     of flags, who's processing the operation at the moment and whether it is
     queued or not.

     This makes it easier to write assertions to check the state at various
     points and to prevent invalid state transitions.

 (2) Add an 'operation complete' state and supply a function to indicate the
     completion of an operation (fscache_op_complete()) and make things call
     it.  The final call to fscache_put_operation() can then check that an op
     in the appropriate state (complete or cancelled).

 (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
     govern the state of an object:

	(a) The ->n_ops is now the number of extant operations on the object
	    and is now decremented by fscache_put_operation() only.

	(b) The ->n_in_progress is simply the number of objects that have been
	    taken off of the object's pending queue for the purposes of being
	    run.  This is decremented by fscache_op_complete() only.

	(c) The ->n_exclusive is the number of exclusive ops that have been
	    submitted and queued or are in progress.  It is decremented by
	    fscache_op_complete() and by fscache_cancel_op().

     fscache_put_operation() and fscache_operation_gc() now no longer try to
     clean up ->n_exclusive and ->n_in_progress.  That was leading to double
     decrements against fscache_cancel_op().

     fscache_cancel_op() now no longer decrements ->n_ops.  That was leading to
     double decrements against fscache_put_operation().

     fscache_submit_exclusive_op() now decides whether it has to queue an op
     based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
     will persist in being true even after all preceding operations have been
     cancelled or completed.  Furthermore, if an object is active and there are
     runnable ops against it, there must be at least one op running.

 (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
     provide a function to record completion of the pages as they complete.

     When n_pages reaches 0, the operation is deemed to be complete and
     fscache_op_complete() is called.

     Add calls to fscache_retrieval_complete() anywhere we've finished with a
     page we've been given to read or allocate for.  This includes places where
     we just return pages to the netfs for reading from the server and where
     accessing the cache fails and we discard the proposed netfs page.

The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs.  This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.


FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090                  /DG965RY
RIP: 0010:[<ffffffffa007050a>]  [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:ffff8800368cfb00  EFLAGS: 00010282
RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
FS:  0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
Stack:
 ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
 ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
 ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
 [<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
 [<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs]
 [<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs]
 [<ffffffff810d8d47>] evict+0xa1/0x15c
 [<ffffffff810d8e2e>] dispose_list+0x2c/0x38
 [<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b
 [<ffffffff810c56b7>] prune_super+0xd5/0x140
 [<ffffffff8109b615>] shrink_slab+0x102/0x1ab
 [<ffffffff8109d690>] balance_pgdat+0x2f2/0x595
 [<ffffffff8103e009>] ? process_timeout+0xb/0xb
 [<ffffffff8109dba3>] kswapd+0x270/0x289
 [<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46
 [<ffffffff8109d933>] ? balance_pgdat+0x595/0x595
 [<ffffffff8104bf7a>] kthread+0x7f/0x87
 [<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
 [<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe
 [<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53
 [<ffffffff813ad6b0>] ? gs_change+0xb/0xb

Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20 21:58:26 +00:00
..
ABI Nothing all that exciting; a new module-from-fd syscall for those who want 2012-12-19 07:55:08 -08:00
accounting doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c 2012-11-26 14:22:21 +01:00
acpi Merge branch 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-12-14 10:03:23 -08:00
aoe aoe: allow user to disable target failure timeout 2012-12-17 17:15:25 -08:00
arm fbdev changes for 3.8: 2012-12-15 13:03:48 -08:00
arm64 Documentation: Fixes a word in Documentation/arm64/memory.txt 2012-11-29 16:33:18 +00:00
auxdisplay
backlight drivers/video/backlight/lp855x_bl.c: use generic PWM functions 2012-12-17 17:15:16 -08:00
blackfin
block
blockdev
bus-devices ARM: OMAP2+: gpmc: generic timing calculation 2012-11-09 18:07:11 +05:30
cdrom
cgroups kmem: add slab-specific documentation about the kmem controller 2012-12-18 15:02:15 -08:00
connector
console
cpu-freq
cpuidle
cris
crypto
development-process
device-mapper DM RAID: Add rebuild capability for RAID10 2012-10-11 13:40:24 +11:00
devicetree Device tree v3.8 bug fix branch. Fixes an undefined struct device build 2012-12-19 20:26:16 -08:00
DocBook kstrto*: add documentation 2012-12-17 17:15:22 -08:00
driver-model
dvb
early-userspace
EDID
extcon
fault-injection doc: fix quite a few typos within Documentation 2012-11-19 14:28:24 +01:00
fb
filesystems FS-Cache: Fix operation state management and accounting 2012-12-20 21:58:26 +00:00
firmware_class firmware loader: document firmware cache mechanism 2012-11-14 15:07:18 -08:00
frv
hid doc: fix quite a few typos within Documentation 2012-11-19 14:28:24 +01:00
hwmon hwmon: (it87) Report thermal sensor type as Intel PECI if appropriate 2012-12-19 22:17:02 +01:00
i2c i2c: Mention functionality flags in SMBus protocol documentation 2012-12-16 21:11:55 +01:00
i2o
ia64
ide
infiniband
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2012-12-13 12:00:48 -08:00
ioctl
isdn
ja_JP
kbuild Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-12-13 12:00:02 -08:00
kdump
ko_KR
laptops
leds
m68k
make
memory-devices
mips
misc-devices doc: fix quite a few typos within Documentation 2012-11-19 14:28:24 +01:00
mmc mmc: core: Extend sysfs to ext_csd parameters for RPMB support 2012-12-06 13:54:48 -05:00
mn10300
mtd
namespaces
netlabel
networking doc: Tighten-up and clarify description of tcp_fin_timeout 2012-12-10 17:14:28 -05:00
nfc
parisc
PCI PCI: SRIOV control and status via sysfs (documentation) 2012-11-28 11:25:47 -07:00
pcmcia
power Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux 2012-12-11 22:15:57 -08:00
powerpc powerpc/hw-breakpoint: Use generic hw-breakpoint interfaces for new PPC ptrace flags 2012-11-15 13:00:23 +11:00
pps
prctl
pti
ptp
rapidio
RCU Merge branches 'urgent.2012.10.27a', 'doc.2012.11.16a', 'fixes.2012.11.13a', 'srcu.2012.10.27a', 'stall.2012.11.13a', 'tracing.2012.11.08a' and 'idle.2012.10.24a' into HEAD 2012-11-16 09:59:58 -08:00
s390
scheduler
scsi [SCSI] hptiop: Support HighPoint RR4520/RR4522 HBA 2012-11-27 08:59:43 +04:00
security Documentation: fix Documentation/security/00-INDEX 2012-12-17 17:15:22 -08:00
serial
sh
sound ALSA: usb-audio: Deprecate async_unlink option 2012-11-21 11:37:40 +01:00
spi
sysctl
target
thermal Thermal: Add documentation for platform layer data 2012-11-05 14:00:09 +08:00
timers
trace
usb USB: report submission of active URBs 2012-11-11 18:10:46 -08:00
vDSO
video4linux doc: fix quite a few typos within Documentation 2012-11-19 14:28:24 +01:00
virtual KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface 2012-12-06 01:34:20 +01:00
vm Merge branch 'akpm' (Andrew's patch-bomb) 2012-12-13 13:11:15 -08:00
w1
watchdog
wimax
x86 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-12-19 12:56:42 -08:00
xtensa xtensa: initialize atomctl SR 2012-12-18 21:10:22 -08:00
zh_CN Documentation:Update Documentation/zh_CN/arm64/memory.txt 2012-11-26 16:25:37 -08:00
.gitignore
00-INDEX Documentation: remove reference to feature-removal-schedule.txt 2012-12-17 17:15:12 -08:00
applying-patches.txt
atomic_ops.txt
bad_memory.txt
basic_profiling.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt
cachetlb.txt
Changes
circular-buffers.txt
clk.txt
coccinelle.txt
CodingStyle
cpu-hotplug.txt doc: Add x86 CPU0 online/offline feature 2012-11-14 09:39:44 -08:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
devices.txt firmware: remove last vestiges of dabusb 2012-11-21 13:03:01 -08:00
digsig.txt
DMA-API-HOWTO.txt
DMA-API.txt
DMA-attributes.txt common: DMA-mapping: add DMA_ATTR_FORCE_CONTIGUOUS attribute 2012-11-29 03:30:34 -08:00
dma-buf-sharing.txt doc: fix quite a few typos within Documentation 2012-11-19 14:28:24 +01:00
DMA-ISA-LPC.txt
dmaengine.txt
dontdiff x86: remove offsets.h from .gitignore and dontdiff 2012-11-19 14:10:53 +01:00
dynamic-debug-howto.txt
edac.txt
eisa.txt
email-clients.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcov.txt
gpio.txt gpiolib: provide provision to register pin ranges 2012-11-11 19:06:00 +01:00
highuid.txt
HOWTO HOWTO: fix double words typo 2012-12-10 15:54:27 +01:00
hw_random.txt
hwspinlock.txt
init.txt
initrd.txt
intel_txt.txt
Intel-IOMMU.txt
io_ordering.txt
io-mapping.txt
iostats.txt
IPMI.txt IPMI: Remove SMBus driver info from the docs 2012-10-16 18:07:12 -07:00
IRQ-affinity.txt
IRQ-domain.txt irqdomain: update documentation 2012-12-05 23:52:10 +00:00
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kernel-doc-nano-HOWTO.txt Kernel-doc: Convention: Use a "Return" section to describe return values 2012-11-27 21:08:57 +01:00
kernel-docs.txt
kernel-parameters.txt Documentation/kernel-parameters.txt: update mem= option's spec according to its implementation 2012-12-17 17:15:12 -08:00
kmemcheck.txt
kmemleak.txt
kobject.txt
kprobes.txt
kref.txt kref: Add kref_get_unless_zero documentation 2012-11-28 18:36:06 +10:00
ldm.txt
local_ops.txt
lockdep-design.txt
lockstat.txt
lockup-watchdogs.txt
logo.gif
logo.txt
magic-number.txt
Makefile
ManagementStyle
md.txt
media-framework.txt
memory-barriers.txt Documentation: Fix memory-barriers.txt example 2012-10-23 14:44:46 -07:00
memory-hotplug.txt hotplug: update nodemasks management 2012-12-12 17:38:33 -08:00
mono.txt
mutex-design.txt
nommu-mmap.txt
numastat.txt
oops-tracing.txt
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
pi-futex.txt
pinctrl.txt gpiolib: provide provision to register pin ranges 2012-11-11 19:06:00 +01:00
pnp.txt
preempt-locking.txt
printk-formats.txt
pwm.txt
ramoops.txt
rbtree.txt rbtree: move augmented rbtree functionality to rbtree_augmented.h 2012-10-09 16:22:40 +09:00
remoteproc.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rt-mutex-design.txt
rt-mutex.txt
rtc.txt
SAK.txt
SecurityBugs
serial-console.txt
sgi-ioc4.txt
sgi-visws.txt
SM501.txt
smsc_ece1099.txt
sparse.txt Documentation/sparse.txt: document context annotations for lock checking 2012-12-17 17:15:23 -08:00
spinlocks.txt
stable_api_nonsense.txt
stable_kernel_rules.txt
static-keys.txt
SubmitChecklist
SubmittingDrivers
SubmittingPatches
svga.txt
sysfs-rules.txt
sysrq.txt sparc64: Add global PMU register dumping via sysrq. 2012-10-16 09:34:01 -07:00
unaligned-memory-access.txt
unicode.txt
unshare.txt
vfio.txt
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
vme_api.txt
volatile-considered-harmful.txt
workqueue.txt
xz.txt
zorro.txt