linux-kernel-test/fs/xfs
Dave Chinner 666d644cd7 xfs: don't free EFIs before the EFDs are committed
Filesystems are occasionally being shut down with this error:

xfs_trans_ail_delete_bulk: attempting to delete a log item that is
not in the AIL.

It was diagnosed to be related to the EFI/EFD commit order when the
EFI and EFD are in different checkpoints and the EFD is committed
before the EFI here:

http://oss.sgi.com/archives/xfs/2013-01/msg00082.html

The real problem is that a single bit cannot fully describe the
states that the EFI/EFD processing can be in. These completion
states are:

EFI			EFI in AIL	EFD		Result
committed/unpinned	Yes		committed	OK
committed/pinned	No		committed	Shutdown
uncommitted		No		committed	Shutdown


Note that the "result" field is what should happen, not what does
happen. The current logic is broken and handles the first two cases
correctly by luck.  That is, the code will free the EFI if the
XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
inverted logic "works" because if both EFI and EFD are committed,
then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
bit, and the second frees the EFI item. Hence as long as
xfs_efi_item_committed() has been called, everything appears to be
fine.

It is the third case where the logic fails - where
xfs_efd_item_committed() is called before xfs_efi_item_committed(),
and that results in the EFI being freed before it has been
committed. That is the bug that triggered the shutdown, and hence
keeping track of whether the EFI has been committed or not is
insufficient to correctly order the EFI/EFD operations w.r.t. the
AIL.

What we really want is this: the EFI is always placed into the
AIL before the last reference goes away. The only way to guarantee
that is that the EFI is not freed until after it has been unpinned
*and* the EFD has been committed. That is, restructure the logic so
that the only case that can occur is the first case.

This can be done easily by replacing the XFS_EFI_COMMITTED with an
EFI reference count. The EFI is initialised with it's own count, and
that is not released until it is unpinned. However, there is a
complication to this method - the high level EFI/EFD code in
xfs_bmap_finish() does not hold direct references to the EFI
structure, and runs a transaction commit between the EFI and EFD
processing. Hence the EFI can be freed even before the EFD is
created using such a method.

Further, log recovery uses the AIL for tracking EFI/EFDs that need
to be recovered, but it uses the AIL *differently* to the EFI
transaction commit. Hence log recovery never pins or unpins EFIs, so
we can't drop the EFI reference count indirectly to free the EFI.

However, this doesn't prevent us from using a reference count here.
There is a 1:1 relationship between EFIs and EFDs, so when we
initialise the EFI we can take a reference count for the EFD as
well. This solves the xfs_bmap_finish() issue - the EFI will never
be freed until the EFD is processed. In terms of log recovery,
during the committing of the EFD we can look for the
XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
thereby ensuring everything works correctly there as well.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-05 13:25:35 -05:00
..
Kconfig fs/xfs: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:39:04 -08:00
kmem.c xfs: switch to proper __bitwise type for KM_... flags 2012-05-29 23:28:32 -04:00
kmem.h xfs: switch to proper __bitwise type for KM_... flags 2012-05-29 23:28:32 -04:00
Makefile xfs: remove xfs_flushinval_pages 2012-11-14 15:15:08 -06:00
mrlock.h
time.h
uuid.c
uuid.h xfs: add CRC infrastructure 2012-11-19 20:11:24 -06:00
xfs_acl.c userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr 2012-09-18 01:01:35 -07:00
xfs_acl.h
xfs_ag.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_alloc_btree.c xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_alloc_btree.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_alloc.c xfs: rename random32() to prandom_u32() 2013-03-07 12:33:57 -06:00
xfs_alloc.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_aops.c xfs: Fix WARN_ON(delalloc) in xfs_vm_releasepage() 2013-03-22 16:12:37 -05:00
xfs_aops.h Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision. 2012-07-22 11:00:55 -05:00
xfs_attr_leaf.c xfs: take inode version into account in XFS_LITINO 2013-03-14 16:19:14 -05:00
xfs_attr_leaf.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_attr_sf.h
xfs_attr.c xfs: refactor space log reservation for XFS_TRANS_ATTR_SET 2013-02-01 14:56:31 -06:00
xfs_attr.h
xfs_bit.c
xfs_bit.h
xfs_bmap_btree.c xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_bmap_btree.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_bmap.c xfs: take inode version into account in XFS_LITINO 2013-03-14 16:19:14 -05:00
xfs_bmap.h xfs: move allocation stack switch up to xfs_bmapi_allocate 2012-10-18 17:42:48 -05:00
xfs_btree.c xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_btree.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_buf_item.c xfs: recheck buffer pinned status after push trylock failure 2013-02-14 17:23:42 -06:00
xfs_buf_item.h xfs: rename bli_format to avoid confusion with bli_formats 2013-01-16 16:07:37 -06:00
xfs_buf.c xfs: ensure we capture IO errors correctly 2013-03-14 15:56:53 -05:00
xfs_buf.h xfs: use b_maps[] for discontiguous buffers 2013-01-16 16:07:11 -06:00
xfs_cksum.h xfs: add CRC infrastructure 2012-11-19 20:11:24 -06:00
xfs_da_btree.c xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_da_btree.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_dfrag.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
xfs_dfrag.h
xfs_dinode.h xfs: take inode version into account in XFS_LITINO 2013-03-14 16:19:14 -05:00
xfs_dir2_block.c xfs: recalculate leaf entry pointer after compacting a dir2 block 2013-01-16 16:08:55 -06:00
xfs_dir2_data.c xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_dir2_format.h
xfs_dir2_leaf.c xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_dir2_node.c xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_dir2_priv.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_dir2_sf.c xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_dir2.c xfs: remove struct xfs_dabuf and infrastructure 2012-07-01 14:50:07 -05:00
xfs_dir2.h
xfs_discard.c xfs: check for possible overflow in xfs_ioc_trim 2012-08-23 14:48:44 -05:00
xfs_discard.h
xfs_dquot_item.c xfs: clean up xfs_bit.h includes 2012-05-14 16:21:00 -05:00
xfs_dquot_item.h
xfs_dquot.c xfs: xfs_dquot prealloc throttling watermarks and low free space 2013-03-22 16:06:30 -05:00
xfs_dquot.h xfs: xfs_dquot prealloc throttling watermarks and low free space 2013-03-22 16:06:30 -05:00
xfs_error.c xfs: rename random32() to prandom_u32() 2013-03-07 12:33:57 -06:00
xfs_error.h
xfs_export.c fs: encode_fh: return FILEID_INVALID if invalid fid_type 2013-02-26 02:46:10 -05:00
xfs_export.h
xfs_extent_busy.c xfs: make xfs_extent_busy_trim not static 2012-05-14 16:21:04 -05:00
xfs_extent_busy.h xfs: make xfs_extent_busy_trim not static 2012-05-14 16:21:04 -05:00
xfs_extfree_item.c xfs: don't free EFIs before the EFDs are committed 2013-04-05 13:25:35 -05:00
xfs_extfree_item.h xfs: don't free EFIs before the EFDs are committed 2013-04-05 13:25:35 -05:00
xfs_file.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
xfs_filestream.c xfs: rename allocation range fields in struct xfs_bmalloca 2011-10-11 21:15:06 -05:00
xfs_filestream.h
xfs_fs.h xfs: add minimum file size filtering to eofblocks scan 2012-11-08 15:32:29 -06:00
xfs_fsops.c xfs: make use of XFS_SB_LOG_RES() at xfs_fs_log_dummy() 2013-02-01 14:55:59 -06:00
xfs_fsops.h
xfs_globals.c xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_ialloc_btree.c xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_ialloc_btree.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_ialloc.c xfs: rename random32() to prandom_u32() 2013-03-07 12:33:57 -06:00
xfs_ialloc.h xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_icache.c xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_icache.h xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_inode_item.c xfs remove the XFS_TRANS_DEBUG routines 2012-12-17 16:29:00 -06:00
xfs_inode_item.h xfs remove the XFS_TRANS_DEBUG routines 2012-12-17 16:29:00 -06:00
xfs_inode.c xfs remove the XFS_TRANS_DEBUG routines 2012-12-17 16:29:00 -06:00
xfs_inode.h xfs: take inode version into account in XFS_LITINO 2013-03-14 16:19:14 -05:00
xfs_inum.h xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_ioctl32.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
xfs_ioctl32.h
xfs_ioctl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
xfs_ioctl.h
xfs_iomap.c xfs: xfs_iomap_prealloc_size() tracepoint 2013-03-22 16:07:56 -05:00
xfs_iomap.h
xfs_iops.c xfs: remove xfs_flush_pages 2012-11-14 15:12:45 -06:00
xfs_iops.h
xfs_itable.c xfs: convert buffer verifiers to an ops structure. 2012-11-15 21:35:12 -06:00
xfs_itable.h
xfs_linux.h xfs: Add ratelimited printk for different alert levels 2013-04-03 13:20:39 -05:00
xfs_log_cil.c xfs: rename log structure to xlog 2012-06-21 14:21:11 -05:00
xfs_log_priv.h xfs: fix sparse reported log CRC endian issue 2012-12-03 12:10:59 -06:00
xfs_log_recover.c xfs: don't free EFIs before the EFDs are committed 2013-04-05 13:25:35 -05:00
xfs_log_recover.h
xfs_log.c xfs: rename random32() to prandom_u32() 2013-03-07 12:33:57 -06:00
xfs_log.h xfs: xfs_quiesce_attr() should quiesce the log like unmount 2012-10-17 13:39:14 -05:00
xfs_message.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_message.h xfs: Add ratelimited printk for different alert levels 2013-04-03 13:20:39 -05:00
xfs_mount.c xfs: make use of XFS_SB_LOG_RES() at xfs_mount_log_sb() 2013-02-01 14:55:08 -06:00
xfs_mount.h xfs: Remove obsoleted m_inode_shrink from xfs_mount structure 2013-03-14 15:55:32 -05:00
xfs_mru_cache.c
xfs_mru_cache.h
xfs_qm_bhv.c xfs: Remove boolean_t typedef completely. 2013-01-17 17:32:57 -06:00
xfs_qm_syscalls.c xfs: xfs_dquot prealloc throttling watermarks and low free space 2013-03-22 16:06:30 -05:00
xfs_qm.c xfs: pass xfs_dquot to xfs_qm_adjust_dqlimits() instead of xfs_disk_dquot_t 2013-03-22 16:05:52 -05:00
xfs_qm.h xfs: xfs_dquot prealloc throttling watermarks and low free space 2013-03-22 16:06:30 -05:00
xfs_quota_priv.h xfs: use per-filesystem radix trees for dquot lookup 2012-03-14 11:09:06 -05:00
xfs_quota.h Define new macro XFS_ALL_QUOTA_ACTIVE and simply some usage 2012-02-03 11:32:20 -06:00
xfs_quotaops.c userns: Convert qutoactl 2012-09-18 01:01:39 -07:00
xfs_rename.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_rtalloc.c xfs: uncached buffer reads need to return an error 2012-11-15 21:34:05 -06:00
xfs_rtalloc.h
xfs_sb.h xfs: add CRC infrastructure 2012-11-19 20:11:24 -06:00
xfs_stats.c xfs: use common code for quota statistics 2012-03-14 11:09:06 -05:00
xfs_stats.h xfs: use common code for quota statistics 2012-03-14 11:09:06 -05:00
xfs_super.c fs/xfs remove obsolete simple_strto<foo> 2013-01-13 14:42:07 -06:00
xfs_super.h xfs: xfs_sync_data is redundant. 2012-10-17 12:01:25 -05:00
xfs_sysctl.c xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_sysctl.h xfs: add background scanning to clear eofblocks inodes 2012-11-08 15:34:59 -06:00
xfs_trace.c xfs: clean up xfs_bit.h includes 2012-05-14 16:21:00 -05:00
xfs_trace.h xfs: xfs_iomap_prealloc_size() tracepoint 2013-03-22 16:07:56 -05:00
xfs_trans_ail.c xfs remove the XFS_TRANS_DEBUG routines 2012-12-17 16:29:00 -06:00
xfs_trans_buf.c xfs: fix the multi-segment log buffer format 2013-01-16 16:08:08 -06:00
xfs_trans_dquot.c xfs: pass xfs_dquot to xfs_qm_adjust_dqlimits() instead of xfs_disk_dquot_t 2013-03-22 16:05:52 -05:00
xfs_trans_extfree.c xfs: move xfsagino_t to xfs_types.h 2012-05-14 16:20:54 -05:00
xfs_trans_inode.c xfs remove the XFS_TRANS_DEBUG routines 2012-12-17 16:29:00 -06:00
xfs_trans_priv.h xfs: re-enable xfsaild idle mode and fix associated races 2012-07-29 16:27:57 -05:00
xfs_trans_space.h
xfs_trans.c xfs: refactor space log reservation for XFS_TRANS_ATTR_SET 2013-02-01 14:56:31 -06:00
xfs_trans.h xfs: refactor space log reservation for XFS_TRANS_ATTR_SET 2013-02-01 14:56:31 -06:00
xfs_types.h xfs: Remove boolean_t typedef completely. 2013-01-17 17:32:57 -06:00
xfs_utils.c xfs: remove the alloc_done argument to xfs_dialloc 2012-07-29 16:00:31 -05:00
xfs_utils.h xfs: propagate umode_t 2012-01-03 22:55:00 -05:00
xfs_vnode.h xfs: remove remaining scraps of struct xfs_iomap 2012-03-15 13:40:16 -05:00
xfs_vnodeops.c xfs: take inode version into account in XFS_LITINO 2013-03-14 16:19:14 -05:00
xfs_vnodeops.h xfs: byte range granularity for XFS_IOC_ZERO_RANGE 2012-11-29 14:21:46 -06:00
xfs_xattr.c
xfs.h