linux-kernel-test/fs/btrfs
Filipe Manana 1ec9a1ae1e Btrfs: fix unreplayable log after snapshot delete + parent dir fsync
If we delete a snapshot, fsync its parent directory and crash/power fail
before the next transaction commit, on the next mount when we attempt to
replay the log tree of the root containing the parent directory we will
fail and prevent the filesystem from mounting, which is solvable by wiping
out the log trees with the btrfs-zero-log tool but very inconvenient as
we will lose any data and metadata fsynced before the parent directory
was fsynced.

For example:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt
  $ mkdir /mnt/testdir
  $ btrfs subvolume snapshot /mnt /mnt/testdir/snap
  $ btrfs subvolume delete /mnt/testdir/snap
  $ xfs_io -c "fsync" /mnt/testdir
  < crash / power failure and reboot >
  $ mount /dev/sdc /mnt
  mount: mount(2) failed: No such file or directory

And in dmesg/syslog we get the following message and trace:

[192066.361162] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
[192066.363010] ------------[ cut here ]------------
[192066.365268] WARNING: CPU: 4 PID: 5130 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x17a/0x354 [btrfs]()
[192066.367250] BTRFS: Transaction aborted (error -2)
[192066.368401] Modules linked in: btrfs dm_flakey dm_mod ppdev sha256_generic xor raid6_pq hmac drbg ansi_cprng aesni_intel acpi_cpufreq tpm_tis aes_x86_64 tpm ablk_helper evdev cryptd sg parport_pc i2c_piix4 psmouse lrw parport i2c_core pcspkr gf128mul processor serio_raw glue_helper button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[192066.377154] CPU: 4 PID: 5130 Comm: mount Tainted: G        W       4.4.0-rc6-btrfs-next-20+ #1
[192066.378875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[192066.380889]  0000000000000000 ffff880143923670 ffffffff81257570 ffff8801439236b8
[192066.382561]  ffff8801439236a8 ffffffff8104ec07 ffffffffa039dc2c 00000000fffffffe
[192066.384191]  ffff8801ed31d000 ffff8801b9fc9c88 ffff8801086875e0 ffff880143923710
[192066.385827] Call Trace:
[192066.386373]  [<ffffffff81257570>] dump_stack+0x4e/0x79
[192066.387387]  [<ffffffff8104ec07>] warn_slowpath_common+0x99/0xb2
[192066.388429]  [<ffffffffa039dc2c>] ? __btrfs_unlink_inode+0x17a/0x354 [btrfs]
[192066.389236]  [<ffffffff8104ec68>] warn_slowpath_fmt+0x48/0x50
[192066.389884]  [<ffffffffa039dc2c>] __btrfs_unlink_inode+0x17a/0x354 [btrfs]
[192066.390621]  [<ffffffff81184b55>] ? iput+0xb0/0x266
[192066.391200]  [<ffffffffa039ea25>] btrfs_unlink_inode+0x1c/0x3d [btrfs]
[192066.391930]  [<ffffffffa03ca623>] check_item_in_log+0x1fe/0x29b [btrfs]
[192066.392715]  [<ffffffffa03ca827>] replay_dir_deletes+0x167/0x1cf [btrfs]
[192066.393510]  [<ffffffffa03cccc7>] replay_one_buffer+0x417/0x570 [btrfs]
[192066.394241]  [<ffffffffa03ca164>] walk_up_log_tree+0x10e/0x1dc [btrfs]
[192066.394958]  [<ffffffffa03cac72>] walk_log_tree+0xa5/0x190 [btrfs]
[192066.395628]  [<ffffffffa03ce8b8>] btrfs_recover_log_trees+0x239/0x32c [btrfs]
[192066.396790]  [<ffffffffa03cc8b0>] ? replay_one_extent+0x50a/0x50a [btrfs]
[192066.397891]  [<ffffffffa0394041>] open_ctree+0x1d8b/0x2167 [btrfs]
[192066.398897]  [<ffffffffa03706e1>] btrfs_mount+0x5ef/0x729 [btrfs]
[192066.399823]  [<ffffffff8108ad98>] ? trace_hardirqs_on+0xd/0xf
[192066.400739]  [<ffffffff8108959b>] ? lockdep_init_map+0xb9/0x1b3
[192066.401700]  [<ffffffff811714b9>] mount_fs+0x67/0x131
[192066.402482]  [<ffffffff81188560>] vfs_kern_mount+0x6c/0xde
[192066.403930]  [<ffffffffa03702bd>] btrfs_mount+0x1cb/0x729 [btrfs]
[192066.404831]  [<ffffffff8108ad98>] ? trace_hardirqs_on+0xd/0xf
[192066.405726]  [<ffffffff8108959b>] ? lockdep_init_map+0xb9/0x1b3
[192066.406621]  [<ffffffff811714b9>] mount_fs+0x67/0x131
[192066.407401]  [<ffffffff81188560>] vfs_kern_mount+0x6c/0xde
[192066.408247]  [<ffffffff8118ae36>] do_mount+0x893/0x9d2
[192066.409047]  [<ffffffff8113009b>] ? strndup_user+0x3f/0x8c
[192066.409842]  [<ffffffff8118b187>] SyS_mount+0x75/0xa1
[192066.410621]  [<ffffffff8147e517>] entry_SYSCALL_64_fastpath+0x12/0x6b
[192066.411572] ---[ end trace 2de42126c1e0a0f0 ]---
[192066.412344] BTRFS: error (device dm-0) in __btrfs_unlink_inode:3986: errno=-2 No such entry
[192066.413748] BTRFS: error (device dm-0) in btrfs_replay_log:2464: errno=-2 No such entry (Failed to recover log tree)
[192066.415458] BTRFS error (device dm-0): cleaner transaction attach returned -30
[192066.444613] BTRFS: open_ctree failed

This happens because when we are replaying the log and processing the
directory entry pointing to the snapshot in the subvolume tree, we treat
its btrfs_dir_item item as having a location with a key type matching
BTRFS_INODE_ITEM_KEY, which is wrong because the type matches
BTRFS_ROOT_ITEM_KEY and therefore must be processed differently, as the
object id refers to a root number and not to an inode in the root
containing the parent directory.

So fix this by triggering a transaction commit if an fsync against the
parent directory is requested after deleting a snapshot. This is the
simplest approach for a rare use case. Some alternative that avoids the
transaction commit would require more code to explicitly delete the
snapshot at log replay time (factoring out common code from ioctl.c:
btrfs_ioctl_snap_destroy()), special care at fsync time to remove the
log tree of the snapshot's root from the log root of the root of tree
roots, amongst other steps.

A test case for xfstests that triggers the issue follows.

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      _cleanup_flakey
      cd /
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_dm_target flakey
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create a snapshot at the root of our filesystem (mount point path), delete it,
  # fsync the mount point path, crash and mount to replay the log. This should
  # succeed and after the filesystem is mounted the snapshot should not be visible
  # anymore.
  _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap1
  _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap1
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT
  _flakey_drop_and_remount
  [ -e $SCRATCH_MNT/snap1 ] && \
      echo "Snapshot snap1 still exists after log replay"

  # Similar scenario as above, but this time the snapshot is created inside a
  # directory and not directly under the root (mount point path).
  mkdir $SCRATCH_MNT/testdir
  _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/testdir/snap2
  _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/testdir/snap2
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir
  _flakey_drop_and_remount
  [ -e $SCRATCH_MNT/testdir/snap2 ] && \
      echo "Snapshot snap2 still exists after log replay"

  _unmount_flakey

  echo "Silence is golden"
  status=0
  exit

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Tested-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-03-01 08:23:25 -08:00
..
tests btrfs: fix memory leak of fs_info in block group cache 2016-02-18 13:28:24 +01:00
acl.c Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-01-18 12:44:40 -08:00
async-thread.c btrfs: async-thread: Fix a use-after-free error for trace 2016-01-25 16:50:26 -08:00
async-thread.h btrfs: async_thread: Fix workqueue 'max_active' value when initializing 2015-08-31 11:46:40 -07:00
backref.c Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
backref.h
btrfs_inode.h btrfs: put delayed item hook into inode 2016-01-07 14:26:58 +01:00
check-integrity.c btrfs: use list_for_each_entry* in check-integrity.c 2016-01-07 14:38:42 +01:00
check-integrity.h
compression.c Btrfs: remove no longer used function extent_read_full_page_nolock() 2016-02-03 19:27:10 +00:00
compression.h
ctree.c Merge branch 'dev/gfp-flags' into for-chris-4.6 2016-02-26 15:38:28 +01:00
ctree.h Merge branch 'dev/control-ioctl' into for-chris-4.6 2016-02-26 15:38:34 +01:00
delayed-inode.c Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
delayed-inode.h btrfs: properly set the termination value of ctx->pos in readdir 2016-02-11 07:01:59 -08:00
delayed-ref.c btrfs: drop null testing before destroy functions 2016-02-18 11:46:03 +01:00
delayed-ref.h btrfs: better packing of btrfs_delayed_extent_op 2016-01-07 14:26:58 +01:00
dev-replace.c Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6 2016-02-26 15:38:32 +01:00
dev-replace.h Btrfs: fix lockdep deadlock warning due to dev_replace 2016-02-23 13:10:10 +01:00
dir-item.c
disk-io.c Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
disk-io.h Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 2016-01-11 06:08:37 -08:00
export.c BTRFS: support NFSv2 export 2015-10-06 06:55:23 -07:00
export.h
extent_io.c Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
extent_io.h Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
extent_map.c btrfs: drop null testing before destroy functions 2016-02-18 11:46:03 +01:00
extent_map.h btrfs: cleanup, stop casting for extent_map->lookup everywhere 2016-01-15 19:22:28 +01:00
extent-tree.c Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
file-item.c Btrfs: Compute and look up csums based on sectorsized blocks 2016-02-01 19:23:47 +01:00
file.c Merge branch 'misc-4.6' into for-chris-4.6 2016-02-26 15:38:34 +01:00
free-space-cache.c Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 2016-01-11 06:08:37 -08:00
free-space-cache.h btrfs: constify remaining structs with function pointers 2016-01-07 15:01:14 +01:00
free-space-tree.c Revert "btrfs: synchronize incompat feature bits with sysfs files" 2016-01-29 08:19:37 -08:00
free-space-tree.h Btrfs: implement the free space B-tree 2015-12-17 12:16:47 -08:00
hash.c
hash.h
inode-item.c Btrfs: consolidate btrfs_error() to btrfs_std_error() 2015-09-29 16:30:00 +02:00
inode-map.c Merge branch 'misc-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 2016-01-19 18:21:30 -08:00
inode-map.h Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots 2016-01-15 19:25:02 +01:00
inode.c Btrfs patchsets for 4.6 2016-03-01 08:13:56 -08:00
ioctl.c Btrfs: fix unreplayable log after snapshot delete + parent dir fsync 2016-03-01 08:23:25 -08:00
Kconfig
locking.c btrfs: cleanup, remove stray return statements 2016-01-07 14:30:52 +01:00
locking.h
lzo.c
Makefile Btrfs: add free space tree sanity tests 2015-12-17 12:16:47 -08:00
math.h
ordered-data.c btrfs: drop null testing before destroy functions 2016-02-18 11:46:03 +01:00
ordered-data.h Btrfs: change how we wait for pending ordered extents 2015-10-21 18:51:40 -07:00
orphan.c
print-tree.c btrfs: teach print_leaf about temporary item subtypes 2016-02-11 16:15:43 +01:00
print-tree.h
props.c btrfs: cleanup iterating over prop_handlers array 2015-10-21 18:28:48 +02:00
props.h
qgroup.c btrfs: qgroup: account shared subtree during snapshot delete 2015-11-25 05:27:33 -08:00
qgroup.h btrfs: qgroup: Check if qgroup reserved space leaked 2015-10-21 18:41:10 -07:00
raid56.c btrfs: raid56: Use raid_write_end_io for scrub 2016-01-20 07:22:18 -08:00
raid56.h Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation 2015-08-09 07:34:26 -07:00
rcu-string.h
reada.c Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6 2016-02-26 15:38:32 +01:00
relocation.c Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2016-01-29 15:46:49 -08:00
root-tree.c btrfs: Replace CURRENT_TIME by current_fs_time() 2016-02-18 11:46:03 +01:00
scrub.c Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6 2016-02-26 15:38:32 +01:00
send.c btrfs: send: use GFP_KERNEL everywhere 2016-02-11 15:19:39 +01:00
send.h Btrfs: use linux/sizes.h to represent constants 2016-01-07 14:38:02 +01:00
struct-funcs.c
super.c Merge branch 'dev/control-ioctl' into for-chris-4.6 2016-02-26 15:38:34 +01:00
sysfs.c btrfs: sysfs: check initialization state before updating features 2016-01-27 05:40:10 -08:00
sysfs.h btrfs: sysfs: introduce helper for syncing bits with sysfs files 2016-01-21 18:50:40 +01:00
transaction.c Merge branch 'cleanups-4.6' into for-chris-4.6 2016-02-26 15:38:33 +01:00
transaction.h btrfs: preallocate path for snapshot creation at ioctl time 2016-01-07 15:20:55 +01:00
tree-defrag.c Btrfs: fix locking bugs when defragging leaves 2015-12-18 02:51:32 +00:00
tree-log.c Btrfs: fix unreplayable log after snapshot delete + parent dir fsync 2016-03-01 08:23:25 -08:00
tree-log.h Btrfs: fix unreplayable log after snapshot delete + parent dir fsync 2016-03-01 08:23:25 -08:00
ulist.c btrfs: ulist: Add ulist_del() function. 2015-06-10 09:26:17 -07:00
ulist.h btrfs: ulist: Add ulist_del() function. 2015-06-10 09:26:17 -07:00
uuid-tree.c
volumes.c Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6 2016-02-26 15:38:32 +01:00
volumes.h Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 2016-01-11 06:08:37 -08:00
xattr.c btrfs: Replace CURRENT_TIME by current_fs_time() 2016-02-18 11:46:03 +01:00
xattr.h btrfs: Use xattr handler infrastructure 2015-12-06 21:34:14 -05:00
zlib.c