linux-kernel-test/fs
Eric Sandeen e9e3bcecf4 ext4: serialize unaligned asynchronous DIO
ext4 has a data corruption case when doing non-block-aligned
asynchronous direct IO into a sparse file, as demonstrated
by xfstest 240.

The root cause is that while ext4 preallocates space in the
hole, mappings of that space still look "new" and 
dio_zero_block() will zero out the unwritten portions.  When
more than one AIO thread is going, they both find this "new"
block and race to zero out their portion; this is uncoordinated
and causes data corruption.

Dave Chinner fixed this for xfs by simply serializing all
unaligned asynchronous direct IO.  I've done the same here.
The difference is that we only wait on conversions, not all IO.
This is a very big hammer, and I'm not very pleased with
stuffing this into ext4_file_write().  But since ext4 is
DIO_LOCKING, we need to serialize it at this high level.

I tried to move this into ext4_ext_direct_IO, but by then
we have the i_mutex already, and we will wait on the
work queue to do conversions - which must also take the
i_mutex.  So that won't work.

This was originally exposed by qemu-kvm installing to
a raw disk image with a normal sector-63 alignment.  I've
tested a backport of this patch with qemu, and it does
avoid the corruption.  It is also quite a lot slower
(14 min for package installs, vs. 8 min for well-aligned)
but I'll take slow correctness over fast corruption any day.

Mingming suggested that we can track outstanding
conversions, and wait on those so that non-sparse
files won't be affected, and I've implemented that here;
unaligned AIO to nonsparse files won't take a perf hit.

[tytso@mit.edu: Keep the mutex as a hashed array instead
 of bloating the ext4 inode]

[tytso@mit.edu: Fix up namespace issues so that global
 variables are protected with an "ext4_" prefix.]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-12 08:17:34 -05:00
..
9p switch 9p 2011-01-12 20:03:43 -05:00
adfs switch adfs 2011-01-12 20:02:45 -05:00
affs switch affs 2011-01-12 20:03:42 -05:00
afs Unexport do_add_mount() and add in follow_automount(), not ->d_automount() 2011-01-15 20:07:48 -05:00
autofs4 autofs4: clean ->d_release() and autofs4_free_ino() up 2011-01-18 01:21:29 -05:00
befs befs: don't pass huge structs by value 2011-01-13 08:03:15 -08:00
bfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable 2011-01-17 14:43:43 -08:00
cachefiles
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2011-01-13 10:25:24 -08:00
cifs cifs: fix up CIFSSMBEcho for unaligned access 2011-01-21 02:23:27 +00:00
coda Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-01-13 10:27:28 -08:00
configfs configfs: change depends -> select SYSFS 2011-01-16 21:22:29 +00:00
cramfs cramfs: generate unique inode number for better inode cache usage 2011-01-13 08:03:23 -08:00
debugfs convert get_sb_single() users 2010-10-29 04:16:28 -04:00
devpts convert get_sb_single() users 2010-10-29 04:16:28 -04:00
dlm dlm: Make DLM depend on CONFIGFS_FS 2011-01-16 21:22:37 +00:00
ecryptfs ecryptfs: remove unnecessary decrypt when extending a file 2011-01-17 13:01:25 -06:00
efs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
exofs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
exportfs fs: dcache per-inode inode alias locking 2011-01-07 17:50:31 +11:00
ext2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2011-01-11 14:37:31 -08:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2011-01-21 07:33:37 -08:00
ext4 ext4: serialize unaligned asynchronous DIO 2011-02-12 08:17:34 -05:00
fat switch fat to ->s_d_op, close exportfs races there 2011-01-12 20:02:43 -05:00
freevxfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
fscache FS-Cache: Fix operation handling 2011-01-14 09:23:36 -08:00
fuse switch fuse 2011-01-12 20:02:44 -05:00
gfs2 GFS2: Fix error path in gfs2_lookup_by_inum() 2011-01-18 14:49:08 +00:00
hfs switch hfs 2011-01-12 20:02:45 -05:00
hfsplus switch hfsplus 2011-01-12 20:02:45 -05:00
hostfs switch hostfs 2011-01-12 20:03:42 -05:00
hpfs hpfs_setattr error case avoids unlock_kernel 2011-01-17 05:11:37 -05:00
hppfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
hugetlbfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
isofs fix isofs d_op handling 2011-01-12 20:02:43 -05:00
jbd fix comment typos concerning "consistent" 2010-12-10 16:04:28 +01:00
jbd2 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
jffs2 Merge git://git.infradead.org/mtd-2.6 2011-01-17 11:15:30 -08:00
jfs Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
lockd lockd: double unlock in next_host_state() 2011-01-04 13:10:37 -05:00
logfs Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
minix minixfs: kill dead code 2011-01-12 20:02:44 -05:00
ncpfs move internal-only parts of ncpfs headers to fs/ncpfs 2011-01-12 20:03:43 -05:00
nfs Unexport do_add_mount() and add in follow_automount(), not ->d_automount() 2011-01-15 20:07:48 -05:00
nfs_common
nfsd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-01-16 11:31:50 -08:00
nilfs2 Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
nls
notify Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
ntfs NTFS: writev() fix and maintenance/contact details update 2011-01-12 08:35:53 -08:00
ocfs2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2011-01-21 07:33:37 -08:00
omfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
openpromfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
partitions Merge branch 'for-2.6.38/event-handling' into for-2.6.38/core 2011-01-13 14:47:54 +01:00
proc kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
qnx4 fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
quota quota: Fix deadlock during path resolution 2011-01-12 19:14:55 +01:00
ramfs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
reiserfs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2011-01-21 07:33:37 -08:00
romfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
squashfs Squashfs: simplify CONFIG_SQUASHFS_LZO handling 2011-01-13 21:38:46 +00:00
sysfs kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
sysv switch sysv 2011-01-12 20:02:44 -05:00
ubifs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
udf Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 2011-01-11 14:45:52 -08:00
ufs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
xfs xfs: Do not name variables "panic" 2011-01-17 12:39:07 -08:00
aio.c aio: check return value of create_workqueue() 2011-01-17 05:12:44 -05:00
anon_inodes.c sanitize vfsmount refcounting changes 2011-01-16 13:47:07 -05:00
attr.c
bad_inode.c fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c binfmt_elf: cleanups 2011-01-13 08:03:12 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c convert get_sb_single() users 2010-10-29 04:16:28 -04:00
binfmt_script.c
binfmt_som.c
bio-integrity.c bio-integrity: mark kintegrityd_wq highpri and CPU intensive 2011-01-03 15:01:48 +01:00
bio.c bio: take care not overflow page count when mapping/copying user data 2010-11-10 14:40:43 +01:00
block_dev.c block: restore multiple bd_link_disk_holder() support 2011-01-14 18:44:22 +01:00
buffer.c fs: Use this_cpu_inc_return in buffer.c 2010-12-17 15:18:05 +01:00
char_dev.c Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
compat_binfmt_elf.c
compat_ioctl.c Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 2011-01-07 14:39:20 -08:00
compat.c compat: copy missing fields in compat_statfs64 to user 2011-01-17 04:54:38 -05:00
dcache.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-01-16 11:31:50 -08:00
dcookies.c
direct-io.c fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio 2011-01-20 17:02:05 -08:00
drop_caches.c
eventfd.c
eventpoll.c epoll: convert max_user_watches to long 2011-01-13 08:03:12 -08:00
exec.c install_special_mapping skips security_file_mmap check. 2010-12-15 12:30:36 -08:00
fcntl.c fasync: Fix placement of FASYNC flag comment 2010-10-27 18:17:02 -07:00
fifo.c
file_table.c fs: Remove unlikely() from fget_light() 2011-01-17 03:26:27 -05:00
file.c
filesystems.c fs: rcu-walk for path lookup 2011-01-07 17:50:27 +11:00
fs_struct.c sanitize vfsmount refcounting changes 2011-01-16 13:47:07 -05:00
fs-writeback.c fs/fs-writeback.c: fix sync_inodes_sb() return value kernel-doc 2011-01-13 17:32:48 -08:00
generic_acl.c fs: provide simple rcu-walk generic_check_acl implementation 2011-01-07 17:50:29 +11:00
inode.c fs: avoid inode RCU freeing for pseudo fs 2011-01-07 17:50:26 +11:00
internal.h tidy up around finish_automount() 2011-01-17 01:47:59 -05:00
ioctl.c fs: fix address space warnings in ioctl_fiemap() 2011-01-17 08:21:42 -05:00
ioprio.c ioprio: grab rcu_read_lock in sys_ioprio_{set,get}() 2010-11-15 10:23:31 +01:00
Kconfig kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
Kconfig.binfmt coredump: default CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y 2010-10-27 18:03:12 -07:00
libfs.c pass default dentry_operations to mount_pseudo() 2011-01-12 20:03:43 -05:00
locks.c Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux 2011-01-14 13:17:26 -08:00
Makefile Merge 'staging-next' to Linus's tree 2010-10-28 09:44:56 -07:00
mbcache.c ext2: Resolve 'dereferencing pointer to incomplete type' when enabling EXT2_XATTR_DEBUG 2011-01-10 19:04:08 +01:00
mpage.c fs/mpage.c: consolidate code 2011-01-13 17:32:32 -08:00
namei.c vfs - fix dentry ref count in do_lookup() 2011-01-18 01:21:26 -05:00
namespace.c tidy up around finish_automount() 2011-01-17 01:47:59 -05:00
nfsctl.c
no-block.c
open.c fallocate should be a file operation 2011-01-17 02:25:31 -05:00
pipe.c Fix broken "pipe: use event aware wakeups" optimization 2011-01-20 16:21:59 -08:00
pnode.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
pnode.h
posix_acl.c
read_write.c fix signedness mess in rw_verify_area() on 64bit architectures 2011-01-12 20:06:58 -05:00
read_write.h
readdir.c
select.c fs/select.c: fix information leak to userspace 2011-01-13 08:03:12 -08:00
seq_file.c
signalfd.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2010-10-26 10:13:10 -07:00
splice.c Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
stack.c
stat.c Add an AT_NO_AUTOMOUNT flag to suppress terminal automount 2011-01-15 20:07:33 -05:00
statfs.c
super.c sanitize vfsmount refcounting changes 2011-01-16 13:47:07 -05:00
sync.c
timerfd.c
utimes.c
xattr_acl.c
xattr.c