Commit Graph

2934 Commits

Author SHA1 Message Date
Brian Foster
74564fb48c xfs: clean up xfs_inactive() error handling, kill VN_INACTIVE_[NO]CACHE
The xfs_inactive() return value is meaningless. Turn xfs_inactive()
into a void function and clean up the error handling appropriately.
Kill the VN_INACTIVE_[NO]CACHE directives as they are not relevant
to Linux.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08 17:20:41 -05:00
Brian Foster
88877d2b97 xfs: push down inactive transaction mgmt for ifree
Push the inode free work performed during xfs_inactive() down into
a new xfs_inactive_ifree() helper. This clears xfs_inactive() from
all inode locking and transaction management more directly
associated with freeing the inode xattrs, extents and the inode
itself.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08 17:15:01 -05:00
Brian Foster
f7be2d7f59 xfs: push down inactive transaction mgmt for truncate
Create the new xfs_inactive_truncate() function to handle the
truncate portion of xfs_inactive(). Push the locking and
transaction management into the new function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08 15:32:11 -05:00
Brian Foster
36b21dde6e xfs: push down inactive transaction mgmt for remote symlinks
Push down the transaction management for remote symlinks from
xfs_inactive() down to xfs_inactive_symlink_rmt(). The latter is
cleaned up to avoid transaction management intended for the
calling context (i.e., trans duplication, reservation, item
attachment).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08 14:53:02 -05:00
Mark Tinguely
2900a579ab xfs: add the inode directory type support to XFS_IOC_FSGEOM
Add the inode type directory type support to XFS_IOC_FSGEOM
so that xfs_repair/xfs_info knows if the superblock v4 filesystem
enabled the feature.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08 14:28:09 -05:00
Ben Myers
d948709b8e xfs: remove usage of is_bad_inode
XFS never calls mark_inode_bad or iget_failed, so it will never see a
bad inode.  Remove all checks for is_bad_inode because they are
unnecessary.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2013-10-01 17:38:16 -05:00
Jie Liu
17ec81c15f xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct()
At xfs_iext_realloc_direct(), the new_size is changed by adding
if_bytes if originally the extent records are stored at the inline
extent buffer, and we have to switch from it to a direct extent
list for those new allocated extents, this is wrong. e.g,

Create a file with three extents which was showing as following,

xfs_io -f -c "truncate 100m" /xfs/testme

for i in $(seq 0 5 10); do
	offset=$(($i * $((1 << 20))))
	xfs_io -c "pwrite $offset 1m" /xfs/testme
done

Inline
------
irec:	if_bytes	bytes_diff	new_size
1st	0		16		16
2nd	16		16		32

Switching
---------						rnew_size
3rd	32		16		48 + 32 = 80	roundup=128

In this case, the desired value of new_size should be 48, and then
it will be roundup to 64 and be assigned to rnew_size.

However, this issue has been covered by resetting the if_bytes to
the new_size which is calculated at the begnning of xfs_iext_add()
before leaving out this function, and in turn make the rnew_size
correctly again. Hence, this can not be detected via xfstestes.

This patch fix above problem and revise the new_size comments at
xfs_iext_realloc_direct() to make it more readable.  Also, fix the
comments while switching from the inline extent buffer to a direct
extent list to reflect this change.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-01 17:33:10 -05:00
Jie Liu
0799a3e808 xfs: get rid of count from xfs_iomap_write_allocate()
Get rid of function variable count from xfs_iomap_write_allocate() as
it is unused.

Additionally, checkpatch warn me of the following for this change:
WARNING: extern prototypes should be avoided in .h files
+extern int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t,

So this patch also remove all extern function prototypes at xfs_iomap.h
to suppress it to make this code style in consistent manner in this file.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-01 15:42:34 -05:00
Thierry Reding
aaaae98022 xfs: Use kmem_free() instead of free()
This fixes a build failure caused by calling the free() function which
does not exist in the Linux kernel.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-01 10:26:24 -05:00
tinguely@sgi.com
519ccb81ac xfs: fix memory leak in xlog_recover_add_to_trans
Free the memory in error path of xlog_recover_add_to_trans().
Normally this memory is freed in recovery pass2, but is leaked
in the error path.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-30 17:52:43 -05:00
Dave Chinner
367993e7c6 xfs: dirent dtype presence is dependent on directory magic numbers
The determination of whether a directory entry contains a dtype
field originally was dependent on the filesystem having CRCs
enabled. This meant that the format for dtype beign enabled could be
determined by checking the directory block magic number rather than
doing a feature bit check. This was useful in that it meant that we
didn't need to pass a struct xfs_mount around to functions that
were already supplied with a directory block header.

Unfortunately, the introduction of dtype fields into the v4
structure via a feature bit meant this "use the directory block
magic number" method of discriminating the dirent entry sizes is
broken. Hence we need to convert the places that use magic number
checks to use feature bit checks so that they work correctly and not
by chance.

The current code works on v4 filesystems only because the dirent
size roundup covers the extra byte needed by the dtype field in the
places where this problem occurs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-30 17:49:28 -05:00
Dave Chinner
f112a04971 xfs: lockdep needs to know about 3 dquot-deep nesting
Michael Semon reported that xfs/299 generated this lockdep warning:

=============================================
[ INFO: possible recursive locking detected ]
3.12.0-rc2+ #2 Not tainted
---------------------------------------------
touch/21072 is trying to acquire lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

but task is already holding lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xfs_dquot_other_class);
  lock(&xfs_dquot_other_class);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

7 locks held by touch/21072:
 #0:  (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
 #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
 #2:  (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
 #3:  (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
 #4:  (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
 #5:  (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
 #6:  (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

The lockdep annotation for dquot lock nesting only understands
locking for user and "other" dquots, not user, group and quota
dquots. Fix the annotations to match the locking heirarchy we now
have.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-30 17:48:25 -05:00
Mark Tinguely
997def25e4 xfs: fix node forward in xfs_node_toosmall
Commit f5ea1100 cleans up the disk to host conversions for
node directory entries, but because a variable is reused in
xfs_node_toosmall() the next node is not correctly found.
If the original node is small enough (<= 3/8 of the node size),
this change may incorrectly cause a node collapse when it should
not. That will cause an assert in xfstest generic/319:

   Assertion failed: first <= last && last < BBTOB(bp->b_length),
   file: /root/newest/xfs/fs/xfs/xfs_trans_buf.c, line: 569

Keep the original node header to get the correct forward node.

(When a node is considered for a merge with a sibling, it overwrites the
 sibling pointers of the original incore nodehdr with the sibling's
 pointers.  This leads to loop considering the original node as a merge
 candidate with itself in the second pass, and so it incorrectly
 determines a merge should occur.)

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

[v3: added Dave Chinner's (slightly modified) suggestion to the commit header,
	cleaned up whitespace.  -bpm]
2013-09-26 10:38:17 -05:00
Dave Chinner
566055d33a xfs: log recovery lsn ordering needs uuid check
After a fair number of xfstests runs, xfs/182 started to fail
regularly with a corrupted directory - a directory read verifier was
failing after recovery because it found a block with a XARM magic
number (remote attribute block) rather than a directory data block.

The first time I saw this repeated failure I did /something/ and the
problem went away, so I was never able to find the underlying
problem. Test xfs/182 failed again today, and I found the root
cause before I did /something else/ that made it go away.

Tracing indicated that the block in question was being correctly
logged, the log was being flushed by sync, but the buffer was not
being written back before the shutdown occurred. Tracing also
indicated that log recovery was also reading the block, but then
never writing it before log recovery invalidated the cache,
indicating that it was not modified by log recovery.

More detailed analysis of the corpse indicated that the filesystem
had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM
block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which
indicated it was a block from an older filesystem. The reason that
log recovery didn't replay it was that the LSN in the XARM block was
larger than the LSN of the transaction being replayed, and so the
block was not overwritten by log recovery.

Hence, log recovery cant blindly trust the magic number and LSN in
the block - it must verify that it belongs to the filesystem being
recovered before using the LSN. i.e. if the UUIDs don't match, we
need to unconditionally recovery the change held in the log.

This patch was first tested on a block device that was repeatedly
causing xfs/182 to fail with the same failure on the same block with
the same directory read corruption signature (i.e. XARM block). It
did not fail, and hasn't failed since.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24 12:35:57 -05:00
Dave Chinner
b771af2fcb xfs: fix XFS_IOC_FREE_EOFBLOCKS definition
It uses a kernel internal structure in it's definition rather than
the user visible structure that is passed to the ioctl.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24 12:35:08 -05:00
Dave Chinner
b313a5f1cb xfs: asserting lock not held during freeing not valid
When we free an inode, we do so via RCU. As an RCU lookup can occur
at any time before we free an inode, and that lookup takes the inode
flags lock, we cannot safely assert that the flags lock is not held
just before marking it dead and running call_rcu() to free the
inode.

We check on allocation of a new inode structre that the lock is not
held, so we still have protection against locks being leaked and
hence not correctly initialised when allocated out of the slab.
Hence just remove the assert...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24 12:32:57 -05:00
Dave Chinner
4885235806 xfs: lock the AIL before removing the buffer item
Regression introduced by commit 46f9d2e ("xfs: aborted buf items can
be in the AIL") which fails to lock the AIL before removing the
item. Spinlock debugging throws a warning about this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24 12:31:41 -05:00
Linus Torvalds
e0ea4045bc xfs: update #2 for v3.12-rc1
Here we have defrag support for v5 superblock, a number of bugfixes and
 a cleanup or two.
 
 - defrag support for CRC filesystems
 - fix endian worning in xlog_recover_get_buf_lsn
 - fixes for sparse warnings
 - fix for assert in xfs_dir3_leaf_hdr_from_disk
 - fix for log recovery of remote symlinks
 - fix for log recovery of btree root splits
 - fixes formemory allocation failures with ACLs
 - fix for assert in xfs_buf_item_relse
 - fix for assert in xfs_inode_buf_verify
 - fix an assignment in an assert that should be a test in
   xfs_bmbt_change_owner
 - remove dead code in xlog_recover_inode_pass2
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJSMjQUAAoJENaLyazVq6ZOu2IP/1OHZYy+Bkmj0tO9pdsdEa4s
 w4FEBPsQePMJPjwdN693rKpW1exZue5sUmPMErH3ENzc2DPAwpUAlc9XAIohtdFx
 rTqrz2q+qTfZTq8oYBIA/RCOifJ2cHWN8tDYZPJpp5wceV7CRGYQeR1foiudE3ZH
 QDIPXioy8P9IkfGaXCtr/iWf9kycMO2lgNTNfdL6qtwX99HCqHZanTlsWx1BIYGQ
 Fa5TaOsXis6idPMCFMuEC15iEwA+YXc0HmXuHkMFLj+9mwFc4h/Aq65bwUkYZLmy
 +T1Wo/uQ/21rl6im/rWqgCh6fFS8NJQp8NIJeCIyihUEHbarfPyJIJRJjoP457YO
 cv8OkixCkt4zX6CkTxaL5ZFEBW9FYbRb13Gg96J6hb4WfdAFMtQg7FAjThSU/+Qr
 HwjaAso3GXimEaZD1C3c0TtZEQ0x9E6pENVI7/ewB1I0p92p7GJBMq4C7CTAYThV
 5zhdcOnViSrJTJvVQxm+gfOYzubkWWiVmbVku3RCO6//kvPBOvJ9juSYsl0mKeRu
 v2DZZB3AYJE/qnbYfZBlktX9obE6k+keKF6w8Eiufr2IqwJaqfaM4h9eogzAwTJA
 vyXKeLxUEmgHuqivFSZjw3sEK6sY654GCMMTP+2IpD19vlAIioYXdgp0ZbkkdiE3
 6twrzdFZAr1zy80xlM8W
 =2Uq6
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.12-rc1-2' of git://oss.sgi.com/xfs/xfs

Pull xfs update #2 from Ben Myers:
 "Here we have defrag support for v5 superblock, a number of bugfixes
  and a cleanup or two.

   - defrag support for CRC filesystems
   - fix endian worning in xlog_recover_get_buf_lsn
   - fixes for sparse warnings
   - fix for assert in xfs_dir3_leaf_hdr_from_disk
   - fix for log recovery of remote symlinks
   - fix for log recovery of btree root splits
   - fixes formemory allocation failures with ACLs
   - fix for assert in xfs_buf_item_relse
   - fix for assert in xfs_inode_buf_verify
   - fix an assignment in an assert that should be a test in
     xfs_bmbt_change_owner
   - remove dead code in xlog_recover_inode_pass2"

* tag 'xfs-for-linus-v3.12-rc1-2' of git://oss.sgi.com/xfs/xfs:
  xfs: remove dead code from xlog_recover_inode_pass2
  xfs: = vs == typo in ASSERT()
  xfs: don't assert fail on bad inode numbers
  xfs: aborted buf items can be in the AIL.
  xfs: factor all the kmalloc-or-vmalloc fallback allocations
  xfs: fix memory allocation failures with ACLs
  xfs: ensure we copy buffer type in da btree root splits
  xfs: set remote symlink buffer type for recovery
  xfs: recovery of swap extents operations for CRC filesystems
  xfs: swap extents operations for CRC filesystems
  xfs: check magic numbers in dir3 leaf verifier first
  xfs: fix some minor sparse warnings
  xfs: fix endian warning in xlog_recover_get_buf_lsn()
2013-09-12 16:13:41 -07:00
Linus Torvalds
ac4de9543a Merge branch 'akpm' (patches from Andrew Morton)
Merge more patches from Andrew Morton:
 "The rest of MM.  Plus one misc cleanup"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
  mm/Kconfig: add MMU dependency for MIGRATION.
  kernel: replace strict_strto*() with kstrto*()
  mm, thp: count thp_fault_fallback anytime thp fault fails
  thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()
  thp: do_huge_pmd_anonymous_page() cleanup
  thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
  mm: cleanup add_to_page_cache_locked()
  thp: account anon transparent huge pages into NR_ANON_PAGES
  truncate: drop 'oldsize' truncate_pagecache() parameter
  mm: make lru_add_drain_all() selective
  memcg: document cgroup dirty/writeback memory statistics
  memcg: add per cgroup writeback pages accounting
  memcg: check for proper lock held in mem_cgroup_update_page_stat
  memcg: remove MEMCG_NR_FILE_MAPPED
  memcg: reduce function dereference
  memcg: avoid overflow caused by PAGE_ALIGN
  memcg: rename RESOURCE_MAX to RES_COUNTER_MAX
  memcg: correct RESOURCE_MAX to ULLONG_MAX
  mm: memcg: do not trap chargers with full callstack on OOM
  mm: memcg: rework and document OOM waiting and wakeup
  ...
2013-09-12 15:44:27 -07:00
Kirill A. Shutemov
7caef26767 truncate: drop 'oldsize' truncate_pagecache() parameter
truncate_pagecache() doesn't care about old size since commit
cedabed49b ("vfs: Fix vmtruncate() regression").  Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:02 -07:00
Mark Tinguely
08474ed639 xfs: remove dead code from xlog_recover_inode_pass2
Additional code in the error handler of xlog_recover_inode_pass2()
results in the following error:

static checker warning: "fs/xfs/xfs_log_recover.c:2999
xlog_recover_inode_pass2()
	 info: ignoring unreachable code."

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-12 09:51:49 -05:00
Dan Carpenter
aa9e10409e xfs: = vs == typo in ASSERT()
There is a '=' vs '==' typo so the ASSERT()s are always true.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-12 09:42:08 -05:00
Glauber Costa
f5e1dd3456 super: fix for destroy lrus
This patch adds the missing call to list_lru_destroy (spotted by Li Zhong)
and moves the deletion to after the shrinker is unregistered, as correctly
spotted by Dave

Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:32 -04:00
Glauber Costa
5ca302c8e5 list_lru: dynamically adjust node arrays
We currently use a compile-time constant to size the node array for the
list_lru structure.  Due to this, we don't need to allocate any memory at
initialization time.  But as a consequence, the structures that contain
embedded list_lru lists can become way too big (the superblock for
instance contains two of them).

This patch aims at ameliorating this situation by dynamically allocating
the node arrays with the firmware provided nr_node_ids.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:32 -04:00
Dave Chinner
35163417fb xfs: fix dquot isolation hang
The new LRU list isolation code in xfs_qm_dquot_isolate() isn't
completely up to date.  Firstly, it needs conversion to return enum
lru_status values, not raw numbers. Secondly - most importantly - it
fails to unlock the dquot and relock the LRU in the LRU_RETRY path.
This leads to deadlocks in xfstests generic/232. Fix them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:31 -04:00
Andrew Morton
2f5b56f856 xfs-convert-dquot-cache-lru-to-list_lru-fix
fix warnings

Cc: Dave Chinner <dchinner@redhat.com>
Cc: Glauber Costa <glommer@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:31 -04:00
Dave Chinner
cd56a39a59 xfs: convert dquot cache lru to list_lru
Convert the XFS dquot lru to use the list_lru construct and convert the
shrinker to being node aware.

[glommer@openvz.org: edited for conflicts + warning fixes]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:31 -04:00
Dave Chinner
a408235726 xfs: rework buffer dispose list tracking
In converting the buffer lru lists to use the generic code, the locking
for marking the buffers as on the dispose list was lost.  This results in
confusion in LRU buffer tracking and acocunting, resulting in reference
counts being mucked up and filesystem beig unmountable.

To fix this, introduce an internal buffer spinlock to protect the state
field that holds the dispose list information.  Because there is now
locking needed around xfs_buf_lru_add/del, and they are used in exactly
one place each two lines apart, get rid of the wrappers and code the logic
directly in place.

Further, the LRU emptying code used on unmount is less than optimal.
Convert it to use a dispose list as per a normal shrinker walk, and repeat
the walk that fills the dispose list until the LRU is empty.  Thi avoids
needing to drop and regain the LRU lock for every item being freed, and
allows the same logic as the shrinker isolate call to be used.  Simpler,
easier to understand.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:31 -04:00
Andrew Morton
addbda40be xfs-convert-buftarg-lru-to-generic-code-fix
fix warnings

Cc: Dave Chinner <dchinner@redhat.com>
Cc: Glauber Costa <glommer@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:31 -04:00
Dave Chinner
e80dfa1997 xfs: convert buftarg LRU to generic code
Convert the buftarg LRU to use the new generic LRU list and take advantage
of the functionality it supplies to make the buffer cache shrinker node
aware.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:31 -04:00
Dave Chinner
9b17c62382 fs: convert inode and dentry shrinking to be node aware
Now that the shrinker is passing a node in the scan control structure, we
can pass this to the the generic LRU list code to isolate reclaim to the
lists on matching nodes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:31 -04:00
Dave Chinner
0a234c6dcb shrinker: convert superblock shrinkers to new API
Convert superblock shrinker to use the new count/scan API, and propagate
the API changes through to the filesystem callouts.  The filesystem
callouts already use a count/scan API, so it's just changing counters to
longs to match the VM API.

This requires the dentry and inode shrinker callouts to be converted to
the count/scan API.  This is mainly a mechanical change.

[glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:30 -04:00
Glauber Costa
55f841ce93 super: fix calculation of shrinkable objects for small numbers
The sysctl knob sysctl_vfs_cache_pressure is used to determine which
percentage of the shrinkable objects in our cache we should actively try
to shrink.

It works great in situations in which we have many objects (at least more
than 100), because the aproximation errors will be negligible.  But if
this is not the case, specially when total_objects < 100, we may end up
concluding that we have no objects at all (total / 100 = 0, if total <
100).

This is certainly not the biggest killer in the world, but may matter in
very low kernel memory situations.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:29 -04:00
Dave Chinner
74ffa796e1 xfs: don't assert fail on bad inode numbers
Let the inode verifier do it's work by returning an error when we
fail to find correct magic numbers in an inode buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-10 14:07:54 -05:00
Dave Chinner
46f9d2eb37 xfs: aborted buf items can be in the AIL.
Saw this on generic/270 after a DQALLOC transaction overrun
shutdown:

XFS: Assertion failed: !(bip->bli_item.li_flags & XFS_LI_IN_AIL), file: fs/xfs/xfs_buf_item.c, line: 952
.....
 xfs_buf_item_relse+0x4f/0xd0
 xfs_buf_item_unlock+0x1b4/0x1e0
 xfs_trans_free_items+0x7d/0xb0
 xfs_trans_cancel+0x13c/0x1b0
 xfs_symlink+0x37e/0xa60
....

When a transaction abort occured.

If we are aborting a transaction and trigger this code path, then
the item may be dirty. If the item is dirty, then it may be in the
AIL. Hence if we are aborting, we need to check if the item is in
the AIL and remove it before freeing it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-10 13:58:07 -05:00
Dave Chinner
fdd3cceef4 xfs: factor all the kmalloc-or-vmalloc fallback allocations
We have quite a few places now where we do:

	x = kmem_zalloc(large size)
	if (!x)
		x = kmem_zalloc_large(large size)

and do a similar dance when freeing the memory. kmem_free() already
does the correct freeing dance, and kmem_zalloc_large() is only ever
called in these constructs, so just factor it all into
kmem_zalloc_large() and kmem_free().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-10 13:57:03 -05:00
Dave Chinner
2dc164f296 xfs: fix memory allocation failures with ACLs
Ever since increasing the number of supported ACLs from 25 to as
many as can fit in an xattr, there have been reports of order 4
memory allocations failing in the ACL code. Fix it in the same way
we've fixed all the xattr read/write code that has the same problem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-10 13:56:25 -05:00
Dave Chinner
0a4edc8f0b xfs: ensure we copy buffer type in da btree root splits
When splitting the root of the da btree, we shuffled data between
buffers and the structures that track them. At one point, we copy
data and state from one buffer to another, including the ops
associated with the buffer. When we do this, we also need to copy
the buffer type associated with the buf log item so that the buffer
is logged correctly. If we don't do that, log recovery won't
recognise it and hence it won't recalculate the CRC on the buffer
after recovery. This leads to a directory block that can't be read
after recovery has run.

Found by inspection after finding the same problem with remote
symlink buffers.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-10 13:34:05 -05:00
Dave Chinner
daf7b799a9 xfs: set remote symlink buffer type for recovery
The logging of a remote symlink block does not set the buffer type
being logged, and hence on recovery the type of buffer is not
recognised and hence CRCs are not calculated after replay. This
results in log recoery throwing:

XFS (vdc): Unknown buffer type 0

errors, and subsequent reads of the symlink failing CRC
verification. Found via fsstress + godown.

Reported by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-10 12:57:09 -05:00
Dave Chinner
638f44163d xfs: recovery of swap extents operations for CRC filesystems
This is the recovery side of the btree block owner change operation
performed by swapext on CRC enabled filesystems. We detect that an
owner change is needed by the flag that has been placed on the inode
log format flag field. Because the inode recovery is being replayed
after the buffers that make up the BMBT in the given checkpoint, we
can walk all the buffers and directly modify them when we see the
flag set on an inode.

Because the inode can be relogged and hence present in multiple
chekpoints with the "change owner" flag set, we could do multiple
passes across the inode to do this change. While this isn't optimal,
we can't directly ignore the flag as there may be multiple
independent swap extent operations being replayed on the same inode
in different checkpoints so we can't ignore them.

Further, because the owner change operation uses ordered buffers, we
might have buffers that are newer on disk than the current
checkpoint and so already have the owner changed in them. Hence we
cannot just peek at a buffer in the tree and check that it has the
correct owner and assume that the change was completed.

So, for the moment just brute force the owner change every time we
see an inode with the flag set. Note that we have to be careful here
because the owner of the buffers may point to either the old owner
or the new owner. Currently the verifier can't verify the owner
directly, so there is no failure case here right now. If we verify
the owner exactly in future, then we'll have to take this into
account.

This was tested in terms of normal operation via xfstests - all of
the fsr tests now pass without failure. however, we really need to
modify xfs/227 to stress v3 inodes correctly to ensure we fully
cover this case for v5 filesystems.

In terms of recovery testing, I used a hacked version of xfs_fsr
that held the temp inode open for a few seconds before exiting so
that the filesystem could be shut down with an open owner change
recovery flags set on at least the temp inode. fsr leaves the temp
inode unlinked and in btree format, so this was necessary for the
owner change to be reliably replayed.

logprint confirmed the tmp inode in the log had the correct flag set:

INO: cnt:3 total:3 a:0x69e9e0 len:56 a:0x69ea20 len:176 a:0x69eae0 len:88
        INODE: #regs:3   ino:0x44  flags:0x209   dsize:88
	                                 ^^^^^

0x200 is set, indicating a data fork owner change needed to be
replayed on inode 0x44.  A printk in the revoery code confirmed that
the inode change was recovered:

XFS (vdc): Mounting Filesystem
XFS (vdc): Starting recovery (logdev: internal)
recovering owner change ino 0x44
XFS (vdc): Version 5 superblock detected. This kernel L support enabled!
Use of these features in this kernel is at your own risk!
XFS (vdc): Ending recovery (logdev: internal)

The script used to test this was:

$ cat ./recovery-fsr.sh
#!/bin/bash

dev=/dev/vdc
mntpt=/mnt/scratch
testfile=$mntpt/testfile

umount $mntpt
mkfs.xfs -f -m crc=1 $dev
mount $dev $mntpt
chmod 777 $mntpt

for i in `seq 10000 -1 0`; do
        xfs_io -f -d -c "pwrite $(($i * 4096)) 4096" $testfile > /dev/null 2>&1
done
xfs_bmap -vp $testfile |head -20

xfs_fsr -d -v $testfile &
sleep 10
/home/dave/src/xfstests-dev/src/godown -f $mntpt
wait
umount $mntpt

xfs_logprint -t $dev |tail -20
time mount $dev $mntpt
xfs_bmap -vp $testfile
umount $mntpt
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-10 12:49:57 -05:00
Dave Chinner
21b5c9784b xfs: swap extents operations for CRC filesystems
For CRC enabled filesystems, we can't just swap inode forks from one
inode to another when defragmenting a file - the blocks in the inode
fork bmap btree contain pointers back to the owner inode. Hence if
we are to swap the inode forks we have to atomically modify every
block in the btree during the transaction.

We are doing an entire fork swap here, so we could create a new
transaction item type that indicates we are changing the owner of a
certain structure from one value to another. If we combine this with
ordered buffer logging to modify all the buffers in the tree, then
we can change the buffers in the tree without needing log space for
the operation. However, this then requires log recovery to perform
the modification of the owner information of the objects/structures
in question.

This does introduce some interesting ordering details into recovery:
we have to make sure that the owner change replay occurs after the
change that moves the objects is made, not before. Hence we can't
use a separate log item for this as we have no guarantee of strict
ordering between multiple items in the log due to the relogging
action of asynchronous transaction commits. Hence there is no
"generic" method we can use for changing the ownership of arbitrary
metadata structures.

For inode forks, however, there is a simple method of communicating
that the fork contents need the owner rewritten - we can pass a
inode log format flag for the fork for the transaction that does a
fork swap. This flag will then follow the inode fork through
relogging actions so when the swap actually gets replayed the
ownership can be changed immediately by log recovery.  So that gives
us a simple method of "whole fork" exchange between two inodes.

This is relatively simple to implement, so it makes sense to do this
as an initial implementation to support xfs_fsr on CRC enabled
filesytems in the same manner as we do on existing filesystems. This
commit introduces the swapext driven functionality, the recovery
functionality will be in a separate patch.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-10 10:26:47 -05:00
Dave Chinner
0f295a214b xfs: check magic numbers in dir3 leaf verifier first
Calling xfs_dir3_leaf_hdr_from_disk() in a verifier before
validating the magic numbers in the buffer results in ASSERT
failures due to mismatching magic numbers when a corruption occurs.
Seeing as the verifier is supposed to catch the corruption and pass
it back to the caller, having the verifier assert fail on error
defeats the purpose of detecting the errors in the first place.

Check the magic numbers direct from the buffer before decoding the
header.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-09 17:43:58 -05:00
Dave Chinner
a30b036797 xfs: fix some minor sparse warnings
A couple of simple locking annotations and 0 vs NULL warnings.
Nothing that changes any code behaviour, just removes build noise.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-09 17:43:05 -05:00
Dave Chinner
e9fbbad8d8 xfs: fix endian warning in xlog_recover_get_buf_lsn()
sparse reports:

fs/xfs/xfs_log_recover.c:2017:24: sparse: cast to restricted __be64

Because I used the wrong structure for the on-disk superblock cast
in 50d5c8d ("xfs: check LSN ordering for v5 superblocks during
recovery"). Fix it.

Reported-by: kbuild test robot
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-09 16:42:48 -05:00
Linus Torvalds
300893b08f xfs: update for v3.12-rc1
For 3.12-rc1 there are a number of bugfixes in addition to work to ease usage
 of shared code between libxfs and the kernel, the rest of the work to enable
 project and group quotas to be used simultaneously, performance optimisations
 in the log and the CIL, directory entry file type support, fixes for log space
 reservations, some spelling/grammar cleanups, and the addition of user
 namespace support.
 
 - introduce readahead to log recovery
 - add directory entry file type support
 - fix a number of spelling errors in comments
 - introduce new Q_XGETQSTATV quotactl for project quotas
 - add USER_NS support
 - log space reservation rework
 - CIL optimisations
 - kernel/userspace libxfs rework
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJSLeikAAoJENaLyazVq6ZOciEP/3tc850sQsPlNwP9aqd1l2Wk
 S1RJ8i+MUQ2W/PlbswCXvdUCT8DIwXWxL31tGvi8vtaLhh6t8ICSZwqNil+/GCIJ
 BErVvY4oXhEMHhlbIRRvpxblTfJGiYy3puUEz9VI0yDdUVnC33+DuEeLTQ/0mibo
 /UUqKFmM3KYpOc8vIQvH5K5i8PkjtMt9yge0k4l9COD30gtY2okkaD4b1voOsKc+
 5YFqulq7zcXBUYti+EFCQeV8aUBTGEPN4PJRdcS12/ylzsTzZivAOO+QREu7qBW8
 x+Gj8fOC+yYWCttmJlfa1n8taxge3ndEuzKN97nvvfQgjvvunMvwJ499skryYVdB
 EcPnBnpDUQuz/y7exKBT9uROK817vZBtfHzSova29ayQSWC+qDpNE4xXeDIqeCtT
 CPxdHuWMOvIdZg41E4x7je0elaZl8EAZ8hycc2WuRhtukEkIdE1O8aD7IVrMYee8
 kg+aVHG5nmYRInO1WuMinbtiCzwvVoBJToWM3y4cbfgW0dILASRyL53HDd+eCr1j
 kOpPIVgXlBZgiPMmdYahWxyVVWcE7zyex0w4frzWVlJMZ4lP5brppD6qfQg1JwOB
 z21Y95F5C2GxSyN/Lwps0G6jujHrpe6GVeYK7uKCtnqTD83nSShv5Naln7pQ3AUs
 qUMsqmJob4+bwt94Xgbx
 =V4s4
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs updates from Ben Myers:
 "For 3.12-rc1 there are a number of bugfixes in addition to work to
  ease usage of shared code between libxfs and the kernel, the rest of
  the work to enable project and group quotas to be used simultaneously,
  performance optimisations in the log and the CIL, directory entry file
  type support, fixes for log space reservations, some spelling/grammar
  cleanups, and the addition of user namespace support.

   - introduce readahead to log recovery
   - add directory entry file type support
   - fix a number of spelling errors in comments
   - introduce new Q_XGETQSTATV quotactl for project quotas
   - add USER_NS support
   - log space reservation rework
   - CIL optimisations
  - kernel/userspace libxfs rework"

* tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
  xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
  xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
  Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
  xfs: finish removing IOP_* macros.
  xfs: inode log reservations are too small
  xfs: check correct status variable for xfs_inobt_get_rec() call
  xfs: inode buffers may not be valid during recovery readahead
  xfs: check LSN ordering for v5 superblocks during recovery
  xfs: btree block LSN escaping to disk uninitialised
  XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
  xfs: fix bad dquot buffer size in log recovery readahead
  xfs: don't account buffer cancellation during log recovery readahead
  xfs: check for underflow in xfs_iformat_fork()
  xfs: xfs_dir3_sfe_put_ino can be static
  xfs: introduce object readahead to log recovery
  xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
  xfs: Register hotcpu notifier after initialization
  xfs: add xfs sb v4 support for dirent filetype field
  xfs: Add write support for dirent filetype field
  xfs: Add read-only support for dirent filetype field
  ...
2013-09-09 11:19:09 -07:00
Christoph Hellwig
7b7a8665ed direct-io: Implement generic deferred AIO completions
Add support to the core direct-io code to defer AIO completions to user
context using a workqueue.  This replaces opencoded and less efficient
code in XFS and ext4 (we save a memory allocation for each direct IO)
and will be needed to properly support O_(D)SYNC for AIO.

The communication between the filesystem and the direct I/O code requires
a new buffer head flag, which is a bit ugly but not avoidable until the
direct I/O code stops abusing the buffer_head structure for communicating
with the filesystems.

Currently this creates a per-superblock unbound workqueue for these
completions, which is taken from an earlier patch by Jan Kara.  I'm
not really convinced about this use and would prefer a "normal" global
workqueue with a high concurrency limit, but this needs further discussion.

JK: Fixed ext4 part, dynamic allocation of the workqueue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04 09:23:46 -04:00
Dave Chinner
1d03c6fa88 xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
So move it to a header file shared with userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-03 15:00:06 -05:00
Dave Chinner
50fc5f7acc xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
So fix up the export in xfs_dir2.h that is needed by userspace.

<sigh>

Now xfs_dir3_sfe_put_ino has been made static. Revert 98f7462 ("xfs:
xfs_dir3_sfe_put_ino can be static") to being non static so that the
code shared with userspace is identical again.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-03 14:51:16 -05:00
Eric Sandeen
914ed44b17 Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
This ASSERT is testing an if_flags flag value against
a di_aformat enum value.  di_aformat is never assigned
XFS_IFINLINE.

This happens to work for now, because XFS_IFINLINE has
the same value as XFS_DINODE_FMT_LOCAL, and that's tested
just before we call this function.

However, I think the intention is to assert that we have
read in the data, i.e. XFS_IFINLINE on if_flags, before
we use if_data.  This is done in other places through the
code as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-30 15:20:50 -05:00
Dave Chinner
904c17e683 xfs: finish removing IOP_* macros.
In optimising the CIL operations, some of the IOP_* macros for
calling log item operations were removed. Remove the rest of them as
Christoph requested.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Geoffrey Wehrman <gwehrman@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-30 14:14:35 -05:00