Commit Graph

1970 Commits

Author SHA1 Message Date
NeilBrown
93b3dbce64 md/raid5: unite fetch_block5 and fetch_block6
Provided that ->failed_num[1] is not a valid device number (which is
easily achieved) fetch_block6 provides all the functionality of
fetch_block5.

So remove the latter and rename the former to simply "fetch_block".

Then handle_stripe_fill5 and handle_stripe_fill6 become the same and
can similarly be united.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27 11:00:36 +10:00
NeilBrown
5d35e09cae md/raid5: rearrange a test in fetch_block6.
Next patch will unite fetch_block5 and fetch_block6.
First I want to make the differences a little more clear.

For RAID6 if we are writing at all and there is a failed device, then
we need to load or compute every block so we can do a
reconstruct-write.
This case isn't needed for RAID5 - we will do a read-modify-write in
that case.
So make that test a separate test in fetch_block6 rather than merged
with two other tests.

Make a similar change in fetch_block5 so the one bit that is not
needed for RAID6 is clearly separate.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27 11:00:36 +10:00
NeilBrown
c5a3100062 md/raid5: move more code into common handle_stripe
The difference between the RAID5 and RAID6 code here is easily
resolved using conf->max_degraded.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27 11:00:36 +10:00
NeilBrown
3687c06188 md/raid5: Move code for finishing a reconstruction into handle_stripe.
Prior to commit ab69ae12ce the code in handle_stripe5 and
handle_stripe6 to "Finish reconstruct operations initiated by the
expansion process" was identical.
That commit added an identical stanza of code to each function, but in
different places.  That was careless.

The raid5 code was correct, so move that out into handle_stripe and
remove raid6 version.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27 11:00:36 +10:00
NeilBrown
86c374ba9f md/raid5: Remove stripe_head_state arg from handle_stripe_expansion.
This arg is only used to differentiate between RAID5 and RAID6 but
that is not needed.  For RAID5, raid5_compute_sector will set qd_idx
to "~0" so j with certainly not equals qd_idx, so there is no need
for a guard on that condition.

So remove the guard and remove the arg from the declaration and
callers of handle_stripe_expansion.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-27 11:00:36 +10:00
NeilBrown
cc94015a9e md/raid5: move stripe_head_state and more code into handle_stripe.
By defining the 'stripe_head_state' in 'handle_stripe', we can move
some common code out of handle_stripe[56]() and into handle_stripe.

The means that all accesses for stripe_head_state in handle_stripe[56]
need to be 's->' instead of 's.', but the compiler should inline
those functions and just use a direct stack reference, and future
patches while hoist most of this code up into handle_stripe()
so we will revert to "s.".

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:35:35 +10:00
NeilBrown
c5709ef6a0 md/raid5: add some more fields to stripe_head_state
Adding these three fields will allow more common code to be moved
to handle_stripe()

struct field rearrangement by Namhyung Kim.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:35:20 +10:00
NeilBrown
f2b3b44dee md/raid5: unify stripe_head_state and r6_state
'struct stripe_head_state' stores state about the 'current' stripe
that is passed around while handling the stripe.
For RAID6 there is an extension structure: r6_state, which is also
passed around.
There is no value in keeping these separate, so move the fields from
the latter into the former.

This means that all code now needs to treat s->failed_num as an small
array, but this is a small cost.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:35:19 +10:00
NeilBrown
82e5a1718b md/raid5: move common code into handle_stripe
There is common code at the start of handle_stripe5 and
handle_stripe6.  Move it into handle_stripe.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:35:15 +10:00
NeilBrown
c4c1663be4 md/raid5: replace sh->lock with an 'active' flag.
sh->lock is now mainly used to ensure that two threads aren't running
in the locked part of handle_stripe[56] at the same time.

That can more neatly be achieved with an 'active' flag which we set
while running handle_stripe.  If we find the flag is set, we simply
requeue the stripe for later by setting STRIPE_HANDLE.

For safety we take ->device_lock while examining the state of the
stripe and creating a summary in 'stripe_head_state / r6_state'.
This possibly isn't needed but as shared fields like ->toread,
->towrite are checked it is safer for now at least.

We leave the label after the old 'unlock' called "unlock" because it
will disappear in a few patches, so renaming seems pointless.

This leaves the stripe 'locked' for longer as we clear STRIPE_ACTIVE
later, but that is not a problem.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:34:20 +10:00
NeilBrown
cbe47ec559 md/raid5: Protect some more code with ->device_lock.
Other places that change or follow dev->towrite and dev->written take
the device_lock as well as the sh->lock.
So it should really be held in these places too.
Also, doing so will allow sh->lock to be discarded.

with merged fixes by: Namhyung Kim <namhyung@gmail.com>


Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:20:35 +10:00
NeilBrown
83206d66b6 md/raid5: Remove use of sh->lock in sync_request
This is the start of a series of patches to remove sh->lock.

sync_request takes sh->lock before setting STRIPE_SYNCING to ensure
there is no race with testing it in handle_stripe[56].

Instead, use a new flag STRIPE_SYNC_REQUESTED and test it early
in handle_stripe[56] (after getting the same lock) and perform the
same set/clear operations if it was set.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:19:49 +10:00
Namhyung Kim
ffd96e35c1 md/raid5: get rid of duplicated call to bio_data_dir()
In raid5::make_request(), once bio_data_dir(@bi) is detected
it never (and couldn't) be changed. Use the result always.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-18 17:38:51 +10:00
Namhyung Kim
6ce328462c md/raid5: use kmem_cache_zalloc()
Replace kmem_cache_alloc + memset(,0,) to kmem_cache_zalloc.
I think it's not harmful since @conf->slab_cache already knows
actual size of struct stripe_head.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-18 17:38:50 +10:00
Namhyung Kim
c65060ad42 md/raid10: share pages between read and write bio's during recovery
When performing a recovery, only first 2 slots in r10_bio are in use,
for read and write respectively. However all of pages in the write bio
are never used and just replaced to read bio's when the read completes.

Get rid of those unused pages and share read pages properly.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-18 17:38:49 +10:00
Namhyung Kim
778ca01852 md/raid10: factor out common bio handling code
When normal-write and sync-read/write bio completes, we should
find out the disk number the bio belongs to. Factor those common
code out to a separate function.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-18 17:38:47 +10:00
Namhyung Kim
2c4193df37 md/raid10: get rid of duplicated conditional expression
Variable 'first' is initialized to zero and updated to @rdev->raid_disk
only if it is greater than 0. Thus condition '>= first' always implies
'>= 0' so the latter is not needed.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-18 17:38:43 +10:00
NeilBrown
4274215d24 md: avoid endless recovery loop when waiting for fail device to complete.
If a device fails in a way that causes pending request to take a while
to complete, md will not be able to immediately remove it from the
array in remove_and_add_spares.
It will then incorrectly look like a spare device and md will try to
recover it even though it is failed.
This leads to a recovery process starting and instantly aborting over
and over again.

We should check if the device is faulty before considering it to be a
spare.  This will avoid trying to start a recovery that cannot
proceed.

This bug was introduced in 2.6.26 so that patch is suitable for any
kernel since then.

Cc: stable@kernel.org
Reported-by: Jim Paradis <james.paradis@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-28 16:59:42 +10:00
Namhyung Kim
fcde90759a md/raid5: remove unusual use of bio_iovec_idx()
In the bio_for_each_segment loop, bvl always points current
bio_vec, so the same as bio_iovec_idx(, i). Let's get rid of
it.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-14 14:23:57 +10:00
Namhyung Kim
b062962edb md/raid5: fix FUA request handling in ops_run_io()
Commit e9c7469bb4 ("md: implment REQ_FLUSH/FUA support")
introduced R5_WantFUA flag and set rw to WRITE_FUA in that case.
However remaining code still checks whether rw is exactly same
as WRITE or not, so FUAed-write ends up with being treated as
READ. Fix it.

This bug has been present since 2.6.37 and the fix is suitable for any
-stable kernel since then.  It is not clear why this has not caused
more problems.

Cc: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-14 14:20:19 +10:00
Namhyung Kim
9b2dc8b665 md/raid5: fix raid5_set_bi_hw_segments
The @bio->bi_phys_segments consists of active stripes count in the
lower 16 bits and processed stripes count in the upper 16 bits. So
logical-OR operator should be bitwise one.

This bug has been present since 2.6.27 and the fix is suitable for any
-stable kernel since then.  Fortunately the bad code is only used on
error paths and is relatively unlikely to be hit.

Cc: stable@kernel.org
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-14 14:09:41 +10:00
Namhyung Kim
97b3d4aacf md/bitmap: remove unused fields from struct bitmap
Get rid of ->syncchunk and ->counter_bits since they're never used.

Also discard COUNTER_BYTE_RATIO which is unused.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09 11:43:01 +10:00
Namhyung Kim
27d5ea04d0 md/bitmap: use proper accessor macro
Use COUNTER()/NEEDED() macro instead of open-coding them.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09 11:42:57 +10:00
Namhyung Kim
01393f3d58 md: check ->hot_remove_disk when removing disk
Check pers->hot_remove_disk instead of pers->hot_add_disk in slot_store()
during disk removal. The linear personality only has ->hot_add_disk and
no ->hot_remove_disk, so that removing disk in the array resulted to
following kernel bug:

$ sudo mdadm --create /dev/md0 --level=linear --raid-devices=4 /dev/loop[0-3]
$ echo none | sudo tee /sys/block/md0/md/dev-loop2/slot
 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [<          (null)>]           (null)
 PGD c9f5d067 PUD 8575a067 PMD 0
 Oops: 0010 [#1] SMP
 CPU 2
 Modules linked in: linear loop bridge stp llc kvm_intel kvm asus_atk0110 sr_mod cdrom sg

 Pid: 10450, comm: tee Not tainted 3.0.0-rc1-leonard+ #173 System manufacturer System Product Name/P5G41TD-M PRO
 RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
 RSP: 0018:ffff880085757df0  EFLAGS: 00010282
 RAX: ffffffffa00168e0 RBX: ffff8800d1431800 RCX: 000000000000006e
 RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88008543c000
 RBP: ffff880085757e48 R08: 0000000000000002 R09: 000000000000000a
 R10: 0000000000000000 R11: ffff88008543c2e0 R12: 00000000ffffffff
 R13: ffff8800b4641000 R14: 0000000000000005 R15: 0000000000000000
 FS:  00007fe8c9e05700(0000) GS:ffff88011fa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000000 CR3: 00000000b4502000 CR4: 00000000000406e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process tee (pid: 10450, threadinfo ffff880085756000, task ffff8800c9f08000)
 Stack:
  ffffffff8138496a ffff8800b4641000 ffff88008543c268 0000000000000000
  ffff8800b4641000 ffff88008543c000 ffff8800d1431868 ffffffff81a78a90
  ffff8800b4641000 ffff88008543c000 ffff8800d1431800 ffff880085757e98
 Call Trace:
  [<ffffffff8138496a>] ? slot_store+0xaa/0x265
  [<ffffffff81384bae>] rdev_attr_store+0x89/0xa8
  [<ffffffff8115a96a>] sysfs_write_file+0x108/0x144
  [<ffffffff81106b87>] vfs_write+0xb1/0x10d
  [<ffffffff8106e6c0>] ? trace_hardirqs_on_caller+0x111/0x135
  [<ffffffff81106cac>] sys_write+0x4d/0x77
  [<ffffffff814fe702>] system_call_fastpath+0x16/0x1b
 Code:  Bad RIP value.
 RIP  [<          (null)>]           (null)
  RSP <ffff880085757df0>
 CR2: 0000000000000000
 ---[ end trace ba5fc64319a826fb ]---

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09 11:42:54 +10:00
马建朋
9864c0053d md: Using poll /proc/mdstat can monitor the events of adding a spare disks
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09 11:42:48 +10:00
Jonathan Brassow
d744540cd3 MD: use is_power_of_2 macro
Make use of is_power_of_2 macro.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09 11:42:36 +10:00
Jonathan Brassow
d6b212f4b1 MD: raid5 do not set fullsync
Add check to determine if a device needs full resync or if partial resync will do

RAID 5 was assuming that if a device was not In_sync, it must undergo a full
resync.  We add a check to see if 'saved_raid_disk' is the same as 'raid_disk'.
If it is, we can safely skip the full resync and rely on the bitmap for
partial recovery instead.  This is the legitimate purpose of 'saved_raid_disk',
from md.h:
int saved_raid_disk;            /* role that device used to have in the
                                 * array and could again if we did a partial
                                 * resync from the bitmap
                                 */

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09 11:42:29 +10:00
Jonathan Brassow
9c81075f43 MD: support initial bitmap creation in-kernel
Add bitmap support to the device-mapper specific metadata area.

This patch allows the creation of the bitmap metadata area upon
initial array creation via device-mapper.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09 11:41:36 +10:00
Jonathan Brassow
076f968b37 MD: add sync_super to mddev_t struct
Add the 'sync_super' function pointer to MD array structure (struct mddev_s)

If device-mapper (dm-raid.c) is to define its own on-disk superblock and be
able to load it, there must still be a way for MD to initiate superblock
updates.  The simplest way to make this happen is to provide a pointer in
the MD array structure that can be set by device-mapper (or other module)
with a function to do this.  If the function has been set, it will be used;
otherwise, the method with be looked up via 'super_types' as usual.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 15:11:31 +10:00
Jonathan Brassow
1ed7242e59 MD: raid1 changes to allow use by device mapper
MD RAID1: Changes to allow RAID1 to be used by device-mapper (dm-raid.c)

Added the necessary congestion function and conditionalize calls requiring an
array 'queue' or 'gendisk'.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 15:11:31 +10:00
Jonathan Brassow
0fd018af37 MD: move thread wakeups into resume
Move personality and sync/recovery thread starting outside md_run.

Moving the wakeup's of the personality and sync/recovery threads out of
md_run and into do_md_run and mddev_resume solves two issues:
1) It allows bitmap_load to be called before the sync_thread is run and
2) when MD personalities are used by device-mapper (dm-raid.c), the start-up
of the array is better alligned with device-mapper primatives
(CTR/resume/suspend/DTR).  I/O - in this case, recovery operations - should
not happen until after a resume has taken place.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 15:11:31 +10:00
Jonathan Brassow
ac42450c7c MD: possible typo
Make message a bit clearer by s/blocks/k/

I chose 'k' vs 'kiB' or 'kB' because it is what is used earlier in the
message.  'k' may be a bit ambigous, but I think it's better than "blocks"
which normally means 512, but means 1024 in MD.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 15:11:31 +10:00
Jonathan Brassow
68866e425b MD: no sync IO while suspended
Disallow resync I/O while the RAID array is suspended.

Recovery, resync, and metadata I/O should not be allowed while a device is
suspended.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 15:10:08 +10:00
Jonathan Brassow
629acb6aba MD: no integrity register if no gendisk
Don't attempt md_integrity_register if there is no gendisk struct available.

When MD arrays are built via device-mapper, the gendisk structure is not
available via mddev.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 15:10:08 +10:00
Mikulas Patocka
fa34ce7307 dm kcopyd: return client directly and not through a pointer
Return client directly from dm_kcopyd_client_create, not through a
parameter, making it consistent with dm_io_client_create.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:03:13 +01:00
Mikulas Patocka
5f43ba2950 dm kcopyd: reserve fewer pages
Reserve just the minimum of pages needed to process one job.

Because we allocate pages from page allocator, we don't need to reserve
a large number of pages.  The maximum job size is SUB_JOB_SIZE and we
calculate the number of reserved pages based on this.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:03:11 +01:00
Mikulas Patocka
bda8efec5c dm io: use fixed initial mempool size
Replace the arbitrary calculation of an initial io struct mempool size
with a constant.

The code calculated the number of reserved structures based on the request
size and used a "magic" multiplication constant of 4.  This patch changes
it to reserve a fixed number - itself still chosen quite arbitrarily.
Further testing might show if there is a better number to choose.

Note that if there is no memory pressure, we can still allocate an
arbitrary number of "struct io" structures.  One structure is enough to
process the whole request.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:03:09 +01:00
Mikulas Patocka
d04714580f dm kcopyd: alloc pages from the main page allocator
This patch changes dm-kcopyd so that it allocates pages from the main
page allocator with __GFP_NOWARN | __GFP_NORETRY flags (so that it can
fail in case of memory pressure). If the allocation fails, dm-kcopyd
allocates pages from its own reserve.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:03:07 +01:00
Mikulas Patocka
f99b55eec7 dm kcopyd: add gfp parm to alloc_pl
Introduce a parameter for gfp flags to alloc_pl() for use in following
patches.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:03:04 +01:00
Mikulas Patocka
4cc1b4cffd dm kcopyd: remove superfluous page allocation spinlock
Remove the spinlock protecting the pages allocation.  The spinlock is only
taken on initialization or from single-threaded workqueue.  Therefore, the
spinlock is useless.

The spinlock is taken in kcopyd_get_pages and kcopyd_put_pages.

kcopyd_get_pages is only called from run_pages_job, which is only
called from process_jobs called from do_work.

kcopyd_put_pages is called from client_alloc_pages (which is initialization
function) or from run_complete_job. run_complete_job is only called from
process_jobs called from do_work.

Another spinlock, kc->job_lock is taken each time someone pushes or pops
some work for the worker thread.  Once we take kc->job_lock, we
guarantee that any written memory is visible to the other CPUs.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:03:02 +01:00
Mikulas Patocka
c6ea41fbbe dm kcopyd: preallocate sub jobs to avoid deadlock
There's a possible theoretical deadlock in dm-kcopyd because multiple
allocations from the same mempool are required to finish a request.
Avoid this by preallocating sub jobs.

There is a mempool of 512 entries. Each request requires up to 9
entries from the mempool. If we have at least 57 concurrent requests
running, the mempool may overflow and mempool allocations may start
blocking until another entry is freed to the mempool. Because the same
thread is used to free entries to the mempool and allocate entries from
the mempool, this may result in a deadlock.

This patch changes it so that one mempool entry contains all 9 "struct
kcopyd_job" required to fulfill the whole request. The allocation is
done only once in dm_kcopyd_copy and no further mempool allocations are
done during request processing.

If dm_kcopyd_copy is not run in the completion thread, this
implementation is deadlock-free.

MIN_JOBS needs reducing accordingly and we've chosen to reduce it
further to 8.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:03:00 +01:00
Mikulas Patocka
a705a34a56 dm kcopyd: avoid pointless job splitting
Don't split SUB_JOB_SIZE jobs

If the job size equals SUB_JOB_SIZE, there is no point in splitting it.
Splitting it just unnecessarily wastes time, because the split job size
is SUB_JOB_SIZE too.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:02:58 +01:00
Martin K. Petersen
6f13f6fba7 dm mpath: do not fail paths after integrity errors
Integrity errors need to be passed to the owner of the integrity
metadata for processing. Consequently EILSEQ should be passed up the
stack.

Cc: stable@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:02:55 +01:00
Milan Broz
f4808ca99a dm table: reject devices without request fns
This patch adds a check that a block device has a request function
defined before it is used.  Otherwise, misconfiguration can cause an oops.

Because we are allowing devices with zero size e.g. an offline multipath
device as in commit 2cd54d9bed
("dm: allow offline devices") there needs to be an additional check
to ensure devices are initialised.  Some block devices, like a loop
device without a backing file, exist but have no request function.

Reproducer is trivial: dm-mirror on unbound loop device
(no backing file on loop devices)

dmsetup create x --table "0 8 mirror core 2 8 sync 2 /dev/loop0 0 /dev/loop1 0"

and mirror resync will immediatelly cause OOps.

BUG: unable to handle kernel NULL pointer dereference at   (null)
 ? generic_make_request+0x2bd/0x590
 ? kmem_cache_alloc+0xad/0x190
 submit_bio+0x53/0xe0
 ? bio_add_page+0x3b/0x50
 dispatch_io+0x1ca/0x210 [dm_mod]
 ? read_callback+0x0/0xd0 [dm_mirror]
 dm_io+0xbb/0x290 [dm_mod]
 do_mirror+0x1e0/0x748 [dm_mirror]

Signed-off-by: Milan Broz <mbroz@redhat.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 13:02:52 +01:00
Mike Snitzer
4c25932701 dm table: allow targets to support discards internally
Permit a target to support discards regardless of whether or not all its
underlying devices do.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29 12:52:55 +01:00
Linus Torvalds
57d19e80f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  b43: fix comment typo reqest -> request
  Haavard Skinnemoen has left Atmel
  cris: typo in mach-fs Makefile
  Kconfig: fix copy/paste-ism for dell-wmi-aio driver
  doc: timers-howto: fix a typo ("unsgined")
  perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
  md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
  treewide: fix a few typos in comments
  regulator: change debug statement be consistent with the style of the rest
  Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
  audit: acquire creds selectively to reduce atomic op overhead
  rtlwifi: don't touch with treewide double semicolon removal
  treewide: cleanup continuations and remove logging message whitespace
  ath9k_hw: don't touch with treewide double semicolon removal
  include/linux/leds-regulator.h: fix syntax in example code
  tty: fix typo in descripton of tty_termios_encode_baud_rate
  xtensa: remove obsolete BKL kernel option from defconfig
  m68k: fix comment typo 'occcured'
  arch:Kconfig.locks Remove unused config option.
  treewide: remove extra semicolons
  ...
2011-05-23 09:12:26 -07:00
NeilBrown
b098636cf0 md: allow resync_start to be set while an array is active.
The sysfs attribute 'resync_start' (known internally as recovery_cp),
records where a resync is up to.  A value of 0 means the array is
not known to be in-sync at all.  A value of MaxSector means the array
is believed to be fully in-sync.

When the size of member devices of an array (RAID1,RAID4/5/6) is
increased, the array can be increased to match.  This process sets
resync_start to the old end-of-device offset so that the new part of
the array gets resynced.

However with RAID1 (and RAID6) a resync is not technically necessary
and may be undesirable.  So it would be good if the implied resync
after the array is resized could be avoided.

So: change 'resync_start' so the value can be changed while the array
is active, and as a precaution only allow it to be changed while
resync/recovery is 'frozen'.  Changing it once resync has started is
not going to be useful anyway.

This allows the array to be resized without a resync by:
  write 'frozen' to 'sync_action'
  write new size to 'component_size' (this will set resync_start)
  write 'none' to 'resync_start'
  write 'idle' to 'sync_action'.

Also slightly improve some tests on recovery_cp when resizing
raid1/raid5.  Now that an arbitrary value could be set we should be
more careful in our tests.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-11 15:52:21 +10:00
NeilBrown
ab9d47e990 md/raid10: reformat some loops with less indenting.
When a loop ends with an 'if' with a large body, it is neater
to make the if 'continue' on the inverse condition, and then
the body is indented less.

Apply this pattern 3 times, and wrap some other long lines.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-11 14:54:41 +10:00
NeilBrown
f17ed07c85 md/raid10: remove unused variable.
This variable 'disk' is never used - how odd.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-11 14:54:32 +10:00
NeilBrown
a8830bcaf3 md/raid10: make more use of 'slot' in raid10d.
Now that we have a 'slot' variable, make better use of it to simplify
some code a little.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-11 14:54:19 +10:00