Merge reason: Linus applied an overlapping commit:
5f2e8e2b0b: kernel/watchdog.c: Use proper ANSI C prototypes
So merge it in to make sure we can iterate the file without conflicts.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ceph does not need these, and they screw up our use of the dcache as a
consistent cache.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
vfs_rename_dir() doesn't properly account for filesystems with
FS_RENAME_DOES_D_MOVE. If new_dentry has a target inode attached, it
unhashes the new_dentry prior to the rename() iop and rehashes it after,
but doesn't account for the possibility that rename() may have swapped
{old,new}_dentry. For FS_RENAME_DOES_D_MOVE filesystems, it rehashes
new_dentry (now the old renamed-from name, which d_move() expected to go
away), such that a subsequent lookup will find it. Currently all
FS_RENAME_DOES_D_MOVE filesystems compensate for this by failing in
d_revalidate.
The bug was introduced by: commit 349457ccf2
"[PATCH] Allow file systems to manually d_move() inside of ->rename()"
Fix by not rehashing the new dentry. Rehashing used to be needed by
d_move() but isn't anymore.
Reported-by: Sage Weil <sage@newdream.net>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There are no libfs issues with dangling references to empty directories.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only a few file systems need this. Start by pushing it down into each
rename method (except gfs2 and xfs) so that it can be dealt with on a
per-fs basis.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only a few file systems need this. Start by pushing it down into each
fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
basis.
This does not change behavior for any in-tree file systems.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This serves no useful purpose that I can discern. All callers (rename,
rmdir) hold their own reference to the dentry.
A quick audit of all file systems showed no relevant checks on the value
of d_count in vfs_rmdir/vfs_rename_dir paths.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This presumes that there is no reason to unhash a dentry if we fail because
it is a mountpoint or the LSM check fails, and that the LSM checks do not
depend on the dentry being unhashed.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We should not allow file modification via mmap while the filesystem is
frozen. So block in block_page_mkwrite() while the filesystem is frozen.
We cannot do the blocking wait in __block_page_mkwrite() since e.g. ext4
will want to call that function with transaction started in some cases
and that would deadlock. But we can at least do the non-blocking reliable
check in __block_page_mkwrite() which is the hardest part anyway.
We have to check for frozen filesystem with the page marked dirty and under
page lock with which we then return from ->page_mkwrite(). Only that way we
cannot race with writeback done by freezing code - either we mark the page
dirty after the writeback has started, see freezing in progress and block, or
writeback will wait for our page lock which is released only when the fault is
done and then writeback will writeout and writeprotect the page again.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Create __block_page_mkwrite() helper which does all what block_page_mkwrite()
does except that it passes back errors from __block_write_begin /
block_commit_write calls.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This issue was discovered by users of busybox. And the bug is actual for
busybox users, I don't know how it affects others. Apparently, mount is
called with and without MS_SILENT, and this affects mount() behaviour.
But MS_SILENT is only supposed to affect kernel logging verbosity.
The following script was run in an empty test directory:
mkdir -p mount.dir mount.shared1 mount.shared2
touch mount.dir/a mount.dir/b
mount -vv --bind mount.shared1 mount.shared1
mount -vv --make-rshared mount.shared1
mount -vv --bind mount.shared2 mount.shared2
mount -vv --make-rshared mount.shared2
mount -vv --bind mount.shared2 mount.shared1
mount -vv --bind mount.dir mount.shared2
ls -R mount.dir mount.shared1 mount.shared2
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
rm -f mount.dir/a mount.dir/b mount.dir/c
rmdir mount.dir mount.shared1 mount.shared2
mount -vv was used to show the mount() call arguments and result.
Output shows that flag argument has 0x00008000 = MS_SILENT bit:
mount: mount('mount.shared1','mount.shared1','(null)',0x00009000,'(null)'):0
mount: mount('','mount.shared1','',0x0010c000,''):0
mount: mount('mount.shared2','mount.shared2','(null)',0x00009000,'(null)'):0
mount: mount('','mount.shared2','',0x0010c000,''):0
mount: mount('mount.shared2','mount.shared1','(null)',0x00009000,'(null)'):0
mount: mount('mount.dir','mount.shared2','(null)',0x00009000,'(null)'):0
mount.dir:
a
b
mount.shared1:
mount.shared2:
a
b
After adding --loud option to remove MS_SILENT bit from just one mount cmd:
mkdir -p mount.dir mount.shared1 mount.shared2
touch mount.dir/a mount.dir/b
mount -vv --bind mount.shared1 mount.shared1 2>&1
mount -vv --make-rshared mount.shared1 2>&1
mount -vv --bind mount.shared2 mount.shared2 2>&1
mount -vv --loud --make-rshared mount.shared2 2>&1 # <-HERE
mount -vv --bind mount.shared2 mount.shared1 2>&1
mount -vv --bind mount.dir mount.shared2 2>&1
ls -R mount.dir mount.shared1 mount.shared2 2>&1
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
umount mount.dir mount.shared1 mount.shared2 2>/dev/null
rm -f mount.dir/a mount.dir/b mount.dir/c
rmdir mount.dir mount.shared1 mount.shared2
The result is different now - look closely at mount.shared1 directory listing.
Now it does show files 'a' and 'b':
mount: mount('mount.shared1','mount.shared1','(null)',0x00009000,'(null)'):0
mount: mount('','mount.shared1','',0x0010c000,''):0
mount: mount('mount.shared2','mount.shared2','(null)',0x00009000,'(null)'):0
mount: mount('','mount.shared2','',0x00104000,''):0
mount: mount('mount.shared2','mount.shared1','(null)',0x00009000,'(null)'):0
mount: mount('mount.dir','mount.shared2','(null)',0x00009000,'(null)'):0
mount.dir:
a
b
mount.shared1:
a
b
mount.shared2:
a
b
The analysis shows that MS_SILENT flag which is ON by default in any
busybox-> mount operations cames to flags_to_propagation_type function and
causes the error return while is_power_of_2 checking because the function
expects only one bit set. This doesn't allow to do busybox->mount with
any --make-[r]shared, --make-[r]private etc options.
Moreover, the recently added flags_to_propagation_type() function doesn't
allow us to do such operations as --make-[r]private --make-[r]shared etc.
when MS_SILENT is on. The idea or clearing the MS_SILENT flag came from
to Denys Vlasenko.
Signed-off-by: Roman Borisov <ext-roman.borisov@nokia.com>
Reported-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Alexander Shishkin <virtuoso@slind.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit 990d6c2d7a ("vfs: Add name to file
handle conversion support") changed EXPORTFS to be a bool.
This was needed for earlier revisions of the original patch, but the actual
commit put the code needing it into its own file that only gets compiled
when FHANDLE is selected which in turn selects EXPORTFS.
So EXPORTFS can be safely compiled as a module when not selecting FHANDLE.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
new helper: complete_walk(). Done on successful completion
of walk, drops out of RCU mode, does d_revalidate of final
result if that hadn't been done already.
handle_reval_dot() and nameidata_drop_rcu_last() subsumed into
that one; callers converted to use of complete_walk().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
commit 4b06042(bitmap, irq: add smp_affinity_list interface to
/proc/irq) causes the following warning:
[ 274.239500] WARNING: at fs/proc/generic.c:850 remove_proc_entry+0x24c/0x27a()
[ 274.251761] remove_proc_entry: removing non-empty directory 'irq/184',
leaking at least 'smp_affinity_list'
Remove the new file in the exit path.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/4DDDE094.6050505@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On ARMv7 CPUs that cache first level page table entries (like the
Cortex-A15), using a reserved ASID while changing the TTBR or flushing
the TLB is unsafe.
This is because the CPU may cache the first level entry as the result of
a speculative memory access while the reserved ASID is assigned. After
the process owning the page tables dies, the memory will be reallocated
and may be written with junk values which can be interpreted as global,
valid PTEs by the processor. This will result in the TLB being populated
with bogus global entries.
This patch avoids the use of a reserved context ID in the v7 switch_mm
and ASID rollover code by temporarily using the swapper_pg_dir pointed
at by TTBR1, which contains only global entries that are not tagged
with ASIDs.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch makes TTBR1 point to swapper_pg_dir so that global, kernel
mappings can be used exclusively on v6 and v7 cores where they are
needed.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The v6 and v7 implementations of flush_kern_dcache_area do not align
the passed MVA to the size of a cacheline in the data cache. If a
misaligned address is used, only a subset of the requested area will
be flushed. This has been observed to cause failures in SMP boot where
the secondary_data initialised by the primary CPU is not cacheline
aligned, causing the secondary CPUs to read incorrect values for their
pgd and stack pointers.
This patch ensures that the base address is cacheline aligned before
flushing the d-cache.
Cc: <stable@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Commit 228e548e (net: Add sendmmsg socket system call) added the new
sendmmsg syscall. Add this to the syscall table for ARM.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
My existing email address may stop working in a month or two, so update
email to one that will continue working.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Hotplug support was added in 9f1890a (msm: hotplug: support cpu hotplug
on msm, 2010-12-02)
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ST-Ericsson modified ARM PrimeCell PL180 block has not got
an updated corresponding amba-id, althought the IP block has
changed in db8500v2. The change was done to the datactrl register.
Using the overrided subversion ID, account for this.
Signed-off-by: Philippe Langlais <philippe.langlais@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The DB8500v2 and DB5500 has a fifth version of the "PL023" and
PL180 blocks. However the ASIC engineers have forgot to bump the
revision in the PrimeCell peripheral ID registers. Since the
platform is aware of the actual silicon revision we need to
hard-code the periphid from the platform, bumping the subrevision
field to 1.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This makes a hardcoded periphid from the platform override any
magic number found in the hardware. This shall henceforth be used
when the information found in the hardware is either missing,
i.e. not encoding the CID with the magic cookie 0xb105f00d, or
incorrect such that the revision number should have been bumped in
hardware, but the silicon designer has failed to do so.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This is redundant. The correct ID number is right there in the
hardware anyway. We will introduce a mechanism later to hard-code
this for deviant cells.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Drivers which make use of the FIQ interrupt may require the state
of the FIQ mode registers to be preserved across suspend/resume.
Because the FIQ mode registers are not saved and restored
automatically by the kernel, driver authors will need to do the
appropriate save/restore in their own driver suspend/resume
handlers.
Implementing global automatic save/restore of the FIQ state does
not appear appropriate, since this by itself is not sufficient for
FIQ-based drivers to function correctly across suspend/resume in
any case.
This patch adds a brief explanatory note to fiq.h documenting the
requirement placed on driver authors.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* To remove the risk of inconvenient register allocation decisions
by the compiler, these functions are separated out as pure
assembler.
* The apcs frame manipulation code is not applicable for Thumb-2
(and also not easily compatible). Since it's not essential to
have a full frame on these leaf assembler functions, the frame
manipulation is removed, in the interests of simplicity.
* Split up ldm/stm instructions to be compatible with Thumb-2,
as well as avoiding instruction forms deprecated on >= ARMv7.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
sanity_check_meminfo walks over the registered memory banks and attempts
to split banks across lowmem and highmem when they would otherwise
overlap with the vmalloc space.
When SPARSEMEM is used, there are two potential problems that occur
when the virtual address of the start of a bank is equal to vmalloc_min.
1.) The end of lowmem is calculated as __pa(vmalloc_min - 1) + 1.
In the above scenario, this will give the end address of the
previous bank, rather than the actual bank we are interested in.
This value is later used as the memblock limit and artificially
restricts the total amount of available memory.
2.) The checks to determine whether or not a bank belongs to highmem
or not only check if __va(bank->start) is greater or less than
vmalloc_min. In the case that it is equal, the bank is incorrectly
treated as lowmem, which hoses the vmalloc area.
This patch fixes these two problems by checking whether the virtual
start address of a bank is >= vmalloc_min and then calculating
lowmem_end by finding the virtual end address of the highest lowmem
bank.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In commit eb33575c ("[ARM] Double check memmap is actually valid with a
memmap has unexpected holes V2"), a new function, memmap_valid_within,
was introduced to mmzone.h so that holes in the memmap which pass
pfn_valid in SPARSEMEM configurations can be detected and avoided.
The fix to this problem checks that the pfn <-> page linkages are
correct by calculating the page for the pfn and then checking that
page_to_pfn on that page returns the original pfn. Unfortunately, in
SPARSEMEM configurations, this results in reading from the page flags to
determine the correct section. Since the memmap here has been freed,
junk is read from memory and the check is no longer robust.
In the best case, reading from /proc/pagetypeinfo will give you the
wrong answer. In the worst case, you get SEGVs, Kernel OOPses and hung
CPUs. Furthermore, ioremap implementations that use pfn_valid to
disallow the remapping of normal memory will break.
This patch allows architectures to provide their own pfn_valid function
instead of using the default implementation used by sparsemem. The
architecture-specific version is aware of the memmap state and will
return false when passed a pfn for a freed page within a valid section.
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Michal Marek <mmarek@suse.cz>
The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.
Cc: Armin Schindler <mac@melware.de>
Cc: netdev@vger.kernel.org
Signed-off-by: Michal Marek <mmarek@suse.cz>
Add ZONE_DMA to 31-bit config again. The performance gain is minimal
and hardly anybody cares anymore about a 31-bit kernel.
So add ZONE_DMA again to help with SLAB_CACHE_DMA removal for
!CONFIG_ZONE_DMA configurations.
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
If e.g. copy_from_user() generates a page fault and the kernel runs
into an OOM situation the system might lock up.
If the OOM killer sends a SIG_KILL to the current process it can't
handle it since it is stuck in a copy_from_user() - page fault loop.
Fix this by adding the same fix as other architectures have.
E.g. the x86 variant f86268 "x86/mm: Handle mm_fault_error() in kernel
space"
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>