Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302
hugetlbfs reserves huge pages but does not fault them at mmap() time to
ensure that future faults succeed. The reservation behaviour differs
depending on whether the mapping was mapped MAP_SHARED or MAP_PRIVATE.
For MAP_SHARED mappings, hugepages are reserved when mmap() is first
called and are tracked based on information associated with the inode.
Other processes mapping MAP_SHARED use the same reservation. MAP_PRIVATE
track the reservations based on the VMA created as part of the mmap()
operation. Each process mapping MAP_PRIVATE must make its own
reservation.
hugetlbfs currently checks if a VMA is MAP_SHARED with the VM_SHARED flag
and not VM_MAYSHARE. For file-backed mappings, such as hugetlbfs,
VM_SHARED is set only if the mapping is MAP_SHARED and the file was opened
read-write. If a shared memory mapping was mapped shared-read-write for
populating of data and mapped shared-read-only by other processes, then
hugetlbfs would account for the mapping as if it was MAP_PRIVATE. This
causes processes to fail to map the file MAP_SHARED even though it should
succeed as the reservation is there.
This patch alters mm/hugetlb.c and replaces VM_SHARED with VM_MAYSHARE
when the intent of the code was to check whether the VMA was mapped
MAP_SHARED or MAP_PRIVATE.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <starlight@binnacle.cx>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302
On x86 and x86-64, it is possible that page tables are shared beween
shared mappings backed by hugetlbfs. As part of this,
page_table_shareable() checks a pair of vma->vm_flags and they must match
if they are to be shared. All VMA flags are taken into account, including
VM_LOCKED.
The problem is that VM_LOCKED is cleared on fork(). When a process with a
shared memory segment forks() to exec() a helper, there will be shared
VMAs with different flags. The impact is that the shared segment is
sometimes considered shareable and other times not, depending on what
process is checking.
What happens is that the segment page tables are being shared but the
count is inaccurate depending on the ordering of events. As the page
tables are freed with put_page(), bad pmd's are found when some of the
children exit. The hugepage counters also get corrupted and the Total and
Free count will no longer match even when all the hugepage-backed regions
are freed. This requires a reboot of the machine to "fix".
This patch addresses the problem by comparing all flags except VM_LOCKED
when deciding if pagetables should be shared or not for hugetlbfs-backed
mapping.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <starlight@binnacle.cx>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove wrong fifo size definition for some AT91 products.
Due to a misunderstanding of some AT91 datasheets, a fifo size of 2048
(words) has been introduced by mistake. In fact, all products (AT91/AT32)
are sharing the same fifo size of 512 words.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Andrew Victor <avictor.za@gmail.com>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/serial/8250_gsc.c:44: warning: format '%lx' expects type
'long unsigned int', but argument 2 has type 'resource_size_t'
[akpm@linux-foundation.org: fix it to handle u64's]
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/parport/parport_gsc.c:356: warning: format '%lx' expects type
'long unsigned int', but argument 2 has type 'resource_size_t'
[akpm@linux-foundation.org: fix it to handle u64's]
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The flat loader uses an architecture's flat_stack_align() to align the
stack but assumes word-alignment is enough for the data sections.
However, on the Xtensa S6000 we have registers up to 128bit width
which can be used from userspace and therefor need userspace stack and
data-section alignment of at least this size.
This patch drops flat_stack_align() and uses the same alignment that
is required for slab caches, ARCH_SLAB_MINALIGN, or wordsize if it's
not defined by the architecture.
It also fixes m32r which was obviously kaput, aligning an
uninitialized stack entry instead of the stack pointer.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oskar Schirmer <os@emlix.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <cooloney@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Johannes Weiner <jw@emlix.com>
Acked-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mapping->tree_lock can be acquired from interrupt context. Then,
following dead lock can occur.
Assume "A" as a page.
CPU0:
lock_page_cgroup(A)
interrupted
-> take mapping->tree_lock.
CPU1:
take mapping->tree_lock
-> lock_page_cgroup(A)
This patch tries to fix above deadlock by moving memcg's hook to out of
mapping->tree_lock. charge/uncharge of pagecache/swapcache is protected
by page lock, not tree_lock.
After this patch, lock_page_cgroup() is not called under mapping->tree_lock.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_pident_instantiate() has following call flow.
proc_pident_lookup()
proc_pident_instantiate()
proc_pid_make_inode()
And, proc_pident_lookup() has following error handling.
const struct pid_entry *p, *last;
error = ERR_PTR(-ENOENT);
if (!task)
goto out_no_task;
Then, proc_pident_instantiate should return ENOENT too when racing against
exit(2) occur.
EINAL has two bad reason.
- it implies caller is wrong. bad the race isn't caller's mistake.
- man 2 open don't explain EINVAL. user often don't handle it.
Note: Other proc_pid_make_inode() caller already use ENOENT properly.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When /proc/sys/vm/oom_dump_tasks is enabled, it is possible to get a NULL
pointer for tasks that have detached mm's since task_lock() is not held
during the tasklist scan. Add the task_lock().
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the local_irq_save() etc.. in routines that are smp function
calls, or have IRQs disabled by other means.
Then change the COMM, MMAP, and swcounter context iteration to
current->perf_counter_ctxp and RCU, since it really doesn't matter
which context they iterate, they're all folded.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit a63eaf34ae ("perf_counter: Dynamically allocate tasks'
perf_counter_context struct") broke COMM and MMAP notification for
cpu wide counters by dropping out early if there was no task context,
thereby also not iterating the cpu context.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This fixes a nasty crash and highlights a bug that we were
freeing failed-fork() counters incorrectly.
(the fix for that will come separately)
[ Impact: fix crashes/lockups with inherited counters ]
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Erase errors such as:
"Newly-erased block contained word 0xa4ef223e at offset 0x0296a014"
and failure to write the clean marker,
moves the offending erase block to erasing list before calling
jffs2_erase_failed(). This is bad as jffs2_erase_failed() will
also move the block to the bad_list, but is now moving the
wrong block, causing FS corruption.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The following patch fixes:
- re-initialization of host->col_addr which is used as byte index
between the successive READID flash commands.
- compile error when CONFIG_PM is enabled
- pass on the error code from clk_get()
- return -ENOMEM in case of failed ioremap()
- pass on the return value of platform_driver_probe() directly
- remove excessive printk
- let command line partition table parsing with mxc_nand name.
The cmd_line parsing is done via <mtd-id> name that differs
from mxc_nand by default and looks like "NAND 256MiB 1,8V 8-bit"
Signed-off-by: Vladimir Barinov <vbarinov@embeddedalley.com>
Signed-off-by: Lothar Wassmann <LW@KARO-electronics.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Peter Zijlstra pointed out that under some circumstances, we can take
the mutex in a context or a counter and then swap that context or
counter to another task, potentially leading to lock order inversions
or the mutexes not protecting what they are supposed to protect.
This fixes the problem by making sure that we never take a mutex in a
context or counter which could get swapped to another task. Most of
the cases where we take a mutex is on a top-level counter or context,
i.e. a counter which has an fd associated with it or a context that
contains such a counter. This adds WARN_ON_ONCE statements to verify
that.
The two cases where we need to take the mutex on a context that is a
clone of another are in perf_counter_exit_task and
perf_counter_init_task. The perf_counter_exit_task case is solved by
uncloning the context before starting to remove the counters from it.
The perf_counter_init_task is a little trickier; we temporarily
disable context swapping for the parent (forking) task by setting its
ctx->parent_gen to the all-1s value after locking the context, if it
is a cloned context, and restore the ctx->parent_gen value at the end
if the context didn't get uncloned in the meantime.
This also moves the increment of the context generation count to be
within the same critical section, protected by the context mutex, that
adds the new counter to the context. That way, taking the mutex is
sufficient to ensure that both the counter list and the generation
count are stable.
[ Impact: fix hangs, races with inherited and PID counters ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18975.31580.520676.619896@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now one has just to use dso__load_kernel() optionally passing a vmlinux
filename.
Will make things easier for perf top that will want to pass a callback
to filter some symbols.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We also fix a problem with cleaning up properly when initializing
drivers and devices, so checks like this will work successfully.
Portions of the patch by Linus and Greg and Ingo.
Reported-by: Ozan Çağlayan <ozan@pardus.org.tr>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit 26e1287594.
A larger patch (f7e7aa585) a few days after this one added the same line
to the Makefile, but in a different place. While it'd be more correct to
revert that one, it's easier to revert this one because this is a
one-liner.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: linux-usb@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1244) fixes a crash in usb-serial that occurs when a
sub-driver returns a positive value from its attach method, indicating
that new firmware was loaded and the device will disconnect and
reconnect. The usb-serial core then skips the step of registering the
port devices; when the disconnect occurs, the attempt to unregister
the ports fails dramatically.
This problem shows up with Keyspan devices and it might affect others
as well.
When the attach method returns a positive value, the patch sets
num_ports to 0. This tells usb_serial_disconnect() not to try
unregistering any of the ports; instead they are cleaned up by
destroy_serial().
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The option driver (and presumably others) allocates several URBs when it
opens and tries to free them when it closes. The isp1760_urb_dequeue
function gets called, but the packet being dequeued is not necessarily at
the
front of one of the 32 queues. If not, the isp1760_urb_done function doesn't
get called for the URB and the process trying to free it hangs forever on a
wait_queue. This patch does two things. If the URB being dequeued has others
queued behind it, it re-queues them. And it searches the queues looking for
the URB being dequeued rather than just looking at the one at the front of
the queue.
[bigeasy@linutronix] whitespace fixes, reformating
Cc: stable <stable@kernel.org>
Signed-off-by: Warren Free <wfree@ipmn.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds another quirky Conexant USB Modem Clone to usb cdc-acm.c
Signed-off-by: Xiao Kaijian <xiaokj@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
usbtest #14 was failing with "udc: ep0: TXCOMP: Invalid endpoint state 2, halting endpoint..."
This occured since ep0 is bidirectional and ep->is_in is not valid (must always use ep->state)
Signed-off-by: Martin Fuzzey <mfuzzey@gmail.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sometimes devices send us their responses in time but due to
unfortunate scheduling decisions the receiving thread does not
get scheduled till much later and we erroneously decide that
device timed out. Work around this problem by checking whether we
received the data we needed instead of checking timeout
condition.
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Commit 564c2b21 ("perf_counter: Optimize context switch between
identical inherited contexts") introduced a race where it is possible
that a counter being attached to a task could get attached to the
wrong task, if the task is one that has inherited its context from
another task via fork. This happens because the optimized context
switch could switch the context to another task after find_get_context
has read task->perf_counter_ctxp. In fact, it's possible that the
context could then get freed, if the other task then exits.
This fixes the problem by protecting both the context switch and the
critical code in find_get_context with spinlocks. The context switch
locks the cxt->lock of both the outgoing and incoming contexts before
swapping them. That means that once code such as find_get_context
has obtained the spinlock for the context associated with a task,
the context can't get swapped to another task. However, the context
may have been swapped in the interval between reading
task->perf_counter_ctxp and getting the lock, so it is necessary to
check and retry.
To make sure that none of the contexts being looked at in
find_get_context can get freed, this changes the context freeing code
to use RCU. Thus an rcu_read_lock() is sufficient to ensure that no
contexts can get freed. This part of the patch is lifted from a patch
posted by Peter Zijlstra.
This also adds a check to make sure that we can't add a counter to a
task that is exiting.
There is also a race between perf_counter_exit_task and
find_get_context; this solves the race by moving the get_ctx that
was in perf_counter_alloc into the locked region in find_get_context,
so that once find_get_context has got the context for a task, it
won't get freed even if the task calls perf_counter_exit_task. It
doesn't matter if new top-level (non-inherited) counters get attached
to the context after perf_counter_exit_task has detached the context
from the task. They will just stay there and never get scheduled in
until the counters' fds get closed, and then perf_release will remove
them from the context and eventually free the context.
With this, we are now doing the unclone in find_get_context rather
than when a counter was added to or removed from a context (actually,
we were missing the unclone_ctx() call when adding a counter to a
context). We don't need to unclone when removing a counter from a
context because we have no way to remove a counter from a cloned
context.
This also takes out the smp_wmb() in find_get_context, which Peter
Zijlstra pointed out was unnecessary because the cmpxchg implies a
full barrier anyway.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <18974.33033.667187.273886@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>