The modifications include:
1. Kconfig of Score: we don't support ioremap
2. Missed headfile including
3. There are some errors in other people's commit not checked by us, we fix it now
3.1 arch/score/kernel/entry.S: wrong instructions
3.2 arch/score/kernel/process.c : just some typos
Signed-off-by: Lennox Wu <lennox.wu@gmail.com>
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
* It was possible to use an uninitialized buffer when reading
kernel modules information and checking if the file was a
/proc/sys/kernel/kptr_restrict'ed one, fix for this from
Adrian Hunter.
* The libbfd demangler doesn't handle cloned functions (e.g. symbol.clone.NUM),
feed it unsuffixed symbol names, workaround from Andi Kleen.
* Fix segfault in 'perf trace' when processing perf.data files with PERF_RECORD_MMAP2
records, recently added but not handled in this tool, from David Ahern.
* Fix libdl related build in old systems like Fedora 12, from David Ahern.
* Make 'perf kmem' work again on non NUMA machines, fix from Jiri Olsa.
* Fix probing symbols with optimization suffix in 'perf probe' where some
operations that are entirely user level and involves vmlinux/DWARF were working
but when the symbol name was fed to the kprobes tracer, the in kernel code
would use /proc/kallsyms where the name had the suffix, from Masami Hiramatsu.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The ASID is represented as an unsigned int in mm_context_t and we
currently use the mmid assembler macro to access this element of the
struct. This should be accessed with a register of 32-bit width. If
the incorrect register width is used the ASID will be returned in
bits[32:63] of the register when running under big-endian.
Fix a use of the mmid macro in tlb.S to use a 32-bit access.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Matthew Leach <matthew.leach@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
get_user() is defined as a function macro in arm64, and trace_get_user()
calls it as followed:
get_user(ch, ptr++);
Since the second parameter occurs twice in the definition, 'ptr++' is
unexpectedly evaluated twice and trace_get_user() will generate a bogus
string from user-provided one. As a result, some ftrace sysfs operations,
like "echo FUNCNAME > set_ftrace_filter," hit this case and eventually fail.
This patch fixes the issue both in get_user() and put_user().
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[catalin.marinas@arm.com: added __user type annotation and s/optr/__p/]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fix perf probe to probe on some symbols which have some optimzation
suffixes, e.g. ".part", ".isra", and ".constprop".
To fix this issue, instead of using the DIE name, perf probe uses the
symbol name found by dwfl_module_addrsym().
This also involves a perf probe --vars operation update which now shows
the symbol name instead of the DIE name.
Without this patch, putting a probe on an inlined function which was
compiled with a suffixed symbol will fail like this:
$ perf probe -v getname_flags
probe-definition(0): getname_flags
symbol:getname_flags file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (6 entries long)
Using /lib/modules/3.11.0+/build/vmlinux for symbols
found inline addr: 0xffffffff8119bb70
Probe point found: getname_flags+0
found inline addr: 0xffffffff8119bcb6
Probe point found: getname+6
found inline addr: 0xffffffff811a06a6
Probe point found: user_path_at_empty+6
find 3 probe_trace_events.
Opening /sys/kernel/debug//tracing/kprobe_events write=1
Added new events:
Writing event: p:probe/getname_flags getname_flags+0
Failed to write event: No such file or directory
Error: Failed to add events. (-1)
Because the debuginfo knows only the original (non suffix) symbol name,
it uses the original symbol for probe address but the kernel (kallsyms)
knows only suffixed symbol. Then, the kernel rejects that original
symbol.
This patch uses dwfl_module_addrsym() to get the correct (suffixed)
symbol from symtab when a probe point is found.
Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130925131616.31632.46658.stgit@udc4-manage.rcp.hitachi.co.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The check cpu_needs_post_dma_flush() in mips_dma_sync_sg_for_cpu() and
the check !plat_device_is_coherent() in mips_dma_sync_sg_for_device()
can be moved outside the for loop.
As a side effect, this also avoids a GCC bug that caused kernel compile
to fail with the error:
arch/mips/mm/dma-default.c: In function 'mips_dma_sync_sg_for_cpu':
arch/mips/mm/dma-default.c:316:1: internal compiler error: in add_insn_before, at emit-rtl.c:3852
This gcc failure is seen in Code Sourcery toolchains [e.g. gcc version
4.7.2 (Sourcery CodeBench Lite 2012.09-99)] after commit "MIPS: Optimize
current_cpu_type() for better code."
Signed-off-by: Jayachandran C <jchandra@broadcom.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5907/
Reviewed-by: Markos Chandras <markos.chandras@imgtec.com>
Tested-by: Markos Chandras <markos.chandras@imgtec.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Which disables in the ticketlock slowpath the Xen PV optimization's.
Useful for diagnosing issues and comparing benchmarks in
over-commit CPU scenarios.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The old rcu_is_cpu_idle() function is just __rcu_is_watching() with
preemption disabled. This commit therefore renames rcu_is_cpu_idle()
to rcu_is_watching.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Commit e6b80a3b (rcu: Detect illegal rcu dereference in extended
quiescent state) exported the pre-existing rcu_is_cpu_idle() function
using EXPORT_SYMBOL(). However, this is inconsistent with the remaining
exports from RCU, which are all EXPORT_SYMBOL_GPL(). The current state
of affairs means that a non-GPL module could use rcu_is_cpu_idle(),
but in a CONFIG_TREE_PREEMPT_RCU=y kernel would be unable to invoke
rcu_read_lock() and rcu_read_unlock().
This commit therefore makes rcu_is_cpu_idle()'s export be consistent
with the rest of RCU, namely EXPORT_SYMBOL_GPL().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
There is currently no way for kernel code to determine whether it
is safe to enter an RCU read-side critical section, in other words,
whether or not RCU is paying attention to the currently running CPU.
Given the large and increasing quantity of code shared by the idle loop
and non-idle code, the this shortcoming is becoming increasingly painful.
This commit therefore adds __rcu_is_watching(), which returns true if
it is safe to enter an RCU read-side critical section on the currently
running CPU. This function is quite fast, using only a __this_cpu_read().
However, the caller must disable preemption.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
If a non-lazy callback arrives on a CPU that has previously gone idle
with no non-lazy callbacks, invoke_rcu_core() forces the RCU core to
run. However, it does not update the conditions, which could result
in several closely spaced invocations of the RCU core, which in turn
could result in an excessively high context-switch rate and resulting
high overhead.
This commit therefore updates the ->all_lazy and ->nonlazy_posted_snap
fields to prevent closely spaced invocations.
Reported-by: Tibor Billes <tbilles@gmx.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Tibor Billes <tbilles@gmx.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The rcu_try_advance_all_cbs() function is invoked on each attempted
entry to and every exit from idle. If this function determines that
there are callbacks ready to invoke, the caller will invoke the RCU
core, which in turn will result in a pair of context switches. If a
CPU enters and exits idle extremely frequently, this can result in
an excessive number of context switches and high CPU overhead.
This commit therefore causes rcu_try_advance_all_cbs() to throttle
itself, refusing to do work more than once per jiffy.
Reported-by: Tibor Billes <tbilles@gmx.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Tibor Billes <tbilles@gmx.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The rcu_try_advance_all_cbs() function returns a bool saying whether or
not there are callbacks ready to invoke, but rcu_cleanup_after_idle()
rechecks this regardless. This commit therefore uses the value returned
by rcu_try_advance_all_cbs() instead of making rcu_cleanup_after_idle()
do this recheck.
Reported-by: Tibor Billes <tbilles@gmx.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Tibor Billes <tbilles@gmx.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
On hosts with more than 168 GB of memory, a 32-bit guest may attempt
to grant map an MFN that is error cannot lookup in its mapping of the
m2p table. There is an m2p lookup as part of m2p_add_override() and
m2p_remove_override(). The lookup falls off the end of the mapped
portion of the m2p and (because the mapping is at the highest virtual
address) wraps around and the lookup causes a fault on what appears to
be a user space address.
do_page_fault() (thinking it's a fault to a userspace address), tries
to lock mm->mmap_sem. If the gntdev device is used for the grant map,
m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
already locked. do_page_fault() then deadlocks.
The deadlock would most commonly occur when a 64-bit guest is started
and xenconsoled attempts to grant map its console ring.
Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
mapped portion of the m2p table before accessing the table and use
this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
(which already had the correct range check).
All faults caused by accessing the non-existant parts of the m2p are
thus within the kernel address space and exception_fixup() is called
without trying to lock mm->mmap_sem.
This means that for MFNs that are outside the mapped range of the m2p
then mfn_to_pfn() will always look in the m2p overrides. This is
correct because it must be a foreign MFN (and the PFN in the m2p in
this case is only relevant for the other domain).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
--
v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
range as it's probably foreign.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Bit 12 is undefined in any of the following cases:
- If the "NMI exiting" VM-execution control is 1 and the "virtual NMIs"
VM-execution control is 0.
- If the VM exit sets the valid bit in the IDT-vectoring information field
Signed-off-by: Gleb Natapov <gleb@redhat.com>
[Add parentheses around & within && - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Starting secondary CPUs early on from Open Firmware and placing them
in a holding spin loop slows down the boot process significantly under
some hypervisors such as KVM.
This is also unnecessary when RTAS supports querying the CPU state
So let's not do it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This makes the "OF" zImage wrapper (zImage.pseries, zImage.pmac,
zImage.maple) work if booted via a flat device-tree (ePAPR boot
mode), and thus potentially usable with kexec.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We've been keeping that field in thread_struct for a while, it contains
the "limit" of the current stack pointer and is meant to be used for
detecting stack overflows.
It has a few problems however:
- First, it was never actually *used* on 64-bit. Set and updated but
not actually exploited
- When switching stack to/from irq and softirq stacks, it's update
is racy unless we hard disable interrupts, which is costly. This
is fine on 32-bit as we don't soft-disable there but not on 64-bit.
Thus rather than fixing 2 in order to implement 1 in some hypothetical
future, let's remove the code completely from 64-bit. In order to avoid
a clutter of ifdef's, we remove the updates from C code completely
during interrupt stack switching, and instead maintain it from the
asm helper that is used to do the stack switching in the first place.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nowadays, irq_exit() calls __do_softirq() pretty much directly
instead of calling do_softirq() which switches to the decicated
softirq stack.
This has lead to observed stack overflows on powerpc since we call
irq_enter() and irq_exit() outside of the scope that switches to
the irq stack.
This fixes it by moving the stack switching up a level, making
irq_enter() and irq_exit() run off the irq stack.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel. Dave Jones reports:
"My fuzz tester keeps hitting this. Every instance shows the non-irq
stack came in from mlockall. I'm only seeing this on one box, but
that has more ram (8gb) than my other machines, which might explain
it.
INFO: rcu_preempt self-detected stall on CPU { 3} (t=6500 jiffies g=470344 c=470343 q=0)
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #32
Call Trace:
lru_add_drain_all+0x15/0x20
SyS_mlockall+0xa5/0x1a0
tracesys+0xdd/0xe2"
This commit addresses this problem by inserting the required preemption
point.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Much of of_irq.h is needlessly ifdef'ed. Clean this up and minimize the
amount ifdef'ed code. This fixes some build warnings when CONFIG_OF
is not enabled (seen on i386 and x86_64):
include/linux/of_irq.h:82:7: warning: 'struct device_node' declared inside parameter list [enabled by default]
include/linux/of_irq.h:82:7: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
include/linux/of_irq.h:87:47: warning: 'struct device_node' declared inside parameter list [enabled by default]
Compile tested on i386, sparc and arm.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Clean-up some copy/paste declarations that are not necessary. All the
functions either don't exist or are already declared in other headers.
This is needed in preparation of of_irq.h clean-up.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: linux@lists.openrisc.net
If 'dvfs_info' is NULL (due to devm_kzalloc failure) the failure
error message would try to dereference it. Use 'pdev' instead.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq_get() can be called from external drivers which might not be aware if
cpufreq driver is registered or not. And so we should actually check if cpufreq
driver is registered or not and also if cpufreq is active or disabled, at the
beginning of cpufreq_get().
Otherwise call to lock_policy_rwsem_read() might hit BUG_ON(!policy).
Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the hw supports intel_pstate and acpi_cpufreq, intel_pstate will
get loaded first.
acpi_cpufreq_init() will call acpi_cpufreq_early_init()
and that will allocate perf data and init those perf data in ACPI core,
(that will cover all CPUs). But later it will free them as
cpufreq_register_driver(acpi_cpufreq) will fail as intel_pstate is
already registered
Use cpufreq_get_current_driver() to check if we can skip the
acpi_cpufreq loading.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch fixes the issues indicated by the test results that
ipmi_msg_handler() is invoked in atomic context.
BUG: scheduling while atomic: kipmi0/18933/0x10000100
Modules linked in: ipmi_si acpi_ipmi ...
CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2
Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010
ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8
ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4
0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8
Call Trace:
<IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b
[<ffffffff814bfbab>] __schedule_bug+0x46/0x54
[<ffffffff814c73e0>] __schedule+0x83/0x59c
[<ffffffff81058853>] __cond_resched+0x22/0x2d
[<ffffffff814c794b>] _cond_resched+0x14/0x1d
[<ffffffff814c6d82>] mutex_lock+0x11/0x32
[<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58
[<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si]
[<ffffffff812bf6e4>] deliver_response+0x55/0x5a
[<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65
[<ffffffff81007ad1>] ? read_tsc+0x9/0x19
[<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc
[<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si]
...
Also Tony Camuso says:
We were getting occasional "Scheduling while atomic" call traces
during boot on some systems. Problem was first seen on a Cisco C210
but we were able to reproduce it on a Cisco c220m3. Setting
CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around
tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device.
=================================
[ INFO: inconsistent lock state ]
2.6.32-415.el6.x86_64-debug-splck #1
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes:
(&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570
[<ffffffff810bb0f4>] lock_acquire+0xa4/0x120
[<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400
[<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60
[<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234
[<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be
The fix implemented by this change has been tested by Tony:
Tested the patch in a boot loop with lockdep debug enabled and never
saw the problem in over 400 reboots.
Reported-and-tested-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge fixes from Andrew Morton:
"Bunch of fixes.
And a reversion of mhocko's "Soft limit rework" patch series. This is
actually your fault for opening the merge window when I was off racing ;)
I didn't read the email thread before sending everything off.
Johannes Weiner raised significant issues:
http://www.spinics.net/lists/cgroups/msg08813.html
and we agreed to back it all out"
I clearly need to be more aware of Andrew's racing schedule.
* akpm:
MAINTAINERS: update mach-bcm related email address
checkpatch: make extern in .h prototypes quieter
cciss: fix info leak in cciss_ioctl32_passthru()
cpqarray: fix info leak in ida_locked_ioctl()
kernel/reboot.c: re-enable the function of variable reboot_default
audit: fix endless wait in audit_log_start()
revert "memcg, vmscan: integrate soft reclaim tighter with zone shrinking code"
revert "memcg: get rid of soft-limit tree infrastructure"
revert "vmscan, memcg: do softlimit reclaim also for targeted reclaim"
revert "memcg: enhance memcg iterator to support predicates"
revert "memcg: track children in soft limit excess to improve soft limit"
revert "memcg, vmscan: do not attempt soft limit reclaim if it would not scan anything"
revert "memcg: track all children over limit in the root"
revert "memcg, vmscan: do not fall into reclaim-all pass too quickly"
fs/ocfs2/super.c: use a bigger nodestr in ocfs2_dismount_volume
watchdog: update watchdog_thresh properly
watchdog: update watchdog attributes atomically
The arg64 struct has a hole after ->buf_size which isn't cleared. Or if
any of the calls to copy_from_user() fail then that would cause an
information leak as well.
This was assigned CVE-2013-2147.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1b3a5d02ee ("reboot: move arch/x86 reboot= handling to generic
kernel") did some cleanup for reboot= command line, but it made the
reboot_default inoperative.
The default value of variable reboot_default should be 1, and if command
line reboot= is not set, system will use the default reboot mode.
[akpm@linux-foundation.org: fix comment layout]
Signed-off-by: Li Fei <fei.li@intel.com>
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Robin Holt <robinmholt@linux.com>
Cc: <stable@vger.kernel.org> [3.11.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit 3b38722efd ("memcg, vmscan: integrate soft reclaim
tighter with zone shrinking code")
I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit e883110aad ("memcg: get rid of soft-limit tree
infrastructure")
I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit a5b7c87f92 ("vmscan, memcg: do softlimit reclaim also
for targeted reclaim")
I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit de57780dc6 ("memcg: enhance memcg iterator to support
predicates")
I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit 7d910c054b ("memcg: track children in soft limit excess
to improve soft limit")
I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit e839b6a1c8 ("memcg, vmscan: do not attempt soft limit
reclaim if it would not scan anything")
I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit 1be171d60b ("memcg: track all children over limit in the
root")
I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit e975de998b ("memcg, vmscan: do not fall into reclaim-all
pass too quickly")
I merged this prematurely - Michal and Johannes still disagree about the
overall design direction and the future remains unclear.
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While printing 32-bit node numbers, an 8-byte string is not enough.
Increase the size of the string to 12 chars.
This got left out in commit 49fa8140e4 ("fs/ocfs2/super.c: Use bigger
nodestr to accomodate 32-bit node numbers").
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
watchdog_tresh controls how often nmi perf event counter checks per-cpu
hrtimer_interrupts counter and blows up if the counter hasn't changed
since the last check. The counter is updated by per-cpu
watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh
period which guarantees that hrtimer is scheduled 2 times per the main
period. Both hrtimer and perf event are started together when the
watchdog is enabled.
So far so good. But...
But what happens when watchdog_thresh is updated from sysctl handler?
proc_dowatchdog will set a new sampling period and hrtimer callback
(watchdog_timer_fn) will use the new value in the next round. The
problem, however, is that nobody tells the perf event that the sampling
period has changed so it is ticking with the period configured when it
has been set up.
This might result in an ear ripping dissonance between perf and hrtimer
parts if the watchdog_thresh is increased. And even worse it might lead
to KABOOM if the watchdog is configured to panic on such a spurious
lockup.
This patch fixes the issue by updating both nmi perf even counter and
hrtimers if the threshold value has changed.
The nmi one is disabled and then reinitialized from scratch. This has
an unpleasant side effect that the allocation of the new event might
fail theoretically so the hard lockup detector would be disabled for
such cpus. On the other hand such a memory allocation failure is very
unlikely because the original event is deallocated right before.
It would be much nicer if we just changed perf event period but there
doesn't seem to be any API to do that right now. It is also unfortunate
that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we
cannot use on_each_cpu() and do the same thing from the per-cpu context.
The update from the current CPU should be safe because
perf_event_disable removes the event atomically before it clears the
per-cpu watchdog_ev so it cannot change anything under running handler
feet.
The hrtimer is simply restarted (thanks to Don Zickus who has pointed
this out) if it is queued because we cannot rely it will fire&adopt to
the new sampling period before a new nmi event triggers (when the
treshold is decreased).
[akpm@linux-foundation.org: the UP version of __smp_call_function_single ended up in the wrong place]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>