Commit Graph

7369 Commits

Author SHA1 Message Date
Paul Bolle
678301ecad x86, ioapic: Don't warn about non-existing IOAPICs if we have none
mp_find_ioapic() prints errors like:

    ERROR: Unable to locate IOAPIC for GSI 13

if it can't find the IOAPIC that manages that specific GSI. I
see errors like that at every boot of a laptop that apparently
doesn't have any IOAPICs.

But if there are no IOAPICs it doesn't seem to be an error that
none can be found. A solution that gets rid of this message is
to directly return if nr_ioapics (still) is zero. (But keep
returning -1 in that case, so nothing breaks from this change.)

The call chain that generates this error is:

pnpacpi_allocated_resource()
    case ACPI_RESOURCE_TYPE_IRQ:
        pnpacpi_parse_allocated_irqresource()
            acpi_get_override_irq()
                 mp_find_ioapic()

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-15 04:15:04 +01:00
Borislav Petkov
9e81509efc x86, amd: Initialize variable properly
Commit d518573de6 ("x86, amd: Normalize compute unit IDs on
multi-node processors") introduced compute unit normalization
but causes a compiler warning:

 arch/x86/kernel/cpu/amd.c: In function 'amd_detect_cmp':
 arch/x86/kernel/cpu/amd.c:268: warning: 'cores_per_cu' may be used uninitialized in this function
 arch/x86/kernel/cpu/amd.c:268: note: 'cores_per_cu' was declared here

The compiler is right - initialize it with a proper value.

Also, fix up a comment while at it.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20110214171451.GB10076@kryptos.osrc.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-15 03:03:19 +01:00
Feng Tang
6b617e224d x86/platform: Add a wallclock_init func to x86_init.timers ops
Some wall clock devices use MMIO based HW register, this new
function will give them a chance to do some initialization work
before their get/set_time service get called, which is usually
in early kernel boot phase.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-14 18:20:43 +01:00
Ingo Molnar
b366801c95 Merge commit 'v2.6.38-rc4' into x86/numa
Merge reason: Merge latest fixes before applying new patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 13:28:31 +01:00
Yinghai Lu
e5fea868e6 x86: Fix and clean up generic_processor_info()
One of the error printouts in generic_processor_info() prints out
the APIC version instead of the cpu index the warning text describes.

Move version validation down, after we get the right cpu index.

-v2: add comments about reason why we can have cpu=0 there.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D5240A9.4080703@kernel.org>
[ Cleaned up and made the BIOS bug printouts more consistent ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 13:23:51 +01:00
Ingo Molnar
91e04ec058 Merge commit 'v2.6.38-rc4' into x86/cpu
Merge reason: pick up the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 13:18:56 +01:00
Kamal Mostafa
9a6d44b9ad x86: Emit "mem=nopentium ignored" warning when not supported
Emit warning when "mem=nopentium" is specified on any arch other
than x86_32 (the only that arch supports it).

Signed-off-by: Kamal Mostafa <kamal@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/553464
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
2011-02-14 13:15:43 +01:00
Kamal Mostafa
77eed821ac x86: Fix panic when handling "mem={invalid}" param
Avoid removing all of memory and panicing when "mem={invalid}"
is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on
platforms other than x86_32).

Signed-off-by: Kamal Mostafa <kamal@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/553464
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@kernel.org> # .3x: as far back as it applies
LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 13:15:43 +01:00
Shaohua Li
3a09fb4570 x86: Allocate 32 tlb_invalidate_interrupt handler stubs
Add up to 32 invalidate_interrupt handlers. How many handlers are
added depends on NUM_INVALIDATE_TLB_VECTORS. So if
NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code
size.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <1295232725.1949.708.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 13:03:08 +01:00
Borislav Petkov
1c9d16e359 x86: Fix mwait_usable section mismatch
We use it in non __cpuinit code now too so drop marker.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20110211171754.GA21047@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 12:08:28 +01:00
Ingo Molnar
d2137d5af4 Merge branch 'linus' into x86/bootmem
Conflicts:
	arch/x86/mm/numa_64.c

Merge reason: fix the conflict, update to latest -rc and pick up this
              dependent fix from Yinghai:

  e6d2e2b2b1: memblock: don't adjust size in memblock_find_base()

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 11:55:18 +01:00
Thomas Gleixner
5117348dea x86: Readd missing irq_to_desc() in fixup_irq()
commit a3c08e5d(x86: Convert irq_chip access to new functions)
accidentally zapped desc = irq_to_desc(irq); in the vector loop.
So we lock some random irq descriptor.

Add it back.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org> # .37
2011-02-12 11:56:22 +01:00
Peter Zijlstra
d91309f69b x86: Fix text_poke_smp_batch() deadlock
Fix this deadlock - we are already holding the mutex:

=======================================================
[ INFO: possible circular locking dependency detected ] 2.6.38-rc4-test+ #1
-------------------------------------------------------
bash/1850 is trying to acquire lock:
 (text_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f

but task is already holding lock:
 (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (smp_alt){+.+...}:
       [<ffffffff81082d02>] lock_acquire+0xcd/0xf8
       [<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339
       [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
       [<ffffffff8101050f>] alternatives_smp_switch+0x77/0x1d8
       [<ffffffff81926a6f>] do_boot_cpu+0xd7/0x762
       [<ffffffff819277dd>] native_cpu_up+0xe6/0x16a
       [<ffffffff81928e28>] _cpu_up+0x9d/0xee
       [<ffffffff81928f4c>] cpu_up+0xd3/0xe7
       [<ffffffff82268d4b>] kernel_init+0xe8/0x20a
       [<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10

-> #1 (cpu_hotplug.lock){+.+.+.}:
       [<ffffffff81082d02>] lock_acquire+0xcd/0xf8
       [<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339
       [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
       [<ffffffff810568cc>] get_online_cpus+0x41/0x55
       [<ffffffff810a1348>] stop_machine+0x1e/0x3e
       [<ffffffff819314c1>] text_poke_smp_batch+0x3a/0x3c
       [<ffffffff81932b6c>] arch_optimize_kprobes+0x10d/0x11c
       [<ffffffff81933a51>] kprobe_optimizer+0x152/0x222
       [<ffffffff8106bb71>] process_one_work+0x1d3/0x335
       [<ffffffff8106cfae>] worker_thread+0x104/0x1a4
       [<ffffffff810707c4>] kthread+0x9d/0xa5
       [<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10

-> #0 (text_mutex){+.+.+.}:

other info that might help us debug this:

6 locks held by bash/1850:
 #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
 #1:  (s_active#75){.+.+.+}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
 #2:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
 #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
 #4:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
 #5:  (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f

stack backtrace:
Pid: 1850, comm: bash Not tainted 2.6.38-rc4-test+ #1
Call Trace:

 [<ffffffff81080eb2>] print_circular_bug+0xa8/0xb7
 [<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
 [<ffffffff81010302>] alternatives_smp_unlock+0x3d/0x93
 [<ffffffff81010630>] alternatives_smp_switch+0x198/0x1d8
 [<ffffffff8102568a>] native_cpu_die+0x65/0x95
 [<ffffffff818cc4ec>] _cpu_down+0x13e/0x202
 [<ffffffff8117a619>] sysfs_write_file+0x108/0x144
 [<ffffffff8111f5a2>] vfs_write+0xac/0xff
 [<ffffffff8111f7a9>] sys_write+0x4a/0x6e

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: mathieu.desnoyers@efficios.com
Cc: rusty@rustcorp.com.au
Cc: ananth@in.ibm.com
Cc: masami.hiramatsu.pt@hitachi.com
Cc: fweisbec@gmail.com
Cc: jbeulich@novell.com
Cc: jbaron@redhat.com
Cc: mhiramat@redhat.com
LKML-Reference: <1297458466.5226.93.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-12 02:34:34 +01:00
Ingo Molnar
3e86858133 Merge commit 'v2.6.38-rc4' into perf/core
Merge reason: pick up the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-12 02:24:25 +01:00
Paul Bolle
45e8234cad x86: Fix printk typo WARING
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-02-11 15:17:24 +01:00
Jan Beulich
691269f0d9 x86: Adjust section placement in AMD northbridge related code
amd_nb_misc_ids[] can live in .rodata, and enable_pci_io_ecs()
can be moved into .cpuinit.text.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4D525DDD0200007800030F07@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-10 13:32:52 +01:00
Jan Beulich
b82fef82d5 x86: Partly unify asm-offsets_{32,64}.c
Just consolidating the common parts. Full unification would seem
straight forward, but it's not clear the necessary #ifdef-s would
be acceptable.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4D525D520200007800030EE9@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-10 13:31:37 +01:00
Jan Beulich
94d1ac8b55 x86: Reduce back the alignment of the per-CPU data section
This complements commit:

  47f19a0814: percpu: Remove the multi-page alignment facility

reverting one leftover of:

  fe8e0c25ca: x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4D525CE60200007800030EE5@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
2011-02-10 13:31:36 +01:00
Jan Beulich
2fb270f321 x86: Fix section mismatch in LAPIC initialization
Additionally doing things conditionally upon smp_processor_id()
being zero is generally a bad idea, as this means CPU 0 cannot
be offlined and brought back online later again.

While there may be other places where this is done, I think adding
more of those should be avoided so that some day SMP can really
become "symmetrical".

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <4D525C7E0200007800030EE1@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-10 13:26:53 +01:00
Borislav Petkov
44d60c0f5c x86, microcode, AMD: Extend ucode size verification
The different families have a different max size for the ucode patch,
adjust size checking to the family we're running on. Also, do not
vzalloc the max size of the ucode but only the actual size that is
passed on from the firmware loader.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-02-10 12:24:03 +01:00
Borislav Petkov
258721ef34 x86, microcode, AMD: Cleanup dmesg output
Unify pr_* to use pr_fmt, shorten messages, correct type formatting.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
2011-02-09 16:05:36 +01:00
Borislav Petkov
05ff02e4c0 x86, microcode, AMD: Remove unneeded memset call
collect_cpu_info_amd() clears its csig arg but this is done in the
microcode_core's collect_cpu_info() by clearing the embedding struct
ucode_cpu_info. Drop it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
2011-02-09 16:05:35 +01:00
Borislav Petkov
7cc27349cb x86, microcode, AMD: Simplify get_next_ucode
Do not copy the section header but look at it directly through the
pointer. Also, make it return a ptr to a ucode header directly
thus dropping a bunch of unneeded casts. Finally, simplify
generic_load_microcode(), while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
2011-02-09 16:05:34 +01:00
Borislav Petkov
10de52d665 x86, microcode, AMD: Simplify install_equiv_cpu_table
There's no need to memcpy the ucode header in order to look at it only
in this function - use the original buffer instead. Also, fix return
type semantics by returning a negative value on error and a positive
otherwise.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
2011-02-09 16:05:33 +01:00
Borislav Petkov
ffc7e8ac82 x86, microcode, AMD: Release firmware on error
When the ucode magic is wrong, for whatever reason, we don't release the
loaded firmware binary and its related resources. Make sure we do. Also,
fix function naming to fit this driver's convention and shorten variable
names.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
2011-02-09 16:05:32 +01:00
Borislav Petkov
6c53cbfced x86, microcode: Correct sysdev_add error path
When we encounter an error while initting the microcode driver on a CPU,
we must undo the previously added sysfs group.

Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
2011-02-09 16:05:31 +01:00
Hans Rosenfeld
cabb5bd7ff x86, amd: Support L3 Cache Partitioning on AMD family 0x15 CPUs
L3 Cache Partitioning allows selecting which of the 4 L3 subcaches can be used
for evictions by the L2 cache of each compute unit. By writing a 4-bit
hexadecimal mask into the the sysfs file
/sys/devices/system/cpu/cpuX/cache/index3/subcaches, the user can set the
enabled subcaches for a CPU.

The settings are directly read from and written to the hardware, so there is no
way to have contradicting settings for two CPUs belonging to the same compute
unit. Writing will always overwrite any previous setting for a compute unit.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: <Andreas.Herrmann3@amd.com>
LKML-Reference: <1297098639-431383-1-git-send-email-hans.rosenfeld@amd.com>
[ -v3: minor style fixes ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-07 19:16:22 +01:00
H. Peter Anvin
d344e38b2c x86, nx: Mark the ACPI resume trampoline code as +x
We reserve lowmem for the things that need it, like the ACPI
wakeup code, way early to guarantee availability.  This happens
before we set up the proper pagetables, so set_memory_x() has no
effect.

Until we have a better solution, use an initcall to mark the
wakeup code executable.

Originally-by: Matthieu Castet <castet.matthieu@free.fr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Matthias Hopf <mhopf@suse.de>
Cc: rjw@sisk.pl
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4D4F8019.2090104@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-07 09:07:13 +01:00
Ingo Molnar
c7f9a6f377 Merge branch 'linus' into perf/core
Merge reason: Pick up perf fixes that are now upstream

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-07 08:44:26 +01:00
Linus Torvalds
07675f484b Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32: Make sure the stack is set up before we use it
  x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms
  x86, nx: Don't force pages RW when setting NX bits
2011-02-06 12:03:10 -08:00
H. Peter Anvin
11d4c3f9b6 x86-32: Make sure the stack is set up before we use it
Since checkin ebba638ae7 we call
verify_cpu even in 32-bit mode.  Unfortunately, calling a function
means using the stack, and the stack pointer was not initialized in
the 32-bit setup code!  This code initializes the stack pointer, and
simplifies the interface slightly since it is easier to rely on just a
pointer value rather than a descriptor; we need to have different
values for the segment register anyway.

This retains start_stack as a virtual address, even though a physical
address would be more convenient for 32 bits; the 64-bit code wants
the other way around...

Reported-by: Matthieu Castet <castet.matthieu@free.fr>
LKML-Reference: <4D41E86D.8060205@free.fr>
Tested-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-02-04 22:27:28 -08:00
Linus Torvalds
eb487ab4d5 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix reading in perf_event_read()
  watchdog: Don't change watchdog state on read of sysctl
  watchdog: Fix sysctl consistency
  watchdog: Fix broken nowatchdog logic
  perf: Fix Pentium4 raw event validation
  perf: Fix alloc_callchain_buffers()
2011-02-03 08:52:05 -08:00
Suresh Siddha
f7448548a9 x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms
Markus Kohn ran into a hard hang regression on an acer aspire
1310, when acpi is enabled. git bisect showed the following
commit as the bad one that introduced the boot regression.

	commit d0af9eed5a
	Author: Suresh Siddha <suresh.b.siddha@intel.com>
	Date:   Wed Aug 19 18:05:36 2009 -0700

	    x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init

Because of the UP configuration of that platform,
native_smp_prepare_cpus() bailed out (in smp_sanity_check())
before doing the set_mtrr_aps_delayed_init()

Further down the boot path, native_smp_cpus_done() will call the
delayed MTRR initialization for the AP's (mtrr_aps_init()) with
mtrr_aps_delayed_init not set. This resulted in the boot
processor reprogramming its MTRR's to the values seen during the
start of the OS boot. While this is not needed ideally, this
shouldn't have caused any side-effects. This is because the
reprogramming of MTRR's (set_mtrr_state() that gets called via
set_mtrr()) will check if the live register contents are
different from what is being asked to write and will do the actual
write only if they are different.

BP's mtrr state is read during the start of the OS boot and
typically nothing would have changed when we ask to reprogram it
on BP again because of the above scenario on an UP platform. So
on a normal UP platform no reprogramming of BP MTRR MSR's
happens and all is well.

However, on this platform, bios seems to be modifying the fixed
mtrr range registers between the start of OS boot and when we
double check the live registers for reprogramming BP MTRR
registers. And as the live registers are modified, we end up
reprogramming the MTRR's to the state seen during the start of
the OS boot.

During ACPI initialization, something in the bios (probably smi
handler?) don't like this fact and results in a hard lockup.

We didn't see this boot hang issue on this platform before the
commit d0af9eed5a, because only
the AP's (if any) will program its MTRR's to the value that BP
had at the start of the OS boot.

Fix this issue by checking mtrr_aps_delayed_init before
continuing further in the mtrr_aps_init(). Now, only AP's (if
any) will program its MTRR's to the BP values during boot.

Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393

  [ By the way, this behavior of the bios modifying MTRR's after the start
    of the OS boot is not common and the kernel is not prepared to
    handle this situation well. Irrespective of this issue, during
    suspend/resume, linux kernel will try to reprogram the BP's MTRR values
    to the values seen during the start of the OS boot. So suspend/resume might
    be already broken on this platform for all linux kernel versions. ]

Reported-and-bisected-by: Markus Kohn <jabber@gmx.org>
Tested-by: Markus Kohn <jabber@gmx.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Thomas Renninger <trenn@novell.com>
Cc: Rafael Wysocki <rjw@novell.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: stable@kernel.org # [v2.6.32+]
LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-03 12:10:38 +01:00
Richard Cochran
ce26efdefa x86: Add clock_adjtime for x86
This patch adds the clock_adjtime system call to the x86 architecture.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Acked-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20110201134419.968905083@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-02-02 15:28:19 +01:00
Ingo Molnar
8104a4775a Merge commit 'v2.6.38-rc3' into perf/core
Merge reason: Pick up latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-02 07:10:06 +01:00
Tejun Heo
4e62445b90 x86: Fix build failure on X86_UP_APIC
Commit 4c321ff8 (x86: Replace cpu_2_logical_apicid[] with early
percpu variable) and following changes introduced and used
x86_cpu_to_logical_apicid percpu variable.  It was declared and
defined inside CONFIG_SMP && CONFIG_X86_32 but if
CONFIG_X86_UP_APIC is set UP configuration makes use of it and
build fails.

Fix it by declaring and defining it inside CONFIG_X86_LOCAL_APIC
&& CONFIG_X86_32.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <20110128162248.GA25746@htj.dyndns.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 17:24:49 +01:00
Tejun Heo
8db78cc4b4 x86: Unify NUMA initialization between 32 and 64bit
Now that everything else is unified, NUMA initialization can be
unified too.

* numa_init_array() and init_cpu_to_node() are moved from
  numa_64 to numa.

* numa_32::initmem_init() is updated to call numa_init_array()
  and setup_arch() to call init_cpu_to_node() on 32bit too.

* x86_cpu_to_node_map is now initialized to NUMA_NO_NODE on
  32bit too. This is safe now as numa_init_array() will initialize
  it early during boot.

This makes NUMA mapping fully initialized before
setup_per_cpu_areas() on 32bit too and thus makes the first
percpu chunk which contains all the static variables and some of
dynamic area allocated with NUMA affinity correctly considered.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-17-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
2011-01-28 14:54:10 +01:00
Tejun Heo
de2d9445f1 x86: Unify node_to_cpumask_map handling between 32 and 64bit
x86_32 has been managing node_to_cpumask_map explicitly from
map_cpu_to_node() and friends in a rather ugly way.  With
previous changes, it's now possible to share the code with
64bit.

* When CONFIG_NUMA_EMU is disabled, numa_add/remove_cpu() are
  implemented in numa.c and shared by 32 and 64bit.  CONFIG_NUMA_EMU
  versions still live in numa_64.c.

  NUMA_EMU's dependency on 64bit is planned to be removed and the
  above should go away together.

* identify_cpu() now calls numa_add_cpu() for 32bit too.  This
  makes the explicit mask management from map_cpu_to_node() unnecessary.

* The whole x86_32 specific map_cpu_to_node() chunk is no longer
  necessary.  Dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-16-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
2011-01-28 14:54:10 +01:00
Tejun Heo
645a79195f x86: Unify CPU -> NUMA node mapping between 32 and 64bit
Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for
CPU -> NUMA node mapping.  Replace it with early_percpu variable
x86_cpu_to_node_map and share the mapping code with 64bit.

* USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too.

* x86_cpu_to_node_map and numa_set/clear_node() are moved from
  numa_64 to numa.  For now, on 32bit, x86_cpu_to_node_map is initialized
  with 0 instead of NUMA_NO_NODE.  This is to avoid introducing unexpected
  behavior change and will be updated once init path is unified.

* srat_detect_node() is now enabled for x86_32 too.  It calls
  numa_set_node() and initializes the mapping making explicit
  cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-15-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
2011-01-28 14:54:09 +01:00
Tejun Heo
bbc9e2f452 x86: Unify cpu/apicid <-> NUMA node mapping between 32 and 64bit
The mapping between cpu/apicid and node is done via
apicid_to_node[] on 64bit and apicid_2_node[] +
apic->x86_32_numa_cpu_node() on 32bit. This difference makes it
difficult to further unify 32 and 64bit NUMA handling.

This patch unifies it by replacing both apicid_to_node[] and
apicid_2_node[] with __apicid_to_node[] array, which is accessed
by two accessors - set_apicid_to_node() and numa_cpu_node().  On
64bit, numa_cpu_node() always consults __apicid_to_node[]
directly while 32bit goes through apic->numa_cpu_node() method
to allow apic implementations to override it.

srat_detect_node() for amd cpus contains workaround for broken
NUMA configuration which assumes relationship between APIC ID,
HT node ID and NUMA topology.  Leave it to access
__apicid_to_node[] directly as mapping through CPU might result
in undesirable behavior change.  The comment is reformatted and
updated to note the ugliness.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-14-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
2011-01-28 14:54:09 +01:00
Tejun Heo
89e5dc218e x86: Replace apic->apicid_to_node() with ->x86_32_numa_cpu_node()
apic->apicid_to_node() is 32bit specific apic operation which
determines NUMA node for a CPU.  Depending on the APIC
implementation, it can be easier to determine NUMA node from
either physical or logical apicid.  Currently,
->apicid_to_node() takes @logical_apicid and calls
hard_smp_processor_id() if the physical apicid is needed.

This prevents NUMA mapping from being queried from a different
CPU, which in turn makes it impossible to initialize NUMA
mapping before SMP bringup.

This patch replaces apic->apicid_to_node() with
->x86_32_numa_cpu_node() which takes @cpu, from which both
logical and physical apicids can easily be determined.  While at
it, drop duplicate implementations from bigsmp_32 and summit_32,
and use the default one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-13-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:08 +01:00
Tejun Heo
df04cf011b x86: Implement x86_32_early_logical_apicid() for numaq_32
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-12-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:08 +01:00
Tejun Heo
3b39d93784 x86: Implement x86_32_early_logical_apicid() for summit_32
Factor out logical apic id calculation from
summit_init_apic_ldr() and use it for the
x86_32_early_logical_apicid() callback.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-11-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:07 +01:00
Tejun Heo
12bf24a47c x86: Implement x86_32_early_logical_apicid() for bigsmp_32
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-10-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:07 +01:00
Tejun Heo
3f6f679888 x86: Implement the default x86_32_early_logical_apicid()
Implement x86_32_early_logical_apicid() for the default apic
flat routing.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-9-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:07 +01:00
Tejun Heo
acb8bc09c6 x86: Add apic->x86_32_early_logical_apicid()
On x86_32, the mapping between cpu and logical apic ID differs
depending on the specific apic implementation in use.  The
mapping is initialized while bringing up CPUs; however, this
makes early inits ignore memory topology.

Add a x86_32 specific apic->x86_32_early_logical_apicid() which
is called early during boot to query the mapping.  The mapping
is later verified against the result of init_apic_ldr().  The
method is allowed to return BAD_APICID if it can't be determined
early.

noop variant which always returns BAD_APICID is implemented and
added to all x86_32 apic implementations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-8-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:06 +01:00
Tejun Heo
7632611f53 x86: Kill apic->cpu_to_logical_apicid()
After the previous patch, apic->cpu_to_logical_apicid() is no
longer used.  Kill it.

For apic types with custom cpu_to_logical_apicid() which is also
used for other purposes, remove the function and modify its
users to do the mapping directly.

#ifdef's on CONFIG_SMP in es7000_32 and summit_32 are ignored
during conversion as they are not used for UP kernels.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-7-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:06 +01:00
Tejun Heo
6f802c4bfa x86: Always use x86_cpu_to_logical_apicid for cpu -> logical apic id
Currently, cpu -> logical apic id translation is done by
apic->cpu_to_logical_apicid() callback which may or may not use
x86_cpu_to_logical_apicid.  This is unnecessary as it should
always equal logical_smp_processor_id() which is known early
during CPU bring up.

Initialize x86_cpu_to_logical_apicid after apic->init_apic_ldr()
in setup_local_APIC() and always use x86_cpu_to_logical_apicid
for cpu -> logical apic id mapping.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-6-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:05 +01:00
Tejun Heo
4c321ff8a0 x86: Replace cpu_2_logical_apicid[] with early percpu variable
Unlike x86_64, on x86_32, the mapping from cpu to logical apicid
may vary depending on apic in use.  cpu_2_logical_apicid[] array
is used for this mapping.  Replace it with early percpu variable
x86_cpu_to_logical_apicid to make it better aligned with other
mappings.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-5-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:05 +01:00
Tejun Heo
1245e1668c x86: Make default_send_IPI_mask_sequence/allbutself_logical() 32bit only
Both functions are used only in 32bit.  Put them inside
CONFIG_X86_32. This is to prepare for logical apicid handling
update.

- Cyrill Gorcunov spotted that I forgot to move declarations in
ipi.h   under CONFIG_X86_32.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: brgerst@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-4-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:05 +01:00
Tejun Heo
b78aa66b1f x86: Drop x86_32 MAX_APICID
Commit 56d91f13 (x86, acpi: Add MAX_LOCAL_APIC for 32bit) added
MAX_LOCAL_APIC for x86_32 but didn't replace MAX_APICID users
with it. Convert MAX_APICID users to MAX_LOCAL_APIC and drop
MAX_APICID.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-3-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:04 +01:00
Tejun Heo
bd22a2f198 x86: Kill unused static boot_cpu_logical_apicid in smpboot.c
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-2-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-28 14:54:04 +01:00
Yinghai Lu
dda9911696 x86, perf: Change two init functions to static
init_hw_perf_events() is called via early_initcall now.
x86_pmu_event_init is x86_pmu member function.

So we can change them to static.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
LKML-Reference: <4D3A16F9.109@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-27 19:23:07 +01:00
Stephane Eranian
d038b12c6d perf: Fix Pentium4 raw event validation
This patch fixes some issues with raw event validation on
Pentium 4 (Netburst) based processors.

As I was testing libpfm4 Netburst support, I ran into two
problems in the p4_validate_raw_event() function:

   - the shared field must be checked ONLY when HT is on
   - the binding to ESCR register was missing

The second item was causing raw events to not be encoded
correctly compared to generic PMU events.

With this patch, I can now pass Netburst events to libpfm4
examples and get meaningful results:

  $ task -e global_power_events🏃u  noploop 1
  noploop for 1 seconds
  3,206,304,898 global_power_events:running

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: perfmon2-devel@lists.sf.net
Cc: eranian@gmail.com
Cc: robert.richter@amd.com
Cc: acme@redhat.com
Cc: gorcunov@gmail.com
Cc: ming.m.lin@intel.com
LKML-Reference: <4d3efb2f.1252d80a.1a80.ffffc83f@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-27 19:21:53 +01:00
Yinghai Lu
792363d2be x86: Don't copy per_cpu cpuinfo for BSP two times
smp_store_cpu_info(0) will do that.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
LKML-Reference: <4D3A16F2.5090902@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26 08:44:50 +01:00
Yinghai Lu
b3d7336db5 x86: Move llc_shared_map out of cpu_info
cpu_info is already with per_cpu, We can take llc_shared_map out
of cpu_info, and declare it as per_cpu variable directly.

So later referencing could be simple and directly instead of
diving to find cpu_info at first.

Also could make smp_store_cpu_info() much simple to avoid to do
save and restore trick.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Alok N Kataria <akataria@vmware.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Hans J. Koch <hjk@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4D3A16E8.5020608@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26 08:44:49 +01:00
Hans Rosenfeld
41b2610c34 x86, amd: Extend AMD northbridge caching code to support "Link Control" devices
"Link Control" devices (NB function 4) will be used by L3 cache
partitioning on family 0x15.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: <andreas.herrmann3@amd.com>
LKML-Reference: <1295881543-572552-4-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26 08:28:23 +01:00
Hans Rosenfeld
b453de02b7 x86, amd: Enable L3 cache index disable on family 0x15
AMD family 0x15 CPUs support L3 cache index disable, so enable
it on them.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: <andreas.herrmann3@amd.com>
LKML-Reference: <1295881543-572552-3-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26 08:28:23 +01:00
Andreas Herrmann
d518573de6 x86, amd: Normalize compute unit IDs on multi-node processors
On multi-node CPUs we don't need the socket wide compute unit ID
but the node-wide compute unit ID. Thus we need to normalize the
value. This is similar to what we do with cpu_core_id.

A compute unit is then identified by physical_package_id,
node_id, and compute_unit_id.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <1295881543-572552-2-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-26 08:28:22 +01:00
Fenghua Yu
9599ec0471 x86-64, mem: Convert memmove() to assembly file and fix return value bug
memmove_64.c only implements memmove() function which is completely written in
inline assembly code. Therefore it doesn't make sense to keep the assembly code
in .c file.

Currently memmove() doesn't store return value to rax. This may cause issue if
caller uses the return value. The patch fixes this issue.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1295314755-6625-1-git-send-email-fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-01-25 16:58:39 -08:00
Tejun Heo
19df0c2fef percpu: align percpu readmostly subsection to cacheline
Currently percpu readmostly subsection may share cachelines with other
percpu subsections which may result in unnecessary cacheline bounce
and performance degradation.

This patch adds @cacheline parameter to PERCPU() and PERCPU_VADDR()
linker macros, makes each arch linker scripts specify its cacheline
size and use it to align percpu subsections.

This is based on Shaohua's x86 only patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shaohua.li@intel.com>
2011-01-25 14:26:50 +01:00
Jesper Juhl
2e5aa6824d x86-64: Don't use pointer to out-of-scope variable in dump_trace()
In arch/x86/kernel/dumpstack_64.c::dump_trace() we have this code:

...
  		if (!stack) {
  			unsigned long dummy;
  			stack = &dummy;
  			if (task && task != current)
  				stack = (unsigned long *)task->thread.sp;
  		}

  		bp = stack_frame(task, regs);
  		/*
  		 * Print function call entries in all stacks, starting at the
  		 * current stack address. If the stacks consist of nested
  		 * exceptions
  		 */
  		tinfo = task_thread_info(task);

  		for (;;) {
  			char *id;
  			unsigned long *estack_end;
  			estack_end = in_exception_stack(cpu, (unsigned long)stack,
  							&used, &id);
...

You'll notice that we assign to 'stack' the address of the variable
'dummy' which is only in-scope inside the 'if (!stack)'. So when we later
access stack (at the end of the above, and assuming we did not take the
'if (task && task != current)' branch) we'll be using the address of a
variable that is no longer in scope. I believe this patch is the proper
fix, but I freely admit that I'm not 100% certain.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
LKML-Reference: <alpine.LNX.2.00.1101242232590.10252@swampdragon.chaosbits.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-01-24 13:46:15 -08:00
Borislav Petkov
93789b32db x86, hotplug: Fix powersavings with offlined cores on AMD
ea53069231 made a CPU use monitor/mwait
when offline. This is not the optimal choice for AMD wrt to powersavings
and we'd prefer our cores to halt (i.e. enter C1) instead. For this, the
same selection whether to use monitor/mwait has to be used as when we
select the idle routine for the machine.

With this patch, offlining cores 1-5 on a X6 machine allows core0 to
boost again.

[ hpa: putting this in urgent since it is a (power) regression fix ]

Reported-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: stable@kernel.org # 37.x
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1295534572-10730-1-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-01-21 18:14:54 -08:00
Fenghua Yu
f21bbec9ff x86, mcheck, therm_throt.c: Export symbol platform_thermal_notify to allow coretemp to handler intr
In therm_throt.c, commit
9e76a97efd patch doesn't export
the symbol platform_thermal_notify.

Other drivers (e.g. drivers/hwmon/coretemp.c) can not find the
symbol platform_thermal_notify when defining threshould
interrupt handler.

Please apply this patch to allow threshold interrupt handler in
coretemp.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: R Durgadoss <durgadoss.r@intel.com>
Cc: khali@linux-fr.org <khali@linux-fr.org>
Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
LKML-Reference: <20110121041239.GB26954@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-21 14:11:12 +01:00
Dave Jones
fb87ec382f x86: Update CPU cache attributes table descriptors
Update to latest definitions in:

   http://www.intel.com/Assets/PDF/appnote/241618.pdf

[ Note, this update of the doc has removed some old values which
  we have listed.  I think until we have clarification that they
  were never used in production, they should be left there. ]

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <20110120012055.GA15985@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-20 12:13:20 +01:00
Ingo Molnar
6b35eb9ddc Revert "x86: Make relocatable kernel work with new binutils"
This reverts commit 86b1e8dd83 ("x86: Make relocatable kernel work with
new binutils").

Markus Trippelsdorf reported a boot failure caused by this patch.

The real solution to the original patch will likely involve an
arch-generic solution to define an overlaid jiffies_64 and jiffies
variables.

Until that's done and tested on all architectures revert this commit to
solve the regression.

Reported-and-bisected-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <4D36A759.60704@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-19 10:09:42 +01:00
Linus Torvalds
404cbbd52f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Clear irqstack thread_info
  x86: Make relocatable kernel work with new binutils
2011-01-18 14:29:21 -08:00
Brian Gerst
7b698ea377 x86: Clear irqstack thread_info
Mathias Merz reported that v2.6.37 failed to boot on his
system.

Make sure that the thread_info part of the irqstack is
initialized to zeroes.

Reported-and-Tested-by: Matthias Merz <linux@merz-ka.de>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <AANLkTimyKXfJ1x8tgwrr1hYnNLrPfgE1NTe4z7L6tUDm@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-18 14:58:37 +01:00
Shaohua Li
86b1e8dd83 x86: Make relocatable kernel work with new binutils
The CONFIG_RELOCATABLE=y option is broken with new binutils, which will make
boot panic.

According to Lu Hongjiu, the affected binutils are from 2.20.51.0.12 to
2.21.51.0.3, which are release since Oct 22 this year. At least ubuntu 10.10 is
using such binutils. See:

    http://sourceware.org/bugzilla/show_bug.cgi?id=12327

The reason of the boot panic is that we have 'jiffies = jiffies_64;' in
vmlinux.lds.S. The jiffies isn't in any section. In kernel build, there is
warning saying jiffies is an absolute address and can't be relocatable. At
runtime, jiffies will have virtual address 0.

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Lu Hongjiu<hongjiu.lu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1295312269.1949.725.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-18 09:05:33 +01:00
Linus Torvalds
f9ee7f60d6 Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', 'timers-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: avoid pointless blocked-task warnings
  rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status
  rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi()

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, olpc: Add missing Kconfig dependencies
  x86, mrst: Set correct APB timer IRQ affinity for secondary cpu
  x86: tsc: Fix calibration refinement conditionals to avoid divide by zero
  x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timekeeping: Make local variables static
  time: Rename misnamed minsec argument of clocks_calc_mult_shift()

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Remove syscall_exit_fields
  tracing: Only process module tracepoints once
  perf record: Add "nodelay" mode, disabled by default
  perf sched: Fix list of events, dropping unsupported ':r' modifier
  Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return"
  perf top: Fix annotate segv
  perf evsel: Fix order of event list deletion
2011-01-15 12:45:00 -08:00
Jacob Pan
6550904ddb x86, mrst: Set correct APB timer IRQ affinity for secondary cpu
Offlining the secondary CPU causes the timer irq affinity to be set to
CPU 0. When the secondary CPU is back online again, the wrong irq
affinity will be used.

This patch ensures secondary per CPU timer always has the correct
IRQ affinity when enabled.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1294963604-18111-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org> 2.6.37
2011-01-14 11:53:44 -08:00
John Stultz
62627bec8a x86: tsc: Fix calibration refinement conditionals to avoid divide by zero
Konrad Wilk reported that the new delayed calibration crashes with a
divide by zero on Xen. The reason is that Xen sets the pmtimer
address, but reading from it returns 0xffffff. That results in the
ref_start and ref_stop value being the same, so the delta is zero
which causes the divide by zero later in the calculation.

The conditional (!hpet && !ref_start && !ref_stop) which sanity checks
the calibration reference values doesn't really make sense. If the
refs are null, but hpet is on, we still want to break out.

The div by zero would be possible to trigger by chance if both reads
from the hardware provided the exact same value (due to hardware
wrapping).

So checking if both the ref values are the same should handle if we
don't have hardware (both null) or if they are the same value (either by
invalid hardware, or by chance), avoiding the div by zero issue.

[ tglx: Applied the same fix to native_calibrate_tsc() where this
  	check was copied from ]

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1295024788-15619-1-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-14 18:28:01 +01:00
Linus Torvalds
52cfd503ad Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (59 commits)
  ACPI / PM: Fix build problems for !CONFIG_ACPI related to NVS rework
  ACPI: fix resource check message
  ACPI / Battery: Update information on info notification and resume
  ACPI: Drop device flag wake_capable
  ACPI: Always check if _PRW is present before trying to evaluate it
  ACPI / PM: Check status of power resources under mutexes
  ACPI / PM: Rename acpi_power_off_device()
  ACPI / PM: Drop acpi_power_nocheck
  ACPI / PM: Drop acpi_bus_get_power()
  Platform / x86: Make fujitsu_laptop use acpi_bus_update_power()
  ACPI / Fan: Rework the handling of power resources
  ACPI / PM: Register power resource devices as soon as they are needed
  ACPI / PM: Register acpi_power_driver early
  ACPI / PM: Add function for updating device power state consistently
  ACPI / PM: Add function for device power state initialization
  ACPI / PM: Introduce __acpi_bus_get_power()
  ACPI / PM: Introduce function for refcounting device power resources
  ACPI / PM: Add functions for manipulating lists of power resources
  ACPI / PM: Prevent acpi_power_get_inferred_state() from making changes
  ACPICA: Update version to 20101209
  ...
2011-01-13 20:15:35 -08:00
Linus Torvalds
dc8e7e3ec6 Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
  cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
  intel_idle: open broadcast clock event
  cpuidle: CPUIDLE_FLAG_CHECK_BM is omap3_idle specific
  cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idle
  cpuidle: delete unused CPUIDLE_FLAG_SHALLOW, BALANCED, DEEP definitions
  SH, cpuidle: delete use of NOP CPUIDLE_FLAGS_SHALLOW
  cpuidle: delete NOP CPUIDLE_FLAG_POLL
  ACPI: processor_idle: delete use of NOP CPUIDLE_FLAGs
  cpuidle: Rename X86 specific idle poll state[0] from C0 to POLL
  ACPI, intel_idle: Cleanup idle= internal variables
  cpuidle: Make cpuidle_enable_device() call poll_idle_init()
  intel_idle: update Sandy Bridge core C-state residency targets
2011-01-13 20:15:18 -08:00
Andrea Arcangeli
bae9c19bf1 thp: split_huge_page_mm/vma
split_huge_page_pmd compat code.  Each one of those would need to be
expanded to hundred of lines of complex code without a fully reliable
split_huge_page_pmd design.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:41 -08:00
Andrea Arcangeli
8ac1f8320a thp: pte alloc trans splitting
pte alloc routines must wait for split_huge_page if the pmd is not present
and not null (i.e.  pmd_trans_splitting).  The additional branches are
optimized away at compile time by pmd_trans_splitting if the config option
is off.  However we must pass the vma down in order to know the anon_vma
lock to wait for.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:40 -08:00
Andrea Arcangeli
331127f799 thp: add pmd paravirt ops
Paravirt ops pmd_update/pmd_update_defer/pmd_set_at.  Not all might be
necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs
pmd_update_defer), but this is to keep full simmetry with pte paravirt
ops, which looks cleaner and simpler from a common code POV.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:39 -08:00
David Rientjes
d0a21265df mm: unify module_alloc code for vmalloc
Four architectures (arm, mips, sparc, x86) use __vmalloc_area() for
module_init().  Much of the code is duplicated and can be generalized in a
globally accessible function, __vmalloc_node_range().

__vmalloc_node() now calls into __vmalloc_node_range() with a range of
[VMALLOC_START, VMALLOC_END) for functionally equivalent behavior.

Each architecture may then use __vmalloc_node_range() directly to remove
the duplication of code.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:34 -08:00
Linus Torvalds
e691d24e9c Merge branch 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, olpc: Speed up device tree creation during boot
  x86, olpc: Add OLPC device-tree support
  x86, of: Define irq functions to allow drivers/of/* to build on x86
2011-01-13 10:15:12 -08:00
Linus Torvalds
55065bc527 Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
  KVM: Initialize fpu state in preemptible context
  KVM: VMX: when entering real mode align segment base to 16 bytes
  KVM: MMU: handle 'map_writable' in set_spte() function
  KVM: MMU: audit: allow audit more guests at the same time
  KVM: Fetch guest cr3 from hardware on demand
  KVM: Replace reads of vcpu->arch.cr3 by an accessor
  KVM: MMU: only write protect mappings at pagetable level
  KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()
  KVM: MMU: Initialize base_role for tdp mmus
  KVM: VMX: Optimize atomic EFER load
  KVM: VMX: Add definitions for more vm entry/exit control bits
  KVM: SVM: copy instruction bytes from VMCB
  KVM: SVM: implement enhanced INVLPG intercept
  KVM: SVM: enhance mov DR intercept handler
  KVM: SVM: enhance MOV CR intercept handler
  KVM: SVM: add new SVM feature bit names
  KVM: cleanup emulate_instruction
  KVM: move complete_insn_gp() into x86.c
  KVM: x86: fix CR8 handling
  KVM guest: Fix kvm clock initialization when it's configured out
  ...
2011-01-13 10:14:24 -08:00
Linus Torvalds
008d23e485 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  Documentation/trace/events.txt: Remove obsolete sched_signal_send.
  writeback: fix global_dirty_limits comment runtime -> real-time
  ppc: fix comment typo singal -> signal
  drivers: fix comment typo diable -> disable.
  m68k: fix comment typo diable -> disable.
  wireless: comment typo fix diable -> disable.
  media: comment typo fix diable -> disable.
  remove doc for obsolete dynamic-printk kernel-parameter
  remove extraneous 'is' from Documentation/iostats.txt
  Fix spelling milisec -> ms in snd_ps3 module parameter description
  Fix spelling mistakes in comments
  Revert conflicting V4L changes
  i7core_edac: fix typos in comments
  mm/rmap.c: fix comment
  sound, ca0106: Fix assignment to 'channel'.
  hrtimer: fix a typo in comment
  init/Kconfig: fix typo
  anon_inodes: fix wrong function name in comment
  fix comment typos concerning "consistent"
  poll: fix a typo in comment
  ...

Fix up trivial conflicts in:
 - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
 - fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13 10:05:56 -08:00
Stephen Hemminger
3e5c12409c set_rtc_mmss: show warning message only once
Occasionally the system gets into a state where the CMOS clock has gotten
slightly ahead of current time and the periodic update of RTC fails.  The
message is a nuisance and repeats spamming the log.

  See: http://www.ntp.org/ntpfaq/NTP-s-trbl-spec.htm#Q-LINUX-SET-RTC-MMSS

Rather than just removing the message, make it show only once and reduce
severity since it indicates a normal and non urgent condition.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:07 -08:00
Len Brown
43952886f0 Merge branch 'cpuidle-perf-events' into idle-test 2011-01-12 18:06:19 -05:00
Len Brown
56dbed129d Merge branch 'linus' into idle-test 2011-01-12 18:06:06 -05:00
Thomas Renninger
f77cfe4ea2 cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
events -> this patch fixes it and makes cpu_idle events throwing less complex.

It also introduces cpu_idle events for all architectures which use
the cpuidle subsystem, namely:
  - arch/arm/mach-at91/cpuidle.c
  - arch/arm/mach-davinci/cpuidle.c
  - arch/arm/mach-kirkwood/cpuidle.c
  - arch/arm/mach-omap2/cpuidle34xx.c
  - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
  - arch/x86/kernel/process.c (did throw events before, but was a mess)
  - drivers/idle/intel_idle.c (did throw events before)

Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.

Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
-> this is really easy is now.

This affects userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific values.
This change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.
Another (perf timechart) patch reads out cpuidle info of cpu_idle
events from:
/sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
to the correct C-/cpuidle state again, even if e.g. vendors miss
out C-states in their BIOS and for example only export C1 and C3.
-> everything is fine.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Jean Pihet <j-pihet@ti.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: linux-pm@lists.linux-foundation.org
CC: linux-acpi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-perf-users@vger.kernel.org
CC: linux-omap@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 18:05:16 -05:00
Thomas Renninger
d18960494f ACPI, intel_idle: Cleanup idle= internal variables
Having four variables for the same thing:
  idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides
is rather confusing and unnecessary complex.

if idle= boot param is passed, only set up one variable:
boot_option_idle_overrides

Introduces following functional changes/fixes:
  - intel_idle driver does not register if any idle=xy
    boot param is passed.
  - processor_idle.c will also not register a cpuidle driver
    and get active if idle=halt is passed.
    Before a cpuidle driver with one (C1, halt) state got registered
    Now the default_idle function will be used which finally uses
    the same idle call to enter sleep state (safe_halt()), but
    without registering a whole cpuidle driver.

That means idle= param will always avoid cpuidle drivers to register
with one exception (same behavior as before):
idle=nomwait
may still register acpi_idle cpuidle driver, but C1 will not use
mwait, but hlt. This can be a workaround for IO based deeper sleep
states where C1 mwait causes problems.

Signed-off-by: Thomas Renninger <trenn@suse.de>
cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 12:47:30 -05:00
Avi Kivity
e5c3014282 KVM: Initialize fpu state in preemptible context
init_fpu() (which is indirectly called by the fpu switching code) assumes
it is in process context.  Rather than makeing init_fpu() use an atomic
allocation, which can cause a task to be killed, make sure the fpu is
already initialized when we enter the run loop.

KVM-Stable-Tag.
Reported-and-tested-by: Kirill A. Shutemov <kas@openvz.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 12:02:26 +02:00
Len Brown
03b6e6e58d Merge branch 'apei' into release 2011-01-12 05:02:22 -05:00
Avi Kivity
a63512a4d7 KVM guest: Fix kvm clock initialization when it's configured out
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:56 +02:00
Gleb Natapov
6adba52742 KVM: Let host know whether the guest can handle async PF in non-userspace context.
If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:21 +02:00
Gleb Natapov
6c047cd982 KVM paravirt: Handle async PF in non preemptable context
If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:19 +02:00
Gleb Natapov
631bc48782 KVM: Handle async PF in a guest.
When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler. Also add async PF handling to nested SVM
emulation. Async PF always generates exit to L1 where vcpu thread will
be scheduled out until page is available.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:16 +02:00
Gleb Natapov
fd10cde929 KVM paravirt: Add async PF initialization to PV guest.
Enable async PF in a guest if async PF capability is discovered.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:14 +02:00
Gleb Natapov
ca3f10172e KVM paravirt: Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.
Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
into generic code.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:10 +02:00
Huang Ying
81e88fdc43 ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support
Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

This patch adds POLL/IRQ/NMI notification types support.

Because the memory area used to transfer hardware error information
from BIOS to Linux can be determined only in NMI, IRQ or timer
handler, but general ioremap can not be used in atomic context, so a
special version of atomic ioremap is implemented for that.

Known issue:

- Error information can not be printed for recoverable errors notified
  via NMI, because printk is not NMI-safe. Will fix this via delay
  printing to IRQ context via irq_work or make printk NMI-safe.

v2:

- adjust printk format per comments.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 03:06:19 -05:00
Linus Torvalds
16ee8db6a9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix Moorestown VRTC fixmap placement
  x86/gpio: Implement x86 gpio_to_irq convert function
  x86, UV: Fix APICID shift for Westmere processors
  x86: Use PCI method for enabling AMD extended config space before MSR method
  x86: tsc: Prevent delayed init if initial tsc calibration failed
  x86, lapic-timer: Increase the max_delta to 31 bits
  x86: Fix sparse non-ANSI function warnings in smpboot.c
  x86, numa: Fix CONFIG_DEBUG_PER_CPU_MAPS without NUMA emulation
  x86, AMD, PCI: Add AMD northbridge PCI device id for CPU families 12h and 14h
  x86, numa: Fix cpu to node mapping for sparse node ids
  x86, numa: Fake node-to-cpumask for NUMA emulation
  x86, numa: Fake apicid and pxm mappings for NUMA emulation
  x86, numa: Avoid compiling NUMA emulation functions without CONFIG_NUMA_EMU
  x86, numa: Reduce minimum fake node size to 32M

Fix up trivial conflict in arch/x86/kernel/apic/x2apic_uv_x.c
2011-01-11 11:11:46 -08:00
Linus Torvalds
42776163e1 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
  perf session: Fix infinite loop in __perf_session__process_events
  perf evsel: Support perf_evsel__open(cpus > 1 && threads > 1)
  perf sched: Use PTHREAD_STACK_MIN to avoid pthread_attr_setstacksize() fail
  perf tools: Emit clearer message for sys_perf_event_open ENOENT return
  perf stat: better error message for unsupported events
  perf sched: Fix allocation result check
  perf, x86: P4 PMU - Fix unflagged overflows handling
  dynamic debug: Fix build issue with older gcc
  tracing: Fix TRACE_EVENT power tracepoint creation
  tracing: Fix preempt count leak
  tracepoint: Add __rcu annotation
  tracing: remove duplicate null-pointer check in skb tracepoint
  tracing/trivial: Add missing comma in TRACE_EVENT comment
  tracing: Include module.h in define_trace.h
  x86: Save rbp in pt_regs on irq entry
  x86, dumpstack: Fix unused variable warning
  x86, NMI: Clean-up default_do_nmi()
  x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU
  x86, NMI: Remove DIE_NMI_IPI
  x86, NMI: Add priorities to handlers
  ...
2011-01-11 11:02:13 -08:00
Jack Steiner
990a32d1e5 x86, UV: Fix APICID shift for Westmere processors
Westmere processors use a different algorithm for
assigning APICIDs on SGI UV systems. The location of the
node number within the apicid is now a function of the
processor type.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20110110195210.GA18737@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-11 12:44:45 +01:00
Jan Beulich
24d9b70b8c x86: Use PCI method for enabling AMD extended config space before MSR method
While both methods should work equivalently well for the native
case, the Xen Dom0 case can't reliably work with the MSR one,
since there's no guarantee that the virtual CPUs it has
available fully cover all necessary physical ones.

As per the suggestion of Robert Richter the patch only adds the
PCI method, but leaves the MSR one as a fallback to cover new
systems the PCI IDs of which may not have got added to the code
base yet.

The only change in v2 is the breaking out of the new CPI
initialization method into a separate function, as requested by
Ingo.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Robert Richter <robert.richter@amd.com>
Cc: Andreas Herrmann3 <Andreas.Herrmann3@amd.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
LKML-Reference: <4D2B3FD7020000780002B67D@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-11 12:43:41 +01:00
Thomas Gleixner
29fe359ca2 x86: tsc: Prevent delayed init if initial tsc calibration failed
commit a8760ec (x86: Check tsc available/disabled in the delayed init
function) missed to prevent the setup of the delayed init function in
case the initial tsc calibration failed. This results in the same
divide by zero bug as we have seen without the tsc disabled check.

Skip the delayed work setup when tsc_khz (the initial calibration
value) is 0.

Bisected-and-tested-by: Kirill A. Shutemov <kas@openvz.org>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-11 11:48:39 +01:00
Linus Torvalds
8c8ae4e8cd Merge branch 'stable/generic' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/generic' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: HVM X2APIC support
  apic: Move hypervisor detection of x2apic to hypervisor.h
2011-01-10 08:48:29 -08:00
Pierre Tardy
4aed89d6b5 x86, lapic-timer: Increase the max_delta to 31 bits
Latest atom socs(penwell) does not have hpet timer.

As their local APIC timer is clocked at 400KHZ, and the current
code limit their Initial Counter register to 23 bits, they
cannot sleep more than 1.34 seconds which leads to ~2 spurious
wakeup per second (1 per thread)

These SOCs support 32bit timer so we change the max_delta to at
least 31bits. So we can at least sleep for 300 seconds.

We could not find any previous chip errata where lapic would
only have 23 bit precision As powertop is suggesting to activate
HPET to "sleep longer", this could mean this problem is already
known.

Problem is here since very first implementation of lapic timer
as a clock event e9e2cdb [PATCH] clockevents: i386 drivers.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Pierre Tardy <pierre.tardy@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
LKML-Reference: <1294327409-19426-1-git-send-email-pierre.tardy@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-10 09:36:49 +01:00
Ingo Molnar
30285c6f03 Merge branch 'x86/apic-cleanups' into x86/urgent
Merge reason: Topic is ready for upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-10 09:34:50 +01:00
Ingo Molnar
4385428a47 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent 2011-01-09 10:42:21 +01:00
Cyrill Gorcunov
047a3772fe perf, x86: P4 PMU - Fix unflagged overflows handling
Don found that P4 PMU reads CCCR register instead of counter
itself (in attempt to catch unflagged event) this makes P4
NMI handler to consume all NMIs it observes. So the other
NMI users such as kgdb simply have no chance to get NMI
on their hands.

Side note: at moment there is no way to run nmi-watchdog
together with perf tool. This is because both 'perf top' and
nmi-watchdog use same event. So while nmi-watchdog reserves
one event/counter for own needs there is no room for perf tool
left (there is a way to disable nmi-watchdog on boot of course).

Ming has tested this patch with the following results

 | 1. watchdog disabled
 |
 | kgdb tests on boot OK
 | perf works OK
 |
 | 2. watchdog enabled, without patch perf-x86-p4-nmi-4
 |
 | kgdb tests on boot hang
 |
 | 3. watchdog enabled, without patch perf-x86-p4-nmi-4 and do not run kgdb
 | tests on boot
 |
 | "perf top" partialy works
 |   cpu-cycles            no
 |   instructions          yes
 |   cache-references      no
 |   cache-misses          no
 |   branch-instructions   no
 |   branch-misses         yes
 |   bus-cycles            no
 |
 | 4. watchdog enabled, with patch perf-x86-p4-nmi-4 applied
 |
 | kgdb tests on boot OK
 | perf does not work, NMI "Dazed and confused" messages show up
 |

Which means we still have problems with p4 box due to 'unknown'
nmi happens but at least it should fix kgdb test cases.

Reported-by: Jason Wessel <jason.wessel@windriver.com>
Reported-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Lin Ming <ming.m.lin@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4D275E7E.3040903@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-09 10:40:52 +01:00
Randy Dunlap
91d88ce22b x86: Fix sparse non-ANSI function warnings in smpboot.c
Fix sparse warning for non-ANSI function declaration:

  arch/x86/kernel/smpboot.c💯30: warning: non-ANSI function declaration of function 'cpu_hotplug_driver_lock'
  arch/x86/kernel/smpboot.c:105:32: warning: non-ANSI function declaration of function 'cpu_hotplug_driver_unlock'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <20110108195914.95d366ea.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-09 10:15:19 +01:00
Linus Torvalds
72eb6a7914 Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
  gameport: use this_cpu_read instead of lookup
  x86: udelay: Use this_cpu_read to avoid address calculation
  x86: Use this_cpu_inc_return for nmi counter
  x86: Replace uses of current_cpu_data with this_cpu ops
  x86: Use this_cpu_ops to optimize code
  vmstat: User per cpu atomics to avoid interrupt disable / enable
  irq_work: Use per cpu atomics instead of regular atomics
  cpuops: Use cmpxchg for xchg to avoid lock semantics
  x86: this_cpu_cmpxchg and this_cpu_xchg operations
  percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
  percpu,x86: relocate this_cpu_add_return() and friends
  connector: Use this_cpu operations
  xen: Use this_cpu_inc_return
  taskstats: Use this_cpu_ops
  random: Use this_cpu_inc_return
  fs: Use this_cpu_inc_return in buffer.c
  highmem: Use this_cpu_xx_return() operations
  vmstat: Use this_cpu_inc_return for vm statistics
  x86: Support for this_cpu_add, sub, dec, inc_return
  percpu: Generic support for this_cpu_add, sub, dec, inc_return
  ...

Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.
2011-01-07 17:02:58 -08:00
Frederic Weisbecker
625dbc3b8a x86: Save rbp in pt_regs on irq entry
From the x86_64 low level interrupt handlers, the frame pointer is
saved right after the partial pt_regs frame.

rbp is not supposed to be part of the irq partial saved registers,
but it only requires to extend the pt_regs frame by 8 bytes to
do so, plus a tiny stack offset fixup on irq exit.

This changes a bit the semantics or get_irq_entry() that is supposed
to provide only the value of caller saved registers and the cpu
saved frame. However it's a win for unwinders that can walk through
stack frames on top of get_irq_regs() snapshots.

A noticeable impact is that it makes perf events cpu-clock and
task-clock events based callchains working on x86_64.

Let's then save rbp into the irq pt_regs.

As a result with:

	perf record -e cpu-clock perf bench sched messaging
	perf report --stdio

Before:
    20.94%             perf  [kernel.kallsyms]        [k] lock_acquire
                       |
                       --- lock_acquire
                          |
                          |--44.01%-- __write_nocancel
                          |
                          |--43.18%-- __read
                          |
                          |--6.08%-- fork
                          |          create_worker
                          |
                          |--0.88%-- _dl_fixup
                          |
                          |--0.65%-- do_lookup_x
                          |
                          |--0.53%-- __GI___libc_read
                           --4.67%-- [...]

After:
    19.23%         perf  [kernel.kallsyms]    [k] __lock_acquire
                   |
                   --- __lock_acquire
                      |
                      |--97.74%-- lock_acquire
                      |          |
                      |          |--21.82%-- _raw_spin_lock
                      |          |          |
                      |          |          |--37.26%-- unix_stream_recvmsg
                      |          |          |          sock_aio_read
                      |          |          |          do_sync_read
                      |          |          |          vfs_read
                      |          |          |          sys_read
                      |          |          |          system_call
                      |          |          |          __read
                      |          |          |
                      |          |          |--24.09%-- unix_stream_sendmsg
                      |          |          |          sock_aio_write
                      |          |          |          do_sync_write
                      |          |          |          vfs_write
                      |          |          |          sys_write
                      |          |          |          system_call
                      |          |          |          __write_nocancel

v2: Fix cfi annotations.

Reported-by: Soeren Sandmann Pedersen <sandmann@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jan Beulich <JBeulich@novell.com>
2011-01-07 17:40:56 +01:00
Rakib Mullick
39a6eebda2 x86, dumpstack: Fix unused variable warning
In dump_stack function, bp isn't used anymore, which is introduced by
commit 9c0729dc80. This patch removes bp
completely.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Soeren Sandmann <sandmann@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <AANLkTik9U_Z0WSZ7YjrykER_pBUfPDdgUUmtYx=R74nL@mail.gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2011-01-07 16:59:49 +01:00
Sheng Yang
2904ed8dd5 apic: Move hypervisor detection of x2apic to hypervisor.h
Then we can reuse it for Xen later.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Avi Kivity <avi@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-07 10:03:49 -05:00
Don Zickus
f2fd43954a x86, NMI: Clean-up default_do_nmi()
Just re-arrange the code a bit to make it easier to follow what is
going on.  Basically un-negating the if-statement and swapping the code
inside the if-statement with code outside.

No functional changes.

Originally-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-7-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07 15:08:53 +01:00
Don Zickus
ab846f13f6 x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU
In original NMI handler, NMI reason io port (0x61) is only processed
on BSP.  This makes it impossible to hot-remove BSP.  To solve the
issue, a raw spinlock is used to allow the port to be processed on any
CPU.

Originally-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-6-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07 15:08:53 +01:00
Don Zickus
c410b83077 x86, NMI: Remove DIE_NMI_IPI
With priorities in place and no one really understanding the difference between
DIE_NMI and DIE_NMI_IPI, just remove DIE_NMI_IPI and convert everyone to DIE_NMI.

This also simplifies default_do_nmi() a little bit.  Instead of calling the
die_notifier in both the if and else part, just pull it out and call it before
the if-statement.  This has the side benefit of avoiding a call to the ioport
to see if there is an external NMI sitting around until after the (more frequent)
internal NMIs are dealt with.

Patch-Inspired-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-5-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07 15:08:53 +01:00
Don Zickus
166d751479 x86, NMI: Add priorities to handlers
In order to consolidate the NMI die_chain events, we need to setup the priorities
for the die notifiers.

I started by defining a bunch of common priorities that can be used by the
notifier blocks.  Then I modified the notifier blocks to use the newly created
priorities.

Now that the priorities are straightened out, it should be easier to remove the
event DIE_NMI_IPI.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-4-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07 15:08:52 +01:00
Don Zickus
673a6092ce x86: Convert some devices to use DIE_NMIUNKNOWN
They are a handful of places in the code that register a die_notifier
as a catch all in case no claims the NMI.  Unfortunately, they trigger
on events like DIE_NMI and DIE_NMI_IPI, which depending on when they
registered may collide with other handlers that have the ability to
determine if the NMI is theirs or not.

The function unknown_nmi_error() makes one last effort to walk the
die_chain when no one else has claimed the NMI before spitting out
messages that the NMI is unknown.

This is a better spot for these devices to execute any code without
colliding with the other handlers.

The two drivers modified are only compiled on x86 arches I believe, so
they shouldn't be affected by other arches that may not have
DIE_NMIUNKNOWN defined.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: openipmi-developer@lists.sourceforge.net
Cc: dann frazier <dannf@hp.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07 15:08:52 +01:00
Huang Ying
1c7b74d46f x86, NMI: Add NMI symbol constants and rename memory parity to PCI SERR
Replace the NMI related magic numbers with symbol constants.

Memory parity error is only valid for IBM PC-AT, newer machine use
bit 7 (0x80) of 0x61 port for PCI SERR. While memory error is usually
reported via MCE. So corresponding function name and kernel log string
is changed.

But on some machines, PCI SERR line is still used to report memory
errors. This is used by EDAC, so corresponding EDAC call is reserved.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07 15:08:51 +01:00
Ingo Molnar
1c2a48cf65 Merge branch 'linus' into x86/apic-cleanups
Conflicts:
	arch/x86/include/asm/io_apic.h

Merge reason: Resolve the conflict, update to a more recent -rc base

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07 14:14:15 +01:00
Rafael J. Wysocki
976513dbfc PM / ACPI: Move NVS saving and restoring code to drivers/acpi
The saving of the ACPI NVS area during hibernation and suspend and
restoring it during the subsequent resume is entirely specific to
ACPI, so move it to drivers/acpi and drop the CONFIG_SUSPEND_NVS
configuration option which is redundant.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-07 00:36:55 -05:00
Linus Torvalds
cb600d2f83 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mm: Initialize initial_page_table before paravirt jumps
2011-01-06 11:12:17 -08:00
Linus Torvalds
47935a731b Merge branches 'x86-alternatives-for-linus', 'x86-fpu-for-linus', 'x86-hwmon-for-linus', 'x86-paravirt-for-linus', 'core-locking-for-linus' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64, asm: Use fxsaveq/fxrestorq in more places

* 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hwmon: Add core threshold notification to therm_throt.c

* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, paravirt: Use native_halt on a halt, not native_safe_halt

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  locking, lockdep: Convert sprintf_symbol to %pS

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  irq: Better struct irqaction layout
2011-01-06 11:11:50 -08:00
Linus Torvalds
77a0dd54ba Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV, BAU: Extend for more than 16 cpus per socket
  x86, UV: Fix the effect of extra bits in the hub nodeid register
  x86, UV: Add common uv_early_read_mmr() function for reading MMRs
2011-01-06 11:09:57 -08:00
Linus Torvalds
d7a5a18190 Merge branch 'x86-tsc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-tsc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Check tsc available/disabled in the delayed init function
  x86: Improve TSC calibration using a delayed workqueue
  x86: Make tsc=reliable override boot time stability checks
2011-01-06 11:08:14 -08:00
Linus Torvalds
4f00b901d4 Merge branch 'x86-security-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-security-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  module: Move RO/NX module protection to after ftrace module update
  x86: Resume trampoline must be executable
  x86: Add RO/NX protection for loadable kernel modules
  x86: Add NX protection for kernel data
  x86: Fix improper large page preservation
2011-01-06 11:07:33 -08:00
Linus Torvalds
b4c6e2ea5e Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, earlyprintk: Move mrst early console to platform/ and fix a typo
  x86, apbt: Setup affinity for apb timers acting as per-cpu timer
  ce4100: Add errata fixes for UART on CE4100
  x86: platform: Move iris to x86/platform where it belongs
  x86, mrst: Check platform_device_register() return code
  x86/platform: Add Eurobraille/Iris power off support
  x86, mrst: Add explanation for using 1960 as the year offset for vrtc
  x86, mrst: Fix dependencies of "select INTEL_SCU_IPC"
  x86, mrst: The shutdown for MRST requires the SCU IPC mechanism
  x86: Ce4100: Add reboot_fixup() for CE4100
  ce4100: Add PCI register emulation for CE4100
  x86: Add CE4100 platform support
  x86: mrst: Set vRTC's IRQ to level trigger type
  x86: mrst: Add audio driver bindings
  rtc: Add drivers/rtc/rtc-mrst.c
  x86: mrst: Add vrtc driver which serves as a wall clock device
  x86: mrst: Add Moorestown specific reboot/shutdown support
  x86: mrst: Parse SFI timer table for all timer configs
  x86/mrst: Add SFI platform device parsing code
2011-01-06 11:06:31 -08:00
Linus Torvalds
6f46b120a9 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, microcode, AMD: Cleanup code a bit
  x86, microcode, AMD: Replace vmalloc+memset with vzalloc
2011-01-06 11:06:09 -08:00
Linus Torvalds
4e1db5e58a Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  apic, amd: Make firmware bug messages more meaningful
  mce, amd: Remove goto in threshold_create_device()
  mce, amd: Add helper functions to setup APIC
  mce, amd: Shorten local variables mci_misc_{hi,lo}
  mce, amd: Implement mce_threshold_block_init() helper function
2011-01-06 11:05:21 -08:00
Linus Torvalds
37d9a8c5ea Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix included-by file reference comments
  x86, cpu: Only CPU features determine NX capabilities
  x86, cpu: Call verify_cpu during 32bit CPU startup
  x86, cpu: Clear XD_DISABLED flag on Intel to regain NX
  x86, cpu: Rename verify_cpu_64.S to verify_cpu.S
2011-01-06 10:56:02 -08:00
Linus Torvalds
017892c341 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion
  x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation
  x86, acpi: Add MAX_LOCAL_APIC for 32bit
  x86: io_apic: Split setup_ioapic_ids_from_mpc()
  x86: io_apic: Fix CONFIG_X86_IO_APIC=n breakage
  x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings()
  x86: Allow platforms to force enable apic
2011-01-06 10:51:36 -08:00
Linus Torvalds
42cbd8efb0 Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cacheinfo: Cleanup L3 cache index disable support
  x86, amd-nb: Cleanup AMD northbridge caching code
  x86, amd-nb: Complete the rename of AMD NB and related code
2011-01-06 10:50:28 -08:00
Huang Ying
74d91e3c6a x86, NMI: Add touch_nmi_watchdog to io_check_error delay
Prevent the long delay in io_check_error making NMI watchdog
timeout.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <1294198689-15447-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-05 14:22:58 +01:00
Dongdong Deng
554ec06398 x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time
The spin_lock_debug/rcu_cpu_stall detector uses
trigger_all_cpu_backtrace() to dump cpu backtrace.
Therefore it is possible that trigger_all_cpu_backtrace()
could be called at the same time on different CPUs, which
triggers and 'unknown reason NMI' warning. The following case
illustrates the problem:

      CPU1                    CPU2                     ...   CPU N
                       trigger_all_cpu_backtrace()
                       set "backtrace_mask" to cpu mask
                               |
generate NMI interrupts  generate NMI interrupts       ...
    \                          |                               /
     \                         |                              /

The "backtrace_mask" will be cleaned by the first NMI interrupt
at nmi_watchdog_tick(), then the following NMI interrupts
generated by other cpus's arch_trigger_all_cpu_backtrace() will
be taken as unknown reason NMI interrupts.

This patch uses a test_and_set to avoid the problem, and stop
the arch_trigger_all_cpu_backtrace() from calling to avoid
dumping a double cpu backtrace info when there is already a
trigger_all_cpu_backtrace() in progress.

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Reviewed-by: Bruce Ashfield <bruce.ashfield@windriver.com>
Cc: fweisbec@gmail.com
LKML-Reference: <1294198689-15447-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Don Zickus <dzickus@redhat.com>
2011-01-05 14:22:57 +01:00
Don Zickus
9ab181fa9f x86: Only call smp_processor_id in non-preempt cases
There are some paths that walk the die_chain with preemption on.
Make sure we are in an NMI call before we start doing anything.

This was triggered by do_general_protection calling notify_die
with DIE_GPF.

Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <1294198689-15447-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-05 14:22:57 +01:00
Yinghai Lu
cb2ded37fd x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion
Found one x2apic pre-enabled system, x2apic_mode suddenly get
corrupted after register some cpus, when compiled
CONFIG_NR_CPUS=255 instead of 512.

It turns out that generic_processor_info() ==> phyid_set(apicid,
phys_cpu_present_map) causes the problem.

phys_cpu_present_map is sized by MAX_APICS bits, and pre-enabled
system some cpus have an apic id > 255.

The variable after phys_cpu_present_map may get corrupted
silently:

 ffffffff828e8420 B phys_cpu_present_map
 ffffffff828e8440 B apic_verbosity
 ffffffff828e8444 B local_apic_timer_c2_ok
 ffffffff828e8448 B disable_apic
 ffffffff828e844c B x2apic_mode
 ffffffff828e8450 B x2apic_disabled
 ffffffff828e8454 B num_processors
 ...

Actually phys_cpu_present_map is referenced via apic id, instead
index. We should use MAX_LOCAL_APIC instead MAX_APICS.

For 64-bit it will be 32768 in all cases. BSS will increase by 4k bytes
on 64-bit:

	text		data		bss		dec		filename
	21696943	4193748		12787712	38678403	vmlinux.before
	21696943	4193748		12791808	38682499	vmlinux.after

No change on 32bit.

Finally we can remove MAX_APCIS that was rather confusing.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4D23BD9C.3070102@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-05 14:09:23 +01:00
Yinghai Lu
f005fe12b9 x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping()
It is not related to init_memory_mapping(),  and init_memory_mapping() is
getting more bigger.

So make it as seperated function and call it from reserve_brk() and that is
point when _brk_end is concluded.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D1933E0.7090305@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-01-05 14:03:45 +01:00
Rusty Russell
d50d8fe192 x86, mm: Initialize initial_page_table before paravirt jumps
v2.6.36-rc8-54-gb40827f (x86-32, mm: Add an initial page table
for core bootstrapping) made x86 boot using initial_page_table
and broke lguest.

For 2.6.37 we simply cut & paste the initialization code into
lguest (da32dac101 "lguest: populate initial_page_table"), now
we fix it properly by doing that initialization before the
paravirt jump.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: lguest <lguest@ozlabs.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <201101041720.54535.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-04 09:53:50 +01:00
Ingo Molnar
bc030d6cb9 Merge commit 'v2.6.37-rc8' into x86/apic
Conflicts:
	arch/x86/include/asm/io_apic.h

Merge reason: move to a fresh -rc, resolve the conflict.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-04 09:43:42 +01:00
Thomas Renninger
25e41933b5 perf: Clean up power events by introducing new, more generic ones
Add these new power trace events:

 power:cpu_idle
 power:cpu_frequency
 power:machine_suspend

The old C-state/idle accounting events:
  power:power_start
  power:power_end

Have now a replacement (but we are still keeping the old
tracepoints for compatibility):

  power:cpu_idle

and
  power:power_frequency

is replaced with:
  power:cpu_frequency

power:machine_suspend is newly introduced.

Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
2011-01-04 08:16:54 +01:00
Ingo Molnar
cc22219699 Merge commit 'v2.6.37-rc8' into perf/core
Merge reason: pick up latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-04 08:08:54 +01:00
R, Durgadoss
9e76a97efd x86, hwmon: Add core threshold notification to therm_throt.c
This patch adds code to therm_throt.c to notify core thermal threshold
events. These thresholds are supported by the IA32_THERM_INTERRUPT register.
The status/log for the same is monitored using the IA32_THERM_STATUS register.
The necessary #defines are in msr-index.h. A call back is added to mce.h, to
further notify the thermal stack, about the threshold events.

Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
LKML-Reference: <D6D887BA8C9DFF48B5233887EF04654105C1251710@bgsmsx502.gar.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-01-03 08:30:30 -08:00
Tejun Heo
c1955b5f3a x86: Use this_cpu_inc_return for nmi counter
this_cpu_inc_return() saves us a memory access there.

Reviewed-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-30 12:22:17 +01:00
Tejun Heo
7b543a5334 x86: Replace uses of current_cpu_data with this_cpu ops
Replace all uses of current_cpu_data with this_cpu operations on the
per cpu structure cpu_info.  The scala accesses are replaced with the
matching this_cpu ops which results in smaller and more efficient
code.

In the long run, it might be a good idea to remove cpu_data() macro
too and use per_cpu macro directly.

tj: updated description

Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-30 12:22:03 +01:00
Tejun Heo
0a3aee0da4 x86: Use this_cpu_ops to optimize code
Go through x86 code and replace __get_cpu_var and get_cpu_var
instances that refer to a scalar and are not used for address
determinations.

Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-30 12:20:28 +01:00
Yinghai Lu
1411e0ec31 x86-64, numa: Put pgtable to local node memory
Introduce init_memory_mapping_high(), and use it with 64bit.

It will go with every memory segment above 4g to create page table to the
memory range itself.

before this patch all page tables was on one node.

with this patch, one RED-PEN is killed

debug out for 8 sockets system after patch
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff]
[    0.000000]  0000000000 - 007f600000 page 2M
[    0.000000]  007f600000 - 007f750000 page 4k
[    0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff]
[    0.000000] RAMDISK: 7bc84000 - 7f745000
....
[    0.000000] Adding active range (0, 0x10, 0x95) 0 entries of 3200 used
[    0.000000] Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used
[    0.000000] Adding active range (0, 0x100000, 0x1080000) 2 entries of 3200 used
[    0.000000] Adding active range (1, 0x1080000, 0x2080000) 3 entries of 3200 used
[    0.000000] Adding active range (2, 0x2080000, 0x3080000) 4 entries of 3200 used
[    0.000000] Adding active range (3, 0x3080000, 0x4080000) 5 entries of 3200 used
[    0.000000] Adding active range (4, 0x4080000, 0x5080000) 6 entries of 3200 used
[    0.000000] Adding active range (5, 0x5080000, 0x6080000) 7 entries of 3200 used
[    0.000000] Adding active range (6, 0x6080000, 0x7080000) 8 entries of 3200 used
[    0.000000] Adding active range (7, 0x7080000, 0x8080000) 9 entries of 3200 used
[    0.000000] init_memory_mapping: [0x00000100000000-0x0000107fffffff]
[    0.000000]  0100000000 - 1080000000 page 2M
[    0.000000] kernel direct mapping tables up to 1080000000 @ [0x107ffbd000-0x107fffffff]
[    0.000000]     memblock_x86_reserve_range: [0x107ffc2000-0x107fffffff]          PGTABLE
[    0.000000] init_memory_mapping: [0x00001080000000-0x0000207fffffff]
[    0.000000]  1080000000 - 2080000000 page 2M
[    0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff]
[    0.000000]     memblock_x86_reserve_range: [0x207ffc0000-0x207fffffff]          PGTABLE
[    0.000000] init_memory_mapping: [0x00002080000000-0x0000307fffffff]
[    0.000000]  2080000000 - 3080000000 page 2M
[    0.000000] kernel direct mapping tables up to 3080000000 @ [0x307ff3d000-0x307fffffff]
[    0.000000]     memblock_x86_reserve_range: [0x307ffc0000-0x307fffffff]          PGTABLE
[    0.000000] init_memory_mapping: [0x00003080000000-0x0000407fffffff]
[    0.000000]  3080000000 - 4080000000 page 2M
[    0.000000] kernel direct mapping tables up to 4080000000 @ [0x407fefd000-0x407fffffff]
[    0.000000]     memblock_x86_reserve_range: [0x407ffc0000-0x407fffffff]          PGTABLE
[    0.000000] init_memory_mapping: [0x00004080000000-0x0000507fffffff]
[    0.000000]  4080000000 - 5080000000 page 2M
[    0.000000] kernel direct mapping tables up to 5080000000 @ [0x507febd000-0x507fffffff]
[    0.000000]     memblock_x86_reserve_range: [0x507ffc0000-0x507fffffff]          PGTABLE
[    0.000000] init_memory_mapping: [0x00005080000000-0x0000607fffffff]
[    0.000000]  5080000000 - 6080000000 page 2M
[    0.000000] kernel direct mapping tables up to 6080000000 @ [0x607fe7d000-0x607fffffff]
[    0.000000]     memblock_x86_reserve_range: [0x607ffc0000-0x607fffffff]          PGTABLE
[    0.000000] init_memory_mapping: [0x00006080000000-0x0000707fffffff]
[    0.000000]  6080000000 - 7080000000 page 2M
[    0.000000] kernel direct mapping tables up to 7080000000 @ [0x707fe3d000-0x707fffffff]
[    0.000000]     memblock_x86_reserve_range: [0x707ffc0000-0x707fffffff]          PGTABLE
[    0.000000] init_memory_mapping: [0x00007080000000-0x0000807fffffff]
[    0.000000]  7080000000 - 8080000000 page 2M
[    0.000000] kernel direct mapping tables up to 8080000000 @ [0x807fdfc000-0x807fffffff]
[    0.000000]     memblock_x86_reserve_range: [0x807ffbf000-0x807fffffff]          PGTABLE
[    0.000000] Initmem setup node 0 [0000000000000000-000000107fffffff]
[    0.000000]   NODE_DATA [0x0000107ffbd000-0x0000107ffc1fff]
[    0.000000] Initmem setup node 1 [0000001080000000-000000207fffffff]
[    0.000000]   NODE_DATA [0x0000207ffbb000-0x0000207ffbffff]
[    0.000000] Initmem setup node 2 [0000002080000000-000000307fffffff]
[    0.000000]   NODE_DATA [0x0000307ffbb000-0x0000307ffbffff]
[    0.000000] Initmem setup node 3 [0000003080000000-000000407fffffff]
[    0.000000]   NODE_DATA [0x0000407ffbb000-0x0000407ffbffff]
[    0.000000] Initmem setup node 4 [0000004080000000-000000507fffffff]
[    0.000000]   NODE_DATA [0x0000507ffbb000-0x0000507ffbffff]
[    0.000000] Initmem setup node 5 [0000005080000000-000000607fffffff]
[    0.000000]   NODE_DATA [0x0000607ffbb000-0x0000607ffbffff]
[    0.000000] Initmem setup node 6 [0000006080000000-000000707fffffff]
[    0.000000]   NODE_DATA [0x0000707ffbb000-0x0000707ffbffff]
[    0.000000] Initmem setup node 7 [0000007080000000-000000807fffffff]
[    0.000000]   NODE_DATA [0x0000807ffba000-0x0000807ffbefff]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D1933D1.9020609@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-29 15:48:08 -08:00
Yinghai Lu
45635ab5e4 x86: Change get_max_mapped() to inline
Move it into head file. to prepare use it in other files.

[ hpa: added missing <linux/types.h> and changed type to phys_addr_t. ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D1933BA.8000508@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-29 15:47:55 -08:00
Yinghai Lu
32e3f2b00c x86-64, gart: Fix allocation with memblock
When trying to change alloc_bootmem with memblock to go with real top-down
Found one old system:
[    0.000000] Node 0: aperture @ ac000000 size 64 MB
[    0.000000] Aperture pointing to e820 RAM. Ignoring.
[    0.000000] Your BIOS doesn't leave a aperture memory hole
[    0.000000] Please enable the IOMMU option in the BIOS setup
[    0.000000] This costs you 64 MB of RAM
[    0.000000]     memblock_x86_reserve_range: [0x2020000000-0x2023ffffff]       aperture64
[    0.000000] Cannot allocate aperture memory hole (ffff882020000000,65536K)
[    0.000000]        memblock_x86_free_range: [0x2020000000-0x2023ffffff]
[    0.000000] Kernel panic - not syncing: Not enough memory for aperture
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.37-rc5-tip-yh-06229-gb792dc2-dirty #331
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff81cf50fe>] ? panic+0x91/0x1a3
[    0.000000]  [<ffffffff827c66b2>] ? gart_iommu_hole_init+0x3d7/0x4a3
[    0.000000]  [<ffffffff81d026a9>] ? _etext+0x0/0x3
[    0.000000]  [<ffffffff827ba940>] ? pci_iommu_alloc+0x47/0x71
[    0.000000]  [<ffffffff827c820b>] ? mem_init+0x19/0xec
[    0.000000]  [<ffffffff827b3c40>] ? start_kernel+0x20a/0x3e8
[    0.000000]  [<ffffffff827b32cc>] ? x86_64_start_reservations+0x9c/0xa0
[    0.000000]  [<ffffffff827b33e4>] ? x86_64_start_kernel+0x114/0x11b

it means __alloc_bootmem_nopanic() get too high for that aperture.

Use memblock_find_in_range() with limit directly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0C0740.90104@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-29 14:46:54 -08:00
Jesper Juhl
5cdd2de0a7 x86/microcode: Fix double vfree() and remove redundant pointer checks before vfree()
In arch/x86/kernel/microcode_intel.c::generic_load_microcode()
we have  this:

	while (leftover) {
		...
		if (get_ucode_data(mc, ucode_ptr, mc_size) ||
		    microcode_sanity_check(mc) < 0) {
			vfree(mc);
			break;
		}
		...
	}

	if (mc)
		vfree(mc);

This will cause a double free of 'mc'. This patch fixes that by
just  removing the vfree() call in the loop since 'mc' will be
freed nicely just  after we break out of the loop.

There's also a second change in the patch. I noticed a lot of
checks for  pointers being NULL before passing them to vfree().
That's completely  redundant since vfree() deals gracefully with
being passed a NULL pointer.  Removing the redundant checks
yields a nice size decrease for the object  file.

Size before the patch:
   text    data     bss     dec     hex filename
   4578     240    1032    5850    16da arch/x86/kernel/microcode_intel.o
Size after the patch:
   text    data     bss     dec     hex filename
   4489     240     984    5713    1651 arch/x86/kernel/microcode_intel.o

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <alpine.LNX.2.00.1012251946100.10759@swampdragon.chaosbits.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-27 14:33:30 +01:00
Linus Torvalds
79534f237f Merge branches 'perf-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf probe: Fix to support libdwfl older than 0.148
  perf tools: Fix lazy wildcard matching
  perf buildid-list: Fix error return for success
  perf buildid-cache: Fix symbolic link handling
  perf symbols: Stop using vmlinux files with no symbols
  perf probe: Fix use of kernel image path given by 'k' option

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, kexec: Limit the crashkernel address appropriately
2010-12-23 15:39:40 -08:00
Yinghai Lu
d3bd058826 x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation
Recent Intel new system have different order in MADT, aka will list all thread0
at first, then all thread1.
But SRAT table still old order, it will list cpus in one socket all together.

If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
to put some cpus apic id to node mapping into apicid_to_node[].

for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash...

[    9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
[    9.235021] divide error: 0000 [#1] SMP
[    9.235315] last sysfs file:
[    9.235481] CPU 1
[    9.235592] Modules linked in:
[    9.245398]
[    9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274      /Sun Fire x4800
[    9.265415] RIP: 0010:[<ffffffff81075a8f>]  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
...
[    9.645938] RIP  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
[    9.665356]  RSP <ffff88103f8d1c40>
[    9.665568] ---[ end trace 2296156d35fdfc87 ]---

So let just parse all cpu entries in SRAT.

Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
apicid_to_node[].

it fixes following bug too.
https://bugzilla.kernel.org/show_bug.cgi?id=22662

-v2: expand to 32bit according to hpa
   need to add MAX_LOCAL_APIC for 32bit

Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Tested-by: Myron Stowe <myron.stowe@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD486.9020704@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-23 13:16:18 -08:00
Yinghai Lu
56d91f132c x86, acpi: Add MAX_LOCAL_APIC for 32bit
We should use MAX_LOCAL_APIC for max apic ids and MAX_APICS as number
of local apics.

Also apic_version[] array should use MAX_LOCAL_APICs.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD464.2020408@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-23 13:15:53 -08:00
Ingo Molnar
26e20a108c Merge commit 'v2.6.37-rc7' into x86/security 2010-12-23 09:48:41 +01:00
Don Zickus
4a7863cc2e x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR
The x86 arch has shifted its use of the nmi_watchdog from a
local implementation to the global one provide by
kernel/watchdog.c.  This shift has caused a whole bunch of
compile problems under different config options.  I attempt to
simplify things with the patch below.

In order to simplify things, I had to come to terms with the
meaning of two terms ARCH_HAS_NMI_WATCHDOG and
CONFIG_HARDLOCKUP_DETECTOR.  Basically they mean the same thing,
the former on a local level and the latter on a global level.

With the old x86 nmi watchdog gone, there is no need to rely on
defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't
make sense any more.  x86 will now use the global
implementation.

The changes below do a few things.  First it changes the few
places that relied on ARCH_HAS_NMI_WATCHDOG to use
CONFIG_X86_LOCAL_APIC (the former was an alias for the latter
anyway, so nothing unusual here).  Those pieces of code were
relying more on local apic functionality the nmi watchdog
functionality, so the change should make sense.

Second, I removed the x86 implementation of
touch_nmi_watchdog().  It isn't need now, instead x86 will rely
on kernel/watchdog.c's implementation.

Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from
x86.  And tweaked the include/linux/nmi.h file to tell users to
look for an externally defined touch_nmi_watchdog in the case of
ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This
changes removes some of the ugliness in that file.

Finally, I added a Kconfig dependency for
CONFIG_HARDLOCKUP_DETECTOR that said you can't have
ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR.  You can
only have one nmi_watchdog.

Tested with
ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken
configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig,
(various broken configs)

Hopefully, after this patch I won't get any more compile broken
emails. :-)

v3:
  changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function
  prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-22 22:15:32 +01:00
Jiri Kosina
4b7bd36470 Merge branch 'master' into for-next
Conflicts:
	MAINTAINERS
	arch/arm/mach-omap2/pm24xx.c
	drivers/scsi/bfa/bfa_fcpim.c

Needed to update to apply fixes for which the old branch was too
outdated.
2010-12-22 18:57:02 +01:00
Jack Steiner
d8850ba425 x86, UV: Fix the effect of extra bits in the hub nodeid register
UV systems can be partitioned into multiple independent SSIs.
Large partitioned systems may have extra bits in the node_id
register. These bits are used when the total memory on all SSIs
exceeds 16TB.  These extra bits need to be ignored when
calculating x2apic_extra_bits.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20101130195926.972776133@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-22 12:31:15 +01:00
Jack Steiner
e681041388 x86, UV: Add common uv_early_read_mmr() function for reading MMRs
Early in boot, reading MMRs from the UV hub controller require
calls to early_ioremap()/early_iounmap().  Rather than
duplicating code, add a common function to do the
map/read/unmap.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20101130195926.834804371@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-22 12:31:15 +01:00
Ingo Molnar
6c529a266b Merge commit 'v2.6.37-rc7' into perf/core
Merge reason: Pick up the latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-22 11:53:23 +01:00
Linus Torvalds
55ec86f848 Merge branches 'x86-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32: Make sure we can map all of lowmem if we need to
  x86, vt-d: Handle previous faults after enabling fault handling
  x86: Enable the intr-remap fault handling after local APIC setup
  x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode
  x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic
  x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem()
  bootmem: Add alloc_bootmem_align()
  x86, gcc-4.6: Use gcc -m options when building vdso
  x86: HPET: Chose a paranoid safe value for the ETIME check
  x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix off by one in perf_swevent_init()
  perf: Fix duplicate events with multiple-pmu vs software events
  ftrace: Have recordmcount honor endianness in fn_ELF_R_INFO
  scripts/tags.sh: Add magic for trace-events
  tracing: Fix panic when lseek() called on "trace" opened for writing
2010-12-19 10:44:54 -08:00
Linus Torvalds
46bdfe6a50 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86: avoid high BIOS area when allocating address space
  x86: avoid E820 regions when allocating address space
  x86: avoid low BIOS area when allocating address space
  resources: add arch hook for preventing allocation in reserved areas
  Revert "resources: support allocating space within a region from the top down"
  Revert "PCI: allocate bus resources from the top down"
  Revert "x86/PCI: allocate space from the end of a region, not the beginning"
  Revert "x86: allocate space within a region top-down"
  Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"
  PCI: Update MCP55 quirk to not affect non HyperTransport variants
2010-12-18 10:13:24 -08:00
H. Peter Anvin
7f8595bfac x86, kexec: Limit the crashkernel address appropriately
Keep the crash kernel address below 512 MiB for 32 bits and 896 MiB
for 64 bits.  For 32 bits, this retains compatibility with earlier
kernel releases, and makes it work even if the vmalloc= setting is
adjusted.

For 64 bits, we should be able to increase this substantially once a
hard-coded limit in kexec-tools is fixed.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101217195035.GE14502@redhat.com>
2010-12-17 15:04:00 -08:00
Bjorn Helgaas
a2c606d53a x86: avoid high BIOS area when allocating address space
This prevents allocation of the last 2MB before 4GB.

The experiment described here shows Windows 7 ignoring the last 1MB:
https://bugzilla.kernel.org/show_bug.cgi?id=23542#c27

This patch ignores the top 2MB instead of just 1MB because H. Peter Anvin
says "There will be ROM at the top of the 32-bit address space; it's a fact
of the architecture, and on at least older systems it was common to have a
shadow 1 MiB below."

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:01:30 -08:00
Bjorn Helgaas
4dc2287c18 x86: avoid E820 regions when allocating address space
When we allocate address space, e.g., to assign it to a PCI device, don't
allocate anything mentioned in the BIOS E820 memory map.

On recent machines (2008 and newer), we assign PCI resources from the
windows described by the ACPI PCI host bridge _CRS.  On many Dell
machines, these windows overlap some E820 reserved areas, e.g.,

    BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
    pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]

If we put devices at 0xbff00000, they don't work, probably because
that's really RAM, not I/O memory.  This patch prevents that by removing
the 0xbfe4dc00-0xbfffffff area from the "available" resource.

I'm not very happy with this solution because Windows solves the problem
differently (it seems to ignore E820 reserved areas and it allocates
top-down instead of bottom-up; details at comment 45 of the bugzilla
below).  That means we're vulnerable to BIOS defects that Windows would not
trip over.  For example, if BIOS described a device in ACPI but didn't
mention it in E820, Windows would work fine but Linux would fail.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:01:24 -08:00
Bjorn Helgaas
30919b0bf3 x86: avoid low BIOS area when allocating address space
This implements arch_remove_reservations() so allocate_resource() can
avoid any arch-specific reserved areas.  This currently just avoids the
BIOS area (the first 1MB), but could be used for E820 reserved areas if
that turns out to be necessary.

We previously avoided this area in pcibios_align_resource().  This patch
moves the test from that PCI-specific path to a generic path, so *all*
resource allocations will avoid this area.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:01:17 -08:00
Bjorn Helgaas
5e52f1c5e8 Revert "x86: allocate space within a region top-down"
This reverts commit 1af3c2e45e.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:00:43 -08:00
Tejun Heo
275c8b9328 Merge branch 'this_cpu_ops' into for-2.6.38 2010-12-17 15:16:46 +01:00
Christoph Lameter
b76834bc1b kprobes: Use this_cpu_ops
Use this_cpu ops in various places to optimize per cpu data access.

Cc: Jason Baron <jbaron@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-17 15:07:19 +01:00
H. Peter Anvin
147dd5610c x86-32: Make sure we can map all of lowmem if we need to
A relocatable kernel can be anywhere in lowmem -- and in the case of a
kdump kernel, is likely to be fairly high.  Since the early page
tables map everything from address zero up we need to make sure we
allocate enough brk that we can map all of lowmem if we need to.

Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD3ED.8070607@kernel.org>
2010-12-16 19:11:09 -08:00
Peter Zijlstra
7639dae0ca perf, x86: Provide a PEBS capable cycle event
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16 11:36:44 +01:00
Peter Zijlstra
2e80a82a49 perf: Dynamic pmu types
Extend the perf_pmu_register() interface to allow for named and
dynamic pmu types.

Because we need to support the existing static types we cannot use
dynamic types for everything, hence provide a type argument.

If we want to enumerate the PMUs they need a name, provide one.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101117222056.259707703@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16 11:36:43 +01:00
Peter Zijlstra
4407204c5c perf, x86: Detect broken BIOSes that corrupt the PMU
Some BIOSes use PMU resources, which can cause various bugs:

 - Non-working or erratic PMU based statistics - the PMU can end up
   counting the wrong thing, resulting in misleading statistics

 - Profiling can stop working or it can profile the wrong thing

 - A non-working or erratic NMI watchdog that cannot be relied on

 - The kernel may disturb whatever thing the BIOS tries to use the
   PMU for - possibly causing hardware malfunction in extreme cases.

 - ... and other forms of potential misbehavior

Various forms of such misbehavior has been observed in practice - there are
BIOSes that just corrupt the PMU state, consequences be damned.

The PMU is a CPU resource that is handled by the kernel and the BIOS
stealing+corrupting it is not acceptable nor robust, so we detect it,
warn about it and further refuse to touch the PMU ourselves.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16 11:36:42 +01:00
Ingo Molnar
006b20fe4c Merge branch 'perf/urgent' into perf/core
Merge reason: We want to apply a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16 11:22:27 +01:00
Rusty Russell
da32dac101 lguest: populate initial_page_table
Two x86 patches broke lguest:
1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator.

In lguest, the host places linear page tables at the top of mem, which
used to be enough to get us up to the swapper_pg_dir page tables.  With
the first patch, the direct mapping tables used that memory:

Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000

I initially fixed this by lying about the amount of memory we had, so
the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...

2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table.

This was initialized in a part of head_32.S which isn't executed by
lguest; it is then copied into swapper_pg_dir.  So we have to initialize
it; and anyway we switch to it before we blatt the old tables, so that
fixes the previous damage as well.

For the moment, I cut & pasted the code into lguest's boot code, but
next merge window I will merge them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: x86@kernel.org
2010-12-16 17:03:15 +10:30
Andres Salomon
4722d194e6 x86, of: Define irq functions to allow drivers/of/* to build on x86
- Define a stub irq_create_of_mapping for x86 as a stop-gap solution until
   drivers/of/irq is further along.
 - Define irq_dispose_mapping for x86 to appease of_i2c.c

These are needed to allow stuff in drivers/of/ to build on x86.  This stuff
will eventually get replaced; quoting Grant,

"The long term plan is to have the drivers/of/ code handling the mapping
intelligently like powerpc currently does."  But for now, just provide
these functions.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20101111214526.5de7121b@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-15 17:11:16 -08:00
Kenji Kaneshige
7f7fbf45c6 x86: Enable the intr-remap fault handling after local APIC setup
Interrupt-remapping gets enabled very early in the boot, as it determines the
apic mode that the processor can use. And the current code enables the vt-d
fault handling before the setup_local_APIC(). And hence the APIC LDR registers
and data structure in the memory may not be initialized. So the vt-d fault
handling in logical xapic/x2apic modes were broken.

Fix this by enabling the vt-d fault handling in the end_local_APIC_setup()

A cleaner fix of enabling fault handling while enabling intr-remapping
will be addressed for v2.6.38. [ Enabling intr-remapping determines the
usage of x2apic mode and the apic mode determines the fault-handling
configuration. ]

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <20101201062244.541996375@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org [v2.6.32+]
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-13 16:53:32 -08:00
Kenji Kaneshige
086e8ced65 x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode
In x2apic mode, we need to set the upper address register of the fault
handling interrupt register of the vt-d hardware. Without this
irq migration of the vt-d fault handling interrupt is broken.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org [v2.6.32+]
Acked-by: Chris Wright <chrisw@sous-sol.org>
Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-13 16:52:52 -08:00
Suresh Siddha
3fb82d56ad x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume
During suspend, we disable all the non boot cpus. And during resume we bring
them all back again. So no need to do alternatives_smp_switch() in between.

On my core 2 based laptop, this speeds up the suspend path by 15msec and the
resume path by 5 msec (suspend/resume speed up differences can be attributed
to the different P-states that the cpu is in during suspend/resume).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-13 16:23:56 -08:00
Suresh Siddha
10340ae130 x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem()
Alignment of alloc_bootmem() depends on the value of
L1_CACHE_SHIFT. What we need here, however, is 64 byte alignment.  Use
alloc_bootmem_align() and explicitly specify the alignment instead.

This fixes a kernel boot crash reported by Jody when the cpu in .config
is set to MPENTIUMII but the kernel is booted on a xsave-capable CPU.

Reported-by: Jody Bruchon <jody@nctritech.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20101116212442.059967454@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
2010-12-13 16:13:11 -08:00
Don Zickus
5f29805a4f x86, watchdog: Compile fix when CONFIG_LOCAL_APIC not enabled
When adjusting the code to handle removing the old nmi watchdog,
I forgot to consider the compile case when the local apic is not
enabled.

This change fixes the following build error:

  arch/x86/kernel/apic/hw_nmi.c:28:6: error: redefinition of ‘touch_nmi_watchdog’

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <20101213153719.GD18577@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-13 18:23:23 +01:00
Thomas Gleixner
f1c18071ad x86: HPET: Chose a paranoid safe value for the ETIME check
commit 995bd3bb5 (x86: Hpet: Avoid the comparator readback penalty)
chose 8 HPET cycles as a safe value for the ETIME check, as we had the
confirmation that the posted write to the comparator register is
delayed by two HPET clock cycles on Intel chipsets which showed
readback problems.

After that patch hit mainline we got reports from machines with newer
AMD chipsets which seem to have an even longer delay. See
http://thread.gmane.org/gmane.linux.kernel/1054283 and
http://thread.gmane.org/gmane.linux.kernel/1069458 for further
information.

Boris tried to come up with an ACPI based selection of the minimum
HPET cycles, but this failed on a couple of test machines. And of
course we did not get any useful information from the hardware folks.

For now our only option is to chose a paranoid high and safe value for
the minimum HPET cycles used by the ETIME check. Adjust the minimum ns
value for the HPET clockevent accordingly.

Reported-Bistected-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6>
Cc: Simon Kirby <sim@hostway.ca>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com>
Cc: John Stultz <johnstul@us.ibm.com>
2010-12-13 13:42:44 +01:00
Thomas Gleixner
a8760eca6c x86: Check tsc available/disabled in the delayed init function
The delayed TSC init function does not check whether the system has no
TSC or TSC is disabled at the kernel command line, which results in a
crash in the work queue based extended calibration due to division by
zero because the basic calibration never happened.

Add the missing checks and do not touch TSC when not available or
disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>
2010-12-13 11:35:05 +01:00
Tejun Heo
0aa002fe60 x86: apic: Cleanup and simplify setup_local_APIC()
setup_local_APIC() is used to setup local APIC early during CPU
initialization and already assumes that preemption is disabled on
entry. However, The function unnecessarily disables and enables
preemption and uses smp_processor_id() multiple times in and out of
the nested preemption disabled section. This gives the wrong
impression that the function might be able to handle being called with
preemption enabled and/or migrated to another processor in the middle.

Make it clear that the function is always called with preemption
disabled, drop the confusing preemption disable block and call
smp_processor_id() once at the beginning of the function.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: brgerst@gmail.com
LKML-Reference: <4D00B3B9.7060702@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-10 13:46:26 +01:00
Don Zickus
5dc3055879 x86, NMI: Add back unknown_nmi_panic and nmi_watchdog sysctls
Originally adapted from Huang Ying's patch which moved the
unknown_nmi_panic to the traps.c file.  Because the old nmi
watchdog was deleted before this change happened, the
unknown_nmi_panic sysctl was lost.  This re-adds it.

Also, the nmi_watchdog sysctl was re-implemented and its
documentation updated accordingly.

Patch-inspired-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1291068437-5331-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-10 00:01:06 +01:00
Don Zickus
96a84c20d6 lockup detector: Compile fixes from removing the old x86 nmi watchdog
My patch that removed the old x86 nmi watchdog broke other
arches.  This change reverts a piece of that patch and puts the
change in the correct spot.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: fweisbec@gmail.com
Cc: yinghai@kernel.org
LKML-Reference: <1291068437-5331-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-10 00:01:06 +01:00
Feng Tang
0e3fa13f4e x86: Further simplify mp_irq info handling
assign_to_mp_irq() is copying the struct mpc_intsrc members one by
one. That's silly. Use memcpy() and let the compiler figure it out.
Same for the identical function assign_to_mpc_intsrc()

mp_irq_mpc_intsrc_cmp() is comparing the struct members one by one,
but no caller ever checks the different return codes. Use memcmp()
instead.

Remove the extra printk in MP_ioapic_info()

Signed-off-by: Feng Tang <feng.tang@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Alan Cox <alan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <20101208151857.212f0018@feng-i7>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:06 +01:00
Feng Tang
2d8009ba67 x86: Unify 3 similar ways of saving mp_irqs info
There are 3 places defining similar functions of saving IRQ vector
info into mp_irqs[] array: mmparse/acpi/mrst.

Replace the redundant code by a common function in io_apic.c as it's
only called when CONFIG_X86_IO_APIC=y

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101207133204.4d913c5a@feng-i7>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:06 +01:00
Yinghai Lu
60d79fd99f x86, ioapic: Avoid writing io_apic id if already correct
For 32bit mptable path, setup_ids_from_mpc() always writes the io_apic
id register, even there is no change needed.

Skip the write, when readout and mptable match.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
LKML-Reference: <4CFDF785.7010401@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:05 +01:00
Yinghai Lu
0450193bff x86, x2apic: Don't map lapic addr for preenabled x2apic systems
If x2apic is preenabled and used by the kernel, we don't need to map
the lapic address. That mapping will never be used.

So just skip that in register_lapic_address()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4CFDF69C.9070501@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:05 +01:00
Yinghai Lu
326a2e6bae x86, apic: Use register_lapic_address() in init_apic_mapping()
Remove the printk as well, we don't want to print when nothing
changed. We print in register_lapic_address() already.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4CFDF68A.7020902@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:05 +01:00
Yinghai Lu
f115714163 x86, apic: Remove early_init_lapic_mapping()
It is almost the same as smp_register_lapic_addr(). We just need to
let smp_read_mpc() call smp_register_lapic_addr() when early==1.

Add the apic_printk to smp_register_lapic_address()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4CFDF681.3030509@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:04 +01:00
Yinghai Lu
c0104d38a7 x86, apic: Unify identical register_lapic_address() functions
They are the same, move the common function to apic.c to allow
further cleanups.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4CFDF675.4060305@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 21:52:04 +01:00
Thomas Gleixner
51ddafcbc7 Merge branch 'x86/platform' into x86/apic-cleanups
Reason: apic cleanup series depends on x86/apic, x86/amd-nb and x86/platform

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 18:19:21 +01:00
Thomas Gleixner
d834a9dcec Merge branch 'x86/amd-nb' into x86/apic-cleanups
Reason: apic cleanup series depends on x86/apic, x86/amd-nb x86/platform

Conflicts:
	arch/x86/include/asm/io_apic.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 18:17:25 +01:00
Thomas Gleixner
4720dd1b38 x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n
arch/x86/kernel/apic/io_apic.c: In function 'ack_apic_level':
arch/x86/kernel/apic/io_apic.c:2433: warning: unused variable 'desc'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <201010272107.o9RL7rse018212@imap1.linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 17:43:21 +01:00
Rakib Mullick
2c6cb1053a x86: Address 'unused' warning in hw_nmi.c again
arch/x86/kernel/apic/hw_nmi.c:29: warning: backtrace_mask defined but not used

commit 0e2af2a9(x86, hw_nmi: Move backtrace_mask declaration under
ARCH_HAS_NMI_WATCHDOG) addressed this warning, but it was reintroduced
by commit 5f2b0ba4(x86, nmi_watchdog: Remove the old nmi_watchdog).

Move backtrace_mask into the #ifdef arch_trigger_all_cpu_backtrace
section again.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <AANLkTi=rcc38QzoKa6LFy4m++-p_9=Zt4_kDQE=GeKxf@mail.gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-09 16:06:52 +01:00
Peter Zijlstra
c079c791c5 perf, amd: Remove the nb lock
Since all the hotplug stuff is serialized by the hotplug mutex,
do away with the amd_nb_lock.

Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08 20:16:30 +01:00
Linus Torvalds
6313e3c217 Merge branches 'x86-fixes-for-linus', 'perf-fixes-for-linus' and 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/pvclock: Zero last_value on resume

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf record: Fix eternal wait for stillborn child
  perf header: Don't assume there's no attr info if no sample ids is provided
  perf symbols: Figure out start address of kernel map from kallsyms
  perf symbols: Fix kallsyms kernel/module map splitting

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  nohz: Fix printk_needs_cpu() return value on offline cpus
  printk: Fix wake_up_klogd() vs cpu hotplug
2010-12-08 06:40:59 -08:00
Ingo Molnar
10a18d7dc0 Merge commit 'v2.6.37-rc5' into perf/core
Merge reason: Pick up the latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-07 07:49:51 +01:00
Feng Tang
991cfffa7c x86, earlyprintk: Move mrst early console to platform/ and fix a typo
Move the code to arch/x86/platform/mrst/. Also fix a typo to use
the correct config option: ONFIG_EARLY_PRINTK_MRST

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: alan@linux.intel.com
LKML-Reference: <1291348298-21263-1-git-send-email-feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-06 20:52:04 +01:00
Masami Hiramatsu
f984ba4eb5 kprobes: Use text_poke_smp_batch for unoptimizing
Use text_poke_smp_batch() on unoptimization path for reducing
the number of stop_machine() issues. If the number of
unoptimizing probes is more than MAX_OPTIMIZE_PROBES(=256),
kprobes unoptimizes first MAX_OPTIMIZE_PROBES probes and kicks
optimizer for remaining probes.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20101203095434.2961.22657.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:32 +01:00
Masami Hiramatsu
cd7ebe2298 kprobes: Use text_poke_smp_batch for optimizing
Use text_poke_smp_batch() in optimization path for reducing
the number of stop_machine() issues. If the number of optimizing
probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes optimizes
first MAX_OPTIMIZE_PROBES probes and kicks optimizer for
remaining probes.

Changes in v5:
- Use kick_kprobe_optimizer() instead of directly calling
  schedule_delayed_work().
- Rescheduling optimizer outside of kprobe mutex lock.

Changes in v2:
- Allocate code buffer and parameters in arch_init_kprobes()
  instead of using static arraies.
- Merge previous max optimization limit patch into this patch.
  So, this patch introduces upper limit of optimization at
  once.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20101203095428.2961.8994.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:31 +01:00
Masami Hiramatsu
7deb18dcf0 x86: Introduce text_poke_smp_batch() for batch-code modifying
Introduce text_poke_smp_batch(). This function modifies several
text areas with one stop_machine() on SMP. Because calling
stop_machine() is heavy task, it is better to aggregate
text_poke requests.

( Note: I've talked with Rusty about this interface, and
  he would not like to expand stop_machine() interface, since
  it is not for generic use. )

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:31 +01:00
Masami Hiramatsu
6274de4984 kprobes: Support delayed unoptimizing
Unoptimization occurs when a probe is unregistered or disabled,
and is heavy because it recovers instructions by using
stop_machine(). This patch delays unoptimization operations and
unoptimize several probes at once by using
text_poke_smp_batch(). This can avoid unexpected system slowdown
coming from stop_machine().

Changes in v5:
- Split this patch into several cleanup patches and this patch.
- Fix some text_mutex lock miss.
- Use bool instead of int for behavior flags.
- Add additional comment for (un)optimizing path.

Changes in v2:
- Use dynamic allocated buffers and params.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20101203095409.2961.82733.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-06 17:59:30 +01:00
Feng Tang
e4d2ebcab1 x86, apbt: Setup affinity for apb timers acting as per-cpu timer
Commit a5ef2e70 "x86: Sanitize apb timer interrupt handling" forgot
the affinity setup when cleaning up the code, this patch just
adds the forgotten part

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
LKML-Reference: <1291348298-21263-2-git-send-email-feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-06 15:58:26 +01:00
Sebastian Andrzej Siewior
a38c5380ef x86: io_apic: Split setup_ioapic_ids_from_mpc()
Sodaville needs to setup the IO_APIC ids as the boot loader leaves
them uninitialized. Split out the setter function so it can be called
unconditionally from the sodaville board code.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101126165020.GA26361@www.tglx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-06 14:30:28 +01:00
John Stultz
08ec0c58fb x86: Improve TSC calibration using a delayed workqueue
Boot to boot the TSC calibration may vary by quite a large amount.

While normal variance of 50-100ppm can easily be seen, the quick
calibration code only requires 500ppm accuracy, which is the limit
of what NTP can correct for.

This can cause problems for systems being used as NTP servers, as
every time they reboot it can take hours for them to calculate the
new drift error caused by the calibration.

The classic trade-off here is calibration accuracy vs slow boot times,
as during the calibration nothing else can run.

This patch uses a delayed workqueue  to calibrate the TSC over the
period of a second. This allows very accurate calibration (in my
tests only varying by 1khz or 0.4ppm boot to boot). Additionally this
refined calibration step does not block the boot process, and only
delays the TSC clocksoure registration by a few seconds in early boot.
If the refined calibration strays 1% from the early boot calibration
value, the system will fall back to already calculated early boot
calibration.

Credit to Andi Kleen who suggested using a timer quite awhile back,
but I dismissed it thinking the timer calibration would be done after
the clocksource was registered (which would break things). Forgive
me for my short-sightedness.

This patch has worked very well in my testing, but TSC hardware is
quite varied so it would probably be good to get some extended
testing, possibly pushing inclusion out to 2.6.39.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1289003985-29060-1-git-send-email-johnstul@us.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Clark Williams <williams@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
2010-12-02 16:48:37 -08:00
John Stultz
b0f969009f Merge remote branch 'tip/x86/tsc' into fortglx/2.6.38/tip/x86/tsc
Conflicts:
	Documentation/kernel-parameters.txt
2010-12-02 16:47:52 -08:00
Linus Torvalds
a9e40a2493 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix the software context switch counter
  perf, x86: Fixup Kconfig deps
  x86, perf, nmi: Disable perf if counters are not accessible
  perf: Fix inherit vs. context rotation bug
2010-11-28 12:25:02 -08:00
Jeremy Fitzhardinge
e7a3481c02 x86/pvclock: Zero last_value on resume
If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).

Make sure we zero last_value in that case so that the domain
continues to see clock updates.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-28 09:33:20 +01:00
Linus Torvalds
fbe6c4047f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  dmar, x86: Use function stubs when CONFIG_INTR_REMAP is disabled
  x86-64: Fix and clean up AMD Fam10 MMCONF enabling
  x86: UV: Address interrupt/IO port operation conflict
  x86: Use online node real index in calulate_tbl_offset()
  x86, asm: Fix binutils 2.15 build failure
2010-11-27 07:28:47 +09:00
Linus Torvalds
d2f30c73ab Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf symbols: Remove incorrect open-coded container_of()
  perf record: Handle restrictive permissions in /proc/{kallsyms,modules}
  x86/kprobes: Prevent kprobes to probe on save_args()
  irq_work: Drop cmpxchg() result
  perf: Fix owner-list vs exit
  x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG
  tracing: Fix recursive user stack trace
  perf,hw_breakpoint: Initialize hardware api earlier
  x86: Ignore trap bits on single step exceptions
  tracing: Force arch_local_irq_* notrace for paravirt
  tracing: Fix module use of trace_bprintk()
2010-11-27 07:28:17 +09:00
Peter Zijlstra
004417a6d4 perf, arch: Cleanup perf-pmu init vs lockup-detector
The perf hardware pmu got initialized at various points in the boot,
some before early_initcall() some after (notably arch_initcall).

The problem is that the NMI lockup detector is ran from early_initcall()
and expects the hardware pmu to be present.

Sanitize this by moving all architecture hardware pmu implementations to
initialize at early_initcall() and move the lockup detector to an explicit
initcall right after that.

Cc: paulus <paulus@samba.org>
Cc: davem <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290707759.2145.119.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:14:56 +01:00
Andi Kleen
5ef428c4b5 x86: Set cpu masks before calling CPU_STARTING notifiers
When booting up a CPU set the various topology masks before
calling the CPU_STARTING notifier. This way the notifier
can actually use the masks.

This is needed for a perf change.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290077254-12165-2-git-send-email-andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:14:56 +01:00
Franck Bui-Huu
6c7e550f13 perf: Introduce is_sampling_event()
and use it when appropriate.

Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290525705-6265-1-git-send-email-fbuihuu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:14:54 +01:00
Ingo Molnar
6c869e772c Merge branch 'perf/urgent' into perf/core
Conflicts:
	arch/x86/kernel/apic/hw_nmi.c

Merge reason: Resolve conflict, queue up dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:07:02 +01:00
Ingo Molnar
e4e91ac410 Merge commit 'v2.6.37-rc3' into perf/core
Merge reason: Pick up latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:04:47 +01:00
Don Zickus
33c6d6a7ad x86, perf, nmi: Disable perf if counters are not accessible
In a kvm virt guests, the perf counters are not emulated.  Instead they
return zero on a rdmsrl. The perf nmi handler uses the fact that crossing
a zero means the counter overflowed (for those counters that do not have
specific interrupt bits). Therefore on kvm guests, perf will swallow all
NMIs thinking the counters overflowed.

This causes problems for subsystems like kgdb which needs NMIs to do its
magic. This problem was discovered by running kgdb tests.

The solution is to write garbage into a perf counter during the
initialization and hopefully reading back the same number.  On kvm
guests, the value will be read back as zero and we disable perf as
a result.

Reported-by: Jason Wessel <jason.wessel@windriver.com>
Patch-inspired-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <1290462923-30734-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 15:00:57 +01:00
Thomas Gleixner
9cdca86972 x86: platform: Move iris to x86/platform where it belongs
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-20 10:37:05 +01:00
Ingo Molnar
ae51ce9061 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-11-18 20:07:12 +01:00
Linus Torvalds
2d42dc3feb Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kgdb,ppc: Fix regression in evr register handling
  kgdb,x86: fix regression in detach handling
  kdb: fix crash when KDB_BASE_CMD_MAX is exceeded
  kdb: fix memory leak in kdb_main.c
2010-11-18 08:24:58 -08:00
Hans Rosenfeld
f658bcfb26 x86, cacheinfo: Cleanup L3 cache index disable support
Adaptions to the changes of the AMD northbridge caching code: instead
of a bool in each l3 struct, use a flag in amd_northbridges.flags to
indicate L3 cache index disable support; use a pointer to the whole
northbridge instead of the misc device in the l3 struct; simplify the
initialisation; dynamically generate sysfs attribute array.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:06 +01:00
Hans Rosenfeld
9653a5c76c x86, amd-nb: Cleanup AMD northbridge caching code
Support more than just the "Misc Control" part of the northbridges.
Support more flags by turning "gart_supported" into a single bit flag
that is stored in a flags member. Clean up related code by using a set
of functions (amd_nb_num(), amd_nb_has_feature() and node_to_amd_nb())
instead of accessing the NB data structures directly. Reorder the
initialization code and put the GART flush words caching in a separate
function.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:05 +01:00
Hans Rosenfeld
eec1d4fa00 x86, amd-nb: Complete the rename of AMD NB and related code
Not only the naming of the files was confusing, it was even more so for
the function and variable names.

Renamed the K8 NB and NUMA stuff that is also used on other AMD
platforms. This also renames the CONFIG_K8_NUMA option to
CONFIG_AMD_NUMA and the related file k8topology_64.c to
amdtopology_64.c. No functional changes intended.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:04 +01:00
Soeren Sandmann Pedersen
9c0729dc80 x86: Eliminate bp argument from the stack tracing routines
The various stack tracing routines take a 'bp' argument in which the
caller is supposed to provide the base pointer to use, or 0 if doesn't
have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not
defined, this means all callers in principle should either always pass
0, or be conditional on CONFIG_FRAME_POINTER.

However, there are only really three use cases for stack tracing:

(a) Trace the current task, including IRQ stack if any
(b) Trace the current task, but skip IRQ stack
(c) Trace some other task

In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just
be 0.  If it _is_ defined, then

- in case (a) bp should be gotten directly from the CPU's register, so
  the caller should pass NULL for regs,

- in case (b) the caller should should pass the IRQ registers to
  dump_trace(),

- in case (c) bp should be gotten from the top of the task's stack, so
  the caller should pass NULL for regs.

Hence, the bp argument is not necessary because the combination of
task and regs is sufficient to determine an appropriate value for bp.

This patch introduces a new inline function stack_frame(task, regs)
that computes the desired bp. This function is then called from the
two versions of dump_stack().

Signed-off-by: Soren Sandmann <ssp@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@infradead.org>,
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>,
LKML-Reference: <m3oc9rop28.fsf@dhcp-100-3-82.bos.redhat.com>>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-11-18 14:37:34 +01:00
Jan Beulich
37db6c8f1d x86-64: Fix and clean up AMD Fam10 MMCONF enabling
Candidate memory ranges were not calculated properly (start
addresses got needlessly rounded down, and end addresses didn't
get rounded up at all), address comparison for secondary CPUs
was done on only part of the address, and disabled status wasn't
tracked properly.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <4CE24DF40200007800022737@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 13:41:35 +01:00
Masami Hiramatsu
de31ec8a31 x86/kprobes: Prevent kprobes to probe on save_args()
Prevent kprobes to probe on save_args() since this function
will be called from breakpoint exception handler. That will
cause infinit loop on breakpoint handling.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20101118101655.2779.2816.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 13:40:19 +01:00
matthieu castet
84e1c6bb38 x86: Add RO/NX protection for loadable kernel modules
This patch is a logical extension of the protection provided by
CONFIG_DEBUG_RODATA to LKMs. The protection is provided by
splitting module_core and module_init into three logical parts
each and setting appropriate page access permissions for each
individual section:

 1. Code: RO+X
 2. RO data: RO+NX
 3. RW data: RW+NX

In order to achieve proper protection, layout_sections() have
been modified to align each of the three parts mentioned above
onto page boundary. Next, the corresponding page access
permissions are set right before successful exit from
load_module(). Further, free_module() and sys_init_module have
been modified to set module_core and module_init as RW+NX right
before calling module_free().

By default, the original section layout and access flags are
preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y,
the patch will page-align each group of sections to ensure that
each page contains only one type of content and will enforce
RO/NX for each group of pages.

  -v1: Initial proof-of-concept patch.
  -v2: The patch have been re-written to reduce the number of #ifdefs
       and to make it architecture-agnostic. Code formatting has also
       been corrected.
  -v3: Opportunistic RO/NX protection is now unconditional. Section
       page-alignment is enabled when CONFIG_DEBUG_RODATA=y.
  -v4: Removed most macros and improved coding style.
  -v5: Changed page-alignment and RO/NX section size calculation
  -v6: Fixed comments. Restricted RO/NX enforcement to x86 only
  -v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added
       calls to set_all_modules_text_rw() and set_all_modules_text_ro()
       in ftrace
  -v8: updated for compatibility with linux 2.6.33-rc5
  -v9: coding style fixes
 -v10: more coding style fixes
 -v11: minor adjustments for -tip
 -v12: minor adjustments for v2.6.35-rc2-tip
 -v13: minor adjustments for v2.6.37-rc1-tip

Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F914.9070106@free.fr>
[ minor cleanliness edits, -v14: build failure fix ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 13:32:56 +01:00
Matthieu Castet
5bd5a45266 x86: Add NX protection for kernel data
This patch expands functionality of CONFIG_DEBUG_RODATA to set main
(static) kernel data area as NX.

The following steps are taken to achieve this:

 1. Linker script is adjusted so .text always starts and ends on a page bound
 2. Linker script is adjusted so .rodata always start and end on a page boundary
 3. NX is set for all pages from _etext through _end in mark_rodata_ro.
 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c
 5. bios rom is set to x when pcibios is used.

The results of patch application may be observed in the diff of kernel page
table dumps:

pcibios:

 -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
 ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
  0x00000000-0xc0000000           3G                           pmd
  ---[ Kernel Mapping ]---
 -0xc0000000-0xc0100000           1M     RW             GLB x  pte
 +0xc0000000-0xc00a0000         640K     RW             GLB NX pte
 +0xc00a0000-0xc0100000         384K     RW             GLB x  pte
 -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
 +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
 +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
 -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
 +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
  0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
  0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
  0xf7bfe000-0xf7c00000           8K                           pte

No pcibios:

 -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
 ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
  0x00000000-0xc0000000           3G                           pmd
  ---[ Kernel Mapping ]---
 -0xc0000000-0xc0100000           1M     RW             GLB x  pte
 +0xc0000000-0xc0100000           1M     RW             GLB NX pte
 -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
 +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
 +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
 -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
 +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
  0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
  0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
  0xf7bfe000-0xf7c00000           8K                           pte

The patch has been originally developed for Linux 2.6.34-rc2 x86 by
Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.

 -v1:  initial patch for 2.6.30
 -v2:  patch for 2.6.31-rc7
 -v3:  moved all code into arch/x86, adjusted credits
 -v4:  fixed ifdef, removed credits from CREDITS
 -v5:  fixed an address calculation bug in mark_nxdata_nx()
 -v6:  added acked-by and PT dump diff to commit log
 -v7:  minor adjustments for -tip
 -v8:  rework with the merge of "Set first MB as RW+NX"

Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: James Morris <jmorris@namei.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F82E.60601@free.fr>
[ minor cleanliness edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 12:52:04 +01:00
Dimitri Sivanich
8191c9f692 x86: UV: Address interrupt/IO port operation conflict
This patch for SGI UV systems addresses a problem whereby
interrupt transactions being looped back from a local IOH,
through the hub to a local CPU can (erroneously) conflict with
IO port operations and other transactions.

To workaound this we set a high bit in the APIC IDs used for
interrupts. This bit appears to be ignored by the sockets, but
it avoids the conflict in the hub.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20101116222352.GA8155@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
___

 arch/x86/include/asm/uv/uv_hub.h   |    4 ++++
 arch/x86/include/asm/uv/uv_mmrs.h  |   19 ++++++++++++++++++-
 arch/x86/kernel/apic/x2apic_uv_x.c |   25 +++++++++++++++++++++++--
 arch/x86/platform/uv/tlb_uv.c      |    2 +-
 arch/x86/platform/uv/uv_time.c     |    4 +++-
 5 files changed, 49 insertions(+), 5 deletions(-)
2010-11-18 10:41:25 +01:00
Ingo Molnar
fcf48a725a Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent 2010-11-18 10:37:51 +01:00
Shérab
82148d1d0b x86/platform: Add Eurobraille/Iris power off support
The Iris machines from Eurobraille do not have APM or ACPI support
to shut themselves down properly.  A special I/O sequence is
needed to do so.  This modle runs this I/O sequence at
kernel shutdown when its force parameter is set to 1.

Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
[ did minor coding style edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 10:03:24 +01:00
Kees Cook
79250af2d5 x86: Fix included-by file reference comments
Adjust the paths for files that are including verify_cpu.S.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
LKML-Reference: <1289931004-16066-1-git-send-email-kees.cook@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 09:58:54 +01:00
Tetsuo Handa
96e612ffc3 x86, asm: Fix binutils 2.15 build failure
Add parentheses around one pushl_cfi argument.

Commit df5d1874 "x86: Use {push,pop}{l,q}_cfi in more places"
caused GNU assembler 2.15 (Debian Sarge) to fail. It is still
failing as of commit 07bd8516 "x86, asm: Restore parentheses
around one pushl_cfi argument". This patch solves build failure
with GNU assembler 2.15.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Jan Beulich <jbeulich@novell.com>
Cc: heukelum@fastmail.fm
Cc: hpa@linux.intel.com
LKML-Reference: <201011160445.oAG4jGif079860@www262.sakura.ne.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 09:25:11 +01:00
Rakib Mullick
0e2af2a9ab x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG
backtrace_mask has been used under the code context of
ARCH_HAS_NMI_WATCHDOG. So put it into that context.
We were warned by the following warning:

  arch/x86/kernel/apic/hw_nmi.c:21: warning: ‘backtrace_mask’ defined but not used

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <1289573455-3410-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 09:15:12 +01:00
Don Zickus
072b198a4a x86, nmi_watchdog: Remove all stub function calls from old nmi_watchdog
Now that the bulk of the old nmi_watchdog is gone, remove all
the stub variables and hooks associated with it.

This touches lots of files mainly because of how the io_apic
nmi_watchdog was implemented.  Now that the io_apic nmi_watchdog
is forever gone, remove all its fingers.

Most of this code was not being exercised by virtue of
nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to
risky here.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: gorcunov@openvz.org
LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 09:08:23 +01:00
Don Zickus
5f2b0ba4d9 x86, nmi_watchdog: Remove the old nmi_watchdog
Now that we have a new nmi_watchdog that is more generic and
sits on top of the perf subsystem, we really do not need the old
nmi_watchdog any more.

In addition, the old nmi_watchdog doesn't really work if you are
using the default clocksource, hpet.  The old nmi_watchdog code
relied on local apic interrupts to determine if the cpu is still
alive.  With hpet as the clocksource, these interrupts don't
increment any more and the old nmi_watchdog triggers false
postives.

This piece removes the old nmi_watchdog code and stubs out any
variables and functions calls.  The stubs are the same ones used
by the new nmi_watchdog code, so it should be well tested.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: gorcunov@openvz.org
LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 09:08:23 +01:00
Jason Wessel
10a6e67648 kgdb,x86: fix regression in detach handling
The fix from ba773f7c51
(x86,kgdb: Fix hw breakpoint regression) was not entirely complete.

The kgdb_remove_all_hw_break() function also needs to call the
hw_break_release_slot() or else a breakpoint can get activated again
after the debugger has detached.

The kgdb test suite exposes the behavior in the form of either a hang
or repetitive failure.  The kernel config that exposes the problem
contains all of the following:

CONFIG_DEBUG_RODATA=y
CONFIG_KGDB_TESTS=y
CONFIG_KGDB_TESTS_ON_BOOT=y
CONFIG_KGDB_TESTS_BOOT_STRING="V1F100"

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-11-17 13:54:57 -06:00
Arnd Bergmann
451a3c24b0 BKL: remove extraneous #include <smp_lock.h>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-17 08:59:32 -08:00
Linus Torvalds
25a34554d6 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pvclock: Remove leftover scale_delta() function
  x86, apic: Remove double #include
  x86: Adjust section annotations in AMD Fam10 MMCONF enabling code
  x86, UV: Update node controller MMRs
  x86: Remove unnecessary casts of void ptr returning alloc function return values
  x86: Address gcc4.6 "set but not used" warnings in apic.h
  x86, mm: Fix section mismatch in tlb.c
2010-11-12 08:40:23 -08:00
Frederic Weisbecker
6c0aca288e x86: Ignore trap bits on single step exceptions
When a single step exception fires, the trap bits, used to
signal hardware breakpoints, are in a random state.

These trap bits might be set if another exception will follow,
like a breakpoint in the next instruction, or a watchpoint in the
previous one. Or there can be any junk there.

So if we handle these trap bits during the single step exception,
we are going to handle an exception twice, or we are going to
handle junk.

Just ignore them in this case.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=21332

Reported-by: Michael Stefaniuc <mstefani@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: All since 2.6.33.x <stable@kernel.org>
2010-11-12 14:51:01 +01:00
Dirk Brandewie
37bc9f5078 x86: Ce4100: Add reboot_fixup() for CE4100
This patch adds the CE4100 reboot fixup to reboot_fixups_32.c

[ tglx: Moved PCI id to reboot_fixups_32.c ]

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
LKML-Reference: <5bdcfb4f0206fa721570504e95659a03b815bc5e.1289331834.git.dirk.brandewie@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-12 00:45:41 +01:00
Thomas Gleixner
c751e17b53 x86: Add CE4100 platform support
Add CE4100 platform support. CE4100 needs early setup like
moorestown.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
LKML-Reference: <94720fd7f5564a12ebf202cf2c4f4c0d619aab35.1289331834.git.dirk.brandewie@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-12 00:45:41 +01:00
Kees Cook
ebba638ae7 x86, cpu: Call verify_cpu during 32bit CPU startup
The XD_DISABLE-clearing side-effect needs to happen for both 32bit
and 64bit, but the 32bit init routines were not calling verify_cpu()
yet. This adds that call to gain the side-effect.

The longmode/SSE tests being performed in verify_cpu() need to happen very
early for 64bit but not for 32bit. Instead of including it in two places
for 32bit, we can just include it once in arch/x86/kernel/head_32.S.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-4-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-10 15:43:09 -08:00
Kees Cook
ae84739c27 x86, cpu: Clear XD_DISABLED flag on Intel to regain NX
Intel CPUs have an additional MSR bit to indicate if the BIOS was
configured to disable the NX cpu feature. This bit was traditionally
used for operating systems that did not understand how to handle the
NX bit. Since Linux understands this, this BIOS flag should be ignored
by default.

In a review[1] of reported hardware being used by Ubuntu bug reporters,
almost 10% of systems had an incorrectly configured BIOS, leaving their
systems unable to use the NX features of their CPU.

This change will clear the MSR_IA32_MISC_ENABLE_XD_DISABLE bit so that NX
cannot be inappropriately controlled by the BIOS on Intel CPUs. If, under
very strange hardware configurations, NX actually needs to be disabled,
"noexec=off" can be used to restore the prior behavior.

[1] http://www.outflux.net/blog/archives/2010/02/18/data-mining-for-nx-bit/

Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-3-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-10 15:42:54 -08:00
Kees Cook
c5cbac6942 x86, cpu: Rename verify_cpu_64.S to verify_cpu.S
The code is 32bit already, and can be used in 32bit routines.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-2-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-10 15:42:42 -08:00
Peter Zijlstra
034c6efa46 perf, amd: Use kmalloc_node(,__GFP_ZERO) for northbridge structure allocation
Jasper suggested we use the zeroing capability of the allocators
instead of calling memset ourselves. Add node affinity while we're at
it.

Reported-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 22:58:40 +01:00
Borislav Petkov
c7657ac0c3 x86, microcode, AMD: Cleanup code a bit
get_ucode_data is a memcpy() wrapper which always returns 0. Move it
into the header and make it an inline. Remove all code checking its
return value and turn it into a void.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-10 14:54:54 +01:00
Jesper Juhl
1ea6be212e x86, microcode, AMD: Replace vmalloc+memset with vzalloc
We don't have to do memset() ourselves after vmalloc() when we have
vzalloc(), so change that in
arch/x86/kernel/microcode_amd.c::get_next_ucode().

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-10 14:48:57 +01:00
Kusanagi Kouichi
1f523bf367 x86, pvclock: Remove leftover scale_delta() function
Commit 92580d64e16402762e2acc3022f065397c780425
("x86: pvclock: Move scale_delta into common header")
forgot to remove scale_delta.

Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Cc: Zachary Amsden <zamsden@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Glauber Costa <glommer@redhat.com>
LKML-Reference: <20101105110444.BAF6D6FC03B@msa105.auone-net.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 10:32:15 +01:00
Jesper Juhl
2a8dcbd6cd x86, apic: Remove double #include
Remove the second <asm/atomic.h> inclusion.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
LKML-Reference: <alpine.LNX.2.00.1011072253360.26247@swampdragon.chaosbits.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 10:21:16 +01:00
Jan Beulich
2f62bf7d23 x86: Adjust section annotations in AMD Fam10 MMCONF enabling code
check_enable_amd_mmconf_dmi() gets called only for the BSP,
hence everything hanging off of it can be __init*.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CD2DE1E0200007800020990@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 10:08:26 +01:00
Jack Steiner
62b0cfc240 x86, UV: Update node controller MMRs
A new version of the SGI UV hub node controller is being
developed. A few of the MMRs (control registers) that exist on
the current hub no longer exist on the new hub. Fortunately,
there are alternate MMRs that are are functionally equivalent
and that exist on both hubs.

This patch changes the UV code to use MMRs that exist in BOTH
versions of the hub node controller.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20101106204056.GA27584@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 10:06:38 +01:00
Jesper Juhl
8e5e9521c1 x86: Remove unnecessary casts of void ptr returning alloc function return values
The [vk][cmz]alloc(_node) family of functions return void
pointers which it's completely unnecessary/pointless to cast to
other pointer types since that happens implicitly.

This patch removes such casts from arch/x86.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: trivial@kernel.org
Cc: amd64-microcode@amd64.org
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <alpine.LNX.2.00.1011082310220.23697@swampdragon.chaosbits.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10 09:13:00 +01:00