Commit Graph

16604 Commits

Author SHA1 Message Date
Linus Torvalds
f741656d64 Revert the PVonHVM kexec. The patch introduces a regression
with older hypervisor stacks, such as Xen 4.1.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRHZ7eAAoJEFjIrFwIi8fJZ+sH/ieMkzdBB6aqbFMcNr7mkfBo
 i3swjO2JQI7REYIHfKEVoR3IgHfqKEuABdeEQrceE0XqDepFh84YiKGI2QpPRWEA
 903vUV4DXVdcBrypbL45tSFZ1Jxsrzx+F7WfV/f9WHyeiwOyaZTGVQH0VuOzpcum
 RvPTT7MmC7g8MJDi66SDYBaX/pBQzifQ81nMWWjXNw0w4CwWX7le1cScZEP42MR6
 jTEHzYMLDojdO+2aQM5pt/0CGI5tzBHtX5nNRl6tovlPI3ckknYYx6a7RfxkfZzF
 IkMIuGS32yLfsswPPIiMs47/Qgiq3BN6eSTJXMZKUwQokL9yEs8LodcnRDYfgyQ=
 =fqcJ
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.8-rc7-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull xen fixes from Konrad Rzeszutek Wilk:
 "Two fixes:

   - A simple bug-fix for redundant NULL check.

   - CVE-2013-0228/XSA-42: x86/xen: don't assume %ds is usable in
     xen_iret for 32-bit PVOPS

  and two reverts:

   - Revert the PVonHVM kexec.  The patch introduces a regression with
     older hypervisor stacks, such as Xen 4.1."

* tag 'stable/for-linus-3.8-rc7-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  Revert "xen PVonHVM: use E820_Reserved area for shared_info"
  Revert "xen/PVonHVM: fix compile warning in init_hvm_pv_info"
  xen: remove redundant NULL check before unregister_and_remove_pcpu().
  x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.
2013-02-15 12:12:55 -08:00
Konrad Rzeszutek Wilk
e9daff24a2 Revert "xen PVonHVM: use E820_Reserved area for shared_info"
This reverts commit 9d02b43dee.

We are doing this b/c on 32-bit PVonHVM with older hypervisors
(Xen 4.1) it ends up bothing up the start_info. This is bad b/c
we use it for the time keeping, and the timekeeping code loops
forever - as the version field never changes. Olaf says to
revert it, so lets do that.

Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-02-14 21:29:31 -05:00
Konrad Rzeszutek Wilk
5eb65be2d9 Revert "xen/PVonHVM: fix compile warning in init_hvm_pv_info"
This reverts commit a7be94ac8d.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-02-14 21:29:27 -05:00
Satoru Takeuchi
1de63d60cd efi: Clear EFI_RUNTIME_SERVICES rather than EFI_BOOT by "noefi" boot parameter
There was a serious problem in samsung-laptop that its platform driver is
designed to run under BIOS and running under EFI can cause the machine to
become bricked or can cause Machine Check Exceptions.

    Discussion about this problem:
    https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557
    https://bugzilla.kernel.org/show_bug.cgi?id=47121

    The patches to fix this problem:
    efi: Make 'efi_enabled' a function to query EFI facilities
    83e6818974

    samsung-laptop: Disable on EFI hardware
    e0094244e4

Unfortunately this problem comes back again if users specify "noefi" option.
This parameter clears EFI_BOOT and that driver continues to run even if running
under EFI. Refer to the document, this parameter should clear
EFI_RUNTIME_SERVICES instead.

Documentation/kernel-parameters.txt:
===============================================================================
...
	noefi		[X86] Disable EFI runtime services support.
...
===============================================================================

Documentation/x86/x86_64/uefi.txt:
===============================================================================
...
- If some or all EFI runtime services don't work, you can try following
  kernel command line parameters to turn off some or all EFI runtime
  services.
	noefi		turn off all EFI runtime services
...
===============================================================================

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/511C2C04.2070108@jp.fujitsu.com
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-13 17:24:11 -08:00
Jan Beulich
13d2b4d11d x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.
This fixes CVE-2013-0228 / XSA-42

Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
in 32bit PV guest can use to crash the > guest with the panic like this:

-------------
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]

Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
EIP is at xen_iret+0x12/0x2b
EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
 DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
Stack:
 00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
Call Trace:
Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
general protection fault: 0000 [#2]
---[ end trace ab0d29a492dcd330 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1250, comm: r Tainted: G      D    ---------------
2.6.32-356.el6.i686 #1
Call Trace:
 [<c08476df>] ? panic+0x6e/0x122
 [<c084b63c>] ? oops_end+0xbc/0xd0
 [<c084b260>] ? do_general_protection+0x0/0x210
 [<c084a9b7>] ? error_code+0x73/
-------------

Petr says: "
 I've analysed the bug and I think that xen_iret() cannot cope with
 mangled DS, in this case zeroed out (null selector/descriptor) by either
 xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
 entry was invalidated by the reproducer. "

Jan took a look at the preliminary patch and came up a fix that solves
this problem:

"This code gets called after all registers other than those handled by
IRET got already restored, hence a null selector in %ds or a non-null
one that got loaded from a code or read-only data descriptor would
cause a kernel mode fault (with the potential of crashing the kernel
as a whole, if panic_on_oops is set)."

The way to fix this is to realize that the we can only relay on the
registers that IRET restores. The two that are guaranteed are the
%cs and %ss as they are always fixed GDT selectors. Also they are
inaccessible from user mode - so they cannot be altered. This is
the approach taken in this patch.

Another alternative option suggested by Jan would be to relay on
the subtle realization that using the %ebp or %esp relative references uses
the %ss segment.  In which case we could switch from using %eax to %ebp and
would not need the %ss over-rides. That would also require one extra
instruction to compensate for the one place where the register is used
as scaled index. However Andrew pointed out that is too subtle and if
further work was to be done in this code-path it could escape folks attention
and lead to accidents.

Reviewed-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Petr Matousek <pmatouse@redhat.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-02-13 15:40:30 -05:00
Mel Gorman
0ee364eb31 x86/mm: Check if PUD is large when validating a kernel address
A user reported the following oops when a backup process reads
/proc/kcore:

 BUG: unable to handle kernel paging request at ffffbb00ff33b000
 IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
 [...]

 Call Trace:
  [<ffffffff811b8aaa>] read_kcore+0x17a/0x370
  [<ffffffff811ad847>] proc_reg_read+0x77/0xc0
  [<ffffffff81151687>] vfs_read+0xc7/0x130
  [<ffffffff811517f3>] sys_read+0x53/0xa0
  [<ffffffff81449692>] system_call_fastpath+0x16/0x1b

Investigation determined that the bug triggered when reading
system RAM at the 4G mark. On this system, that was the first
address using 1G pages for the virt->phys direct mapping so the
PUD is pointing to a physical address, not a PMD page.

The problem is that the page table walker in kern_addr_valid() is
not checking pud_large() and treats the physical address as if
it was a PMD.  If it happens to look like pmd_none then it'll
silently fail, probably returning zeros instead of real data. If
the data happens to look like a present PMD though, it will be
walked resulting in the oops above.

This patch adds the necessary pud_large() check.

Unfortunately the problem was not readily reproducible and now
they are running the backup program without accessing
/proc/kcore so the patch has not been validated but I think it
makes sense.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.coM>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-13 10:02:55 +01:00
Stoney Wang
cb214ede76 x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systems
When a HP ProLiant DL980 G7 Server boots a regular kernel,
there will be intermittent lost interrupts which could
result in a hang or (in extreme cases) data loss.

The reason is that this system only supports x2apic physical
mode, while the kernel boots with a logical-cluster default
setting.

This bug can be worked around by specifying the "x2apic_phys" or
"nox2apic" boot option, but we want to handle this system
without requiring manual workarounds.

The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table.
As all apicids are smaller than 255, BIOS need to pass the
control to the OS with xapic mode, according to x2apic-spec,
chapter 2.9.

Current code handle x2apic when BIOS pass with xapic mode
enabled:

When user specifies x2apic_phys, or FADT indicates PHYSICAL:

1. During madt oem check, apic driver is set with xapic logical
   or xapic phys driver at first.

2. enable_IR_x2apic() will enable x2apic_mode.

3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe()
   will install the correct x2apic phys driver and use x2apic phys mode.
   Otherwise it will skip the driver will let x2apic_cluster_probe to
   take over to install x2apic cluster driver (wrong one) even though FADT
   indicates PHYSICAL, because x2apic_phys_probe does not check
   FADT PHYSICAL.

Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the
problem.

Signed-off-by: Stoney Wang <song-bo.wang@hp.com>
[ updated the changelog and simplified the code ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-11 11:13:00 +01:00
Kees Cook
e575a86fdc x86: Do not leak kernel page mapping locations
Without this patch, it is trivial to determine kernel page
mappings by examining the error code reported to dmesg[1].
Instead, declare the entire kernel memory space as a violation
of a present page.

Additionally, since show_unhandled_signals is enabled by
default, switch branch hinting to the more realistic
expectation, and unobfuscate the setting of the PF_PROT bit to
improve readability.

[1] http://vulnfactory.org/blog/2013/02/06/a-linux-memory-trick/

Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130207174413.GA12485@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-07 19:57:44 +01:00
H. Peter Anvin
bb9b1a834f Retract MCE-specific UAPI exports which are unused and shouldn't be
used.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ7XZeAAoJEBLB8Bhh3lVKzAAQAKJPmNM4jK3o7QixkySFOKAN
 GKDmciwBCSLDvhDU5bXSKEzhxaWq+/Wp8aTy0u28TUtIq+Dnya8+9xAMZqe56lNv
 slBNk1N2IIJBqZoul1FCY7MyBnss19nh2h/jN9PChcY9dGpgkRnKc3rkjjy4UJ10
 YibUm/wLuzNGCskt6nGO4uVSDXyv5W6PCnpOb6IW5KFywHE0ojsVwZQM5Vnakjrd
 vsyicHHaIXSnRtyDWGZSX4JxY7HDK7+v03OHNvO3FtFzNjTJ8r1qU7LVbC1yBPP6
 uFN+piBUYn6q6Gsy67H1FcufGTz0IzFwyx/akW1MkgdN4FggrLiFf0a10Fh0Dx3s
 jd2tuCv72rfOQUZAYg9kobHk1NecCZt0Wt3NWul3WjZHZC35nFgznnUiznAPhoCI
 /sBKE2VQt0CejigNTIEpaToc/kHuQy3RzmMphknROEehXdm79TqdirNNHKrDkYug
 U9SV4nsOaRFcSZDPXK52xFIEb9Uc90zSWqNblhBRXt7sBC34EqZk95jYDgOrZxHQ
 v8iNieYgOezXzIyyUAHC7dlIUbkHlBsPG+8eEoWFcvOBI3UbGvOv3abMO18yBZqr
 28An+v8pMsT9Q5sUwHFwvblD/vDaj/NIO0OP+y40Z96KnXz8Cm4mpkzBvBdVVxUs
 cmaDId47QRqdvIW2mBZZ
 =2Zr4
 -----END PGP SIGNATURE-----

Merge tag 'ras_for_3.8' into x86/urgent

Retract MCE-specific UAPI exports which are unused and shouldn't be
used.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-06 14:18:53 -08:00
Linus Torvalds
3f4e5aacf7 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Three small fixlets"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/cacheinfo: Shut up annoying warning
  x86, doc: Boot protocol 2.12 is in 3.8
  x86-64: Replace left over sti/cli in ia32 audit exit code
2013-02-05 07:59:44 +11:00
Linus Torvalds
51c1abb95f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Three fixlets and two small (and low risk) hw-enablement changes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix event group context move
  x86/perf: Add IvyBridge EP support
  perf/x86: Fix P6 driver section warning
  arch/x86/tools/insn_sanity.c: Identify source of messages
  perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom support
2013-02-05 07:57:09 +11:00
Borislav Petkov
f76e39c531 x86/intel/cacheinfo: Shut up annoying warning
I've been getting the following warning when doing randbuilds
since forever. Now it finally pissed me off just the perfect
amount so that I can fix it.

  arch/x86/kernel/cpu/intel_cacheinfo.c:489:27: warning: ‘cache_disable_0’ defined but not used [-Wunused-variable]
  arch/x86/kernel/cpu/intel_cacheinfo.c:491:27: warning: ‘cache_disable_1’ defined but not used [-Wunused-variable] arch/x86/kernel/cpu/intel_cacheinfo.c:524:27: warning: ‘subcaches’ defined but not used [-Wunused-variable]

It happens because in randconfigs where CONFIG_SYSFS is not set,
the whole sysfs-interface to L3 cache index disabling is
remaining unused and gcc correctly warns about it. Make it
optional, depending on CONFIG_SYSFS too, as is the case with
other sysfs-related machinery in this file.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/1359969195-27362-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-04 11:29:52 +01:00
H. Peter Anvin
b5831174f9 Linux 3.8-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRCxWdAAoJEHm+PkMAQRiG3bAH/28D2NbRLIDo6QIzk3seCCh3
 A6vqDWbX671JRa0RO38DrWhqv6L/EisvjNZMVxBNaN635+3+yC7wsKziXYxA4+AL
 Ef9JISiwXykRWwrC8Q34YBWfWgeFTmaau71IRv45x2OTwiijd6cvjXhLDrOhHEGL
 rdAGJxIdBOfuASw1zpn9PxzfAtg1j/FGP03B4yy+JFhYzuDp3pvnDIysAnXefLml
 UheuBELdNf8Jx7NtW80PTa+TQtruWPC8otoiJevV4TrAWmAOctJiM1izL+VOZqqD
 I/v5aGf9mCN+cxLq06imwgEHWmP1I7yDdjjTHrr4wwIMws1vg8PNsJqG3AsPcwM=
 =dR8S
 -----END PGP SIGNATURE-----

Merge tag 'v3.8-rc6' into x86/urgent

Linux 3.8-rc6

Merged in order to add a documentation update versus new code in
upstream.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 20:22:57 -08:00
Jan Beulich
40a1ef95da x86-64: Replace left over sti/cli in ia32 audit exit code
For some reason they didn't get replaced so far by their
paravirt equivalents, resulting in code to be run with
interrupts disabled that doesn't expect so (causing, in the
observed case, a BUG_ON() to trigger) when syscall auditing is
enabled.

David (Cc-ed) came up with an identical fix, so likely this can
be taken to count as an ack from him.

Reported-by: Peter Moody <pmoody@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/5108E01902000078000BA9C5@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Tested-by: Peter Moody <pmoody@google.com>
2013-01-31 10:36:01 +01:00
Linus Torvalds
04c2eee5b9 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI fixes from Peter Anvin:
 "This is a collection of fixes for the EFI support.  The controversial
  bit here is a set of patches which bumps the boot protocol version as
  part of fixing some serious problems with the EFI handover protocol,
  used when booting under EFI using a bootloader as opposed to directly
  from EFI.  These changes should also make it a lot saner to support
  cross-mode 32/64-bit EFI booting in the future.  Getting these changes
  into 3.8 means we avoid presenting an inconsistent ABI to bootloaders.

  Other changes are display detection and fixing efivarfs."

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: remove attribute check from setup_efi_pci
  x86, build: Dynamically find entry points in compressed startup code
  x86, efi: Fix PCI ROM handing in EFI boot stub, in 32-bit mode
  x86, efi: Fix 32-bit EFI handover protocol entry point
  x86, efi: Fix display detection in EFI boot stub
  x86, boot: Define the 2.12 bzImage boot protocol
  x86/boot: Fix minor fd leakage in tools/relocs.c
  x86, efi: Set runtime_version to the EFI spec revision
  x86, efi: fix 32-bit warnings in setup_efi_pci()
  efivarfs: Delete dentry from dcache in efivarfs_file_write()
  efivarfs: Never return ENOENT from firmware
  efi, x86: Pass a proper identity mapping in efi_call_phys_prelog
  efivarfs: Drop link count of the right inode
2013-01-31 17:10:36 +11:00
Linus Torvalds
bdb0ae6a76 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This is a collection of miscellaneous fixes, the most important one is
  the fix for the Samsung laptop bricking issue (auto-blacklisting the
  samsung-laptop driver); the efi_enabled() changes you see below are
  prerequisites for that fix.

  The other issues fixed are booting on OLPC XO-1.5, an UV fix, NMI
  debugging, and requiring CAP_SYS_RAWIO for MSR references, just as
  with I/O port references."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  samsung-laptop: Disable on EFI hardware
  efi: Make 'efi_enabled' a function to query EFI facilities
  smp: Fix SMP function call empty cpu mask race
  x86/msr: Add capabilities check
  x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES
  x86/olpc: Fix olpc-xo1-sci.c build errors
  arch/x86/platform/uv: Fix incorrect tlb flush all issue
  x86-64: Fix unwind annotations in recent NMI changes
  x86-32: Start out cr0 clean, disable paging before modifying cr3/4
2013-01-31 17:08:43 +11:00
H. Peter Anvin
becbd66080 Various urgent EFI fixes and some warning cleanups for v3.8
* EFI boot stub fix for Macbook Pro's from Maarten Lankhorst
   * Fix an oops in efivarfs from Lingzhu Xiang
   * 32-bit warning cleanups from Jan Beulich
   * Patch to Boot on >512GB RAM systems from Nathan Zimmer
   * Set efi.runtime_version correctly
   * efivarfs updates
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRCBrMAAoJEC84WcCNIz1VTdcP/2u3ZqohOKJAwwMkyzB3nkrQ
 1mhxKGFDitAAvGQQCOq3oIMgBZHOevKznH3hZtX+hxBxwu7AuNL+qw6Baz8GYZpz
 guFvAZjm2JX2ko1PgtNvPUFZ1krw7TObLW2YstTWhSDoOlRK5kqmA+idaJf1aHDe
 /cwV6Mr6u5N/egyBBcQI1ydKLA6ogmx1zfDsS9b2Vzavw168RGqfrpH3ybcokYND
 /E2NtcRVZagBw35eZHEDNKcoPt5z+skCA4nJyA6bLbxMsq51ZKaK0PKKaA8vd70s
 6Pc7d6zkQG/ZmaxrRfsdQUAYfJRJq/cpeTgS4YurkZB0r0gdxk6I86vYlg+xXi0X
 eqLAkUJJJasVY/1NK/c2vsJ03W9wDYkd2IJpUcl7rWz7Aa/RurY32QmT3SnLop7m
 Tzj3CgXAu/RH8FyMNMWpI85tOis7OcMUfrjmnxquQdCZpLXSsh7Rf5EgBRiv9xhH
 txDOX3y21Jnv2A5efAVWm5EbyI204Wq2nVDzSu0xTMXWkzdBg+/OeyYfzV0Sdguf
 3/MzYTn7mVXh/EZtnvsTyNjgvVxzpXW6mAf+ne9iJaC8MUJVIeSjB7xzSfuHXUBU
 aUc9OnbkHRJCdVSeKqZbLwO3X5mTXqmDMfIcRle3BPewvZ9pOEv8VrGgsNxh9ixW
 JaCpiTdxJDFtz6cLVsNa
 =QrJx
 -----END PGP SIGNATURE-----

Merge tag 'efi-for-3.8' into x86/efi

Various urgent EFI fixes and some warning cleanups for v3.8

  * EFI boot stub fix for Macbook Pro's from Maarten Lankhorst
  * Fix an oops in efivarfs from Lingzhu Xiang
  * 32-bit warning cleanups from Jan Beulich
  * Patch to Boot on >512GB RAM systems from Nathan Zimmer
  * Set efi.runtime_version correctly
  * efivarfs updates

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-30 14:43:05 -08:00
Matt Fleming
83e6818974 efi: Make 'efi_enabled' a function to query EFI facilities
Originally 'efi_enabled' indicated whether a kernel was booted from
EFI firmware. Over time its semantics have changed, and it now
indicates whether or not we are booted on an EFI machine with
bit-native firmware, e.g. 64-bit kernel with 64-bit firmware.

The immediate motivation for this patch is the bug report at,

    https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557

which details how running a platform driver on an EFI machine that is
designed to run under BIOS can cause the machine to become
bricked. Also, the following report,

    https://bugzilla.kernel.org/show_bug.cgi?id=47121

details how running said driver can also cause Machine Check
Exceptions. Drivers need a new means of detecting whether they're
running on an EFI machine, as sadly the expression,

    if (!efi_enabled)

hasn't been a sufficient condition for quite some time.

Users actually want to query 'efi_enabled' for different reasons -
what they really want access to is the list of available EFI
facilities.

For instance, the x86 reboot code needs to know whether it can invoke
the ResetSystem() function provided by the EFI runtime services, while
the ACPI OSL code wants to know whether the EFI config tables were
mapped successfully. There are also checks in some of the platform
driver code to simply see if they're running on an EFI machine (which
would make it a bad idea to do BIOS-y things).

This patch is a prereq for the samsung-laptop fix patch.

Cc: David Airlie <airlied@linux.ie>
Cc: Corentin Chary <corentincj@iksaif.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Peter Jones <pjones@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Steve Langasek <steve.langasek@canonical.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-30 11:51:59 -08:00
Maarten Lankhorst
739701888f x86, efi: remove attribute check from setup_efi_pci
It looks like the original commit that copied the rom contents from
efi always copied the rom, and the fixup in setup_efi_pci from commit
886d751a2e ("x86, efi: correct precedence of operators in
setup_efi_pci") broke that.

This resulted in macbook pro's no longer finding the rom images, and
thus not being able to use the radeon card any more.

The solution is to just remove the check for now, and always copy the
rom if available.

Reported-by: Vitaly Budovski <vbudovski+news@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-29 17:52:06 +00:00
David Woodhouse
99f857db88 x86, build: Dynamically find entry points in compressed startup code
We have historically hard-coded entry points in head.S just so it's easy
to build the executable/bzImage headers with references to them.

Unfortunately, this leads to boot loaders abusing these "known" addresses
even when they are *explicitly* told that they "should look at the ELF
header to find this address, as it may change in the future". And even
when the address in question *has* actually been changed in the past,
without fanfare or thought to compatibility.

Thus we have bootloaders doing stunningly broken things like jumping
to offset 0x200 in the kernel startup code in 64-bit mode, *hoping*
that startup_64 is still there (it has moved at least once
before). And hoping that it's actually a 64-bit kernel despite the
fact that we don't give them any indication of that fact.

This patch should hopefully remove the temptation to abuse internal
addresses in future, where sternly worded comments have not sufficed.
Instead of having hard-coded addresses and saying "please don't abuse
these", we actually pull the addresses out of the ELF payload into
zoffset.h, and make build.c shove them back into the right places in
the bzImage header.

Rather than including zoffset.h into build.c and thus having to rebuild
the tool for every kernel build, we parse it instead. The parsing code
is small and simple.

This patch doesn't actually move any of the interesting entry points, so
any offending bootloader will still continue to "work" after this patch
is applied. For some version of "work" which includes jumping into the
compressed payload and crashing, if the bzImage it's given is a 32-bit
kernel. No change there then.

[ hpa: some of the issues in the description are addressed or
  retconned by the 2.12 boot protocol.  This patch has been edited to
  only remove fixed addresses that were *not* thus retconned. ]

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
David Woodhouse
b607e21267 x86, efi: Fix PCI ROM handing in EFI boot stub, in 32-bit mode
The 'Attributes' argument to pci->Attributes() function is 64-bit. So
when invoking in 32-bit mode it takes two registers, not just one.

This fixes memory corruption when booting via the 32-bit EFI boot stub.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
David Woodhouse
f791620fa7 x86, efi: Fix 32-bit EFI handover protocol entry point
If the bootloader calls the EFI handover entry point as a standard function
call, then it'll have a return address on the stack. We need to pop that
before calling efi_main(), or the arguments will all be out of position on
the stack.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
David Woodhouse
70a479cbe8 x86, efi: Fix display detection in EFI boot stub
When booting under OVMF we have precisely one GOP device, and it
implements the ConOut protocol.

We break out of the loop when we look at it... and then promptly abort
because 'first_gop' never gets set. We should set first_gop *before*
breaking out of the loop. Yes, it doesn't really mean "first" any more,
but that doesn't matter. It's only a flag to indicate that a suitable
GOP was found.

In fact, we'd do just as well to initialise 'width' to zero in this
function, then just check *that* instead of first_gop. But I'll do the
minimal fix for now (and for stable@).

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
H. Peter Anvin
09c205afde x86, boot: Define the 2.12 bzImage boot protocol
Define the 2.12 bzImage boot protocol: add xloadflags and additional
fields to allow the command line, initramfs and struct boot_params to
live above the 4 GiB mark.

The xloadflags now communicates if this is a 64-bit kernel with the
legacy 64-bit entry point and which of the EFI handover entry points
are supported.

Avoid adding new read flags to loadflags because of claimed
bootloaders testing the whole byte for == 1 to determine bzImageness
at least until the issue can be researched further.

This is based on patches by Yinghai Lu and David Woodhouse.

Originally-by: Yinghai Lu <yinghai@kernel.org>
Originally-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
Cc: Rob Landley <rob@landley.net>
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
2013-01-27 15:56:37 -08:00
Cong Ding
65315d4889 x86/boot: Fix minor fd leakage in tools/relocs.c
The opened file should be closed.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Cc: Kusanagi Kouichi <slash@ac.auone-net.jp>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1358183628-27784-1-git-send-email-dinggnu@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-27 10:24:28 -08:00
Matt Fleming
712ba9e9af x86, efi: Set runtime_version to the EFI spec revision
efi.runtime_version is erroneously being set to the value of the
vendor's firmware revision instead of that of the implemented EFI
specification. We can't deduce which EFI functions are available based
on the revision of the vendor's firmware since the version scheme is
likely to be unique to each vendor.

What we really need to know is the revision of the implemented EFI
specification, which is available in the EFI System Table header.

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-25 12:00:16 +00:00
Jan Beulich
bc754790f9 x86, efi: fix 32-bit warnings in setup_efi_pci()
Fix four similar build warnings on 32-bit (casts between different
size pointers and integers).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Stefan Hasko <hasko.stevo@gmail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-25 10:22:53 +00:00
Alan Cox
c903f0456b x86/msr: Add capabilities check
At the moment the MSR driver only relies upon file system
checks. This means that anything as root with any capability set
can write to MSRs. Historically that wasn't very interesting but
on modern processors the MSRs are such that writing to them
provides several ways to execute arbitary code in kernel space.
Sample code and documentation on doing this is circulating and
MSR attacks are used on Windows 64bit rootkits already.

In the Linux case you still need to be able to open the device
file so the impact is fairly limited and reduces the security of
some capability and security model based systems down towards
that of a generic "root owns the box" setup.

Therefore they should require CAP_SYS_RAWIO to prevent an
elevation of capabilities. The impact of this is fairly minimal
on most setups because they don't have heavy use of
capabilities. Those using SELinux, SMACK or AppArmor rules might
want to consider if their rulesets on the MSR driver could be
tighter.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Horses <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 17:37:51 +01:00
Maarten Lankhorst
73b664ceb5 x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES
I ran out of free entries when I had CONFIG_DMA_API_DEBUG
enabled. Some other archs seem to default to 65536, so increase
this limit for x86 too.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/50A612AA.7040206@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
----
2013-01-24 17:34:18 +01:00
Youquan Song
923d8697e2 x86/perf: Add IvyBridge EP support
Running the perf utility on a Ivybridge EP server we encounter
"not supported" events:

   <not supported> L1-dcache-loads
   <not supported> L1-dcache-load-misses
   <not supported> L1-dcache-stores
   <not supported> L1-dcache-store-misses
   <not supported> L1-dcache-prefetches
   <not supported> L1-dcache-prefetch-misses

This patch adds support for this processor.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Youquan Song <youquan.song@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1355851223-27705-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:14:04 +01:00
yangyongqiang
9faec5be3a perf/x86: Fix P6 driver section warning
Fix a compile warning - 'a section type conflict' by removing
__initconst.

Signed-off-by: yangyongqiang <yangyongqiang01@baidu.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:04:56 +01:00
Randy Dunlap
ed8e47fefc x86/olpc: Fix olpc-xo1-sci.c build errors
Fix build errors when CONFIG_INPUT=m.  This is not pretty, but
all of the OLPC kconfig options are bool instead of tristate.

  arch/x86/built-in.o: In function `send_lid_state':
    olpc-xo1-sci.c:(.text+0x1d323): undefined reference to `input_event'
    olpc-xo1-sci.c:(.text+0x1d338): undefined reference to `input_event'
  ...

In the long run, fixing this driver kconfig to be tristate
instead of bool would be a very good change.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Chris Ball <cjb@laptop.org>
Cc: Jon Nettleton <jon.nettleton@gmail.com>
Cc: Daniel Drake <dsd@laptop.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:00:23 +01:00
Alex Shi
57c4f43043 arch/x86/platform/uv: Fix incorrect tlb flush all issue
The flush tlb optimization code has logical issue on UV
platform.  It doesn't flush the full range at all, since it
simply ignores its 'end' parameter (and hence also the "all"
indicator) in uv_flush_tlb_others() function.

Cliff's notes:

 | I tested the patch on a UV.  It has the effect of either
 | clearing 1 or all TLBs in a cpu.  I added some debugging to
 | test for the cases when clearing all TLBs is overkill, and in
 | practice it happens very seldom.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Tested-by: Cliff Wickman <cpw@sgi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 15:58:54 +01:00
Andrew Morton
55a6e622e6 arch/x86/tools/insn_sanity.c: Identify source of messages
The kernel build prints:

  Building modules, stage 2.
  TEST    posttest
  MODPOST 3821 modules
  TEST    posttest
 Success: decoded and checked 1000000 random instructions with 0
 errors (seed:0xaac4bc47)   CC      arch/x86/boot/a20.o
  CC      arch/x86/boot/cmdline.o
  AS      arch/x86/boot/copy.o
  HOSTCC  arch/x86/boot/mkcpustr
  CC      arch/x86/boot/cpucheck.o
  CC      arch/x86/boot/early_serial_console.o

which is irritating because you don't know what program is
proudly pronouncing its success.

So, as described in "console mode programming user interface
guidelines version 101" which doesn't exist, change this program
to identify the source of its messages.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 15:57:53 +01:00
ShuoX Liu
0927b482ae perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom support
These three chip are based on Atom and have different model id.
So add such three id for perf HW event support.

Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Cc: yanmin_zhang@intel.linux.com
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1356713324-12442-1-git-send-email-shuox.liu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 15:10:03 +01:00
Jan Beulich
444723dccc x86-64: Fix unwind annotations in recent NMI changes
While in one case a plain annotation is necessary, in the other
case the stack adjustment can simply be folded into the
immediately preceding RESTORE_ALL, thus getting the correct
annotation for free.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander van Heukelum <heukelum@mailshack.com>
Link: http://lkml.kernel.org/r/51010C9302000078000B9045@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 10:56:32 +01:00
Linus Torvalds
ed06ef318a perf/urgent fixes:
. revert 20b279 - require exclude_guest to use PEBS - kernel side,
   now older binaries will continue working for things like cycles:pp
   without needing to pass extra modifiers, from David Ahern.
 
 . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI, from
   Sebastian Andrzej Siewior
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ7yQhAAoJENZQFvNTUqpAjj4P/RR+8WeXpV02yzndhry+Yjav
 L/WQH0CkRRmyUA5akTcpgFrohJCEOi+auHl7ivmDX4XWFavcAkX3H1Yz1FytCOkb
 Cbb9Lv1rGdlTno8fTVUn8mNyTG64AlWrAw3ICixWw6q6I/k6SO7EkKigCxPhmY+2
 BE2EvkZmOY/PEUXgM6HtUdifORatX48p1toS7p3CDQ31cxBN5OVNZUXa1FakJpyH
 7R+1imKLsjuyi/G7Bt061LyWQkOh7L/ITWN+5Rx4RsUwRRT3vm1H9nlqUBsPS0PW
 qfkktkCmn/cFpKbfBipuglnt16jHPMfI/pghKvzx8n2uJMNEGXbfFDJefpzcdih9
 wIRgB6a5bvA8VF6Xpcn0I5JhqLAcnWTer07JgjZevjqYCdZStpbJjvE5131JjTLw
 Dnm7UshE+VFBcA3iXNX64p/X7WDJSk+SIDsJDuNe57dktFVLw76Ibb55XG18Ex7e
 c9QcIEhD1P19VzOniDZQZNEJhqnu5Vjle/eG+JRVRCm/BgoQFyJuD3EooKPN7hHR
 Op4oqf5RhDf7XNH0+Y4rOdRMZRiumdfcEl6kdcQGPJPycxpD7xCJNzBLWK/BvQgT
 Kl0AEkRC0KE2c5LyFttW+g1Byu1rctlMz2TVJDTskTm0XGOQ9mTzsQBP5rgBVt+b
 hQsMMWNVSI+jfm8bTJwx
 =LTjZ
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

 . revert 20b279 - require exclude_guest to use PEBS - kernel side, now
   older binaries will continue working for things like cycles:pp
   without needing to pass extra modifiers, from David Ahern.

 . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI,
   from Sebastian Andrzej Siewior

[ Pulling directly, Ingo would normally pull but has been unresponsive ]

* tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tools: Fix building from 'make perf-*-src-pkg' tarballs
  perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
2013-01-22 14:32:07 -08:00
Oleg Nesterov
9899d11f65 ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack.  However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.

set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.

As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it.  Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.

Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().

While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 10:08:00 -08:00
H. Peter Anvin
021ef050fc x86-32: Start out cr0 clean, disable paging before modifying cr3/4
Patch

  5a5a51db78 x86-32: Start out eflags and cr4 clean

... made x86-32 match x86-64 in that we initialize %eflags and %cr4
from scratch.  This broke OLPC XO-1.5, because the XO enters the
kernel with paging enabled, which the kernel doesn't expect.

Since we no longer support 386 (the source of most of the variability
in %cr0 configuration), we can simply match further x86-64 and
initialize %cr0 to a fixed value -- the one variable part remaining in
%cr0 is for FPU control, but all that is handled later on in
initialization; in particular, configuring %cr0 as if the FPU is
present until proven otherwise is correct and necessary for the probe
to work.

To deal with the XO case sanely, explicitly disable paging in %cr0
before we muck with %cr3, %cr4 or EFER -- those operations are
inherently unsafe with paging enabled.

NOTE: There is still a lot of 386-related junk in head_32.S which we
can and should get rid of, however, this is intended as a minimal fix
whereas the cleanup can be deferred to the next merge window.

Reported-by: Andres Salomon <dilinger@queued.net>
Tested-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/50FA0661.2060400@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-19 11:01:22 -08:00
Linus Torvalds
5c69bed266 Fixes:
- CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels)
  - Fix racy vma access spotted by Al Viro
  - Fix mmap batch ioctl potentially resulting in large O(n) page allcations.
  - Fix vcpu online/offline BUG:scheduling while atomic..
  - Fix unbound buffer scanning for more than 32 vCPUs.
  - Fix grant table being incorrectly initialized
  - Fix incorrect check in pciback
  - Allow privcmd in backend domains.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJQ+L7qAAoJEFjIrFwIi8fJLNIH/jUsneraEggWeh0L4GGWZvWL
 cNCf0zjQt/pi1Q5drbleW2/6Wv6s6N1QA9pGRsJ+rrliC73HVTqIWFh0TjpwmCVy
 hZal7jDXOuFVIR7GbGEPn004T6mkEnYDb/O2fyojwMVg0NQYwtMYJfTBkKdjKnmV
 z6sWpQPVqO3/nZ17k2DipYRldbeiqS6LLOiUWd72b2W8bV4ySY5iVPVsqFusSEr6
 PNyW33RPs5H0jEPR1uJlLD+l/uIbENykpEPeAS2uHGlch129+xHH5h79dwYJTbw6
 x5nAOveO9VNJscUoqhpE7YbySzJmrUwxnBerZ6YTW6WCknYXrx4uiVAlfWem7uY=
 =26Sq
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fixes from Konrad Rzeszutek Wilk:
 - CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels)
 - Fix racy vma access spotted by Al Viro
 - Fix mmap batch ioctl potentially resulting in large O(n) page allcations.
 - Fix vcpu online/offline BUG:scheduling while atomic..
 - Fix unbound buffer scanning for more than 32 vCPUs.
 - Fix grant table being incorrectly initialized
 - Fix incorrect check in pciback
 - Allow privcmd in backend domains.

Fix up whitespace conflict due to ugly merge resolution in Xen tree in
arch/arm/xen/enlighten.c

* tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.
  Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic."
  xen/gntdev: remove erronous use of copy_to_user
  xen/gntdev: correctly unmap unlinked maps in mmu notifier
  xen/gntdev: fix unsafe vma access
  xen/privcmd: Fix mmap batch ioctl.
  Xen: properly bound buffer access when parsing cpu/*/availability
  xen/grant-table: correctly initialize grant table version 1
  x86/xen : Fix the wrong check in pciback
  xen/privcmd: Relax access control in privcmd_ioctl_mmap
2013-01-18 12:02:52 -08:00
Nathan Zimmer
b8f2c21db3 efi, x86: Pass a proper identity mapping in efi_call_phys_prelog
Update efi_call_phys_prelog to install an identity mapping of all available
memory.  This corrects a bug on very large systems with more then 512 GB in
which bios would not be able to access addresses above not in the mapping.

The result is a crash that looks much like this.

BUG: unable to handle kernel paging request at 000000effd870020
IP: [<0000000078bce331>] 0x78bce330
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU 0
Pid: 0, comm: swapper/0 Tainted: G        W    3.8.0-rc1-next-20121224-medusa_ntz+ #2 Intel Corp. Stoutland Platform
RIP: 0010:[<0000000078bce331>]  [<0000000078bce331>] 0x78bce330
RSP: 0000:ffffffff81601d28  EFLAGS: 00010006
RAX: 0000000078b80e18 RBX: 0000000000000004 RCX: 0000000000000004
RDX: 0000000078bcf958 RSI: 0000000000002400 RDI: 8000000000000000
RBP: 0000000078bcf760 R08: 000000effd870000 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000000000c3 R12: 0000000000000030
R13: 000000effd870000 R14: 0000000000000000 R15: ffff88effd870000
FS:  0000000000000000(0000) GS:ffff88effe400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000effd870020 CR3: 000000000160c000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81614400)
Stack:
 0000000078b80d18 0000000000000004 0000000078bced7b ffff880078b81fff
 0000000000000000 0000000000000082 0000000078bce3a8 0000000000002400
 0000000060000202 0000000078b80da0 0000000078bce45d ffffffff8107cb5a
Call Trace:
 [<ffffffff8107cb5a>] ? on_each_cpu+0x77/0x83
 [<ffffffff8102f4eb>] ? change_page_attr_set_clr+0x32f/0x3ed
 [<ffffffff81035946>] ? efi_call4+0x46/0x80
 [<ffffffff816c5abb>] ? efi_enter_virtual_mode+0x1f5/0x305
 [<ffffffff816aeb24>] ? start_kernel+0x34a/0x3d2
 [<ffffffff816ae5ed>] ? repair_env_string+0x60/0x60
 [<ffffffff816ae2be>] ? x86_64_start_reservations+0xba/0xc1
 [<ffffffff816ae120>] ? early_idt_handlers+0x120/0x120
 [<ffffffff816ae419>] ? x86_64_start_kernel+0x154/0x163
Code:  Bad RIP value.
RIP  [<0000000078bce331>] 0x78bce330
 RSP <ffffffff81601d28>
CR2: 000000effd870020
---[ end trace ead828934fef5eab ]---

Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-18 09:43:43 +00:00
Andrew Cooper
9174adbee4 xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.
This fixes CVE-2013-0190 / XSA-40

There has been an error on the xen_failsafe_callback path for failed
iret, which causes the stack pointer to be wrong when entering the
iret_exc error path.  This can result in the kernel crashing.

In the classic kernel case, the relevant code looked a little like:

        popl %eax      # Error code from hypervisor
        jz 5f
        addl $16,%esp
        jmp iret_exc   # Hypervisor said iret fault
5:      addl $16,%esp
                       # Hypervisor said segment selector fault

Here, there are two identical addls on either option of a branch which
appears to have been optimised by hoisting it above the jz, and
converting it to an lea, which leaves the flags register unaffected.

In the PVOPS case, the code looks like:

        popl_cfi %eax         # Error from the hypervisor
        lea 16(%esp),%esp     # Add $16 before choosing fault path
        CFI_ADJUST_CFA_OFFSET -16
        jz 5f
        addl $16,%esp         # Incorrectly adjust %esp again
        jmp iret_exc

It is possible unprivileged userspace applications to cause this
behaviour, for example by loading an LDT code selector, then changing
the code selector to be not-present.  At this point, there is a race
condition where it is possible for the hypervisor to return back to
userspace from an interrupt, fault on its own iret, and inject a
failsafe_callback into the kernel.

This bug has been present since the introduction of Xen PVOPS support
in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-16 16:17:42 -05:00
Linus Torvalds
2409c873be Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This is mainly a workaround for a bug in Sandy Bridge graphics which
  causes corruption of certain memory pages."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
  x86/Sandy Bridge: mark arrays in __init functions as __initconst
  x86/Sandy Bridge: reserve pages when integrated graphics is present
  x86, efi: correct precedence of operators in setup_efi_pci
2013-01-16 09:11:50 -08:00
Konrad Rzeszutek Wilk
d55bf532d7 Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic."
This reverts commit 41bd956de3.

The fix is incorrect and not appropiate for the latest kernels.
In fact it _causes_ the BUG: scheduling while atomic while
doing vCPU hotplug.

Suggested-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-15 22:41:27 -05:00
Konrad Rzeszutek Wilk
7bcc1ec077 Linux 3.7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJQxqj1AAoJEHm+PkMAQRiG9MQH/j21UwP2QGpdpXbWAnFMjtlv
 uE/yCFhPoqR1QjjE6oRlO6MHFA41xGDbr5RQki9Ik2AfSYiastt4ZWYvtSJKVTCr
 O0Lj+Cdt/2qBkGiARHqVEBZ4S/l/cw4/EHPb5StFyu3ggnPPQhoPIP7oAmRn0+mh
 NNb5CEcJOLqIaJSteqMP71Q899ncbLayBnimYCaC2f6r00beqNXIqxSHipcPlUsf
 ehNxqCX+5z5Q788EL33EL8GpBcy4Ueevu6nvnuVI8qIEnBnrBVngsiaQ4Hti+2eK
 A//4DYoF2N1wLjQv7hFeiwMURQ16OlxXoc/Z66sv2QQRwPxOIQlxdhWuey4KebA=
 =7LYr
 -----END PGP SIGNATURE-----

Merge tag 'v3.7' into stable/for-linus-3.8

Linux 3.7

* tag 'v3.7': (833 commits)
  Linux 3.7
  Input: matrix-keymap - provide proper module license
  Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage
  ipv4: ip_check_defrag must not modify skb before unsharing
  Revert "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended"
  inet_diag: validate port comparison byte code to prevent unsafe reads
  inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
  inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
  inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
  mm: vmscan: fix inappropriate zone congestion clearing
  vfs: fix O_DIRECT read past end of block device
  net: gro: fix possible panic in skb_gro_receive()
  tcp: bug fix Fast Open client retransmission
  tmpfs: fix shared mempolicy leak
  mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones
  mm: compaction: validate pfn range passed to isolate_freepages_block
  mmc: sh-mmcif: avoid oops on spurious interrupts (second try)
  Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts"
  mmc: sdhci-s3c: fix missing clock for gpio card-detect
  lib/Makefile: Fix oid_registry build dependency
  ...

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Conflicts:
	arch/arm/xen/enlighten.c
	drivers/xen/Makefile

[We need to have the v3.7 base as the 'for-3.8' was based off v3.7-rc3
and there are some patches in v3.7-rc6 that we to have in our branch]
2013-01-15 15:58:25 -05:00
H. Peter Anvin
e43b3cec71 x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
early_pci_allowed() and read_pci_config_16() are only available if
CONFIG_PCI is defined.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2013-01-13 20:58:57 -08:00
H. Peter Anvin
ab3cd8670e x86/Sandy Bridge: mark arrays in __init functions as __initconst
Mark static arrays as __initconst so they get removed when the init
sections are flushed.

Reported-by: Mathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/75F4BEE6-CB0E-4426-B40B-697451677738@googlemail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-13 20:36:39 -08:00
Jesse Barnes
a9acc5365d x86/Sandy Bridge: reserve pages when integrated graphics is present
SNB graphics devices have a bug that prevent them from accessing certain
memory ranges, namely anything below 1M and in the pages listed in the
table.  So reserve those at boot if set detect a SNB gfx device on the
CPU to avoid GPU hangs.

Stephane Marchesin had a similar patch to the page allocator awhile
back, but rather than reserving pages up front, it leaked them at
allocation time.

[ hpa: made a number of stylistic changes, marked arrays as static
  const, and made less verbose; use "memblock=debug" for full
  verbosity. ]

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-11 14:26:38 -08:00
Linus Torvalds
ccae663cd4 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM bugfixes from Marcelo Tosatti.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: use dynamic percpu allocations for shared msrs area
  KVM: PPC: Book3S HV: Fix compilation without CONFIG_PPC_POWERNV
  powerpc: Corrected include header path in kvm_para.h
  Add rcu user eqs exception hooks for async page fault
2013-01-10 09:05:18 -08:00
David Ahern
a706d965dc perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
This patch is brought to you by the letter 'H'.

Commit 20b279 breaks compatiblity with older perf binaries when run with
precise modifier (:p or :pp) by requiring the exclude_guest attribute to be
set. Older binaries default exclude_guest to 0 (ie., wanting guest-based
samples) unless host only profiling is requested (:H modifier). The workaround
for older binaries is to add H to the modifier list (e.g., -e cycles:ppH -
toggles exclude_guest to 1). This was deemed unacceptable by Linus:

https://lkml.org/lkml/2012/12/12/570

Between family in town and the fresh snow in Breckenridge there is no time left
to be working on the proper fix for this over the holidays. In the New Year I
have more pressing problems to resolve -- like some memory leaks in perf which
are proving to be elusive -- although the aforementioned snow is probably why
they are proving to be elusive. Either way I do not have any spare time to work
on this and from the time I have managed to spend on it the solution is more
difficult than just moving to a new exclude_guest flag (does not work) or
flipping the logic to include_guest (which is not as trivial as one would
think).

So, two options: silently force exclude_guest on as suggested by Gleb which
means no impact to older perf binaries or revert the original patch which
caused the breakage.

This patch does the latter -- reverts the original patch that introduced the
regression. The problem can be revisited in the future as time allows.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/r/1356749767-17322-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-10 09:21:19 -03:00