Commit Graph

152 Commits

Author SHA1 Message Date
Jan Beulich
9b35589756 [PATCH] x86: simplify notify_page_fault()
Remove all parameters from this function that aren't really variable.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:23 +01:00
Rohit Seth
53fee04f31 [PATCH] x86-64: Fix fake numa for x86_64 machines with big IO hole
This patch resolves the issue of running with numa=fake=X on kernel command
line on x86_64 machines that have big IO hole.  While calculating the size
of each node now we look at the total hole size in that range.

Previously there were nodes that only had IO holes in them causing kernel
boot problems.  We now use the NODE_MIN_SIZE (64MB) as the minimum size of
memory that any node must have.  We reduce the number of allocated nodes if
the number of nodes specified on kernel command line results in any node
getting memory smaller than NODE_MIN_SIZE.

This change allows the extra memory to be incremented in NODE_MIN_SIZE
granule and uniformly distribute among as many nodes (called big nodes) as
possible.

[akpm@osdl.org: build fix]
Signed-off-by: David Rientjes <reintjes@google.com>
Signed-off-by: Paul Menage <menage@google.com>
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:22 +01:00
Amul Shah
54413927f0 [PATCH] x86-64: x86_64-make-the-numa-hash-function-nodemap-allocation fix fix
- Removed an extraneous debug message from allocate_cachealigned_map

- Changed extract_lsb_from_nodes to return 63 for the case where there was
  only one memory node.  The prevents the creation of the dynamic hashmap.

- Changed extract_lsb_from_nodes to use only the starting memory address of
  a node.  On an ES7000, our nodes overlap the starting and ending address,
  meaning, that we see nodes like

	00000 - 10000
	10000 - 20000

  But other systems have nodes whose start and end addresses do not overlap.
   For example:

	00000 - 0FFFF
	10000 - 1FFFF

  In this case, using the ending address will result in an LSB much lower
  than what is possible.  In this case an LSB of 1 when in reality it should
  be 16.

Cc: Andi Kleen <ak@suse.de>
Cc: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:20 +01:00
Amul Shah
076422d2af [PATCH] x86-64: Allocate the NUMA hash function nodemap dynamically
Remove the statically allocated memory to NUMA node hash map in favor of a
dynamically allocated memory to node hash map (it is cache aligned).

This patch has the nice side effect in that it allows the hash map to grow
for systems with large amounts of memory (256GB - 1TB), but suffer from
having small PCI space tacked onto the boot node (which is somewhere
between 192MB to 512MB on the ES7000).

Signed-off-by: Amul Shah <amul.shah@unisys.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:19 +01:00
Kirill Korotaev
cefc8be824 [PATCH] Consolidate bust_spinlocks()
Part of long forgotten patch
http://groups.google.com/group/fa.linux.kernel/msg/e98e941ce1cf29f6?dmode=source
Since then, m32r grabbed two copies.

Leave s390 copy because of important absence of CONFIG_VT, but remove
references to non-existent timerlist_lock.  ia64 also loses timerlist_lock.

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:34 -08:00
Alexey Starikovskiy
15a58ed121 ACPICA: Remove duplicate table definitions (non-conflicting), cont
Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-02 21:14:29 -05:00
Ernie Petrides
103efcd9aa [PATCH] x86-64: fix perms/range of vsyscall vma in /proc/*/maps
The final line of /proc/<pid>/maps on x86_64 for native 64-bit
tasks shows an incorrect ending address and incorrect permissions.  There
is only a single page mapped in this vsyscall region, and it is accessible
for both read and execute.

The patch below fixes this.  (Since 32-bit-compat tasks have a real vma
with correct perms/range, no change is necessary for that scenario.)

Before the patch, a "cat /proc/self/maps | tail -1" shows this:

        ffffffffff600000-ffffffffffe00000 ---p 00000000 [...]

After the patch, this is the output:

        ffffffffff600000-ffffffffff601000 r-xp 00000000 [...]

Signed-off-by: Ernie Petrides <petrides@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:09 +01:00
Andi Kleen
ab2bf0c1c6 [PATCH] x86-64: Use probe_kernel_address in arch/x86_64/*
Instead of open coded __get_user

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:06 +01:00
Andi Kleen
ea7322decb [PATCH] x86-64: Speed and clean up cache flushing in change_page_attr
CLFLUSH is a lot faster than WBINVD so avoid the later if at all
possible.

Always pass the complete list of pages to other CPUs to cut down
the number of IPIs.

Minor other cleanup and sync with i386 version.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:05 +01:00
Yasunori Goto
8243229f09 [PATCH] x86_64: fix memory hotplug build with NUMA=n
This is to fix compile error of x86-64 memory hotplug without any NUMA
option.

  CC      arch/x86_64/mm/init.o
arch/x86_64/mm/init.c:501: error: redefinition of 'memory_add_physaddr_to_nid'
include/linux/memory_hotplug.h:71: error: previous definition of 'memory_add_phys
addr_to_nid' was here
arch/x86_64/mm/init.c:509: error: redefinition of 'memory_add_physaddr_to_nid'
arch/x86_64/mm/init.c:501: error: previous definition of 'memory_add_physaddr_to_
nid' was here

I confirmed compile completion with !NUMA, (NUMA & !ACPI_NUMA),
or (NUMA & ACPI_NUMA).

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-20 09:42:05 -08:00
Andi Kleen
5e58a02a8f [PATCH] x86-64: Handle reserve_bootmem_generic beyond end_pfn
This can happen on kexec kernels with some configurations, in particularly
on Unisys ES7000 systems.

Analysis by Amul Shah

Cc: Amul Shah <amul.shah@unisys.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
keith mannthey
926fafebc4 [PATCH] x86-64: x86_64 hot-add memory srat.c fix
This patch corrects the logic used in srat.c to figure out what
parsing what action to take when registering hot-add areas.  Hot-add
areas should only be added to the node information for the
MEMORY_HOTPLUG_RESERVE case.  When booting MEMORY_HOTPLUG_SPARSE hot-add
areas on everything but the last node are getting include in the node
data and during kernel boot the pages are setup then the kernel dies
when the pages are used. This patch fixes this issue.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:01 +02:00
Mel Gorman
6391af174a [PATCH] mm: use symbolic names instead of indices for zone initialisation
Arch-independent zone-sizing is using indices instead of symbolic names to
offset within an array related to zones (max_zone_pfns).  The unintended
impact is that ZONE_DMA and ZONE_NORMAL is initialised on powerpc instead
of ZONE_DMA and ZONE_HIGHMEM when CONFIG_HIGHMEM is set.  As a result, the
the machine fails to boot but will boot with CONFIG_HIGHMEM turned off.

The following patch properly initialises the max_zone_pfns[] array and uses
symbolic names instead of indices in each architecture using
arch-independent zone-sizing.  Two users have successfully booted their
powerpcs with it (one an ibook G4).  It has also been boot tested on x86,
x86_64, ppc64 and ia64.  Please merge for 2.6.19-rc2.

Credit to Benjamin Herrenschmidt for identifying the bug and rolling the
first fix.  Additional credit to Johannes Berg and Andreas Schwab for
reporting the problem and testing on powerpc.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11 11:14:14 -07:00
Haavard Skinnemoen
16c564bb3c [PATCH] Generic ioremap_page_range: x86_64 conversion
Convert x86_64 to use generic ioremap_page_range()

[akpm@osdl.org: build fix]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:32 -07:00
Keith Mannthey
45e0b78b05 [PATCH] hot-add-mem x86_64: use CONFIG_MEMORY_HOTPLUG_RESERVE
The api for hot-add memory already has a construct for finding nodes based on
an address, memory_add_physaddr_to_nid.  This patch allows the fucntion to do
something besides return 0.  It uses the nodes_add infomation to lookup to
node info for a hot add event.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Keith Mannthey
8c2676a587 [PATCH] hot-add-mem x86_64: memory_add_physaddr_to_nid node fixup
In cases where the acpi memory-add event does not containe the pxm (node)
infomation allow the driver to look up node info based on the address.  The
acpi_get_node call returns -1 if it can't decode the pxm info, this causes
add_memory to panic.  acpi_get_node would have to decode the resource from the
handle (a lenghty proposition).  This seems to be the cleanist point to
interject the hook.

[kamezawa.hiroyu@jp.fujitsu.com: build fixes]
[y-goto@jp.fujitsu.com: build fixes]
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Keith Mannthey
4942e998b4 [PATCH] hot-add-mem x86_64: memory_add_physaddr_to_nid enable
The api for hot-add memory already has a construct for finding nodes based on
an address, memory_add_physaddr_to_nid.  This patch allows the fucntion to do
something besides return 0.  It uses the nodes_add infomation to lookup to
node info for a hot add event.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Keith Mannthey
71efa8fdc5 [PATCH] hot-add-mem x86_64: Enable SPARSEMEM in srat.c
Enable x86_64 srat.c to share code between both reserve and sparsemem based
add memory paths.  Both paths need the hot-add area node locality infomration
(nodes_add).  This code refactors the code path to allow this.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Sukadev Bhattiprolu
f400e198b2 [PATCH] pidspace: is_init()
This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280).  It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().

Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.

Eric's original description:

	There are a lot of places in the kernel where we test for init
	because we give it special properties.  Most  significantly init
	must not die.  This results in code all over the kernel test
	->pid == 1.

	Introduce is_init to capture this case.

	With multiple pid spaces for all of the cases affected we are
	looking for only the first process on the system, not some other
	process that has pid == 1.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:12 -07:00
Jason Baron
df67b3daea [PATCH] make PROT_WRITE imply PROT_READ
Make PROT_WRITE imply PROT_READ for a number of architectures which don't
support write only in hardware.

While looking at this, I noticed that some architectures which do not
support write only mappings already take the exact same approach.  For
example, in arch/alpha/mm/fault.c:

"
        if (cause < 0) {
                if (!(vma->vm_flags & VM_EXEC))
                        goto bad_area;
        } else if (!cause) {
                /* Allow reads even for write-only mappings */
                if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                        goto bad_area;
        } else {
                if (!(vma->vm_flags & VM_WRITE))
                        goto bad_area;
        }
"

Thus, this patch brings other architectures which do not support write only
mappings in-line and consistent with the rest.  I've verified the patch on
ia64, x86_64 and x86.

Additional discussion:

Several architectures, including x86, can not support write-only mappings.
The pte for x86 reserves a single bit for protection and its two states are
read only or read/write.  Thus, write only is not supported in h/w.

Currently, if i 'mmap' a page write-only, the first read attempt on that page
creates a page fault and will SEGV.  That check is enforced in
arch/blah/mm/fault.c.  However, if i first write that page it will fault in
and the pte will be set to read/write.  Thus, any subsequent reads to the page
will succeed.  It is this inconsistency in behavior that this patch is
attempting to address.  Furthermore, if the page is swapped out, and then
brought back the first read will also cause a SEGV.  Thus, any arbitrary read
on a page can potentially result in a SEGV.

According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
implementation may also allow read access." Also as mentioned, some
archtectures, such as alpha, shown above already take the approach that i am
suggesting.

The counter-argument to this raised by Arjan, is that the kernel is enforcing
the write only mapping the best it can given the h/w limitations.  This is
true, however Alan Cox, and myself would argue that the inconsitency in
behavior, that is applications can sometimes work/sometimes fails is highly
undesireable.  If you read through the thread, i think people, came to an
agreement on the last patch i posted, as nobody has objected to it...

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:05 -07:00
Mel Gorman
fb01439c5b [PATCH] Allow an arch to expand node boundaries
Arch-independent zone-sizing determines the size of a node
(pgdat->node_spanned_pages) based on the physical memory that was
registered by the architecture.  However, when
CONFIG_MEMORY_HOTPLUG_RESERVE is set, the architecture expects that the
spanned_pages will be much larger and that mem_map will be allocated that
is used lated on memory hot-add.

This patch allows an architecture that sets CONFIG_MEMORY_HOTPLUG_RESERVE
to call push_node_boundaries() which will set the node beginning and end to
at *least* the requested boundary.

Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:12 -07:00
Mel Gorman
9c7cd6877c [PATCH] Account for holes that are outside the range of physical memory
absent_pages_in_range() made the assumption that users of the API would not
care about holes beyound the end of physical memory.  This was not the
case.  This patch will account for ranges outside of physical memory as
holes correctly.

Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Mel Gorman
0e0b864e06 [PATCH] Account for memmap and optionally the kernel image as holes
The x86_64 code accounted for memmap and some portions of the the DMA zone as
holes.  This was because those areas would never be reclaimed and accounting
for them as memory affects min watermarks.  This patch will account for the
memmap as a memory hole.  Architectures may optionally use set_dma_reserve()
if they wish to account for a portion of memory in ZONE_DMA as a hole.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Mel Gorman
5cb248abf5 [PATCH] Have x86_64 use add_active_range() and free_area_init_nodes
Size zones and holes in an architecture independent manner for x86_64.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Linus Torvalds
b278240839 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
  [PATCH] Don't set calgary iommu as default y
  [PATCH] i386/x86-64: New Intel feature flags
  [PATCH] x86: Add a cumulative thermal throttle event counter.
  [PATCH] i386: Make the jiffies compares use the 64bit safe macros.
  [PATCH] x86: Refactor thermal throttle processing
  [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
  [PATCH] Fix unwinder warning in traps.c
  [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
  [PATCH] x86: Move direct PCI scanning functions out of line
  [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
  [PATCH] Don't leak NT bit into next task
  [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
  [PATCH] Fix some broken white space in ia32_signal.c
  [PATCH] Initialize argument registers for 32bit signal handlers.
  [PATCH] Remove all traces of signal number conversion
  [PATCH] Don't synchronize time reading on single core AMD systems
  [PATCH] Remove outdated comment in x86-64 mmconfig code
  [PATCH] Use string instructions for Core2 copy/clear
  [PATCH] x86: - restore i8259A eoi status on resume
  [PATCH] i386: Split multi-line printk in oops output.
  ...
2006-09-26 13:07:55 -07:00
Dave McCracken
46a82b2d55 [PATCH] Standardize pxx_page macros
One of the changes necessary for shared page tables is to standardize the
pxx_page macros.  pte_page and pmd_page have always returned the struct
page associated with their entry, while pte_page_kernel and pmd_page_kernel
have returned the kernel virtual address.  pud_page and pgd_page, on the
other hand, return the kernel virtual address.

Shared page tables needs pud_page and pgd_page to return the actual page
structures.  There are very few actual users of these functions, so it is
simple to standardize their usage.

Since this is basic cleanup, I am submitting these changes as a standalone
patch.  Per Hugh Dickins' comments about it, I am also changing the
pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.

Signed-off-by: Dave McCracken <dmccr@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:51 -07:00
Christoph Lameter
776ed98b84 [PATCH] reduce MAX_NR_ZONES: remove two strange uses of MAX_NR_ZONES
I keep seeing zones on various platforms that are never used and wonder why we
compile support for them into the kernel.  Counters show up for HIGHMEM and
DMA32 that are alway zero.

This patch allows the removal of ZONE_DMA32 for non x86_64 architectures and
it will get rid of ZONE_HIGHMEM for arches not using highmem (like 64 bit
architectures).  If an arch does not define CONFIG_HIGHMEM then ZONE_HIGHMEM
will not be defined.  Similarly if an arch does not define CONFIG_ZONE_DMA32
then ZONE_DMA32 will not be defined.

No current architecture uses all the 4 zones (DMA,DMA32,NORMAL,HIGH) that we
have now.  The patchset will reduce the number of zones for all platforms.

On many platforms that do not have DMA32 or HIGHMEM this will reduce the
number of zones by 50%.  F.e.  ia64 only uses DMA and NORMAL.

Large amounts of memory can be saved for larger systemss that may have a few
hundred NUMA nodes.

With ZONE_DMA32 and ZONE_HIGHMEM support optional MAX_NR_ZONES will be 2 for
many non i386 platforms and even for i386 without CONFIG_HIGHMEM set.

Tested on ia64, x86_64 and on i386 with and without highmem.

The patchset consists of 11 patches that are following this message.

One could go even further than this patchset and also make ZONE_DMA optional
because some platforms do not need a separate DMA zone and can do DMA to all
of memory.  This could reduce MAX_NR_ZONES to 1.  Such a patchset will
hopefully follow soon.

This patch:

Fix strange uses of MAX_NR_ZONES

Sometimes we use MAX_NR_ZONES - x to refer to a zone.  Make that explicit.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:46 -07:00
Andi Kleen
0637a70a5d [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
Some buggy systems can machine check when config space accesses
happen for some non existent devices.  i386/x86-64 do some early
device scans that might trigger this. Allow pci=noearly to disable
this. Also when type 1 is disabling also don't do any early
accesses which are always type1.

This moves the pci= configuration parsing to be a early parameter.
I don't think this can break anything because it only changes
a single global that is only used by PCI.

Cc: gregkh@suse.de
Cc: Trammell Hudson <hudson@osresearch.net>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
5e6b0bfe5b [PATCH] Use proper accessors to change PSE bits in change_page_attr()
Use normal pte accessors in change_page_attr() to access the PSE
bits.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
df992848f5 [PATCH] Fix pte_exec/mkexec and use it in change_page_attr()
Fix the pte_exec/mkexec page table accessor functions to really
use the NX bit. Previously they only checked the USER bit, but
weren't actually used for anything.

Then use them in change_page_attr() to manipulate the NX bit
properly.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
d3cf7f0615 [PATCH] Remove bogus warning from early_ioremap
It is correct for its only caller right now, but not for possible
future others.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andrew Morton
1164c9994f [PATCH] make numa_emulation() __init
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Keith Mannthey
6ad9165811 [PATCH] x86_64 kernel mapping fix
Fix for the x86_64 kernel mapping code.  Without this patch the update path
only inits one pmd_page worth of memory and tramples any entries on it.  now
the calling convention to phys_pmd_init and phys_init is to always pass a
[pmd/pud] page not an offset within a page.

Signed-off-by: Keith Mannthey<kmannth@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:36 +02:00
Andi Kleen
273819a2d9 [PATCH] make fault notifier unconditional and export it
It's needed for external debuggers and overhead is very small.

Also make the actual notifier chain they use static

Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:35 +02:00
Andi Kleen
dd2994f619 [PATCH] Add sparse annotations to quiet sparse in arch/x86_64/mm/fault.c
Fixes

linux/arch/x86_64/mm/fault.c:125:7: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/mm/fault.c:125:7:    expected void [noderef] *<noident><asn:1>
linux/arch/x86_64/mm/fault.c:125:7:    got unsigned char *[assigned] instr
linux/arch/x86_64/mm/fault.c:163:8: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/mm/fault.c:163:8:    expected void [noderef] *<noident><asn:1>
linux/arch/x86_64/mm/fault.c:163:8:    got unsigned char *[assigned] instr
linux/arch/x86_64/mm/fault.c:179:9: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/mm/fault.c:179:9:    expected void [noderef] *<noident><asn:1>
linux/arch/x86_64/mm/fault.c:179:9:    got unsigned long *<noident>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
c31fbb1ad8 [PATCH] Clean up acpi_numa variable
Move it into srat.c No need to clutter up setup.c for it

And remove use in setup.c completely - it only guarded a printk
which can be done unconditionally.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
2c8c0e6b8d [PATCH] Convert x86-64 to early param
Instead of hackish manual parsing

Requires earlier i386 patchkit, but also fixes i386 early_printk again.

I removed some obsolete really early parameters which didn't do anything useful.
Also made a few parameters that needed it early (mostly oops printing setup)

Also removed one panic check that wasn't visible without
early console anyways (the early console is now initialized after that
panic)

This cleans up a lot of code.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Jan Beulich
caff0710eb [PATCH] initialize end of memory variables as early as possible
While an earlier patch already did a small step into that direction,
this patch moves initialization of all memory end variables to as
early as possible, so that dependent code doesn't need to check
whether these variables have already been set.

Also, remove a misleading (perhaps just outdated) comment, and make
static a variable only used in a single file.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:31 +02:00
Ingo Molnar
3ac94932a2 [PATCH] lockdep: beautify x86_64 stacktraces
Beautify x86_64 stacktraces to be more readable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:02 -07:00
Heiko Carstens
a581c2a469 [PATCH] add __[start|end]_rodata sections to asm-generic/sections.h
Add __start_rodata and __end_rodata to sections.h to avoid extern
declarations.  Needed by s390 code (see following patch).

[akpm@osdl.org: update architectures]
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:56:03 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Adrian Bunk
80f7228b59 typo fixes: occuring -> occurring
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 18:27:16 +02:00
Randy Dunlap
c9cf55285e [PATCH] add poison.h and patch primary users
Localize poison values into one header file for better documentation and
easier/quicker debugging and so that the same values won't be used for
multiple purposes.

Use these constants in core arch., mm, driver, and fs code.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:38 -07:00
Yasunori Goto
bc02af93dd [PATCH] pgdat allocation for new node add (specify node id)
Change the name of old add_memory() to arch_add_memory.  And use node id to
get pgdat for the node at NODE_DATA().

Note: Powerpc's old add_memory() is defined as __devinit. However,
      add_memory() is usually called only after bootup.
      I suppose it may be redundant. But, I'm not well known about powerpc.
      So, I keep it. (But, __meminit is better at least.)

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:35 -07:00
Linus Torvalds
81a07d7588 Merge branch 'x86-64'
* x86-64: (83 commits)
  [PATCH] x86_64: x86_64 stack usage debugging
  [PATCH] x86_64: (resend) x86_64 stack overflow debugging
  [PATCH] x86_64: msi_apic.c build fix
  [PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs
  [PATCH] x86_64: Avoid broadcasting NMI IPIs
  [PATCH] x86_64: fix apic error on bootup
  [PATCH] x86_64: enlarge window for stack growth
  [PATCH] x86_64: Minor string functions optimizations
  [PATCH] x86_64: Move export symbols to their C functions
  [PATCH] x86_64: Standardize i386/x86_64 handling of NMI_VECTOR
  [PATCH] x86_64: Fix modular pc speaker
  [PATCH] x86_64: remove sys32_ni_syscall()
  [PATCH] x86_64: Do not use -ffunction-sections for modules
  [PATCH] x86_64: Add cpu_relax to apic_wait_icr_idle
  [PATCH] x86_64: adjust kstack_depth_to_print default
  [PATCH] i386/x86-64: adjust /proc/interrupts column headings
  [PATCH] x86_64: Fix race in cpu_local_* on preemptible kernels
  [PATCH] x86_64: Fix fast check in safe_smp_processor_id
  [PATCH] x86_64: x86_64 setup.c - printing cmp related boottime information
  [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
  ...

Manual resolve of trivial conflict in arch/i386/kernel/Makefile
2006-06-26 10:51:09 -07:00
Chuck Ebbert
03fdc2c277 [PATCH] x86_64: enlarge window for stack growth
Allow stack growth so the 'enter' instruction works.  Also
fixes problem in compat_sys_kexec_load() which could allocate
more than 128 bytes using compat_alloc_user_space().

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:22 -07:00
Andi Kleen
2ee60e1789 [PATCH] x86_64: Move export symbols to their C functions
Only exports for assembler files are left in x8664_ksyms.c

Originally inspired by a patch from Al Viro
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:22 -07:00
Jan Beulich
5f51e139d8 [PATCH] x86_64: miscellaneous mm/init.c fixes
- fix an off-by-one error in phys_pmd_init()
 - prevent phys_pmd_init() from removing mappings established earlier
 - fix the direct mapping early printk to in fact show the end of the range
 - remove an apparently orphan comment

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:20 -07:00
Jon Mason
0dc243ae10 [PATCH] x86_64: Calgary IOMMU - IOMMU abstractions
This patch creates a new interface for IOMMUs by adding a centralized
location for IOMMU allocation (for translation tables/apertures) and
IOMMU initialization.  In creating these, code was moved around for
abstraction, uniformity, and consiceness.

Take note of the move of the iommu_setup bootarg parsing code to
__setup.  This is enabled by moving back the location of the aperture
allocation/detection to mem init (which while ugly, was already the
location of the swiotlb_init).

While a slight departure from the previous patch, I belive this provides
the true intention of the previous versions of the patch which changed
this code.  It also makes the addition of the upcoming calgary code much
cleaner than previous patches.

[AK: Removed one broken change. iommu_setup still has to be called
early]

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00
Andi Kleen
d2ae5b5f6a [PATCH] x86_64: Get rid of pud_offset_k / __pud_offset_k
pud_offset_k() equivalent to pud_offset() now.  Pointed out by Jan Beulich
Similar for __pud_offset_ok, which needs a small change in the callers.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00