Commit Graph

461 Commits

Author SHA1 Message Date
Al Viro
6450578f32 [PATCH] ia64: task_pt_regs()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:58 -08:00
Al Viro
ab03591db1 [PATCH] ia64: task_thread_info()
on ia64 thread_info is at the constant offset from task_struct and stack
is embedded into the same beast.  Set __HAVE_THREAD_FUNCTIONS, made
task_thread_info() just add a constant.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:58 -08:00
akpm@osdl.org
198e2f1811 [PATCH] scheduler cache-hot-autodetect
)

From: Ingo Molnar <mingo@elte.hu>

This is the latest version of the scheduler cache-hot-auto-tune patch.

The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:

- I've added a 'domain distance' function, which is used to cache
  measurement results. Each distance is only measured once. This means
  that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
  distances 0 and 1, and on SMP distance 0 is measured. The code walks
  the domain tree to determine the distance, so it automatically follows
  whatever hierarchy an architecture sets up. This cuts down on the boot
  time significantly and removes the O(N^2) limit. The only assumption
  is that migration costs can be expressed as a function of domain
  distance - this covers the overwhelming majority of existing systems,
  and is a good guess even for more assymetric systems.

  [ People hacking systems that have assymetries that break this
    assumption (e.g. different CPU speeds) should experiment a bit with
    the cpu_distance() function. Adding a ->migration_distance factor to
    the domain structure would be one possible solution - but lets first
    see the problem systems, if they exist at all. Lets not overdesign. ]

Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:

- Instead of relying on a single cache-size provided by the platform and
  sticking to it, the code now auto-detects the 'effective migration
  cost' between two measured CPUs, via iterating through a wide range of
  cachesizes. The code searches for the maximum migration cost, which
  occurs when the working set of the test-workload falls just below the
  'effective cache size'. I.e. real-life optimized search is done for
  the maximum migration cost, between two real CPUs.

  This, amongst other things, has the positive effect hat if e.g. two
  CPUs share a L2/L3 cache, a different (and accurate) migration cost
  will be found than between two CPUs on the same system that dont share
  any caches.

(The reliable measurement of migration costs is tricky - see the source
for details.)

Furthermore i've added various boot-time options to override/tune
migration behavior.

Firstly, there's a blanket override for autodetection:

	migration_cost=1000,2000,3000

will override the depth 0/1/2 values with 1msec/2msec/3msec values.

Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:

	migration_factor=120

will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.

I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:

Dual Celeron (128K L2 cache):

 ---------------------
 migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
 ---------------------
           [00]    [01]
 [00]:     -     1.7(1)
 [01]:   1.7(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (0) 1.7 (1784008)
 ---------------------

Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.

Dual HT P4 (512K L2 cache):

 ---------------------
 migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
 ---------------------
           [00]    [01]    [02]    [03]
 [00]:     -     0.4(1)  0.0(0)  0.4(1)
 [01]:   0.4(1)    -     0.4(1)  0.0(0)
 [02]:   0.0(0)  0.4(1)    -     0.4(1)
 [03]:   0.4(1)  0.0(0)  0.4(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (33900) 0.4 (448514)
 ---------------------

Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.

8-way P3/Xeon [2MB L2 cache]:

 ---------------------
 migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
 ---------------------
           [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]
 [00]:     -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [01]:  19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [02]:  19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [03]:  19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [04]:  19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1)
 [05]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1)
 [06]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1)
 [07]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (0) 19.2 (19281756)
 ---------------------

This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <wilder@us.ibm.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:50 -08:00
Ingo Molnar
4dc7a0bbeb [PATCH] sched: add cacheflush() asm
Add per-arch sched_cacheflush() which is a write-back cacheflush used by
the migration-cost calibration code at bootup time.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:49 -08:00
Randy Dunlap
a941564458 [PATCH] capable/capability.h (arch/)
arch: Use <linux/capability.h> where capable() is used.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:14 -08:00
Keshavamurthy Anil S
eb3a72921c [PATCH] kprobes: fix race in recovery of reentrant probe
There is a window where a probe gets removed right after the probe is hit
on some different cpu.  In this case probe handlers can't find a matching
probe instance related to break address.  In this case we need to read the
original instruction at break address to see if that is not a break/int3
instruction and recover safely.

Previous code had a bug where we were not checking for the above race in
case of reentrant probes and the below patch fixes this race.

Tested on IA64, Powerpc, x86_64.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:12 -08:00
Linus Torvalds
ab396e91bf Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
Fix up some trivial conflicts in {i386|ia64}/Makefile
2006-01-10 08:21:33 -08:00
Anil S Keshavamurthy
e597c2984c [PATCH] kprobes: arch_remove_kprobe
Currently arch_remove_kprobes() is only implemented/required for x86_64 and
powerpc.  All other architecture like IA64, i386 and sparc64 implementes a
dummy function which is being called from arch independent kprobes.c file.

This patch removes the dummy functions and replaces it with
#define arch_remove_kprobe(p, s)	do { } while(0)

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:40 -08:00
Christoph Hellwig
e6a6d2efcb [PATCH] sanitize building of fs/compat_ioctl.c
Now that all these entries in the arch ioctl32.c files are gone [1], we can
build fs/compat_ioctl.c as a normal object and kill tons of cruft.  We need a
special do_ioctl32_pointer handler for s390 so the compat_ptr call is done.
This is not needed but harmless on all other architectures.  Also remove some
superflous includes in fs/compat_ioctl.c

Tested on ppc64.

[1] parisc still had it's PPP handler left, which is not fully correct
    for ppp and besides that ppp uses the generic SIOCPRIV ioctl so it'd
    kick in for all netdevice users.  We can introduce a proper handler
    in one of the next patch series by adding a compat_ioctl method to
    struct net_device but for now let's just kill it - parisc doesn't
    compile in mainline anyway and I don't want this to block this
    patchset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:33 -08:00
Christoph Hellwig
3a0f69d59b [PATCH] common compat_sys_timer_create
The comment in compat.c is wrong, every architecture provides a
get_compat_sigevent() for the IPC compat code already.

This basically moves the x86_64 version to common code and removes all the
others.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:32 -08:00
Bjorn Helgaas
80851ef2a5 [PATCH] /dev/mem: validate mmap requests
Add a hook so architectures can validate /dev/mem mmap requests.

This is analogous to validation we already perform in the read/write
paths.

The identity mapping scheme used on ia64 requires that each 16MB or
64MB granule be accessed with exactly one attribute (write-back or
uncacheable).  This avoids "attribute aliasing", which can cause a
machine check.

Sample problem scenario:
  - Machine supports VGA, so it has uncacheable (UC) MMIO at 640K-768K
  - efi_memmap_init() discards any write-back (WB) memory in the first granule
  - Application (e.g., "hwinfo") mmaps /dev/mem, offset 0
  - hwinfo receives UC mapping (the default, since memmap says "no WB here")
  - Machine check abort (on chipsets that don't support UC access to WB
    memory, e.g., sx1000)

In the scenario above, the only choices are
  - Use WB for hwinfo mmap.  Can't do this because it causes attribute
    aliasing with the UC mapping for the VGA MMIO space.
  - Use UC for hwinfo mmap.  Can't do this because the chipset may not
    support UC for that region.
  - Disallow the hwinfo mmap with -EINVAL.  That's what this patch does.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:02 -08:00
Andrew Morton
a136564702 [PATCH] remove gcc-2 checks
Remove various things which were checking for gcc-1.x and gcc-2.x compilers.

From: Adrian Bunk <bunk@stusta.de>

    Some documentation updates and removes some code paths for gcc < 3.2.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:02 -08:00
Christoph Hellwig
6b9c7ed848 [PATCH] use ptrace_get_task_struct in various places
The ptrace_get_task_struct() helper that I added as part of the ptrace
consolidation is useful in variety of places that currently opencode it.
Switch them to the common helpers.

Add a ptrace_traceme() helper that needs to be explicitly called, and simplify
the ptrace_get_task_struct() interface.  We don't need the request argument
now, and we return the task_struct directly, using ERR_PTR() for error
returns.  It's a bit more code in the callers, but we have two sane routines
that do one thing well now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:51 -08:00
Christoph Lameter
39743889aa [PATCH] Swap Migration V5: sys_migrate_pages interface
sys_migrate_pages implementation using swap based page migration

This is the original API proposed by Ray Bryant in his posts during the first
half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.

The intent of sys_migrate is to migrate memory of a process.  A process may
have migrated to another node.  Memory was allocated optimally for the prior
context.  sys_migrate_pages allows to shift the memory to the new node.

sys_migrate_pages is also useful if the processes available memory nodes have
changed through cpuset operations to manually move the processes memory.  Paul
Jackson is working on an automated mechanism that will allow an automatic
migration if the cpuset of a process is changed.  However, a user may decide
to manually control the migration.

This implementation is put into the policy layer since it uses concepts and
functions that are also needed for mbind and friends.  The patch also provides
a do_migrate_pages function that may be useful for cpusets to automatically
move memory.  sys_migrate_pages does not modify policies in contrast to Ray's
implementation.

The current code here is based on the swap based page migration capability and
thus is not able to preserve the physical layout relative to it containing
nodeset (which may be a cpuset).  When direct page migration becomes available
then the implementation needs to be changed to do a isomorphic move of pages
between different nodesets.  The current implementation simply evicts all
pages in source nodeset that are not in the target nodeset.

Patch supports ia64, i386 and x86_64.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:12:42 -08:00
Sam Ravnborg
ad14336de8 kbuild: remove GCC_VERSION
This was causing some ordering problems.  Remove the up-front evaluation
and just revaluate the compiler version each time we need it.

(The up-front evaluation was problematic because some architectures modify
the value of $(CC)).

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2006-01-08 19:58:51 +01:00
Tony Luck
38c0b2c2aa [IA64] Fix compile warnings in setup.c
arch/ia64/kernel/setup.c: In function `show_cpuinfo':
arch/ia64/kernel/setup.c:576: warning: long unsigned int format, different type arg (arg 12)
arch/ia64/kernel/setup.c:576: warning: long unsigned int format, different type arg (arg 13)

Introduced by 95235ca2c2

Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-05 13:30:52 -08:00
Tony Luck
5c3eee7912 Auto-update from upstream 2006-01-05 08:52:11 -08:00
Linus Torvalds
db9edfd7e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
Trivial manual merge fixup for usb_find_interface clashes.
2006-01-04 18:44:12 -08:00
Linus Torvalds
0356dbb7fe Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq 2006-01-04 16:21:26 -08:00
Paul Jackson
6d20b035de [PATCH] driver kill hotplug word from sn and others fix
The first of these changes s/hotplug/uevent/ was needed to
compile sn2_defconfig (ia64/sn).  The other three files
changed are blind changes of all remaining bus_type.hotplug
references I could find to bus_type.uevent.

This patch attempts to finish similar changes made in the
gregkh-driver-kill-hotplug-word-from-driver-core Nov 22 patch.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:08 -08:00
Alex Williamson
408045afbd [IA64] incorrect return from ia64_pci_legacy_write()
The function ia64_pci_legacy_write() returns 0 for everything
except errors.  This return value gets sent back to the user from
pci_write_legacy_io(), making it look like every write fails.  The trivial
patch below copies the behavior of the SGI sn machvec and does what
would be expected from something implementing a write() function.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-03 11:16:17 -08:00
Christoph Lameter
dc86e88c2b [IA64] Add __read_mostly support for IA64
sparc64, i386 and x86_64 have support for a special data section dedicated
to rarely updated data that is frequently read. The section was created to
avoid false sharing of those rarely read data with frequently written kernel
data.

This patch creates such a data section for ia64 and will group rarely written
data into this section.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-16 10:52:46 -08:00
hawkes@sgi.com
d5bf3165b6 [IA64-SGI] change default_sn2 to NR_CPUS==1024
Change the NR_CPUS default for ia64/sn up to 1024.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: John Hesterberg <jh@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-16 10:51:29 -08:00
Jack Steiner
d74700e604 [IA64-SGI] Missed TLB flush
I see why the problem exists only on SN. SN uses a different hardware
mechanism to purge TLB entries across nodes.

It looks like there is a bug in the SN TLB flushing code. During context switch,
kernel threads inherit the mm of the task that was previously running on the
cpu. This confuses the code in sn2_global_tlb_purge().

The result is a missed TLB purge for the task that owns the "borrowed" mm.

(I hit the problem running heavy stress where kswapd was purging code pages of
a user task that woke kswapd. The user task took a SIGILL fault trying to
execute code in the page that had been ripped out from underneath it).

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-16 10:46:25 -08:00
Jes Sorensen
3bd7f01713 [IA64] uncached ref count leak
Use raw_smp_processor_id() instead of get_cpu() as we don't need the
extra features of get_cpu().

Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-16 10:29:52 -08:00
John Hawkes
f5899b5d4f [IA64] disable preemption in udelay()
The udelay() inline for ia64 uses the ITC.  If CONFIG_PREEMPT is enabled
and the platform has unsynchronized ITCs and the calling task migrates
to another CPU while doing the udelay loop, then the effective delay may
be too short or very, very long.

This patch disables preemption around 100 usec chunks of the overall
desired udelay time.  This minimizes preemption-holdoffs.

udelay() is now too big to be inline, move it out of line and export it.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-16 10:00:24 -08:00
Al Viro
b3e5b5b227 [PATCH] ia64 sn __iomem annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-15 10:04:31 -08:00
Robin Holt
27af4cfd11 [IA64] fix for SET_PERSONALITY when CONFIG_IA32_SUPPORT is not set.
Missed this when fixing the SET_PERSONALITY change.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-14 08:52:57 -08:00
Linus Torvalds
0e67050666 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2005-12-12 16:48:29 -08:00
Yasunori Goto
ff9569bc55 [PATCH] Fix Kconfig of DMA32 for ia64
I realized ZONE_DMA32 has a trivial bug at Kconfig for ia64.  In
include/linux/gfp.h on 2.6.15-rc5-mm1, CONFIG is define like followings.

#ifdef CONFIG_DMA_IS_DMA32
#define __GFP_DMA32	((__force gfp_t)0x01)	/* ZONE_DMA is ZONE_DMA32
*/
       :
       :

So, CONFIG_"ZONE"_DMA_IS_DMA32 is clearly wrong.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 08:57:45 -08:00
Keshavamurthy Anil S
bf8d5c52c3 [PATCH] kprobes: increment kprobe missed count for multiprobes
When multiple probes are registered at the same address and if due to some
recursion (probe getting triggered within a probe handler), we skip calling
pre_handlers and just increment nmissed field.

The below patch make sure it walks the list for multiple probes case.
Without the below patch we get incorrect results of nmissed count for
multiple probe case.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 08:57:45 -08:00
Venkatesh Pallipadi
95235ca2c2 [CPUFREQ] CPU frequency display in /proc/cpuinfo
What is the value shown in "cpu MHz" of /proc/cpuinfo when CPUs are capable of
changing frequency?

Today the answer is: It depends.
On i386:
SMP kernel - It is always the boot frequency
UP kernel - Scales with the frequency change and shows that was last set.

On x86_64:
There is one single variable cpu_khz that gets written by all the CPUs. So,
the frequency set by last CPU will be seen on /proc/cpuinfo of all the
CPUs in the system. What you see also depends on whether you have constant_tsc
capable CPU or not.

On ia64:
It is always boot time frequency of a particular CPU that gets displayed.

The patch below changes this to:
Show the last known frequency of the particular CPU, when cpufreq is present. If
cpu doesnot support changing of frequency through cpufreq, then boot frequency
will be shown. The patch affects i386, x86_64 and ia64 architectures.

Signed-off-by: Venkatesh Pallipadi<venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-12-06 19:35:11 -08:00
Jack Steiner
590711b7dd [IA64-SGI] Fix SN PTC deadlock recovery
The patch that added support for a new platform chipset (shub2) broke
PTC deadlock recovery on older versions of the chipset. (PTCs are the
SN platform-specific method for doing a global TLB purge). This
patch fixes deadlock recovery so that it works on both the old & new
chipsets.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-06 09:13:42 -08:00
Robin Holt
bd1d6e2451 [IA64] Change SET_PERSONALITY to comply with comment in binfmt_elf.c.
We have a customer application which trips a bug.  The problem arises
when a driver attempts to call do_munmap on an area which is mapped, but
because current->thread.task_size has been set to 0xC0000000, the call
to do_munmap fails thinking it is an unmap beyond the user's address
space.

The comment in fs/binfmt_elf.c in load_elf_library() before the call
to SET_PERSONALITY() indicates that task_size must not be changed for
the running application until flush_thread, but is for ia64 executing
ia32 binaries.

This patch moves the setting of task_size from SET_PERSONALITY() to
flush_thread() as indicated.  The customer application no longer is able
to trip the bug.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-06 09:12:34 -08:00
Jack Steiner
acb7f67280 [IA64] Limit the maximum NODEDATA_ALIGN() offset
The per-node data structures are allocated with strided offsets that are a
function of the node number. This prevents excessive cache-aliasing from
occurring.

On systems with a large number of nodes, the strided offset becomes
too large. This patch restricts the maximum offset to 32MB. This is far larger
than the size of any current L3 cache.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-06 09:10:37 -08:00
John Keller
3ec829b689 [IA64-SGI] altix: pci_window fixup
Altix only patch to add fixup code that sets up
pci_controller->window. This code is a temporary
fix until ACPI support on Altix is added.

Also, corrects the usage of pci_dev->sysdata,
which had previously been used to reference
platform specific device info, to now point to
a pci_controller struct.

Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-06 09:09:23 -08:00
Keith Owens
05f70395c6 [IA64] Allow salinfo_decode to detect signals on read
Return -EINTR instead of -ERESTARTSYS when signals are delivered during
a blocked read of /proc/sal/*/event.  This allows salinfo_decode to
detect signals when it is blocked on a read of those files.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-05 11:49:17 -08:00
Tony Luck
885da19e80 [IA64] refresh tiger_defconfig ready for 2.6.15
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-02 16:18:42 -08:00
Robin Holt
1448652a54 [IA64] Updates to the sn2_defconfig for 2.6.15.
This updates the sn2_defconfig file for the Altix 330 hardware, enables
the AGP graphics for the SGI Prism, and removes prompts for the remainder
of the new features.  Greg Edwards reviewed the changes.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Greg Edwards <edwardsg@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-02 13:32:44 -08:00
Keshavamurthy Anil S
5a94bcfd2a [IA64] Remove getting break_num by decoding instruction
break.b always sets cr.iim to 0 and the current code tries to
get the break_num by decoding instruction. However, their
seems to be a race condition while reading the regs->cr_iip,
as on other cpu the break.b at regs->cr_iip might have been
replaced with the original instruction as a result of
unregister_kprobe() and hence decoding instruction to
obtain break_num will result in wrong value in this case.

Also includes changes to kprobes.c which now has to handle
break number zero.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-29 09:24:39 -08:00
Dean Roe
b77dae5293 [IA64] - Make pfn_valid more precise for SGI Altix systems
A single SGI Altix system can be divided into multiple partitions,
each running their own instance of the Linux kernel.  pfn_valid()
is currently not optimal for any but the first partition, since it
does not compare the pfn with min_low_pfn before calling the more
costly ia64_pfn_valid().

Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-29 09:24:10 -08:00
Jim Keniston
8bf1101bd5 [PATCH] kprobes: Fix return probes on sys_execve
Fix a bug in kprobes that can cause an Oops or even a crash when a return
probe is installed on one of the following functions: sys_execve,
do_execve, load_*_binary, flush_old_exec, or flush_thread.  The fix is to
remove the call to kprobe_flush_task() in flush_thread().  This fix has
been tested on all architectures for which the return-probes feature has
been implemented (i386, x86_64, ppc64, ia64).  Please apply.

BACKGROUND

Up to now, we have called kprobe_flush_task() under two situations: when a
task exits, and when it execs.  Flushing kretprobe_instances on exit is
correct because (a) do_exit() doesn't return, and (b) one or more
return-probed functions may be active when a task calls do_exit().  Neither
is the case for sys_execve() and its callees.

Initially, the mistaken call to kprobe_flush_task() on exec was harmless
because we put the "real" return address of each active probed function
back in the stack, just to be safe, when we recycled its
kretprobe_instance.  When support for ppc64 and ia64 was added, this safety
measure couldn't be employed, and was eventually dropped even for i386 and
x86_64.  sys_execve() and its callees were informally blacklisted for
return probes until this fix was developed.

Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23 16:08:39 -08:00
Russ Anderson
ab2ff46a2d [IA64-SGI] bte_copy nasid_index fix
The nasid_index was not being incremented if the
pointer was null, causing an infinite loop.

Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-21 14:19:36 -08:00
hawkes@sgi.com
090de0b77c [IA64] fix bug in sn/ia64 for sparse CPU numbering
The kernel's use of the for_each_*cpu(i) macros has allowed for sparse CPU
numbering.  When I hacked the kernel to test sparse cpu_present_map[] and
cpu_possible_map[] cpumasks, I discovered one remaining spot, in
sn_hwperf_ioctl() during sn initialization, that needs to be fixed.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-21 14:15:42 -08:00
Prarit Bhargava
9ad4f924ec [IA64] Prevent sn2 ptc code from executing on all ia64 subarches
Patch to prevent sn2_ptc_init code from attempting to load on non-sn2 systems
when sn2_smp.c is built-in to generic kernel.

Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-21 14:13:31 -08:00
Chen, Kenneth W
e8aabc4716 [IA64] polish comments for tlb fault handler in ivt.S
Polish the comments specifically in vhpt_miss and nested_dtlb_miss
handlers.  I think it's better to explicitly name each page table
level with its name instead of numerically name them.  i.e., use
pgd, pud, pmd, and pte instead of referring as L1, L2, L3 etc.
Along the line, remove some magic number in the comments like:
"PTA + (((IFA(61,63) << 7) | IFA(33,39))*8)".  No code change at
all, pure comment update.  Feel free to shoot anything you have,
darts or tomahawk cruise missile.  I will duck behind a bunker ;-)

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-17 09:48:15 -08:00
Chen, Kenneth W
fedb25fae7 [IA64] 4 level page table bug fix in vhpt_miss
From source code inspection, I think there is a bug with 4 level
page table with vhpt_miss handler.  In the code path of rechecking
page table entry against previously read value after tlb insertion,
*pte value in register r18 was overwritten with value newly read
from pud pointer, render the check of new *pte against previous
*pte completely wrong.  Though the bug is none fatal and the penalty
is to purge the entry and retry.  For functional correctness, it
should be fixed.  The fix is to use a different register so new
*pud don't trash *pte.  (btw, the comments in the cmp statement is
wrong as well, which I will address in the next patch).

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-17 09:47:18 -08:00
Chen, Kenneth W
1e185b97b4 [PATCH] ia64: cpu_idle performance bug fix
Our performance validation on 2.6.15-rc1 caught a disastrous performance
regression on ia64 with netperf (-98%) and volanomark (-58%) compares to
previous kernel version 2.6.14-git7.  See the following chart (result
group 1 & 2).

  http://kernel-perf.sourceforge.net/results.machine_id=26.html

We have root caused it to commit 64c7c8f885

This changeset broke the ia64 task resched notification.  In
sched.c:resched_task(), a reschedule IPI is conditioned upon
TIF_POLLING_NRFLAG.  However, the above changeset unconditionally set
the polling thread flag for idle tasks regardless whether pal_halt_light
is in use or not.  As a result, resched IPI is not sent from
resched_task().  And since the default behavior on ia64 is to use
pal_halt_light, we end up delaying the rescheduling task until next
timer tick, and thus cause the performance regression.

This fixes the performance bug.  I'm glad our performance suite is
turning up bad performance bug like this in time.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15 15:50:51 -08:00
Linus Torvalds
4060994c3e Merge x86-64 update from Andi 2005-11-14 19:56:02 -08:00
Andi Kleen
d1e3dfdc2c [PATCH] x86_64: Set compatibility flag for 4GB zone on IA64
IA64 traditionally had a 4GB DMA32 zone. Set the compatibility flag
to keep old drivers working.

For new drivers it would be better to use ZONE_DMA32 now.

Cc: tony.luck@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:13 -08:00