linux-kernel-test

Author	SHA1	Message	Date
Nick Piggin	cd54e7e543	[PATCH] mm: incorrect VM_FAULT_OOM returns from drivers Some drivers are returning OOM when it is not in response to a memory shortage. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dave Airlie <airlied@linux.ie> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Nick Piggin	f2a2a7108a	[PATCH] oom: less memdie Don't cause all threads in all other thread groups to gain TIF_MEMDIE otherwise we'll get a thundering herd eating our memory reserve. This may not be the optimal scheme, but it fits our policy of allowing just one TIF_MEMDIE in the system at once. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Nick Piggin	f3af38d30c	[PATCH] oom: cleanup messages Clean up the OOM killer messages to be more consistent. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Nick Piggin	c33e0fca35	[PATCH] oom: don't kill unkillable children or siblings Abort the kill if any of our threads have OOM_DISABLE set. Having this test here also prevents any OOM_DISABLE child of the "selected" process from being killed. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Paul Jackson	7253f4ef04	[PATCH] memory page_alloc zonelist caching reorder structure Rearrange the struct members in the 'struct zonelist_cache' structure, so as to put the readonly (once initialized) z_to_n[] array first, where it will come right after the zones[] array in struct zonelist. This pretty much eliminates the chance that the two frequently written elements of 'struct zonelist_cache', the fullzones bitmap and last_full_zap times, will end up on the same cache line as the performance sensitive, frequently read, never (after init) written zones[] array. Keeping frequently written data off frequently read cache lines is good for performance. Thanks to Rohit Seth for the suggestion. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Paul Jackson	9276b1bc96	[PATCH] memory page_alloc zonelist caching speedup Optimize the critical zonelist scanning for free pages in the kernel memory allocator by caching the zones that were found to be full recently, and skipping them. Remembers the zones in a zonelist that were short of free memory in the last second. And it stashes a zone-to-node table in the zonelist struct, to optimize that conversion (minimize its cache footprint.) Recent changes: This differs in a significant way from a similar patch that I posted a week ago. Now, instead of having a nodemask_t of recently full nodes, I have a bitmask of recently full zones. This solves a problem that last weeks patch had, which on systems with multiple zones per node (such as DMA zone) would take seeing any of these zones full as meaning that all zones on that node were full. Also I changed names - from "zonelist faster" to "zonelist cache", as that seemed to better convey what we're doing here - caching some of the key zonelist state (for faster access.) See below for some performance benchmark results. After all that discussion with David on why I didn't need them, I went and got some ;). I wanted to verify that I had not hurt the normal case of memory allocation noticeably. At least for my one little microbenchmark, I found (1) the normal case wasn't affected, and (2) workloads that forced scanning across multiple nodes for memory improved up to 10% fewer System CPU cycles and lower elapsed clock time ('sys' and 'real'). Good. See details, below. I didn't have the logic in get_page_from_freelist() for various full nodes and zone reclaim failures correct. That should be fixed up now - notice the new goto labels zonelist_scan, this_zone_full, and try_next_zone, in get_page_from_freelist(). There are two reasons I persued this alternative, over some earlier proposals that would have focused on optimizing the fake numa emulation case by caching the last useful zone: 1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems) have seen real customer loads where the cost to scan the zonelist was a problem, due to many nodes being full of memory before we got to a node we could use. Or at least, I think we have. This was related to me by another engineer, based on experiences from some time past. So this is not guaranteed. Most likely, though. The following approach should help such real numa systems just as much as it helps fake numa systems, or any combination thereof. 2) The effort to distinguish fake from real numa, using node_distance, so that we could cache a fake numa node and optimize choosing it over equivalent distance fake nodes, while continuing to properly scan all real nodes in distance order, was going to require a nasty blob of zonelist and node distance munging. The following approach has no new dependency on node distances or zone sorting. See comment in the patch below for a description of what it actually does. Technical details of note (or controversy): - See the use of "zlc_active" and "did_zlc_setup" below, to delay adding any work for this new mechanism until we've looked at the first zone in zonelist. I figured the odds of the first zone having the memory we needed were high enough that we should just look there, first, then get fancy only if we need to keep looking. - Some odd hackery was needed to add items to struct zonelist, while not tripping up the custom zonelists built by the mm/mempolicy.c code for MPOL_BIND. My usual wordy comments below explain this. Search for "MPOL_BIND". - Some per-node data in the struct zonelist is now modified frequently, with no locking. Multiple CPU cores on a node could hit and mangle this data. The theory is that this is just performance hint data, and the memory allocator will work just fine despite any such mangling. The fields at risk are the struct 'zonelist_cache' fields 'fullzones' (a bitmask) and 'last_full_zap' (unsigned long jiffies). It should all be self correcting after at most a one second delay. - This still does a linear scan of the same lengths as before. All I've optimized is making the scan faster, not algorithmically shorter. It is now able to scan a compact array of 'unsigned short' in the case of many full nodes, so one cache line should cover quite a few nodes, rather than each node hitting another one or two new and distinct cache lines. - If both Andi and Nick don't find this too complicated, I will be (pleasantly) flabbergasted. - I removed the comment claiming we only use one cachline's worth of zonelist. We seem, at least in the fake numa case, to have put the lie to that claim. - I pay no attention to the various watermarks and such in this performance hint. A node could be marked full for one watermark, and then skipped over when searching for a page using a different watermark. I think that's actually quite ok, as it will tend to slightly increase the spreading of memory over other nodes, away from a memory stressed node. =============== Performance - some benchmark results and analysis: This benchmark runs a memory hog program that uses multiple threads to touch alot of memory as quickly as it can. Multiple runs were made, touching 12, 38, 64 or 90 GBytes out of the total 96 GBytes on the system, and using 1, 19, 37, or 55 threads (on a 56 CPU system.) System, user and real (elapsed) timings were recorded for each run, shown in units of seconds, in the table below. Two kernels were tested - 2.6.18-mm3 and the same kernel with this zonelist caching patch added. The table also shows the percentage improvement the zonelist caching sys time is over (lower than) the stock -mm kernel. number 2.6.18-mm3 zonelist-cache delta (< 0 good) percent GBs N ------------ -------------- ---------------- systime mem threads sys user real sys user real sys user real better 12 1 153 24 177 151 24 176 -2 0 -1 1% 12 19 99 22 8 99 22 8 0 0 0 0% 12 37 111 25 6 112 25 6 1 0 0 -0% 12 55 115 25 5 110 23 5 -5 -2 0 4% 38 1 502 74 576 497 73 570 -5 -1 -6 0% 38 19 426 78 48 373 76 39 -53 -2 -9 12% 38 37 544 83 36 547 82 36 3 -1 0 -0% 38 55 501 77 23 511 80 24 10 3 1 -1% 64 1 917 125 1042 890 124 1014 -27 -1 -28 2% 64 19 1118 138 119 965 141 103 -153 3 -16 13% 64 37 1202 151 94 1136 150 81 -66 -1 -13 5% 64 55 1118 141 61 1072 140 58 -46 -1 -3 4% 90 1 1342 177 1519 1275 174 1450 -67 -3 -69 4% 90 19 2392 199 192 2116 189 176 -276 -10 -16 11% 90 37 3313 238 175 2972 225 145 -341 -13 -30 10% 90 55 1948 210 104 1843 213 100 -105 3 -4 5% Notes: 1) This test ran a memory hog program that started a specified number N of threads, and had each thread allocate and touch 1/N'th of the total memory to be used in the test run in a single loop, writing a constant word to memory, one store every 4096 bytes. Watching this test during some earlier trial runs, I would see each of these threads sit down on one CPU and stay there, for the remainder of the pass, a different CPU for each thread. 2) The 'real' column is not comparable to the 'sys' or 'user' columns. The 'real' column is seconds wall clock time elapsed, from beginning to end of that test pass. The 'sys' and 'user' columns are total CPU seconds spent on that test pass. For a 19 thread test run, for example, the sum of 'sys' and 'user' could be up to 19 times the number of 'real' elapsed wall clock seconds. 3) Tests were run on a fresh, single-user boot, to minimize the amount of memory already in use at the start of the test, and to minimize the amount of background activity that might interfere. 4) Tests were done on a 56 CPU, 28 Node system with 96 GBytes of RAM. 5) Notice that the 'real' time gets large for the single thread runs, even though the measured 'sys' and 'user' times are modest. I'm not sure what that means - probably something to do with it being slow for one thread to be accessing memory along ways away. Perhaps the fake numa system, running ostensibly the same workload, would not show this substantial degradation of 'real' time for one thread on many nodes -- lets hope not. 6) The high thread count passes (one thread per CPU - on 55 of 56 CPUs) ran quite efficiently, as one might expect. Each pair of threads needed to allocate and touch the memory on the node the two threads shared, a pleasantly parallizable workload. 7) The intermediate thread count passes, when asking for alot of memory forcing them to go to a few neighboring nodes, improved the most with this zonelist caching patch. Conclusions: This zonelist cache patch probably makes little difference one way or the other for most workloads on real numa hardware, if those workloads avoid heavy off node allocations. * For memory intensive workloads requiring substantial off-node allocations on real numa hardware, this patch improves both kernel and elapsed timings up to ten per-cent. * For fake numa systems, I'm optimistic, but will have to leave that up to Rohit Seth to actually test (once I get him a 2.6.18 backport.) Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: David Rientjes <rientjes@cs.washington.edu> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Christoph Lameter	89689ae7f9	[PATCH] Get rid of zone_table[] The zone table is mostly not needed. If we have a node in the page flags then we can get to the zone via NODE_DATA() which is much more likely to be already in the cpu cache. In case of SMP and UP NODE_DATA() is a constant pointer which allows us to access an exact replica of zonetable in the node_zones field. In all of the above cases there will be no need at all for the zone table. The only remaining case is if in a NUMA system the node numbers do not fit into the page flags. In that case we make sparse generate a table that maps sections to nodes and use that table to to figure out the node number. This table is sized to fit in a single cache line for the known 32 bit NUMA platform which makes it very likely that the information can be obtained without a cache miss. For sparsemem the zone table seems to be have been fairly large based on the maximum possible number of sections and the number of zones per node. There is some memory saving by removing zone_table. The main benefit is to reduce the cache foootprint of the VM from the frequent lookups of zones. Plus it simplifies the page allocator. [akpm@osdl.org: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Chen, Kenneth W	c0a499c2c4	[PATCH] __unmap_hugepage_range(): add comment Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Paul Jackson	0798e5193c	[PATCH] memory page alloc minor cleanups - s/freeliest/freelist/ spelling fix - Check for NULL *z zone seems useless - even if it could happen, so what? Perhaps we should have a check later on if we are faced with an allocation request that is not allowed to fail - shouldn't that be a serious kernel error, passing an empty zonelist with a mandate to not fail? - Initializing 'z' to zonelist->zones can wait until after the first get_page_from_freelist() fails; we only use 'z' in the wakeup_kswapd() loop, so let's initialize 'z' there, in a 'for' loop. Seems clearer. - Remove superfluous braces around a break - Fix a couple errant spaces - Adjust indentation on the cpuset_zone_allowed() check, to match the lines just before it -- seems easier to read in this case. - Add another set of braces to the zone_watermark_ok logic From: Paul Jackson <pj@sgi.com> Backout one item from a previous "memory page_alloc minor cleanups" patch. Until and unless we are certain that no one can ever pass an empty zonelist to __alloc_pages(), this check for an empty zonelist (or some BUG equivalent) is essential. The code in get_page_from_freelist() blow ups if passed an empty zonelist. Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Andrew Morton	a2ce774096	[PATCH] uml: workqueue build fix arch/um/drivers/chan_kern.c:643: error: conflicting types for 'chan_interrupt' arch/um/include/chan_kern.h:31: error: previous declaration of 'chan_interrupt' Cc: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Andrey Mirkin	822191a2fa	[PATCH] skip data conversion in compat_sys_mount when data_page is NULL OpenVZ Linux kernel team has found a problem with mounting in compat mode. Simple command "mount -t smbfs ..." on Fedora Core 5 distro in 32-bit mode leads to oops: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: compat_sys_mount+0xd6/0x290 Process mount (pid: 14656, veid=300, threadinfo ffff810034d30000, task ffff810034c86bc0) Call Trace: ia32_sysret+0x0/0xa The problem is that data_page pointer can be NULL, so we should skip data conversion in this case. Signed-off-by: Andrey Mirkin <amirkin@openvz.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Andrew Morton	a1e85378ba	[PATCH] drm-sis linkage fix Fix http://bugzilla.kernel.org/show_bug.cgi?id=7606 WARNING: "drm_sman_set_manager" [drivers/char/drm/sis.ko] undefined! Cc: <daniel-silveira@gee.inatel.br> Cc: Dave Airlie <airlied@linux.ie> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Andrew Morton	676dcb8bc2	[PATCH] add bottom_half.h With CONFIG_SMP=n: drivers/input/ff-memless.c:384: warning: implicit declaration of function 'local_bh_disable' drivers/input/ff-memless.c:393: warning: implicit declaration of function 'local_bh_enable' Really linux/spinlock.h should include linux/interrupt.h. But interrupt.h includes sched.h which will need spinlock.h. So the patch breaks the _bh declarations out into a separate header and includes it in both interrupt.h and spinlock.h. Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Andi Kleen <ak@suse.de> Cc: <stable@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Russell King	05f96ef118	[ARM] Allow gcc to optimise arm_add_memory a little more For some reason, gcc was calculating meminfo.bank[meminfo.nr_banks] repeatedly. Use a pointer to it instead. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:26:16 +00:00
Pavel Pisa	86987d5bf4	[ARM] 3991/1: i.MX/MX1 high resolution time source Enhanced resolution for time measurement functions. Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:24:16 +00:00
Pavel Pisa	5c894cd1c8	[ARM] 3990/1: i.MX/MX1 more precise PLL decode The future high resolution support inclusion utilizes imx_decode_pll() in timer base frequency computation. This use requires more precise computation without discarding 10 bits by shifting left. Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:24:15 +00:00
Ben Dooks	9073341c2b	[ARM] 3986/1: H1940: suspend to RAM support Add support to suspend and resume, using the H1940's bootloader Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:17:49 +00:00
Kevin Hilman	f9a8ca1cab	[ARM] 3985/1: ixp4xx clocksource cleanup Rather than using a device_initcall() for the clocksource initialization, just call the init from the sys_timer init function. Signed-off-by: Kevin Hilman <khilman@mvista.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:17:07 +00:00
Rod Whitby	a47d08e2e3	[ARM] 3984/1: ixp4xx/nslu2: Fix disk LED numbering (take 2) This patch fixes an error in the numbering of the disk LEDs on the Linksys NSLU2. The error crept in because the physical location of the LEDs has the Disk 2 LED above the Disk 1 LED. Thanks to Gordon Farquharson for reporting this. Signed-off-by: Rod Whitby <rod@whitby.id.au> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:17:06 +00:00
Lennert Buytenhek	46156e04de	[ARM] 3994/1: ixp23xx: fix handling of pci master aborts The PCI master abort handling issue that affected ixp2000 also affects ixp23xx. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:16:19 +00:00
Nicolas Pitre	2dc20a51dc	[ARM] 3981/1: sched_clock for PXA2xx Here's a 63-bit implementation of shed_clock() for PXA2xx. The actual period depends on the value of CLOCK_TICK_RATE and whether or not reduced scaling factors were provided for it. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:06:55 +00:00
Nicolas Pitre	752bee178e	[ARM] 3980/1: extend the ARM Versatile sched_clock implementation from 32 to 63 bit period This provides a 63 bit clock counter guaranteed to be monotonic over a period of 35583 days instead of a clock wrap every 179 seconds, as long as sched_clock() is called at least once every 89 seconds. This should not be a problem in practice, although a kernel timer could be scheduled every 80 seconds for example simply to call sched_clock() making sure top bits are always synchronized if need be. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:06:53 +00:00
Nicolas Pitre	2f1675c11a	[ARM] 3979/1: extend the SA11x0 sched_clock implementation from 32 to 63 bit period This provides a 63 bit clock counter guaranteed to be monotonic over a period of 370 days instead of a clock wrap every 19.4 minutes, as long as sched_clock() is called at least once every 9.7 minutes which shouldn't be a problem in practice. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:06:50 +00:00
Nicolas Pitre	838ccbc35e	[ARM] 3978/1: macro to provide a 63-bit value from a 32-bit hardware counter This is done in a completely lockless fashion. Bits 0 to 31 of the count are provided by the hardware while bits 32 to 62 are stored in memory. The top bit in memory is used to synchronize with the hardware count half-period. When the top bit of both counters (hardware and in memory) differ then the memory is updated with a new value, incrementing it when the hardware counter wraps around. Because a word store in memory is atomic then the incremented value will always be in synch with the top bit indicating to any potential concurrent reader if the value in memory is up to date or not wrt the needed increment. And any race in updating the value in memory is harmless as the same value would be stored more than once. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:06:45 +00:00
Nicolas Pitre	fa4adc6149	[ARM] 3611/4: optimize do_div() when divisor is constant On ARM all divisions have to be performed "manually". For 64-bit divisions that may take more than a hundred cycles in many cases. With 32-bit divisions gcc already use the recyprocal of constant divisors to perform a multiplication, but not with 64-bit divisions. Since the kernel is increasingly relying upon 64-bit divisions it is worth optimizing at least those cases where the divisor is a constant. This is what this patch does using plain C code that gets optimized away at compile time. For example, despite the amount of added C code, do_div(x, 10000) now produces the following assembly code (where x is assigned to r0-r1): adr r4, .L0 ldmia r4, {r4-r5} umull r2, r3, r4, r0 mov r2, #0 umlal r3, r2, r5, r0 umlal r3, r2, r4, r1 mov r3, #0 umlal r2, r3, r5, r1 mov r0, r2, lsr #11 orr r0, r0, r3, lsl #21 mov r1, r3, lsr #11 ... .L0: .word 948328779 .word 879609302 which is the fastest that can be done for any value of x in that case, many times faster than the __do_div64 code (except for the small x value space for which the result ends up being zero or a single bit). The fact that this code is generated inline produces a tiny increase in .text size, but not significant compared to the needed code around each __do_div64 call site this code is replacing. The algorithm used has been validated on a 16-bit scale for all possible values, and then recodified for 64-bit values. Furthermore I've been running it with the final BUG_ON() uncommented for over two months now with no problem. Note that this new code is compiled with gcc versions 4.0 or later. Earlier gcc versions proved themselves too problematic and only the original code is used with them. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:06:09 +00:00
Lennert Buytenhek	47d7e524b7	[ARM] 3993/1: ep93xx: add cirrus logic edb9302a support Add support for the Cirrus Logic EDB9302A Evaluation Board. Confirmed to work by Chase Douglas. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:01:56 +00:00
George G. Davis	5636810d6f	[ARM] 3982/2: Explicitly select 32-bit ARM ISA (-marm) Do not assume that the ARM GCC toolchain defaults to building for the 32-bit ARM ISA (-marm) case. Instead, explicitly select -marm in CFLAGS since the toolchain default can be for the 16-bit Thumb ISA (-mthumb) in some odd/rare cases. Signed-off-by: George G. Davis <gdavis@mvista.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:01:11 +00:00
Patrick Caulfield	ac33d07105	[DLM] Clean up lowcomms This fixes up most of the things pointed out by akpm and Pavel Machek with comments below indicating why some things have been left: Andrew Morton wrote: > >> +static struct nodeinfo nodeid2nodeinfo(int nodeid, gfp_t alloc) >> +{ >> + struct nodeinfo ni; >> + int r; >> + int n; >> + >> + down_read(&nodeinfo_lock); > > Given that this function can sleep, I wonder if `alloc' is useful. > > I see lots of callers passing in a literal "0" for `alloc'. That's in fact > a secret (GFP_ATOMIC & ~__GFP_HIGH). I doubt if that's what you really > meant. Particularly as the code could at least have used __GFP_WAIT (aka > GFP_NOIO) which is much, much more reliable than "0". In fact "0" is the > least reliable mode possible. > > IOW, this is all bollixed up. When 0 is passed into nodeid2nodeinfo the function does not try to allocate a new structure at all. it's an indication that the caller only wants the nodeinfo struct for that nodeid if there actually is one in existance. I've tidied the function itself so it's more obvious, (and tidier!) >> +/* Data received from remote end / >> +static int receive_from_sock(void) >> +{ >> + int ret = 0; >> + struct msghdr msg; >> + struct kvec iov[2]; >> + unsigned len; >> + int r; >> + struct sctp_sndrcvinfo sinfo; >> + struct cmsghdr cmsg; >> + struct nodeinfo ni; >> + >> + /* These two are marginally too big for stack allocation, but this >> + * function is (currently) only called by dlm_recvd so static should be >> + * OK. >> + / >> + static struct sockaddr_storage msgname; >> + static char incmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; > > whoa. This is globally singly-threaded code?? Yes. it is only ever run in the context of dlm_recvd. >> >> +static void initiate_association(int nodeid) >> +{ >> + struct sockaddr_storage rem_addr; >> + static char outcmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; > > Another static buffer to worry about. Globally singly-threaded code? Yes. Only ever called by dlm_sendd. >> + >> +/ Send a message / >> +static int send_to_sock(struct nodeinfo ni) >> +{ >> + int ret = 0; >> + struct writequeue_entry e; >> + int len, offset; >> + struct msghdr outmsg; >> + static char outcmsg[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; > > Singly-threaded? Yep. >> >> +static void dealloc_nodeinfo(void) >> +{ >> + int i; >> + >> + for (i=1; i<=max_nodeid; i++) { >> + struct nodeinfo ni = nodeid2nodeinfo(i, 0); >> + if (ni) { >> + idr_remove(&nodeinfo_idr, i); > > Didn't that need locking? Not. it's only ever called at DLM shutdown after all the other threads have been stopped. >> >> +static int write_list_empty(void) >> +{ >> + int status; >> + >> + spin_lock_bh(&write_nodes_lock); >> + status = list_empty(&write_nodes); >> + spin_unlock_bh(&write_nodes_lock); >> + >> + return status; >> +} > > This function's return value is meaningless. As soon as the lock gets > dropped, the return value can get out of sync with reality. > > Looking at the caller, this _might_ happen to be OK, but it's a nasty and > dangerous thing. Really the locking should be moved into the caller. It's just an optimisation to allow the caller to schedule if there is no work to do. if something arrives immediately afterwards then it will get picked up when the process re-awakes (and it will be woken by that arrival). The 'accepting' atomic has gone completely. as Andrew pointed out it didn't really achieve much anyway. I suspect it was a plaster over some other startup or shutdown bug to be honest. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Pavel Machek <pavel@ucw.cz>	2006-12-07 09:25:13 -05:00
Steven Whitehouse	34126f9f41	[GFS2] Change gfs2_fsync() to use write_inode_now() This is a bit better than the previous version of gfs2_fsync() although it would be better still if we were able to call a function which only wrote the inode & metadata. Its no big deal though that this will potentially write the data as well since the VFS has already done that before calling gfs2_fsync(). I've also added a comment to explain whats going on here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@osdl.org>	2006-12-07 09:13:14 -05:00
Alan	fd3367af3d	[PATCH] libata: Incorrect timing computation for PIO5/6 The ata timing computation code makes some mistakes in PIO5/6 because a check was not updated correctly when I put this support into the kernel. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 07:37:07 -05:00
Mikael Pettersson	25b93d81b9	[PATCH] sata_promise: new EH conversion, take 2 This patch converts sata_promise to use new-style libata error handling on Promise SATA chips, for both SATA and PATA ports. * ATA_FLAG_SRST is no longer set * ->phy_reset is no longer set as it is unused when ->error_handler is present, and pdc_sata_phy_reset() has been removed * pdc_freeze() masks interrupts and halts DMA via PDC_CTLSTAT * pdc_thaw() clears interrupt status in PDC_INT_SEQMASK and then unmasks interrupts in PDC_CTLSTAT * pdc_error_handler() reinitialises the port if it isn't frozen, and then invokes ata_do_eh() with standard {s,}ata reset methods * pdc_post_internal_cmd() resets the port in case of errors * the PATA-only 20619 chip continues to use old-style EH: not by necessity but simply because I don't have documentation for it or any way to test it Since the previous version pdc_error_handler() has been rewritten and it now mostly matches ahci and sata_sil24. In case anyone wonders: the call to pdc_reset_port() isn't a heavy-duty reset, it's a light-weight reset to quickly put a port into a sane state. The discussion about the PCI flushes in pdc_freeze() and pdc_thaw() seemed to end with a consensus that the flushes are OK and not obviously redundant, so I decided to keep them for now. This patch was prepared against 2.6.19-git7, but it also applies to 2.6.19 + libata #upstream, with or without the revised sata_promise cleanup patch I recently submitted. This patch does conflict with the #promise-sata-pata patch: this patch removes pdc_sata_phy_reset() while #promise-sata-pata modifies it. The correct patch resolution is to remove the function. Tested on 2037x and 2057x chips, with PATA patches on top and disks on both SATA and PATA ports. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 07:25:01 -05:00
Albert Lee	e3472cbe5c	[PATCH] libata: let ATA_FLAG_PIO_POLLING use polling pio for ATA_PROT_NODATA Even if ATA_FLAG_PIO_POLLING is set, libata uses irq pio for the ATA_PROT_NODATA protocol. This patch let ATA_FLAG_PIO_POLLING use polling pio for the ATA_PROT_NODATA protocol. Signed-off-by: Albert Lee <albertcc@tw.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 07:22:28 -05:00
Mikael Pettersson	d324d4627d	[PATCH] sata_promise: cleanups, take 2 This patch performs two simple cleanups of sata_promise. * Remove board_20771 and map device id 0x3577 to board_2057x. After the recent corrections for SATAII chips, board_20771 and board_2057x were equivalent in the driver. * Remove hp->hotplug_offset and use hp->flags & PDC_FLAG_GEN_II to compute hotplug_offset in pdc_host_init(). hp->hotplug_offset was used to distinguish 1st and 2nd generation chips in one particular case, but now we have that information in a more general form in hp->flags, so hp->hotplug_offset is redundant. Changes since previous submission: rebased on libata-dev #upstream, cleaned up hotplug_offset computation based on Tejun's comments, expanded hotplug_offset removal rationale. This patch does not depend on the pending new EH conversion patch. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 07:21:24 -05:00
Jeff Garzik	0ae851352a	[wireless] zd1211rw: workqueue-related build fixes Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 06:30:30 -05:00
Jeff Garzik	0bfdcc88df	[netdrvr] netxen: workqueue-related build fixes	2006-12-07 06:30:07 -05:00
Jeff Garzik	f1ff0fdc35	Merge tag 'r8169-upstream-20061204-00' of git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6 into upstream	2006-12-07 05:05:58 -05:00
Jeff Garzik	359f2d17e3	Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream Conflicts: drivers/net/wireless/zd1211rw/zd_mac.h net/ieee80211/softmac/ieee80211softmac_assoc.c	2006-12-07 05:02:40 -05:00
Stephen Hemminger	0efdf26266	[PATCH] sky2: sparse warnings Get rid of sparse warnings in sky2 driver because of mixed enum usage. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:59:20 -05:00
Stephen Hemminger	7f4b45c526	[PATCH] skge: fix sparse warnings Fix sparse warnings from using enum as part of arithmetic expression, and comment indentation fixes Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:59:20 -05:00
Brice Goglin	e67bda55e2	[PATCH] myri10ge: write as 2 32-byte blocks in myri10ge_submit_8rx In the myri10ge_submit_8rx() routine, write the 64 byte request block as 2 32-byte blocks so that it is handled by the hardware pio write handler if write-combining is enabled. Signed-off-by: Brice Goglin <brice@myri.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:59:20 -05:00
Stephen Hemminger	c3905bc4b7	[PATCH] sky2: receive queue watermark tweak This patch makes the receive performance on some systems go from 714MB/s to 941MB/s. It adjusts the watermark of the receive queue to be lower, thereby avoiding excess hardware flow control. This is most important on the systems which have little/no additional buffering. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:58:33 -05:00
Stephen Hemminger	6771290102	[PATCH] sky2: beter ram buffer partitioning Different chips have different sizes of ram buffers, and some versions have no ram buffer at all!. Be more careful about sizing the ram usage because it maybe a problem if vendor keeps changing sizes. There is the (unlikely) possibility that some of the errors on some of the chips have been caused by partitioning not on a 1K boundary. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:58:33 -05:00
Stephen Hemminger	e5b74c7ddd	[PATCH] sky2: add comments to PCI ids Add comments to sky2 driver to show relationship between PCI id and hardware. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:58:33 -05:00
Stephen Hemminger	2a45b49c30	[PATCH] sky2: add PCI for 88ec033 Add another new/missing pci id for 88ec033 chip. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:58:32 -05:00
Andrew Victor	a3f63e4f4b	[PATCH] AT91RM9200 Ethernet: Use dev_alloc_skb() Use dev_alloc_skb() instead of alloc_skb(). It is also not necessary to adjust skb->len manually since that's already done by skb_put(). Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:58:32 -05:00
Andrew Victor	51cc210457	[PATCH] AT91RM9200 Ethernet: Add netpoll / netconsole support Adds netpoll / netconsole support. Original patch from Bill Gatliff. Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:58:32 -05:00
Andrew Victor	cf42553ab4	[PATCH] AT91RM9200 Ethernet: Move check_timer variable and use mod_timer() Move the global 'check_timer' variable into the private data structure. Also now use mod_timer(). Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:58:32 -05:00
Andrew Victor	c57ee096b6	[PATCH] AT91RM9200 Ethernet: Remove 'at91_dev' and use netdev_priv() Remove the global 'at91_dev' variable. Use netdev_priv() instead of casting dev->priv directly. Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-12-07 04:58:32 -05:00
Jeff Garzik	8d1413b280	Merge branch 'master' into upstream Conflicts: drivers/net/netxen/netxen_nic.h drivers/net/netxen/netxen_nic_main.c	2006-12-07 04:57:19 -05:00
Randy Dunlap	272491ef42	[NETFILTER]: Fix non-ANSI func. decl. Fix non-ANSI function declaration: net/netfilter/nf_conntrack_core.c:1096:25: warning: non-ANSI function declaration of function 'nf_conntrack_flush' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-07 01:17:24 -08:00

... 17 18 19 20 21 ...

43937 Commits