linux-kernel-test

Author	SHA1	Message	Date
Johannes Weiner	527a5ec9a5	mm: memcg: per-priority per-zone hierarchy scan generations Memory cgroup limit reclaim currently picks one memory cgroup out of the target hierarchy, remembers it as the last scanned child, and reclaims all zones in it with decreasing priority levels. The new hierarchy reclaim code will pick memory cgroups from the same hierarchy concurrently from different zones and priority levels, it becomes necessary that hierarchy roots not only remember the last scanned child, but do so for each zone and priority level. Until now, we reclaimed memcgs like this: mem = mem_cgroup_iter(root) for each priority level: for each zone in zonelist: reclaim(mem, zone) But subsequent patches will move the memcg iteration inside the loop over the zones: for each priority level: for each zone in zonelist: mem = mem_cgroup_iter(root) reclaim(mem, zone) And to keep with the original scan order - memcg -> priority -> zone - the last scanned memcg has to be remembered per zone and per priority level. Furthermore, global reclaim will be switched to the hierarchy walk as well. Different from limit reclaim, which can just recheck the limit after some reclaim progress, its target is to scan all memcgs for the desired zone pages, proportional to the memcg size, and so reliably detecting a full hierarchy round-trip will become crucial. Currently, the code relies on one reclaimer encountering the same memcg twice, but that is error-prone with concurrent reclaimers. Instead, use a generation counter that is increased every time the child with the highest ID has been visited, so that reclaimers can stop when the generation changes. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:04 -08:00
Johannes Weiner	f16015fbf2	mm: vmscan: distinguish between memcg triggering reclaim and memcg being scanned Memory cgroup hierarchies are currently handled completely outside of the traditional reclaim code, which is invoked with a single memory cgroup as an argument for the whole call stack. Subsequent patches will switch this code to do hierarchical reclaim, so there needs to be a distinction between a) the memory cgroup that is triggering reclaim due to hitting its limit and b) the memory cgroup that is being scanned as a child of a). This patch introduces a struct mem_cgroup_zone that contains the combination of the memory cgroup and the zone being scanned, which is then passed down the stack instead of the zone argument. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:04 -08:00
Johannes Weiner	89b5fae536	mm: vmscan: distinguish global reclaim from global LRU scanning The traditional zone reclaim code is scanning the per-zone LRU lists during direct reclaim and kswapd, and the per-zone per-memory cgroup LRU lists when reclaiming on behalf of a memory cgroup limit. Subsequent patches will convert the traditional reclaim code to reclaim exclusively from the per-memory cgroup LRU lists. As a result, using the predicate for which LRU list is scanned will no longer be appropriate to tell global reclaim from limit reclaim. This patch adds a global_reclaim() predicate to tell direct/kswapd reclaim from memory cgroup limit reclaim and substitutes it in all places where currently scanning_global_lru() is used for that. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:04 -08:00
Johannes Weiner	9f3a0d0933	mm: memcg: consolidate hierarchy iteration primitives The memcg naturalization series: Memory control groups are currently bolted onto the side of traditional memory management in places where better integration would be preferrable. To reclaim memory, for example, memory control groups maintain their own LRU list and reclaim strategy aside from the global per-zone LRU list reclaim. But an extra list head for each existing page frame is expensive and maintaining it requires additional code. This patchset disables the global per-zone LRU lists on memory cgroup configurations and converts all its users to operate on the per-memory cgroup lists instead. As LRU pages are then exclusively on one list, this saves two list pointers for each page frame in the system: page_cgroup array size with 4G physical memory vanilla: allocated 31457280 bytes of page_cgroup patched: allocated 15728640 bytes of page_cgroup At the same time, system performance for various workloads is unaffected: 100G sparse file cat, 4G physical memory, 10 runs, to test for code bloat in the traditional LRU handling and kswapd & direct reclaim paths, without/with the memory controller configured in vanilla: 71.603(0.207) seconds patched: 71.640(0.156) seconds vanilla: 79.558(0.288) seconds patched: 77.233(0.147) seconds 100G sparse file cat in 1G memory cgroup, 10 runs, to test for code bloat in the traditional memory cgroup LRU handling and reclaim path vanilla: 96.844(0.281) seconds patched: 94.454(0.311) seconds 4 unlimited memcgs running kbuild -j32 each, 4G physical memory, 500M swap on SSD, 10 runs, to test for regressions in kswapd & direct reclaim using per-memcg LRU lists with multiple memcgs and multiple allocators within each memcg vanilla: 717.722(1.440) seconds [ 69720.100(11600.835) majfaults ] patched: 714.106(2.313) seconds [ 71109.300(14886.186) majfaults ] 16 unlimited memcgs running kbuild, 1900M hierarchical limit, 500M swap on SSD, 10 runs, to test for regressions in hierarchical memcg setups vanilla: 2742.058(1.992) seconds [ 26479.600(1736.737) majfaults ] patched: 2743.267(1.214) seconds [ 27240.700(1076.063) majfaults ] This patch: There are currently two different implementations of iterating over a memory cgroup hierarchy tree. Consolidate them into one worker function and base the convenience looping-macros on top of it. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:04 -08:00
KAMEZAWA Hiroyuki	ab936cbcd0	memcg: add mem_cgroup_replace_page_cache() to fix LRU issue Commit `ef6a3c6311` ("mm: add replace_page_cache_page() function") added a function replace_page_cache_page(). This function replaces a page in the radix-tree with a new page. WHen doing this, memory cgroup needs to fix up the accounting information. memcg need to check PCG_USED bit etc. In some(many?) cases, 'newpage' is on LRU before calling replace_page_cache(). So, memcg's LRU accounting information should be fixed, too. This patch adds mem_cgroup_replace_page_cache() and removes the old hooks. In that function, old pages will be unaccounted without touching res_counter and new page will be accounted to the memcg (of old page). WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid races with LRU handling. Background: replace_page_cache_page() is called by FUSE code in its splice() handling. Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated page and may be on LRU. LRU mis-accounting will be critical for memory cgroup because rmdir() checks the whole LRU is empty and there is no account leak. If a page is on the other LRU than it should be, rmdir() will fail. This bug was added in March 2011, but no bug report yet. I guess there are not many people who use memcg and FUSE at the same time with upstream kernels. The result of this bug is that admin cannot destroy a memcg because of account leak. So, no panic, no deadlock. And, even if an active cgroup exist, umount can succseed. So no problem at shutdown. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:04 -08:00
Jason Baron	28d82dc1c4	epoll: limit paths The current epoll code can be tickled to run basically indefinitely in both loop detection path check (on ep_insert()), and in the wakeup paths. The programs that tickle this behavior set up deeply linked networks of epoll file descriptors that cause the epoll algorithms to traverse them indefinitely. A couple of these sample programs have been previously posted in this thread: https://lkml.org/lkml/2011/2/25/297. To fix the loop detection path check algorithms, I simply keep track of the epoll nodes that have been already visited. Thus, the loop detection becomes proportional to the number of epoll file descriptor and links. This dramatically decreases the run-time of the loop check algorithm. In one diabolical case I tried it reduced the run-time from 15 mintues (all in kernel time) to .3 seconds. Fixing the wakeup paths could be done at wakeup time in a similar manner by keeping track of nodes that have already been visited, but the complexity is harder, since there can be multiple wakeups on different cpus...Thus, I've opted to limit the number of possible wakeup paths when the paths are created. This is accomplished, by noting that the end file descriptor points that are found during the loop detection pass (from the newly added link), are actually the sources for wakeup events. I keep a list of these file descriptors and limit the number and length of these paths that emanate from these 'source file descriptors'. In the current implemetation I allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of length 4 and 10 of length 5. Note that it is sufficient to check the 'source file descriptors' reachable from the newly added link, since no other 'source file descriptors' will have newly added links. This allows us to check only the wakeup paths that may have gotten too long, and not re-check all possible wakeup paths on the system. In terms of the path limit selection, I think its first worth noting that the most common case for epoll, is probably the model where you have 1 epoll file descriptor that is monitoring n number of 'source file descriptors'. In this case, each 'source file descriptor' has a 1 path of length 1. Thus, I believe that the limits I'm proposing are quite reasonable and in fact may be too generous. Thus, I'm hoping that the proposed limits will not prevent any workloads that currently work to fail. In terms of locking, I have extended the use of the 'epmutex' to all epoll_ctl add and remove operations. Currently its only used in a subset of the add paths. I need to hold the epmutex, so that we can correctly traverse a coherent graph, to check the number of paths. I believe that this additional locking is probably ok, since its in the setup/teardown paths, and doesn't affect the running paths, but it certainly is going to add some extra overhead. Also, worth noting is that the epmuex was recently added to the ep_ctl add operations in the initial path loop detection code using the argument that it was not on a critical path. Another thing to note here, is the length of epoll chains that is allowed. Currently, eventpoll.c defines: /* Maximum number of nesting allowed inside epoll sets */ #define EP_MAX_NESTS 4 This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS + 1). However, this limit is currently only enforced during the loop check detection code, and only when the epoll file descriptors are added in a certain order. Thus, this limit is currently easily bypassed. The newly added check for wakeup paths, stricly limits the wakeup paths to a length of 5, regardless of the order in which ep's are linked together. Thus, a side-effect of the new code is a more consistent enforcement of the graph depth. Thus far, I've tested this, using the sample programs previously mentioned, which now either return quickly or return -EINVAL. I've also testing using the piptest.c epoll tester, which showed no difference in performance. I've also created a number of different epoll networks and tested that they behave as expectded. I believe this solves the original diabolical test cases, while still preserving the sane epoll nesting. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Nelson Elhage <nelhage@ksplice.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:04 -08:00
Sasha Levin	2ccd4f4d47	pipe: fail cleanly when root tries F_SETPIPE_SZ with big size When a user with the CAP_SYS_RESOURCE cap tries to F_SETPIPE_SZ a pipe with size bigger than kmalloc() can alloc it spits out an ugly warning: ------------[ cut here ]------------ WARNING: at mm/page_alloc.c:2095 __alloc_pages_nodemask+0x5d3/0x7a0() Pid: 733, comm: a.out Not tainted 3.2.0-rc1+ #4 Call Trace: warn_slowpath_common+0x75/0xb0 warn_slowpath_null+0x15/0x20 __alloc_pages_nodemask+0x5d3/0x7a0 __get_free_pages+0x12/0x50 __kmalloc+0x12b/0x150 pipe_set_size+0x75/0x120 pipe_fcntl+0xf8/0x140 do_fcntl+0x2d4/0x410 sys_fcntl+0x66/0xa0 system_call_fastpath+0x16/0x1b ---[ end trace 432f702e6db7b5ee ]--- Instead, make kcalloc() handle the overflow case and fail quietly. [akpm@linux-foundation.org: switch to sizeof(*bufs) for 80-column niceness] Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:04 -08:00
Stanislaw Gruszka	888a214dc4	slub: document setting min order with debug_guardpage_minorder > 0 Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:04 -08:00
Mathias Krause	15ee2d000d	parisc, exec: remove redundant set_fs(USER_DS) The address limit is already set in flush_old_exec() so those calls to set_fs(USER_DS) are redundant. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:04 -08:00
Mathias Krause	01fa310cd9	ia64, exec: remove redundant set_fs(USER_DS) The address limit is already set in flush_old_exec() so this set_fs(USER_DS) is redundant. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:03 -08:00
Andrew Morton	08346bf805	drivers/video/nvidia/nvidia.c: fix warning Fix the int/bool confusion in there. drivers/video/nvidia/nvidia.c:1602: warning: return from incompatible pointer type Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:03 -08:00
Heiko Carstens	2565409fc0	mm,x86,um: move CMPXCHG_DOUBLE config option Move CMPXCHG_DOUBLE and rename it to HAVE_CMPXCHG_DOUBLE so architectures can simply select the option if it is supported. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:03 -08:00
Heiko Carstens	4156153c4d	mm,x86,um: move CMPXCHG_LOCAL config option Move CMPXCHG_LOCAL and rename it to HAVE_CMPXCHG_LOCAL so architectures can simply select the option if it is supported. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:03 -08:00
Heiko Carstens	43570fd2f4	mm,slub,x86: decouple size of struct page from CONFIG_CMPXCHG_LOCAL While implementing cmpxchg_double() on s390 I realized that we don't set CONFIG_CMPXCHG_LOCAL despite the fact that we have support for it. However setting that option will increase the size of struct page by eight bytes on 64 bit, which we certainly do not want. Also, it doesn't make sense that a present cpu feature should increase the size of struct page. Besides that it looks like the dependency to CMPXCHG_LOCAL is wrong and that it should depend on CMPXCHG_DOUBLE instead. This patch: If an architecture supports CMPXCHG_LOCAL this shouldn't result automatically in larger struct pages if the SLUB allocator is used. Instead introduce a new config option "HAVE_ALIGNED_STRUCT_PAGE" which can be selected if a double word aligned struct page is required. Also update x86 Kconfig so that it should work as before. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:03 -08:00
Joe Perches	0d259cf819	include/linux/linkage.h: remove unused ATTRIB_NORET macro The uses have been renamed so delete the unused macro. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:03 -08:00
Joe Perches	ff2d8b19a3	treewide: convert uses of ATTRIB_NORETURN to __noreturn Use the more commonly used __noreturn instead of ATTRIB_NORETURN. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:03 -08:00
Joe Perches	9402c95f34	treewide: remove useless NORET_TYPE macro and uses It's a very old and now unused prototype marking so just delete it. Neaten panic pointer argument style to keep checkpatch quiet. Signed-off-by: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:03 -08:00
Joe Perches	80bf007f20	include/linux/linkage.h: remove unused NORET_AND macro The only use in kernel.h is gone so remove the macro. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:02 -08:00
Joe Perches	4da4785995	kernel.h: neaten panic prototype Use __printf macro. Convert NORET_AND to ATTRIB_NORET. Use the normal kernel style for pointer arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:02 -08:00
Stephen Boyd	efeb156e72	kprobes: silence DEBUG_STRICT_USER_COPY_CHECKS=y warning Enabling DEBUG_STRICT_USER_COPY_CHECKS causes the following warning: In file included from arch/x86/include/asm/uaccess.h:573, from kernel/kprobes.c:55: In function 'copy_from_user', inlined from 'write_enabled_file_bool' at kernel/kprobes.c:2191: arch/x86/include/asm/uaccess_64.h:65: warning: call to 'copy_from_user_overflow' declared with attribute warning: copy_from_user() buffer size is not provably correct presumably due to buf_size being signed causing GCC to fail to see that buf_size can't become negative. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:02 -08:00
Xiaotian Feng	a2ef990ab5	proc: fix null pointer deref in proc_pid_permission() get_proc_task() can fail to search the task and return NULL, put_task_struct() will then bomb the kernel with following oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff81217d34>] proc_pid_permission+0x64/0xe0 PGD 112075067 PUD 112814067 PMD 0 Oops: 0002 [#1] PREEMPT SMP This is a regression introduced by commit `0499680a` ("procfs: add hidepid= and gid= mount options"). The kernel should return -ESRCH if get_proc_task() failed. Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Stephen Wilson <wilsons@start.ca> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:02 -08:00
Bradley Peterson	91dce7ddab	pptp: Accept packet with seq zero Initialize the PPTP "seq received" value to 0xffffffff, so we don't ignore packets with seq zero. Signed-off-by: Bradley Peterson <despite@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 20:05:28 -08:00
Roland Dreier	5b7bf42e3d	RDS: Remove some unused iWARP code rds_iw_flush_goal() just returns a count, but it is only called in one place and its return value is ignored there. So delete all the dead code. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 20:05:28 -08:00
Eric Benard	8d82f219c2	net: fsl: fec: handle 10Mbps speed in RMII mode when the link is 10 Mbps and the mode is RMII, it's necessary to set FRCONT to 1 in MIIGSK_CFGR to divide the RMII source clock by 10 in order to support 10 Mbps operations. Signed-off-by: Eric Bénard <eric@eukrea.com> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 20:05:28 -08:00
Julia Lawall	25cecd7e35	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c: add missing iounmap Add missing iounmap in error handling code, in a case where the function already preforms iounmap on some other execution path. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression e; statement S,S1; int ret; @@ e = $ioremap\\|ioremap_nocache$(...) ... when != iounmap(e) if (<+...e...+>) S ... when any when != iounmap(e) *if (...) { ... when != iounmap(e) return ...; } ... when any iounmap(e); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 20:05:28 -08:00
Julia Lawall	20d4369b68	drivers/net/ethernet/tundra/tsi108_eth.c: add missing iounmap Add missing iounmap in error handling code, in a case where the function already preforms iounmap on some other execution path. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression e; statement S,S1; int ret; @@ e = $ioremap\\|ioremap_nocache$(...) ... when != iounmap(e) if (<+...e...+>) S ... when any when != iounmap(e) *if (...) { ... when != iounmap(e) return ...; } ... when any iounmap(e); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 20:05:28 -08:00
Doug Kehn	83636580ad	ksz884x: fix mtu for VLAN The Ethernet header does not account for the addition of a VLAN header. Full size Ethernet frames containing VLAN header are not processed because the frame is larger than the resulting hw mtu. Signed-off-by: Doug Kehn <rdkehn@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 20:05:28 -08:00
Eric Dumazet	ddecf0f4db	net_sched: sfq: add optional RED on top of SFQ Adds an optional Random Early Detection on each SFQ flow queue. Traditional SFQ limits count of packets, while RED permits to also control number of bytes per flow, and adds ECN capability as well. 1) We dont handle the idle time management in this RED implementation, since each 'new flow' begins with a null qavg. We really want to address backlogged flows. 2) if headdrop is selected, we try to ecn mark first packet instead of currently enqueued packet. This gives faster feedback for tcp flows compared to traditional RED [ marking the last packet in queue ] Example of use : tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \ limit 3000 headdrop flows 512 divisor 16384 \ redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop flows 512/16384 divisor 16384 ewma 6 min 8000b max 60000b probability 0.2 ecn prob_mark 0 prob_mark_head 4876 prob_drop 6131 forced_mark 0 forced_mark_head 0 forced_drop 0 Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007 requeues 0) rate 99483Kbit 8219pps backlog 689392b 456p requeues 0 In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled flows, we can see number of packets CE marked is smaller than number of drops (for non ECN flows) If same test is run, without RED, we can check backlog is much bigger. qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop flows 512/16384 divisor 16384 Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0) rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Dave Taht <dave.taht@gmail.com> Tested-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 20:05:28 -08:00
Manfred Rudigier	72092cc453	dp83640: Fix NOHZ local_softirq_pending 08 warning Similar problem as in `481a819914` ("can: fix NOHZ local_softirq_pending 08 warning"). This fix replaces netif_rx() with netif_rx_ni() which has to be used from process/softirq context. Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 15:26:01 -08:00
Manfred Rudigier	9c4886e5e6	gianfar: Fix invalid TX frames returned on error queue when time stamping When TX time stamping for PTP messages is enabled on a socket, a time stamp is returned on the socket error queue to the user space application after the frame was transmitted. The transmitted frame is also returned on the error queue so that an application knows to which frame the time stamp belongs. In the current implementation the TxFCB is immediately followed by the frame. Since the eTSEC inserts the TX time stamp 8 bytes after the TxFCB, parts of the frame have been overwritten and an invalid frame was returned on the socket error queue. This patch fixes the described problem by adding additional 16 padding bytes between the TxFCB and the frame for all messages sent from a time stamping enabled socket (other sockets are not affected). Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 15:26:01 -08:00
Manfred Rudigier	db83d136d7	gianfar: Fix missing sock reference when processing TX time stamps When there is not enough headroom in the skb a private copy will be made. However, the private copy had no reference to the socket and consequently no time stamp could be queued on the socket error queue during the skb_tstamp_tx function. This patch fixes this issue by also stealing the sock reference from the original skb after making the private copy. Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 15:26:01 -08:00
Timur Tabi	eb8a54a78e	phylib: introduce mdiobus_alloc_size() Introduce function mdiobus_alloc_size() as an alternative to mdiobus_alloc(). Most callers of mdiobus_alloc() also allocate a private data structure, and then manually point bus->priv to this object. mdiobus_alloc_size() combines the two operations into one, which simplifies memory management. The original mdiobus_alloc() now just calls mdiobus_alloc_size(0). Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 15:23:04 -08:00
Linus Torvalds	2485a4b610	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: bcm5974 - set BUTTONPAD property Input: serio_raw - return proper result when serio_raw_write fails Input: serio_raw - really signal HUP upon disconnect Input: serio_raw - remove stray semicolon Input: revert some over-zealous conversions to module_platform_driver()	2012-01-12 12:40:41 -08:00
Linus Torvalds	6733e54b66	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: FUSE: Notifying the kernel of deletion. fuse: support ioctl on directories fuse: Use kcalloc instead of kzalloc to allocate array fuse: llseek optimize SEEK_CUR and SEEK_SET	2012-01-12 12:39:21 -08:00
Linus Torvalds	bcf8a3dfcb	Merge tag 'to-linus' of git://github.com/rustyrussell/linux * tag 'to-linus' of git://github.com/rustyrussell/linux: (24 commits) lguest: Make sure interrupt is allocated ok by lguest_setup_irq lguest: move the lguest tool to the tools directory lguest: switch segment-voodoo-numbers to readable symbols virtio: balloon: Add freeze, restore handlers to support S4 virtio: balloon: Move vq initialization into separate function virtio: net: Add freeze, restore handlers to support S4 virtio: net: Move vq and vq buf removal into separate function virtio: net: Move vq initialization into separate function virtio: blk: Add freeze, restore handlers to support S4 virtio: blk: Move vq initialization to separate function virtio: console: Disable callbacks for virtqueues at start of S4 freeze virtio: console: Add freeze and restore handlers to support S4 virtio: console: Move vq and vq buf removal into separate functions virtio: pci: add PM notification handlers for restore, freeze, thaw, poweroff virtio: pci: switch to new PM API virtio_blk: fix config handler race virtio: add debugging if driver doesn't kick. virtio: expose added descriptors immediately. virtio: avoid modulus operation. virtio: support unlocked queue kick ...	2012-01-12 12:37:27 -08:00
Glauber Costa	1398eee082	net: decrement memcg jump label when limit, not usage, is changed The logic of the current code is that whenever we destroy a cgroup that had its limit set (set meaning different than maximum), we should decrement the jump_label counter. Otherwise we assume it was never incremented. But what the code actually does is test for RES_USAGE instead of RES_LIMIT. Usage being different than maximum is likely to be true most of the time. The effect of this is that the key must become negative, and since the jump_label test says: !!atomic_read(&key->enabled); we'll have jump_labels still on when no one else is using this functionality. Signed-off-by: Glauber Costa <glommer@parallels.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 12:27:59 -08:00
Eric Dumazet	cf778b00e9	net: reintroduce missing rcu_assign_pointer() calls commit `a9b3cd7f32` (rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER) did a lot of incorrect changes, since it did a complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x, y). We miss needed barriers, even on x86, when y is not NULL. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-12 12:26:56 -08:00
Linus Torvalds	61bd5e5683	brcmsmac: fix reading of PCI sprom contents It appears that you can only read the sprom contents with aligned 16-bit reads: anything else causes at least some versions of the broadcom chipset to abort the PCI transaction, returning 0xff. This apparently doesn't trigger very often, because most setups don't use an external srom chip, and the OTP sprom loading doesn't have this issue. But at least the current 11" Macbook Air does trigger it, and wireless communications were broken as a result. Acked-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 12:19:34 -08:00
David S. Miller	9ee6045f09	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	2012-01-12 12:10:00 -08:00
Anton Vorontsov	bccd17294a	x86: Get rid of 'dubious one-bit signed bitfield' sprase warning This very noisy sparse warning appears on almost every file in the kernel: CHECK init/main.c arch/x86/include/asm/thread_info.h:43:55: error: dubious one-bit signed bitfield arch/x86/include/asm/thread_info.h:44:46: error: dubious one-bit signed bitfield This patch changes sig_on_uaccess_error and uaccess_err flags to unsigned type and thus fixes the warning. Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com> Acked-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 09:32:21 -08:00
Linus Torvalds	a429638cac	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (526 commits) ASoC: twl6040 - Add method to query optimum PDM_DL1 gain ALSA: hda - Fix the lost power-setup of seconary pins after PM resume ALSA: usb-audio: add Yamaha MOX6/MOX8 support ALSA: virtuoso: add S/PDIF input support for all Xonars ALSA: ice1724 - Support for ooAoo SQ210a ALSA: ice1724 - Allow card info based on model only ALSA: ice1724 - Create capture pcm only for ADC-enabled configurations ALSA: hdspm - Provide unique driver id based on card serial ASoC: Dynamically allocate the rtd device for a non-empty release() ASoC: Fix recursive dependency due to select ATMEL_SSC in SND_ATMEL_SOC_SSC ALSA: hda - Fix the detection of "Loopback Mixing" control for VIA codecs ALSA: hda - Return the error from get_wcaps_type() for invalid NIDs ALSA: hda - Use auto-parser for HP laptops with cx20459 codec ALSA: asihpi - Fix potential Oops in snd_asihpi_cmode_info() ALSA: hdsp - Fix potential Oops in snd_hdsp_info_pref_sync_ref() ALSA: hda/cirrus - support for iMac12,2 model ASoC: cx20442: add bias control over a platform provided regulator ALSA: usb-audio - Avoid flood of frame-active debug messages ALSA: snd-usb-us122l: Delete calls to preempt_disable mfd: Put WM8994 into cache only mode when suspending ... Fix up trivial conflicts in: - arch/arm/mach-s3c64xx/mach-crag6410.c: renamed speyside_wm8962 to tobermory, added littlemill right next to it - drivers/base/regmap/{regcache.c,regmap.c}: duplicate diff that had already come in with other changes in the regmap tree	2012-01-12 08:00:30 -08:00
Bjorn Helgaas	5cf9a4e69c	x86/PCI: build amd_bus.o only when CONFIG_AMD_NB=y We only need amd_bus.o for AMD systems with PCI. arch/x86/pci/Makefile already depends on CONFIG_PCI=y, so this patch just adds the dependency on CONFIG_AMD_NB. Cc: Yinghai Lu <yinghai@kernel.org> Cc: stable@kernel.org # 2.6.34+ (needs adjustment for k8 -> amd rename) Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 07:54:18 -08:00
Sujith Manoharan	8232736dab	ath6kl: Fix listen interval handling This patch addresses a few problems with the commit: "ath6kl: Implement support for listen interval from userspace" * The debugfs file required for reading/writing the listen interval wasn't created. Fix this. * The interface index was being hardcoded to zero. Fix this. * Two separate parameters, "listen_interval_time and listen_interval_beacons" were being used. This fails to work as expected because the FW assigns higher precedence to "listen_interval_beacons" and "listen_interval_time" ends up being never used at all. To handle this, fix the host driver to exclusively use listen interval based on units of beacon intervals. To set the listen interval, a user would now do something like this: echo "10" > /sys/kernel/debug/ieee80211/*/ath6kl/listen_interval kvalo: fix two checkpatch warnings Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>	2012-01-12 13:36:37 +02:00
Sujith Manoharan	cbec267a51	ath6kl: Initialize a variable properly This prevents 'comp_pktq' from being used in an incorrect manner. Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>	2012-01-12 13:36:37 +02:00
Sujith Manoharan	4a8ce2fd05	ath6kl: Remove redundant pointer check 'params' is already used earlier and there is no point in checking for a NULL condition again. Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>	2012-01-12 13:36:37 +02:00
Sujith Manoharan	866dc88607	ath6kl: Fix SDIO error path sdio_release_host() would be called twice if sdio_set_block_size() fails for some reason, which would result in the following warning. WARNING: at /home/sujith/dev/wireless-testing/drivers/mmc/core/core.c:828 mmc_release_host+0x42/0x50 [mmc_core]() Call Trace: [<ffffffff81064fdf>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8106503a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa03beb42>] mmc_release_host+0x42/0x50 [mmc_core] [<ffffffffa03c917e>] sdio_release_host+0x1e/0x30 [mmc_core] [<ffffffffa053fac7>] ath6kl_sdio_config+0xc7/0x110 [ath6kl_sdio] [<ffffffffa053fd2c>] ath6kl_sdio_probe+0x21c/0x320 [ath6kl_sdio] [<ffffffffa03beb2a>] ? mmc_release_host+0x2a/0x50 [mmc_core] [<ffffffffa03c7d2a>] sdio_bus_probe+0xfa/0x130 [mmc_core] [<ffffffff813015ae>] driver_probe_device+0x7e/0x1b0 [<ffffffff8130178b>] __driver_attach+0xab/0xb0 [<ffffffff813016e0>] ? driver_probe_device+0x1b0/0x1b0 [<ffffffff813016e0>] ? driver_probe_device+0x1b0/0x1b0 [<ffffffff81300504>] bus_for_each_dev+0x64/0xa0 [<ffffffff8130123e>] driver_attach+0x1e/0x20 [<ffffffff81300e80>] bus_add_driver+0x1b0/0x280 [<ffffffffa0065000>] ? 0xffffffffa0064fff [<ffffffff81301d06>] driver_register+0x76/0x140 [<ffffffffa0065000>] ? 0xffffffffa0064fff [<ffffffffa03c7b71>] sdio_register_driver+0x21/0x30 [mmc_core] [<ffffffffa0065012>] ath6kl_sdio_init+0x12/0x35 [ath6kl_sdio] [<ffffffff81002042>] do_one_initcall+0x42/0x180 [<ffffffff810b025f>] sys_init_module+0x8f/0x200 [<ffffffff81425ac2>] system_call_fastpath+0x16/0x1b Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>	2012-01-12 13:36:37 +02:00
Takashi Iwai	9e4ce164ee	Merge branch 'topic/hda' into for-linus	2012-01-12 09:59:18 +01:00
Takashi Iwai	627b79628f	Merge branch 'topic/misc' into for-linus	2012-01-12 09:59:14 +01:00
Takashi Iwai	29abceb67f	Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into topic/asoc	2012-01-12 09:48:20 +01:00
Linus Torvalds	4c4d285ad5	Merge tag 'rmobile-for-linus' of git://github.com/pmundt/linux-sh SH/R-Mobile updates for 3.3 merge window. * tag 'rmobile-for-linus' of git://github.com/pmundt/linux-sh: (32 commits) arm: mach-shmobile: add a resource name for shdma ARM: mach-shmobile: r8a7779 SMP support V3 ARM: mach-shmobile: Add kota2 defconfig. ARM: mach-shmobile: Add marzen defconfig. ARM: mach-shmobile: r8a7779 power domain support V2 ARM: mach-shmobile: Fix up marzen build for recent GIC changes. ARM: mach-shmobile: r8a7779 PFC function support ARM: mach-shmobile: Flush caches in platform_cpu_die() ARM: mach-shmobile: Allow SoC specific CPU kill code ARM: mach-shmobile: Fix headsmp.S code to use CPUINIT ARM: mach-shmobile: clock-r8a7779: clkz/clkzs support ARM: mach-shmobile: clock-r8a7779: add DIV4 clock support ARM: mach-shmobile: Marzen LAN89218 support ARM: mach-shmobile: Marzen SCIF2/SCIF4 support ARM: mach-shmobile: r8a7779 PFC GPIO-only support V2 ARM: mach-shmobile: r8a7779 and Marzen base support V2 sh: pfc: Unlock register support sh: pfc: Variable bitfield width config register support sh: pfc: Add config_reg_helper() function sh: pfc: Convert index to field and value pair ...	2012-01-11 23:29:20 -08:00

... 8 9 10 11 12 ...

284845 Commits