Commit Graph

324050 Commits

Author SHA1 Message Date
Ingo Molnar
f74eb72868 Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

 * Convert the trace builtins to use the growing evsel/evlist
   tracepoint infrastructure, removing several open coded constructs
   like switch like series of strcmp to dispatch events, etc.
   Basically what had already been showcased in 'perf sched'.

 * Add evsel constructor for tracepoints, that uses libtraceevent
   just to parse the /format events file, use it in a new 'perf test'
   to make sure the libtraceevent format parsing regressions can
   be more readily caught.

 * Some strange errors were happening in some builds, but not on the
   next, reported by several people, problem was some parser related
   files, generated during the build, didn't had proper make deps,
   fix from Eric Sandeen.

 * Fix some compiling errors on 32-bit, from Feng Tang.

 * Don't use sscanf extension %as, not available on bionic, reimplementation
   by Irina Tirdea.

 * Fix bfd.h/libbfd detection with recent binutils, from Markus Trippelsdorf.

 * Introduce struct and cache information about the environment where a
   perf.data file was captured, from Namhyung Kim.

 * Fix several error paths in libtraceevent, from Namhyung Kim.

   Print event causing perf_event_open() to fail in 'perf record',
   from Stephane Eranian.

 * New 'kvm' analysis tool, from Xiao Guangrong.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 20:55:26 +02:00
Mark Salter
b02d617585 c6x: use asm-generic/barrier.h
A recent patch in the linux-next tree caused a build failure on
C6X because C6X didn't define a read_barrier_depends() macro. C6X
does not support SMP and the architecture doesn't provide any
special memory ordering instructions, so it makes sense to just
use the generic barrier.h rather than patching the existing c6x
specific header.

Signed-off-by: Mark Salter <msalter@redhat.com>
2012-09-24 14:39:36 -04:00
Ezequiel Garcia
8781915ad2 trace: Move trace event enable from fs_initcall to core_initcall
This patch splits trace event initialization in two stages:
 * ftrace enable
 * sysfs event entry creation

This allows to capture trace events from an earlier point
by using 'trace_event' kernel parameter and is important
to trace boot-up allocations.

Note that, in order to enable events at core_initcall,
it's necessary to move init_ftrace_syscalls() from
core_initcall to early_initcall.

Link: http://lkml.kernel.org/r/1347461277-25302-1-git-send-email-elezegarcia@gmail.com

Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-24 14:13:02 -04:00
Mandeep Singh Baines
5224c3a315 tracing: Add an option for disabling markers
In our application, we have trace markers spread through user-space.
We have markers in GL, X, etc. These are super handy for Chrome's
about:tracing feature (Chrome + system + kernel trace view), but
can be very distracting when you're trying to debug a kernel issue.

I normally, use "grep -v tracing_mark_write" but it would be nice
if I could just temporarily disable markers all together.

Link: http://lkml.kernel.org/r/1347066739-26285-1-git-send-email-msb@chromium.org

CC: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-24 14:10:44 -04:00
Namhyung Kim
b1ac754b67 tools lib traceevent: Handle alloc_arg failure
Now alloc_arg returns NULL if memory allocation failed, it should be
handled on callsites properly.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/87k3vpzbqo.fsf_-_@sejong.aot.lge.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:31:52 -03:00
Arnaldo Carvalho de Melo
6a6cd11d4e perf test: Add test for the sched tracepoint format fields
So that we make sure the routines that do event format parsing are
working on at least two well know scheduler tracepoints.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-g3rm9b3wtim4djx3z8dkftrj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:29:19 -03:00
Arnaldo Carvalho de Melo
efd2b924d3 perf evsel: Provide a new constructor for tracepoints
The existing constructor receives a perf_event_attr filled with the
event type and the config.

To reduce the boilerplate for tracepoints, provide a new constructor,
perf_evsel__newtp() that receives the tracepoint name and will open
the debugfs file, call into libtraceevent new pevent_parse_format file
to fill its ->tp_format member, so that users can then just call
perf_evsel__field() to access its fields.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-6du8dl1hz0y5l4cybodye7hn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:26:59 -03:00
Arnaldo Carvalho de Melo
2b29175d2b tools lib traceevent: Carve out events format parsing routine
The pevent_parse_event() routine will parse a events/sys/tp/format file
and add an event_format instance to the pevent struct.

This patch introduces a pevent_parse_format() routine with just the bits
needed to parse the event/sys/tp/format file and just return the
event_format instance, useful for when all we want is to parse the
format file, without requiring the pevent struct.

Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-lge0afl47arh86om0m6a5bqr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:26:31 -03:00
Arnaldo Carvalho de Melo
a6d2a61ac6 tools lib traceevent: Remove some die() calls
Cleaned event-parse.c this time, just propagate the errors and in handle
them the call sites.

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-9ebpr2vgfk2qs2841i99sa8y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:13:35 -03:00
Arnaldo Carvalho de Melo
b85119200d tools lib traceevent: Fix afterlife gotos
Instead of dying, just use do_warning and let the goto that is there to
take place.

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-aoaus46ngnt9oc2pt7ckot5d@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:12:59 -03:00
Arnaldo Carvalho de Melo
87162d816f tools lib traceevent: Use calloc were applicable
Replacing the equivalent open coded malloc + memset bits.

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-598fjtjbzal4wxh7fp0yv0q1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:12:31 -03:00
Arnaldo Carvalho de Melo
0dbca1e364 tools lib traceevent: Use asprintf were applicable
Replacing the equivalent open coded malloc + sprintf bits.

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-ghokwtdw2hgmmmn7oa9s03r4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:10:34 -03:00
Markus Trippelsdorf
3ce711a6ab perf tools: bfd.h/libbfd detection fails with recent binutils
With recent binutils I get:

 perf % make
Makefile:668: No bfd.h/libbfd found, install binutils-dev[el]/zlib-static to gain symbol demanglin

That happens because bfd.h now contains:

I've reopened a bug in the hope that this check will be deleted:
http://sourceware.org/bugzilla/show_bug.cgi?id=14243

But in the meantime, the following patch fixes the problem

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Paul Mackerras <paulus@samba.org>
Link: http://lkml.kernel.org/r/20120919072902.GA262@x4
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:05:02 -03:00
Namhyung Kim
70d9304475 tools lib traceevent: Free field if an error occurs on process_flags/symbols
The field should be freed on error paths.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1348037924-17568-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:03:18 -03:00
Namhyung Kim
f8c49d2645 tools lib traceevent: Free field if an error occurs on process_fields
The field should be freed on error paths.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1348037924-17568-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:02:38 -03:00
Namhyung Kim
41e51a289b tools lib traceevent: Make sure that arg->op.right is set properly
When process_op failed, @arg will be freed on a caller with type of
PRINT_OP.  Thus free_arg() will try to free ->op.right field which can
have stale value if something bad happens in the middle.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1348037924-17568-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:01:21 -03:00
Namhyung Kim
1bce6e0fec tools lib traceevent: Fix error path on process_array()
free_token() under out_free should be called with 'token' and no need
to set *tok to NULL since it's set already.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1348037924-17568-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 12:00:15 -03:00
Irina Tirdea
bcbd004020 perf tools: remove sscanf extension %as
perf uses sscanf extension %as to read and allocate a string in the same
step.  This is a non-standard extension only present in new versions of
glibc.

Replacing the use of sscanf and %as with strtok_r calls in order to
parse a given string into its components.  This is needed in Android
since bionic does not support
%as extension for sscanf.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Irina Tirdea <irina.tirdea@intel.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1348173470-4936-1-git-send-email-irina.tirdea@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 11:49:31 -03:00
Namhyung Kim
37e9d750e6 perf header: Remove perf_header__read_feature
Because its only user builtin-kvm::get_cpu_isa() has gone, It can be
removed safely.  In general, we have the feature information in
perf_session_env already, no need to read it again.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Dong Hao <haodong@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1348474503-15070-7-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 11:47:46 -03:00
Namhyung Kim
2f9e97aa8b perf kvm: Use perf_session_env for reading cpuid
We have processed and saved cpuid information to perf_session_env so
reuse it for get_cpu_isa().

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Dong Hao <haodong@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1348474503-15070-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 11:47:27 -03:00
Namhyung Kim
3d7eb86b9d perf header: Remove unused @feat arg from ->process callback
As the @feat arg is not used anywhere, get rid of it from the signature.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1348474503-15070-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 11:47:09 -03:00
Namhyung Kim
7e94cfcc9d perf header: Use pre-processed session env when printing
From now on each feature information is processed and saved in perf
header so that it can be used for printing.  The event desc and branch
stack features are not touched since they're not saved.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1348474503-15070-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 11:45:53 -03:00
Namhyung Kim
a1ae565528 perf header: Add ->process callbacks to most of features
From now on each feature information is processed and saved in perf
header so that it can be used wherever needed.  The BRANCH_STACK feature
is an exception since it needs nothing to be done.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1348474503-15070-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 11:45:22 -03:00
Namhyung Kim
e93699b3c7 perf header: Add struct perf_session_env
The struct perf_session_env will preserve environment information at the
time of perf record.  It can be accessed anytime after parsing a
perf.data file if needed.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1348474503-15070-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 11:44:05 -03:00
Arnaldo Carvalho de Melo
e0dcd6fb25 perf timechart: Use zalloc and fix a couple leaks
Use zalloc for the malloc+memset open coded sequence.

Fix leak on the #ifdef'ed C state handling and when detecting invalid
data in p_state_change().

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-v9x3q9rv4caxtox7wtjpchq5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 11:43:38 -03:00
Arnaldo Carvalho de Melo
746f16ec6a perf lock: Use perf_evsel__intval and perf_session__set_tracepoints_handlers
Following the model of 'perf sched':

. raw_field_value searches first on the common fields, that are unused
  in this tool

. Leave using perf_evsel__intval to the actual handlers, some may not
  need to incur some of the cost because they may not need all the
  fields values.

. Using perf_session__set_tracepoints_handlers will save all those
  strcmp to find the right handler at sample processing time, do it just
  once and get the handler from evsel->handler.func.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-v9x3q9rv4caxtox7wtjpchq5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 10:52:12 -03:00
Arnaldo Carvalho de Melo
0f7d2f1b65 perf kmem: Use perf_evsel__intval and perf_session__set_tracepoints_handlers
Following the model of 'perf sched':

. raw_field_value searches first on the common fields, that are unused
  in this tool

. Using perf_session__set_tracepoints_handlers will save all those
  strcmp to find the right handler at sample processing time, do it just
  once and get the handler from evsel->handler.func.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-v9x3q9rv4caxtox7wtjpchq5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 10:52:03 -03:00
Arnaldo Carvalho de Melo
14907e7383 perf kvm: Use perf_evsel__intval
Using plain raw_field_value(evsel->tp_format) will look at the common
fields as well, and since this tool doesn't need those, speed it up a
bit by looking at just the event specific fields.

Also in general use just evsel and sample, just like was done in 'perf
sched'.

v2: Fixed up test against evsel->name, that contains the subsys name
too, by David Ahern.

Cc: David Ahern <dsahern@gmail.com>
Cc: Dong Hao <haodong@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-v9x3q9rv4caxtox7wtjpchq5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 10:17:32 -03:00
Konrad Rzeszutek Wilk
8d54db795d xen/boot: Disable NUMA for PV guests.
The hypervisor is in charge of allocating the proper "NUMA" memory
and dealing with the CPU scheduler to keep them bound to the proper
NUMA node. The PV guests (and PVHVM) have no inkling of where they
run and do not need to know that right now. In the future we will
need to inject NUMA configuration data (if a guest spans two or more
NUMA nodes) so that the kernel can make the right choices. But those
patches are not yet present.

In the meantime, disable the NUMA capability in the PV guest, which
also fixes a bootup issue. Andre says:

"we see Dom0 crashes due to the kernel detecting the NUMA topology not
by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).

This will detect the actual NUMA config of the physical machine, but
will crash about the mismatch with Dom0's virtual memory. Variation of
the theme: Dom0 sees what it's not supposed to see.

This happens with the said config option enabled and on a machine where
this scanning is still enabled (K8 and Fam10h, not Bulldozer class)

We have this dump then:
NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10
Scanning NUMA topology in Northbridge 24
Number of physical nodes 4
Node 0 MemBase 0000000000000000 Limit 0000000040000000
Node 1 MemBase 0000000040000000 Limit 0000000138000000
Node 2 MemBase 0000000138000000 Limit 00000001f8000000
Node 3 MemBase 00000001f8000000 Limit 0000000238000000
Initmem setup node 0 0000000000000000-0000000040000000
  NODE_DATA [000000003ffd9000 - 000000003fffffff]
Initmem setup node 1 0000000040000000-0000000138000000
  NODE_DATA [0000000137fd9000 - 0000000137ffffff]
Initmem setup node 2 0000000138000000-00000001f8000000
  NODE_DATA [00000001f095e000 - 00000001f0984fff]
Initmem setup node 3 00000001f8000000-0000000238000000
Cannot find 159744 bytes in node 3
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
RIP: e030:[<ffffffff81d220e6>]  [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
.. snip..
  [<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178
  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
  [<ffffffff81d16840>] paging_init+0x13/0x22
  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
  [<ffffffff81683954>] ? printk+0x3c/0x3e
  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
  [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
  [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
  [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
"

so we just disable NUMA scanning by setting numa_off=1.

CC: stable@vger.kernel.org
Reported-and-Tested-by: Andre Przywara <andre.przywara@amd.com>
Acked-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-24 08:47:20 -04:00
Benjamin Marzinski
2216db70c9 GFS2: Write out dirty inode metadata in delayed deletes
If a dirty GFS2 inode was being deleted but was in use by another node, its
metadata was not getting written out before GFS2 checked for dirty buffers in
gfs2_ail_flush().  GFS2 was relying on inode_go_sync() to write out the
metadata when the other node tried to free the file, but it failed the error
check before it got that far. This patch writes out the metadata before calling
gfs2_ail_flush()

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:30 +01:00
Eric Sandeen
a0b4df2943 GFS2: fix s_writers.counter imbalance in gfs2_ail_empty_gl
gfs2_ail_empty_gl() contains an "inline version" of gfs2_trans_begin(),
so it needs an explicit sb_start_intwrite() as well, to balance the
sb_end_intwrite() which will be called by gfs2_trans_end().

With this, xfstest 068 passes on lock_nolock local gfs2.
Without it, we reach a writer count of -1 and get stuck.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:29 +01:00
Bob Peterson
3701530aed GFS2: Fix infinite loop in rbm_find
This patch fixes an infinite loop in gfs2_rbm_find that was introduced
by the previous patch. The problem occurred when the length was less
than 3 but the rbm block was byte-aligned, causing it to improperly
return a extent length of zero, which caused it to spin.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Barry Marson <bmarson@redhat.com>
2012-09-24 10:47:27 +01:00
Steven Whitehouse
ff7f4cb461 GFS2: Consolidate free block searching functions
With the recently added block reservation code, an additional function
was added to search for free blocks. This had a restriction of only being
able to search for aligned extents of free blocks. As a result the
allocation patterns when reserving blocks were suboptimal when the
existing allocation of blocks for an inode was not aligned to the same
boundary.

This patch resolves that problem by adding the ability for gfs2_rbm_find
to search for extents of a particular minimum size. We can then use
gfs2_rbm_find for both looking for reservations, and also looking for
free blocks on an individual basis when we actually come to do the
allocation later on. As a result we only need a single set of code
to deal with both situations.

The function gfs2_rbm_from_block() is moved up rgrp.c so that it
occurs before all of its callers.

Many thanks are due to Bob for helping track down the final issue in
this patch. That fix to the rb_tree traversal and to not share
block reservations from a dirctory to its children is included here.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2012-09-24 10:47:26 +01:00
Jan Kara
56aa72d0fc GFS2: Get rid of I_MUTEX_QUOTA usage
GFS2 uses i_mutex on its system quota inode to synchronize writes to
quota file. Since this is an internal inode to GFS2 (not part of directory
hiearchy or visible by user) we are safe to define locking rules for it. So
let's just get it its own locking class to make it clear.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:24 +01:00
Bob Peterson
0688a5ecea GFS2: Stop block extents at the end of bitmaps
This patch stops multiple block allocations if a nonzero
return code is received from gfs2_rbm_from_block. Without
this patch, if enough pressure is put on the file system,
you get a kernel warning quickly followed by:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa04f47e8>] gfs2_alloc_blocks+0x2c8/0x880 [gfs2]
With this patch, things run normally.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:23 +01:00
Steven Whitehouse
c743ffd09f GFS2: Fix unclaimed_blocks() wrapping bug and clean up
When rgd->rd_free_clone is less than rgd->rd_reserved, the
unclaimed_blocks() calculation would wrap and produce
incorrect results. This patch checks for this condition
when this function is called from gfs2_mblk_search()

In addition, the use of this particular function in other
places in the code has been dropped by means of a general
clean up of gfs2_inplace_reserve(). This function is now
much easier to follow.

Also the setting of the rgd->rd_last_alloc field is corrected.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:21 +01:00
Steven Whitehouse
9e733d3923 GFS2: Improve block reservation tracing
This patch improves the tracing of block reservations by
removing some corner cases and also providing more useful
detail in the traces.

A new field is added to the reservation structure to contain
the inode number. This is used since in certain contexts it is
not possible to access the inode itself to obtain this information.
As a result we can then display the inode number for all tracepoints
and also in case we dump the resource group.

The "del" tracepoint operation has been removed. This could be called
with the reservation rgrp set to NULL. That resulted in not printing
the device number, and thus making the information largely useless
anyway. Also, the conditional on the rgrp being NULL can then be
removed from the tracepoint. After this change, all the block
reservation tracepoint calls will be called with the rgrp information.

The existing ins,clm and tdel calls to the block reservation tracepoint
are sufficient to track the entire life of the block reservation.

In gfs2_block_alloc() the error detection is updated to print out
the inode number of the problematic inode. This can then be compared
against the information in the glock dump,tracepoints, etc.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:20 +01:00
Steven Whitehouse
137834a696 GFS2: Fall back to ignoring reservations, if there are no other blocks left
When we get to the stage of allocating blocks, we know that the
resource group in question must contain enough free blocks, otherwise
gfs2_inplace_reserve() would have failed. So if we are left with only
free blocks which are reserved, then we must use those. This can happen
if another node has sneeked in and use some blocks reserved on this
node, for example. Generally this will happen very rarely and only
when the resouce group is nearly full.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:19 +01:00
Steven Whitehouse
2b9731e8bb GFS2: Fix ->show_options() for statfs slow
The ->show_options() function for GFS2 was not correctly displaying
the value when statfs slow in in use.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Milos Jakubicek <xjakub@fi.muni.cz>
2012-09-24 10:47:17 +01:00
Steven Whitehouse
3e6339dd28 GFS2: Use rbm for gfs2_setbit()
Use the rbm structure for gfs2_setbit() in order to simplify the
arguments to the function. We have to add a bool to control whether
the clone bitmap should be updated (if it exists) but otherwise it
is a more or less direct substitution.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:16 +01:00
Steven Whitehouse
c04a2ef3a8 GFS2: Use rbm for gfs2_testbit()
Change the arguments to gfs2_testbit() so that it now just takes an
rbm specifying the position of the two bit entry to return.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:14 +01:00
Bob Peterson
29c05b205d GFS2: Eliminate unnecessary check for state > 3 in bitfit
Function gfs2_bitfit was checking for state > 3, but that's
impossible since it is only called from rgblk_search, which receives
only GFS2_BLKST_ constants.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:13 +01:00
Bob Peterson
e5dc76b9af GFS2: Eliminate redundant calls to may_grant
Function add_to_queue was checking may_grant for the passed-in
holder for every iteration of its gh2 loop. Now it only checks it
once at the beginning to see if a try lock is futile.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:12 +01:00
Bob Peterson
81e1d45061 GFS2: Combine functions gfs2_glock_dq_wait and wait_on_demote
Function gfs2_glock_dq_wait called two-line function wait_on_demote,
so they were combined.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:10 +01:00
Bob Peterson
07a7904942 GFS2: Combine functions gfs2_glock_wait and wait_on_holder
Function gfs2_glock_wait only called function wait_on_holder and
returned its return code, so they were combined for readability.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:09 +01:00
Bob Peterson
4abb6ad9ea GFS2: inline __gfs2_glock_schedule_for_reclaim
Since function gfs2_glock_schedule_for_reclaim is only two
significant lines, we can eliminate it, simplifying the code
and making it more readable.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:07 +01:00
Bob Peterson
8e711e100f GFS2: change function gfs2_direct_IO to use a normal gfs2_glock_dq
This patch changes function gfs2_direct_IO so that it uses a normal
call to gfs2_glock_dq rather than a call to a multiple-dq of one item.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:06 +01:00
Bob Peterson
8d8b752a0f GFS2: rbm code cleanup
This patch fixes a few small rbm related things. First, it fixes
a corner case where the rbm needs to switch bitmaps and wasn't
adjusting its buffer pointer. Second, there's a white space issue
fixed. Third, the logic in function gfs2_rbm_from_block was optimized
a bit. Lastly, a check for goal block overflows was added to function
gfs2_alloc_blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:04 +01:00
Steven Whitehouse
5d50d53246 GFS2: Fix case where reservation finished at end of rgrp
One corner case which the original patch failed to take into
account was when there is a reservation which ended such that
the following block was one beyond the end of the rgrp in
question. This extra test fixes that case.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
2012-09-24 10:47:03 +01:00
Michel Lespinasse
24d634e8f3 GFS2: Use RB_CLEAR_NODE() rather than rb_init_node()
gfs2 calls RB_EMPTY_NODE() to check if nodes are not on an rbtree.
The corresponding initialization function is RB_CLEAR_NODE().
rb_init_node() was never clearly defined and is going away.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:02 +01:00