A couple of fixes for the rwtop script:
- printing the totals and clearing the hashes in the signal handler
eventually leads to various random and serious problems when running
the rwtop script continuously. Moving the print_totals() calls to
the event handlers solves that problem, and the event handlers are
invoked frequently enough that it doesn't affect the timeliness of
the output.
- Fix nuisance 'use of uninitialized value' warnings
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Message-Id: <1273466820-9330-4-git-send-email-tzanussi@gmail.com>
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In cbbc79a we introduced support for multiple events by introducing a
new "event_stat_id" struct and then made several perf_session methods
receive a point to it instead of a pointer to perf_session, and kept the
event_stats and hists rb_tree in perf_session.
While working on the new newt based browser, I realised that it would be
better to introduce a new class, "hists" (short for "histograms"),
renaming the "event_stat_id" struct and the perf_session methods that
were really "hists" methods, as they manipulate only struct hists
members, not touching anything in the other perf_session members.
Other optimizations, such as calculating the maximum lenght of a symbol
name present in an hists instance will be possible as we add them,
avoiding a re-traversal just for finding that information.
The rationale for the name "hists" to replace "event_stat_id" is that we
may have multiple sets of hists for the same event_stat id, as, for
instance, the 'perf diff' tool has, so event stat id is not what
characterizes what this struct and the functions that manipulate it do.
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using machines__create_kernel_maps(..., HOST_KERNEL_ID) it would create
another machine instance for the host machine, and since 1f626bc we have
it out of the machines rb_tree.
Fix it by using machine__create_kernel_maps(&self->host_machine)
directly.
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of newtAddComponents(just-one-entry, NULL), that is not needed
if, like in this browser, we're adding just one component at a time.
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Works by adding a third parameter to the '-g' argument, after the graph
type and minimum percentage, for example:
[root@doppio linux-2.6-tip]# perf report -g fractal,0.5,2
Will show only the first two symbols where at least 0.5% of the samples
took place.
All the other symbols that don't fall outside these constraints will be
put together in the last entry, prefixed with "[...]" and the total
percentage for them.
Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We have just one host on a given session, and that is the most common
setup right now, so embed a ->host_machine struct machine instance
directly in the perf_session class, check if we're looking for it before
going to the rb_tree.
This also fixes a problem found when we try to process old perf.data
files where we didn't have MMAP events for the kernel and modules and
thus don't create the kernel maps, do it in event__preprocess_sample if
it wasn't already.
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Which can happen when processing old files that had no fake kernel MMAP,
events.
That shouldn't result in perf_session__create_kernel_maps not being
called, this will be fixed in a followup patch, for now do these checks
to avoid segfaulting.
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch drops "-a" from the default arguments passed to
perf record by perf lock.
If a user wants to do a system wide record of lock events,
perf lock record -a <program> <argument> ...
is enough for this purpose.
This can reduce the size of the perf.data file.
% sudo ./perf lock record whoami
root
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.439 MB perf.data (~19170 samples) ]
% sudo ./perf lock record -a whoami # with -a option
root
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 48.962 MB perf.data (~2139197 samples) ]
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
LKML-Reference: Message-Id: <1273306229-5216-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
And with that fix at least one bug:
The first hit for an entry, the one that calls malloc to create a new
instance in __perf_session__add_hist_entry, wasn't adding the count to
the per cpumode (PERF_RECORD_MISC_USER, etc) total variable.
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some events, such as the PERF_RECORD_FINISHED_ROUND event consist of
only an event header and no data. In this case, a 0-length payload
will be read, and the 0 return value will be wrongly interpreted as an
'unexpected end of event stream'.
This patch allows for proper handling of data-less events by skipping
0-length reads.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <1273038527.6383.51.camel@tropicana>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
When a lock is acquired after beeing contended, we update the
wait time statistics for the given lock.
But if the min wait time is updated, we don't check the max wait
time. This is wrong because the first time we update the wait time,
we want to update both min and max wait time.
Before:
Name acquired contended total wait (ns) max wait (ns) min wait (ns)
key 8 1 21656 0 21656
After:
Name acquired contended total wait (ns) max wait (ns) min wait (ns)
key 8 1 21656 21656 21656
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
This adds the "info" subcommand to perf lock which can be used
to dump metadata like threads or addresses of lock instances.
"map" was removed because info should do the work for it.
This will be useful not only for debugging but also for ordinary
analyzing.
v2: adding example of usage
% sudo ./perf lock info -t
| Thread ID: comm
| 0: swapper
| 1: init
| 18: migration/5
| 29: events/2
| 32: events/5
| 33: events/6
...
% sudo ./perf lock info -m
| Address of instance: name of class
| 0xffff8800b95adae0: &(&sighand->siglock)->rlock
| 0xffff8800bbb41ae0: &(&sighand->siglock)->rlock
| 0xffff8800bf165ae0: &(&sighand->siglock)->rlock
| 0xffff8800b9576a98: &p->cred_guard_mutex
| 0xffff8800bb890a08: &(&p->alloc_lock)->rlock
| 0xffff8800b9522a08: &(&p->alloc_lock)->rlock
| 0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock
| 0xffff8800bba72a08: &(&p->alloc_lock)->rlock
| 0xffff8800bf18ea08: &(&p->alloc_lock)->rlock
| 0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock
| 0xffff88009bf818a0: &(&ip->i_lock)->mr_lock
| 0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock
| 0xffff8800bb6478a0: &(shost->host_lock)->rlock
v3: fixed some problems Frederic pointed out
* better rbtree tracking in dump_threads()
* removed printf() and used pr_info() and pr_debug()
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
LKML-Reference: <1272863520-16179-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The current events reordering algorithm is based on a heuristic that
gets broken once we deal with a very fast flow of events.
Indeed the time period based flushing is not suitable anymore
in the following case, assuming we have a flush period of two
seconds.
CPU 0 | CPU 1
|
cnt1 timestamps | cnt1 timestamps
|
0 | 0
1 | 1
2 | 2
3 | 3
[...] | [...]
4 seconds later
If we spend too much time to read the buffers (case of a lot of
events to record in each buffers or when we have a lot of CPU buffers
to read), in the next pass the CPU 0 buffer could contain a slice
of several seconds of events. We'll read them all and notice we've
reached the period to flush. In the above example we flush the first
half of the CPU 0 buffer, then we read the CPU 1 buffer where we
have events that were on the flush slice and then the reordering
fails.
It's simple to reproduce with:
perf lock record perf bench sched messaging
To solve this, we use a new solution that doesn't rely on an
heuristical time slice period anymore but on a deterministic basis
based on how perf record does its job.
perf record saves the buffers through passes. A pass is a tour
on every buffers from every CPUs. This is made in order: for
each CPU we read the buffers of every counters. So the more
buffers we visit, the later will be the timstamps of their events.
When perf record finishes a pass it records a
PERF_RECORD_FINISHED_ROUND pseudo event.
We record the max timestamp t found in the pass n. Assuming these
timestamps are monotonic across cpus, we know that if a buffer
still has events with timestamps below t, they will be all available
and then read in the pass n + 1.
Hence when we start to read the pass n + 2, we can safely flush every
events with timestamps below t.
============ PASS n =================
CPU 0 | CPU 1
|
cnt1 timestamps | cnt2 timestamps
1 | 2
2 | 3
- | 4 <--- max recorded
============ PASS n + 1 ==============
CPU 0 | CPU 1
|
cnt1 timestamps | cnt2 timestamps
3 | 5
4 | 6
5 | 7 <---- max recorded
Flush every events below timestamp 4
============ PASS n + 2 ==============
CPU 0 | CPU 1
|
cnt1 timestamps | cnt2 timestamps
6 | 8
7 | 9
- | 10
Flush every events below timestamp 7
etc...
It also works on perf.data versions that don't have
PERF_RECORD_FINISHED_ROUND pseudo events. The difference is that
the events will be only flushed in the end of the perf.data
processing. It will then consume more memory and scale less with
large perf.data files.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
In order to provide a more rubust and deterministic reordering
algorithm, we need to know when we reach a point where we just
did a pass through over every counter buffers to read every thing
they had.
This patch introduces a new PERF_RECORD_FINISHED_ROUND pseudo event
that only consist in an event header and doesn't need to contain
anything.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Steven reported:
|
| I'm getting:
|
| Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727
| Call Trace:
| [<ffffffff811c7565>] debug_smp_processor_id+0xd5/0xf0
| [<ffffffff81019874>] p4_hw_config+0x2b/0x15c
| [<ffffffff8107acbc>] ? trace_hardirqs_on_caller+0x12b/0x14f
| [<ffffffff81019143>] hw_perf_event_init+0x468/0x7be
| [<ffffffff810782fd>] ? debug_mutex_init+0x31/0x3c
| [<ffffffff810c68b2>] T.850+0x273/0x42e
| [<ffffffff810c6cab>] sys_perf_event_open+0x23e/0x3f1
| [<ffffffff81009e6a>] ? sysret_check+0x2e/0x69
| [<ffffffff81009e32>] system_call_fastpath+0x16/0x1b
|
| When running perf record in latest tip/perf/core
|
Due to the fact that p4 counters are shared between HT threads
we synthetically divide the whole set of counters into two
non-intersected subsets. And while we're "borrowing" counters
from these subsets we should not be preempted (well, strictly
speaking in p4_hw_config we just pre-set reference to the
subset which allow to save some cycles in schedule routine
if it happens on the same cpu). So use get_cpu/put_cpu pair.
Also p4_pmu_schedule_events should use smp_processor_id rather
than raw_ version. This allow us to catch up preemption issue
(if there will ever be).
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100508112716.963478928@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 6bde9b6ce0 ("perf: Add
group scheduling transactional APIs") added code to allow a
group to be scheduled in a single transaction. However, it
introduced a bug in handling events whose pmu does not implement
transactions -- at the end of scheduling in the events in the
group, in the non-transactional case the code now falls through
to the group_error label, and proceeds to unschedule all the
events in the group and return failure.
This fixes it by returning 0 (success) in the non-transactional
case.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: eranian@gmail.com
LKML-Reference: <20100508105800.GB10650@brick.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add group scheduling transactional APIs to struct pmu.
These APIs will be implemented in arch code, based on Peter's idea as
below.
> the idea behind hw_perf_group_sched_in() is to not perform
> schedulability tests on each event in the group, but to add the group
> as a whole and then perform one test.
>
> Of course, when that test fails, you'll have to roll-back the whole
> group again.
>
> So start_txn (or a better name) would simply toggle a flag in the pmu
> implementation that will make pmu::enable() not perform the
> schedulablilty test.
>
> Then commit_txn() will perform the schedulability test (so note the
> method has to have a !void return value.
>
> This will allow us to use the regular
> kernel/perf_event.c::group_sched_in() and all the rollback code.
> Currently each hw_perf_group_sched_in() implementation duplicates all
> the rolllback code (with various bugs).
->start_txn:
Start group events scheduling transaction, set a flag to make
pmu::enable() not perform the schedulability test, it will be performed
at commit time.
->commit_txn:
Commit group events scheduling transaction, perform the group
schedulability as a whole
->cancel_txn:
Stop group events scheduling transaction, clear the flag so
pmu::enable() will perform the schedulability test.
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1272002160.5707.60.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rename perf_event_attr::precise to perf_event_attr::precise_ip and
widen it to 2 bits. This new field describes the required precision of
the PERF_SAMPLE_IP field:
0 - SAMPLE_IP can have arbitrary skid
1 - SAMPLE_IP must have constant skid
2 - SAMPLE_IP requested to have 0 skid
3 - SAMPLE_IP must have 0 skid
And modify the Intel PEBS code accordingly. The PEBS implementation
now supports up to precise_ip == 2, where we perform the IP fixup.
Also s/PERF_RECORD_MISC_EXACT/&_IP/ to clarify its meaning, this bit
should be set for each PERF_SAMPLE_IP field known to match the actual
instruction triggering the event.
This new scheme allows for a PEBS mode that uses the buffer for more
than a single event.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Its broken, we really should get PERF_SAMPLE_REGS sorted.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There may exist constraints with a cmask set to zero. In this case
for_each_event_constraint() will not work properly. Now weight is used
instead of the cmask for loop exit detection. Weight is always a value
other than zero since the default contains the HWEIGHT from the
counter mask and in other cases a value of zero does not fit too.
This is in preparation of ibs event constraints that wont have a
cmask.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-7-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The perfctr setup calls are in the corresponding .hw_config()
functions now. This makes it possible to introduce config functions
for other pmu events that are not perfctr specific.
Also, all of a sudden the code looks much nicer.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-4-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>