Introduce kprobes jump optimization arch-independent parts.
Kprobes uses breakpoint instruction for interrupting execution
flow, on some architectures, it can be replaced by a jump
instruction and interruption emulation code. This gains kprobs'
performance drastically.
To enable this feature, set CONFIG_OPTPROBES=y (default y if the
arch supports OPTPROBE).
Changes in v9:
- Fix a bug to optimize probe when enabling.
- Check nearby probes can be optimize/unoptimize when disarming/arming
kprobes, instead of registering/unregistering. This will help
kprobe-tracer because most of probes on it are usually disabled.
Changes in v6:
- Cleanup coding style for readability.
- Add comments around get/put_online_cpus().
Changes in v5:
- Use get_online_cpus()/put_online_cpus() for avoiding text_mutex
deadlock.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Anders Kaseorg <andersk@ksplice.com>
Cc: Tim Abbott <tabbott@ksplice.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20100225133407.6725.81992.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Be more clear about DSO long names and tell from which file
kernel symbols were obtained, all in --verbose mode:
[root@mica ~]# perf report -v > /dev/null
Looking at the vmlinux_path (5 entries long)
Using /lib/modules/2.6.33-rc8-tip-00777-g0918527-dirty/build/vmlinux for symbols
[root@mica ~]# mv /lib/modules/2.6.33-rc8-tip-00777-g0918527-dirty/build/vmlinux /tmp/dd
[root@mica ~]# perf report -v > /dev/null
Looking at the vmlinux_path (5 entries long)
Using /proc/kallsyms for symbols
[root@mica ~]#
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1266866139-6361-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
syscall_name() helper, which resolves a syscall arch number to
its name, is not yet available as we first need to implement
event injection for it to work.
Remove it from the documentation or tag its references as
unavailable yet. Once it's implemented, we can just revert
the current patch.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Keiichi KII <k-keiichi@bx.jp.nec.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
The check-perf-trace script only checks Perl functionality, and
doesn't really need to be listed as as user script anyway.
This only removes the '-report' shell script, so although it doesn't
appear in the listing, the '-record' shell script and the check perf
trace perl script itself is still available and can still be run
manually as such:
$ libexec/perf-core/scripts/perl/bin/check-perf-trace-record
$ perf trace -s libexec/perf-core/scripts/perl/check-perf-trace.pl
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Keiichi KII <k-keiichi@bx.jp.nec.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1264580883-15324-6-git-send-email-tzanussi@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
As the parent comm then is worthless, confusing users about the
thread where the sample really happened, leading to think that
the sample happened in the parent, not where it really happened,
in the children of a thread for which a PERF_RECORD_COMM event
was not received.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1266627727-19715-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When 'perf record -g' a existing process, even with debuginfo
packages, still cannnot get symbol from 'perf report'.
try:
perf record -g -p `pidof xxx` -f
perf report
68.26% :1181 b74870f2 [.] 0x000000b74870f2
|
|--32.09%-- 0xb73b5b44
| 0xb7487102
| 0xb748a4e2
| 0xb748633d
| 0xb73b41cd
| 0xb73b4467
| 0xb747d531
The reason is: for existing process, in __cmd_record(),
the pid is 0 rather than the existing process id.
Signed-off-by: Austin Zhang <austin_zhang@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4710.10.255.24.35.1265389362.squirrel@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fixes these warnings:
arch/x86/kernel/alternative.c: In function 'alternatives_text_reserved':
arch/x86/kernel/alternative.c:402: warning: comparison of distinct pointer types lacks a cast
arch/x86/kernel/alternative.c:402: warning: comparison of distinct pointer types lacks a cast
arch/x86/kernel/alternative.c:405: warning: comparison of distinct pointer types lacks a cast
arch/x86/kernel/alternative.c:405: warning: comparison of distinct pointer types lacks a cast
Caused by:
2cfa197: ftrace/alternatives: Introducing *_text_reserved functions
Changes in v2:
- Use local variables to compare, instead of type casts.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
LKML-Reference: <20100205171647.15750.37221.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Because we may have aliases, like __GI___strcoll_l in
/lib64/libc-2.10.2.so that appears in objdump as:
$ objdump --start-address=0x0000003715a86420 \
--stop-address=0x0000003715a872dc -dS /lib64/libc-2.10.2.so
0000003715a86420 <__strcoll_l>:
3715a86420: 55 push %rbp
3715a86421: 48 89 e5 mov %rsp,%rbp
3715a86424: 41 57 push %r15
[root@doppio linux-2.6-tip]#
So look for the address exactly at the start of the line instead
so that annotation can work for in these cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1265550376-12665-2-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
First, for programs and prelinked libraries, annotate code was
fooled by objdump output IPs (src->eip in the code) being
wrongly converted to absolute IPs. In such case there were no
conversion needed, but in
src->eip = strtoull(src->line, NULL, 16);
src->eip = map->unmap_ip(map, src->eip); // = eip + map->start - map->pgoff
we were reading absolute address from objdump (e.g. 8048604) and
then almost doubling it, because eip & map->start are
approximately close for small programs.
Needless to say, that later, in record_precise_ip() there was no
matching with real runtime IPs.
And second, like with `perf annotate` the problem with
non-prelinked *.so was that we were doing rip -> objdump address
conversion wrong.
Also, because unlike `perf annotate`, `perf top` code does
annotation based on absolute IPs for performance reasons(*), new
helper for mapping objdump addresse to IP is introduced.
(*) we get samples info in absolute IPs, and since we do lots of
hit-testing on absolute IPs at runtime in record_precise_ip(), it's
better to convert objdump addresses to IPs once and do no conversion
at runtime.
I also had to fix how objdump output is parsed (with hardcoded
8/16 characters format, which was inappropriate for ET_DYN dsos
with small addresses like '4ac')
Also note, that not all objdump output lines has associtated
IPs, e.g. look at source lines here:
000004ac <my_strlen>:
extern "C"
int my_strlen(const char *s)
4ac: 55 push %ebp
4ad: 89 e5 mov %esp,%ebp
4af: 83 ec 10 sub $0x10,%esp
{
int len = 0;
4b2: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
4b9: eb 08 jmp 4c3 <my_strlen+0x17>
while (*s) {
++len;
4bb: 83 45 fc 01 addl $0x1,-0x4(%ebp)
++s;
4bf: 83 45 08 01 addl $0x1,0x8(%ebp)
So we mark them with eip=0, and ignore such lines in annotate
lookup code.
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
[ Note: one hunk of this patch was applied by Mike in 57d8188 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1265550376-12665-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We cannot assume that because hwc->idx == assign[i], we can avoid
reprogramming the counter in hw_perf_enable().
The event may have been scheduled out and another event may have been
programmed into this counter. Thus, we need a more robust way of
verifying if the counter still contains config/data related to an event.
This patch adds a generation number to each counter on each cpu. Using
this mechanism we can verify reliabilty whether the content of a counter
corresponds to an event.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4b66dc67.0b38560a.1635.ffffae18@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pretty much all of the calls do perf_disable/perf_enable cycles, pull
that out to cut back on hardware programming.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
By relying on logic in dso__load_kernel_sym(), we can
automatically load vmlinux.
The only thing which needs to be adjusted, is how --sym-annotate
option is handled - now we can't rely on vmlinux been loaded
until full successful pass of dso__load_vmlinux(), but that's
not the case if we'll do sym_filter_entry setup in
symbol_filter().
So move this step right after event__process_sample() where we
know the whole dso__load_kernel_sym() pass is done.
By the way, though conceptually similar `perf top` still can't
annotate userspace - see next patches with fixes.
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1265223128-11786-9-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The problem was we were incorrectly calculating objdump
addresses for sym->start and sym->end, look:
For simple ET_DYN type DSO (*.so) with one function, objdump -dS
output is something like this:
000004ac <my_strlen>:
int my_strlen(const char *s)
4ac: 55 push %ebp
4ad: 89 e5 mov %esp,%ebp
4af: 83 ec 10 sub $0x10,%esp
{
i.e. we have relative-to-dso-mapping IPs (=RIP) there.
For ET_EXEC type and probably for prelinked libs as well (sorry
can't test - I don't use prelink) objdump outputs absolute IPs,
e.g.
08048604 <zz_strlen>:
extern "C"
int zz_strlen(const char *s)
8048604: 55 push %ebp
8048605: 89 e5 mov %esp,%ebp
8048607: 83 ec 10 sub $0x10,%esp
{
So, if sym->start is always relative to dso mapping(*), we'll
have to unmap it for ET_EXEC like cases, and leave as is for
ET_DYN cases.
(*) and it is - we've explicitely made it relative. Look for
adjust_symbols handling in dso__load_sym()
Previously we were always unmapping sym->start and for ET_DYN
dsos resulting addresses were wrong, and so objdump output was
empty.
The end result was that perf annotate output for symbols from
non-prelinked *.so had always 0.00% percents only, which is
wrong.
To fix it, let's introduce a helper for converting rip to
objdump address, and also let's document what map_ip() and
unmap_ip() do -- I had to study sources for several hours to
understand it.
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1265223128-11786-8-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We want to stream events as fast as possible to perf.data, and
also in the future we want to have splice working, when no
interception will be possible.
Using build_id__mark_dso_hit_ops to create the list of DSOs that
back MMAPs we also optimize disk usage in the build-id cache by
only caching DSOs that had hits.
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1265223128-11786-6-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We can check using strcmp, most DSOs don't start with '[' so the
test is cheap enough and we had to test it there anyway since
when reading perf.data files we weren't calling the routine that
created this global variable and thus weren't setting it as
"loaded", which was causing a bogus:
Failed to open [vdso], continuing without symbols
Message as the first line of 'perf report'.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1265223128-11786-3-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While debugging a problem reported by Pekka Enberg by printing
the IP and all the maps for a thread when we don't find a map
for an IP I noticed that dso__load_sym needs to fixup these
extra maps it creates to hold symbols in different ELF sections
than the main kernel one.
Now we're back showing things like:
[root@doppio linux-2.6-tip]# perf report | grep vsyscall
0.02% mutt [kernel.kallsyms].vsyscall_fn [.] vread_hpet
0.01% named [kernel.kallsyms].vsyscall_fn [.] vread_hpet
0.01% NetworkManager [kernel.kallsyms].vsyscall_fn [.] vread_hpet
0.01% gconfd-2 [kernel.kallsyms].vsyscall_0 [.] vgettimeofday
0.01% hald-addon-rfki [kernel.kallsyms].vsyscall_fn [.] vread_hpet
0.00% dbus-daemon [kernel.kallsyms].vsyscall_fn [.] vread_hpet
[root@doppio linux-2.6-tip]#
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1265223128-11786-2-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I noticed while writing the first test in 'perf regtest' that to
just test the symbol handling routines one needs to create a
perf session, that is a layer centered on a perf.data file,
events, etc, so I untied these layers.
This reduces the complexity for the users as the number of
parameters to most of the symbols and session APIs now was
reduced while not adding more state to all the map instances by
only having data that is needed to split the kernel (kallsyms
and ELF symtab sections) maps and do vmlinux relocation on the
main kernel map.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1265223128-11786-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Open perf data file with O_LARGEFILE flag since its size is
easily larger that 2G.
For example:
# rm -rf perf.data
# ./perf kmem record sleep 300
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 3142.147 MB perf.data
(~137282513 samples) ]
# ll -h perf.data
-rw------- 1 root root 3.1G .....
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4B68F32A.9040203@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>