tracing: consolidate documents
Move kmemtrace.txt, tracepoints.txt, ftrace.txt and mmiotrace.txt to the new trace/ directory. I didnt find any references to those documents in both source files and documents, so no extra work needs to be done. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Pekka Paalanen <pq@iki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> LKML-Reference: <49DD6E2B.6090200@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
This commit is contained in:
1828
Documentation/trace/ftrace.txt
Normal file
1828
Documentation/trace/ftrace.txt
Normal file
File diff suppressed because it is too large
Load Diff
126
Documentation/trace/kmemtrace.txt
Normal file
126
Documentation/trace/kmemtrace.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
kmemtrace - Kernel Memory Tracer
|
||||
|
||||
by Eduard - Gabriel Munteanu
|
||||
<eduard.munteanu@linux360.ro>
|
||||
|
||||
I. Introduction
|
||||
===============
|
||||
|
||||
kmemtrace helps kernel developers figure out two things:
|
||||
1) how different allocators (SLAB, SLUB etc.) perform
|
||||
2) how kernel code allocates memory and how much
|
||||
|
||||
To do this, we trace every allocation and export information to the userspace
|
||||
through the relay interface. We export things such as the number of requested
|
||||
bytes, the number of bytes actually allocated (i.e. including internal
|
||||
fragmentation), whether this is a slab allocation or a plain kmalloc() and so
|
||||
on.
|
||||
|
||||
The actual analysis is performed by a userspace tool (see section III for
|
||||
details on where to get it from). It logs the data exported by the kernel,
|
||||
processes it and (as of writing this) can provide the following information:
|
||||
- the total amount of memory allocated and fragmentation per call-site
|
||||
- the amount of memory allocated and fragmentation per allocation
|
||||
- total memory allocated and fragmentation in the collected dataset
|
||||
- number of cross-CPU allocation and frees (makes sense in NUMA environments)
|
||||
|
||||
Moreover, it can potentially find inconsistent and erroneous behavior in
|
||||
kernel code, such as using slab free functions on kmalloc'ed memory or
|
||||
allocating less memory than requested (but not truly failed allocations).
|
||||
|
||||
kmemtrace also makes provisions for tracing on some arch and analysing the
|
||||
data on another.
|
||||
|
||||
II. Design and goals
|
||||
====================
|
||||
|
||||
kmemtrace was designed to handle rather large amounts of data. Thus, it uses
|
||||
the relay interface to export whatever is logged to userspace, which then
|
||||
stores it. Analysis and reporting is done asynchronously, that is, after the
|
||||
data is collected and stored. By design, it allows one to log and analyse
|
||||
on different machines and different arches.
|
||||
|
||||
As of writing this, the ABI is not considered stable, though it might not
|
||||
change much. However, no guarantees are made about compatibility yet. When
|
||||
deemed stable, the ABI should still allow easy extension while maintaining
|
||||
backward compatibility. This is described further in Documentation/ABI.
|
||||
|
||||
Summary of design goals:
|
||||
- allow logging and analysis to be done across different machines
|
||||
- be fast and anticipate usage in high-load environments (*)
|
||||
- be reasonably extensible
|
||||
- make it possible for GNU/Linux distributions to have kmemtrace
|
||||
included in their repositories
|
||||
|
||||
(*) - one of the reasons Pekka Enberg's original userspace data analysis
|
||||
tool's code was rewritten from Perl to C (although this is more than a
|
||||
simple conversion)
|
||||
|
||||
|
||||
III. Quick usage guide
|
||||
======================
|
||||
|
||||
1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
|
||||
CONFIG_KMEMTRACE).
|
||||
|
||||
2) Get the userspace tool and build it:
|
||||
$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository
|
||||
$ cd kmemtrace-user/
|
||||
$ ./autogen.sh
|
||||
$ ./configure
|
||||
$ make
|
||||
|
||||
3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
|
||||
'single' runlevel (so that relay buffers don't fill up easily), and run
|
||||
kmemtrace:
|
||||
# '$' does not mean user, but root here.
|
||||
$ mount -t debugfs none /sys/kernel/debug
|
||||
$ mount -t proc none /proc
|
||||
$ cd path/to/kmemtrace-user/
|
||||
$ ./kmemtraced
|
||||
Wait a bit, then stop it with CTRL+C.
|
||||
$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't
|
||||
# overrun, should
|
||||
# be zero.
|
||||
$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
|
||||
check its correctness]
|
||||
$ ./kmemtrace-report
|
||||
|
||||
Now you should have a nice and short summary of how the allocator performs.
|
||||
|
||||
IV. FAQ and known issues
|
||||
========================
|
||||
|
||||
Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix
|
||||
this? Should I worry?
|
||||
A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
|
||||
large the number is. You can fix it by supplying a higher
|
||||
'kmemtrace.subbufs=N' kernel parameter.
|
||||
---
|
||||
|
||||
Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
|
||||
A: This is a bug and should be reported. It can occur for a variety of
|
||||
reasons:
|
||||
- possible bugs in relay code
|
||||
- possible misuse of relay by kmemtrace
|
||||
- timestamps being collected unorderly
|
||||
Or you may fix it yourself and send us a patch.
|
||||
---
|
||||
|
||||
Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
|
||||
A: This is a known issue and I'm working on it. These might be true errors
|
||||
in kernel code, which may have inconsistent behavior (e.g. allocating memory
|
||||
with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
|
||||
out this behavior may work with SLAB, but may fail with other allocators.
|
||||
|
||||
It may also be due to lack of tracing in some unusual allocator functions.
|
||||
|
||||
We don't want bug reports regarding this issue yet.
|
||||
---
|
||||
|
||||
V. See also
|
||||
===========
|
||||
|
||||
Documentation/kernel-parameters.txt
|
||||
Documentation/ABI/testing/debugfs-kmemtrace
|
||||
|
163
Documentation/trace/mmiotrace.txt
Normal file
163
Documentation/trace/mmiotrace.txt
Normal file
@@ -0,0 +1,163 @@
|
||||
In-kernel memory-mapped I/O tracing
|
||||
|
||||
|
||||
Home page and links to optional user space tools:
|
||||
|
||||
http://nouveau.freedesktop.org/wiki/MmioTrace
|
||||
|
||||
MMIO tracing was originally developed by Intel around 2003 for their Fault
|
||||
Injection Test Harness. In Dec 2006 - Jan 2007, using the code from Intel,
|
||||
Jeff Muizelaar created a tool for tracing MMIO accesses with the Nouveau
|
||||
project in mind. Since then many people have contributed.
|
||||
|
||||
Mmiotrace was built for reverse engineering any memory-mapped IO device with
|
||||
the Nouveau project as the first real user. Only x86 and x86_64 architectures
|
||||
are supported.
|
||||
|
||||
Out-of-tree mmiotrace was originally modified for mainline inclusion and
|
||||
ftrace framework by Pekka Paalanen <pq@iki.fi>.
|
||||
|
||||
|
||||
Preparation
|
||||
-----------
|
||||
|
||||
Mmiotrace feature is compiled in by the CONFIG_MMIOTRACE option. Tracing is
|
||||
disabled by default, so it is safe to have this set to yes. SMP systems are
|
||||
supported, but tracing is unreliable and may miss events if more than one CPU
|
||||
is on-line, therefore mmiotrace takes all but one CPU off-line during run-time
|
||||
activation. You can re-enable CPUs by hand, but you have been warned, there
|
||||
is no way to automatically detect if you are losing events due to CPUs racing.
|
||||
|
||||
|
||||
Usage Quick Reference
|
||||
---------------------
|
||||
|
||||
$ mount -t debugfs debugfs /debug
|
||||
$ echo mmiotrace > /debug/tracing/current_tracer
|
||||
$ cat /debug/tracing/trace_pipe > mydump.txt &
|
||||
Start X or whatever.
|
||||
$ echo "X is up" > /debug/tracing/trace_marker
|
||||
$ echo nop > /debug/tracing/current_tracer
|
||||
Check for lost events.
|
||||
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
Make sure debugfs is mounted to /debug. If not, (requires root privileges)
|
||||
$ mount -t debugfs debugfs /debug
|
||||
|
||||
Check that the driver you are about to trace is not loaded.
|
||||
|
||||
Activate mmiotrace (requires root privileges):
|
||||
$ echo mmiotrace > /debug/tracing/current_tracer
|
||||
|
||||
Start storing the trace:
|
||||
$ cat /debug/tracing/trace_pipe > mydump.txt &
|
||||
The 'cat' process should stay running (sleeping) in the background.
|
||||
|
||||
Load the driver you want to trace and use it. Mmiotrace will only catch MMIO
|
||||
accesses to areas that are ioremapped while mmiotrace is active.
|
||||
|
||||
During tracing you can place comments (markers) into the trace by
|
||||
$ echo "X is up" > /debug/tracing/trace_marker
|
||||
This makes it easier to see which part of the (huge) trace corresponds to
|
||||
which action. It is recommended to place descriptive markers about what you
|
||||
do.
|
||||
|
||||
Shut down mmiotrace (requires root privileges):
|
||||
$ echo nop > /debug/tracing/current_tracer
|
||||
The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
|
||||
pressing ctrl+c.
|
||||
|
||||
Check that mmiotrace did not lose events due to a buffer filling up. Either
|
||||
$ grep -i lost mydump.txt
|
||||
which tells you exactly how many events were lost, or use
|
||||
$ dmesg
|
||||
to view your kernel log and look for "mmiotrace has lost events" warning. If
|
||||
events were lost, the trace is incomplete. You should enlarge the buffers and
|
||||
try again. Buffers are enlarged by first seeing how large the current buffers
|
||||
are:
|
||||
$ cat /debug/tracing/buffer_size_kb
|
||||
gives you a number. Approximately double this number and write it back, for
|
||||
instance:
|
||||
$ echo 128000 > /debug/tracing/buffer_size_kb
|
||||
Then start again from the top.
|
||||
|
||||
If you are doing a trace for a driver project, e.g. Nouveau, you should also
|
||||
do the following before sending your results:
|
||||
$ lspci -vvv > lspci.txt
|
||||
$ dmesg > dmesg.txt
|
||||
$ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
|
||||
and then send the .tar.gz file. The trace compresses considerably. Replace
|
||||
"pciid" and "nick" with the PCI ID or model name of your piece of hardware
|
||||
under investigation and your nick name.
|
||||
|
||||
|
||||
How Mmiotrace Works
|
||||
-------------------
|
||||
|
||||
Access to hardware IO-memory is gained by mapping addresses from PCI bus by
|
||||
calling one of the ioremap_*() functions. Mmiotrace is hooked into the
|
||||
__ioremap() function and gets called whenever a mapping is created. Mapping is
|
||||
an event that is recorded into the trace log. Note, that ISA range mappings
|
||||
are not caught, since the mapping always exists and is returned directly.
|
||||
|
||||
MMIO accesses are recorded via page faults. Just before __ioremap() returns,
|
||||
the mapped pages are marked as not present. Any access to the pages causes a
|
||||
fault. The page fault handler calls mmiotrace to handle the fault. Mmiotrace
|
||||
marks the page present, sets TF flag to achieve single stepping and exits the
|
||||
fault handler. The instruction that faulted is executed and debug trap is
|
||||
entered. Here mmiotrace again marks the page as not present. The instruction
|
||||
is decoded to get the type of operation (read/write), data width and the value
|
||||
read or written. These are stored to the trace log.
|
||||
|
||||
Setting the page present in the page fault handler has a race condition on SMP
|
||||
machines. During the single stepping other CPUs may run freely on that page
|
||||
and events can be missed without a notice. Re-enabling other CPUs during
|
||||
tracing is discouraged.
|
||||
|
||||
|
||||
Trace Log Format
|
||||
----------------
|
||||
|
||||
The raw log is text and easily filtered with e.g. grep and awk. One record is
|
||||
one line in the log. A record starts with a keyword, followed by keyword
|
||||
dependant arguments. Arguments are separated by a space, or continue until the
|
||||
end of line. The format for version 20070824 is as follows:
|
||||
|
||||
Explanation Keyword Space separated arguments
|
||||
---------------------------------------------------------------------------
|
||||
|
||||
read event R width, timestamp, map id, physical, value, PC, PID
|
||||
write event W width, timestamp, map id, physical, value, PC, PID
|
||||
ioremap event MAP timestamp, map id, physical, virtual, length, PC, PID
|
||||
iounmap event UNMAP timestamp, map id, PC, PID
|
||||
marker MARK timestamp, text
|
||||
version VERSION the string "20070824"
|
||||
info for reader LSPCI one line from lspci -v
|
||||
PCI address map PCIDEV space separated /proc/bus/pci/devices data
|
||||
unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID
|
||||
|
||||
Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual
|
||||
is a kernel virtual address. Width is the data width in bytes and value is the
|
||||
data value. Map id is an arbitrary id number identifying the mapping that was
|
||||
used in an operation. PC is the program counter and PID is process id. PC is
|
||||
zero if it is not recorded. PID is always zero as tracing MMIO accesses
|
||||
originating in user space memory is not yet supported.
|
||||
|
||||
For instance, the following awk filter will pass all 32-bit writes that target
|
||||
physical addresses in the range [0xfb73ce40, 0xfb800000[
|
||||
|
||||
$ awk '/W 4 / { adr=strtonum($5); if (adr >= 0xfb73ce40 &&
|
||||
adr < 0xfb800000) print; }'
|
||||
|
||||
|
||||
Tools for Developers
|
||||
--------------------
|
||||
|
||||
The user space tools include utilities for:
|
||||
- replacing numeric addresses and values with hardware register names
|
||||
- replaying MMIO logs, i.e., re-executing the recorded writes
|
||||
|
||||
|
116
Documentation/trace/tracepoints.txt
Normal file
116
Documentation/trace/tracepoints.txt
Normal file
@@ -0,0 +1,116 @@
|
||||
Using the Linux Kernel Tracepoints
|
||||
|
||||
Mathieu Desnoyers
|
||||
|
||||
|
||||
This document introduces Linux Kernel Tracepoints and their use. It
|
||||
provides examples of how to insert tracepoints in the kernel and
|
||||
connect probe functions to them and provides some examples of probe
|
||||
functions.
|
||||
|
||||
|
||||
* Purpose of tracepoints
|
||||
|
||||
A tracepoint placed in code provides a hook to call a function (probe)
|
||||
that you can provide at runtime. A tracepoint can be "on" (a probe is
|
||||
connected to it) or "off" (no probe is attached). When a tracepoint is
|
||||
"off" it has no effect, except for adding a tiny time penalty
|
||||
(checking a condition for a branch) and space penalty (adding a few
|
||||
bytes for the function call at the end of the instrumented function
|
||||
and adds a data structure in a separate section). When a tracepoint
|
||||
is "on", the function you provide is called each time the tracepoint
|
||||
is executed, in the execution context of the caller. When the function
|
||||
provided ends its execution, it returns to the caller (continuing from
|
||||
the tracepoint site).
|
||||
|
||||
You can put tracepoints at important locations in the code. They are
|
||||
lightweight hooks that can pass an arbitrary number of parameters,
|
||||
which prototypes are described in a tracepoint declaration placed in a
|
||||
header file.
|
||||
|
||||
They can be used for tracing and performance accounting.
|
||||
|
||||
|
||||
* Usage
|
||||
|
||||
Two elements are required for tracepoints :
|
||||
|
||||
- A tracepoint definition, placed in a header file.
|
||||
- The tracepoint statement, in C code.
|
||||
|
||||
In order to use tracepoints, you should include linux/tracepoint.h.
|
||||
|
||||
In include/trace/subsys.h :
|
||||
|
||||
#include <linux/tracepoint.h>
|
||||
|
||||
DECLARE_TRACE(subsys_eventname,
|
||||
TP_PROTO(int firstarg, struct task_struct *p),
|
||||
TP_ARGS(firstarg, p));
|
||||
|
||||
In subsys/file.c (where the tracing statement must be added) :
|
||||
|
||||
#include <trace/subsys.h>
|
||||
|
||||
DEFINE_TRACE(subsys_eventname);
|
||||
|
||||
void somefct(void)
|
||||
{
|
||||
...
|
||||
trace_subsys_eventname(arg, task);
|
||||
...
|
||||
}
|
||||
|
||||
Where :
|
||||
- subsys_eventname is an identifier unique to your event
|
||||
- subsys is the name of your subsystem.
|
||||
- eventname is the name of the event to trace.
|
||||
|
||||
- TP_PROTO(int firstarg, struct task_struct *p) is the prototype of the
|
||||
function called by this tracepoint.
|
||||
|
||||
- TP_ARGS(firstarg, p) are the parameters names, same as found in the
|
||||
prototype.
|
||||
|
||||
Connecting a function (probe) to a tracepoint is done by providing a
|
||||
probe (function to call) for the specific tracepoint through
|
||||
register_trace_subsys_eventname(). Removing a probe is done through
|
||||
unregister_trace_subsys_eventname(); it will remove the probe.
|
||||
|
||||
tracepoint_synchronize_unregister() must be called before the end of
|
||||
the module exit function to make sure there is no caller left using
|
||||
the probe. This, and the fact that preemption is disabled around the
|
||||
probe call, make sure that probe removal and module unload are safe.
|
||||
See the "Probe example" section below for a sample probe module.
|
||||
|
||||
The tracepoint mechanism supports inserting multiple instances of the
|
||||
same tracepoint, but a single definition must be made of a given
|
||||
tracepoint name over all the kernel to make sure no type conflict will
|
||||
occur. Name mangling of the tracepoints is done using the prototypes
|
||||
to make sure typing is correct. Verification of probe type correctness
|
||||
is done at the registration site by the compiler. Tracepoints can be
|
||||
put in inline functions, inlined static functions, and unrolled loops
|
||||
as well as regular functions.
|
||||
|
||||
The naming scheme "subsys_event" is suggested here as a convention
|
||||
intended to limit collisions. Tracepoint names are global to the
|
||||
kernel: they are considered as being the same whether they are in the
|
||||
core kernel image or in modules.
|
||||
|
||||
If the tracepoint has to be used in kernel modules, an
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
|
||||
used to export the defined tracepoints.
|
||||
|
||||
* Probe / tracepoint example
|
||||
|
||||
See the example provided in samples/tracepoints
|
||||
|
||||
Compile them with your kernel. They are built during 'make' (not
|
||||
'make modules') when CONFIG_SAMPLE_TRACEPOINTS=m.
|
||||
|
||||
Run, as root :
|
||||
modprobe tracepoint-sample (insmod order is not important)
|
||||
modprobe tracepoint-probe-sample
|
||||
cat /proc/tracepoint-sample (returns an expected error)
|
||||
rmmod tracepoint-sample tracepoint-probe-sample
|
||||
dmesg
|
Reference in New Issue
Block a user