tracing: consolidate documents

Move kmemtrace.txt, tracepoints.txt, ftrace.txt and mmiotrace.txt to
the new trace/ directory.

I didnt find any references to those documents in both source
files and documents, so no extra work needs to be done.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Pekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
LKML-Reference: <49DD6E2B.6090200@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This commit is contained in:
Li Zefan
2009-04-09 11:40:27 +08:00
committed by Ingo Molnar
parent 9eb85125ce
commit 66bb74888e
4 changed files with 0 additions and 0 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,126 @@
kmemtrace - Kernel Memory Tracer
by Eduard - Gabriel Munteanu
<eduard.munteanu@linux360.ro>
I. Introduction
===============
kmemtrace helps kernel developers figure out two things:
1) how different allocators (SLAB, SLUB etc.) perform
2) how kernel code allocates memory and how much
To do this, we trace every allocation and export information to the userspace
through the relay interface. We export things such as the number of requested
bytes, the number of bytes actually allocated (i.e. including internal
fragmentation), whether this is a slab allocation or a plain kmalloc() and so
on.
The actual analysis is performed by a userspace tool (see section III for
details on where to get it from). It logs the data exported by the kernel,
processes it and (as of writing this) can provide the following information:
- the total amount of memory allocated and fragmentation per call-site
- the amount of memory allocated and fragmentation per allocation
- total memory allocated and fragmentation in the collected dataset
- number of cross-CPU allocation and frees (makes sense in NUMA environments)
Moreover, it can potentially find inconsistent and erroneous behavior in
kernel code, such as using slab free functions on kmalloc'ed memory or
allocating less memory than requested (but not truly failed allocations).
kmemtrace also makes provisions for tracing on some arch and analysing the
data on another.
II. Design and goals
====================
kmemtrace was designed to handle rather large amounts of data. Thus, it uses
the relay interface to export whatever is logged to userspace, which then
stores it. Analysis and reporting is done asynchronously, that is, after the
data is collected and stored. By design, it allows one to log and analyse
on different machines and different arches.
As of writing this, the ABI is not considered stable, though it might not
change much. However, no guarantees are made about compatibility yet. When
deemed stable, the ABI should still allow easy extension while maintaining
backward compatibility. This is described further in Documentation/ABI.
Summary of design goals:
- allow logging and analysis to be done across different machines
- be fast and anticipate usage in high-load environments (*)
- be reasonably extensible
- make it possible for GNU/Linux distributions to have kmemtrace
included in their repositories
(*) - one of the reasons Pekka Enberg's original userspace data analysis
tool's code was rewritten from Perl to C (although this is more than a
simple conversion)
III. Quick usage guide
======================
1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
CONFIG_KMEMTRACE).
2) Get the userspace tool and build it:
$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository
$ cd kmemtrace-user/
$ ./autogen.sh
$ ./configure
$ make
3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
'single' runlevel (so that relay buffers don't fill up easily), and run
kmemtrace:
# '$' does not mean user, but root here.
$ mount -t debugfs none /sys/kernel/debug
$ mount -t proc none /proc
$ cd path/to/kmemtrace-user/
$ ./kmemtraced
Wait a bit, then stop it with CTRL+C.
$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't
# overrun, should
# be zero.
$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
check its correctness]
$ ./kmemtrace-report
Now you should have a nice and short summary of how the allocator performs.
IV. FAQ and known issues
========================
Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix
this? Should I worry?
A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
large the number is. You can fix it by supplying a higher
'kmemtrace.subbufs=N' kernel parameter.
---
Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
A: This is a bug and should be reported. It can occur for a variety of
reasons:
- possible bugs in relay code
- possible misuse of relay by kmemtrace
- timestamps being collected unorderly
Or you may fix it yourself and send us a patch.
---
Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
A: This is a known issue and I'm working on it. These might be true errors
in kernel code, which may have inconsistent behavior (e.g. allocating memory
with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
out this behavior may work with SLAB, but may fail with other allocators.
It may also be due to lack of tracing in some unusual allocator functions.
We don't want bug reports regarding this issue yet.
---
V. See also
===========
Documentation/kernel-parameters.txt
Documentation/ABI/testing/debugfs-kmemtrace

View File

@@ -0,0 +1,163 @@
In-kernel memory-mapped I/O tracing
Home page and links to optional user space tools:
http://nouveau.freedesktop.org/wiki/MmioTrace
MMIO tracing was originally developed by Intel around 2003 for their Fault
Injection Test Harness. In Dec 2006 - Jan 2007, using the code from Intel,
Jeff Muizelaar created a tool for tracing MMIO accesses with the Nouveau
project in mind. Since then many people have contributed.
Mmiotrace was built for reverse engineering any memory-mapped IO device with
the Nouveau project as the first real user. Only x86 and x86_64 architectures
are supported.
Out-of-tree mmiotrace was originally modified for mainline inclusion and
ftrace framework by Pekka Paalanen <pq@iki.fi>.
Preparation
-----------
Mmiotrace feature is compiled in by the CONFIG_MMIOTRACE option. Tracing is
disabled by default, so it is safe to have this set to yes. SMP systems are
supported, but tracing is unreliable and may miss events if more than one CPU
is on-line, therefore mmiotrace takes all but one CPU off-line during run-time
activation. You can re-enable CPUs by hand, but you have been warned, there
is no way to automatically detect if you are losing events due to CPUs racing.
Usage Quick Reference
---------------------
$ mount -t debugfs debugfs /debug
$ echo mmiotrace > /debug/tracing/current_tracer
$ cat /debug/tracing/trace_pipe > mydump.txt &
Start X or whatever.
$ echo "X is up" > /debug/tracing/trace_marker
$ echo nop > /debug/tracing/current_tracer
Check for lost events.
Usage
-----
Make sure debugfs is mounted to /debug. If not, (requires root privileges)
$ mount -t debugfs debugfs /debug
Check that the driver you are about to trace is not loaded.
Activate mmiotrace (requires root privileges):
$ echo mmiotrace > /debug/tracing/current_tracer
Start storing the trace:
$ cat /debug/tracing/trace_pipe > mydump.txt &
The 'cat' process should stay running (sleeping) in the background.
Load the driver you want to trace and use it. Mmiotrace will only catch MMIO
accesses to areas that are ioremapped while mmiotrace is active.
During tracing you can place comments (markers) into the trace by
$ echo "X is up" > /debug/tracing/trace_marker
This makes it easier to see which part of the (huge) trace corresponds to
which action. It is recommended to place descriptive markers about what you
do.
Shut down mmiotrace (requires root privileges):
$ echo nop > /debug/tracing/current_tracer
The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
pressing ctrl+c.
Check that mmiotrace did not lose events due to a buffer filling up. Either
$ grep -i lost mydump.txt
which tells you exactly how many events were lost, or use
$ dmesg
to view your kernel log and look for "mmiotrace has lost events" warning. If
events were lost, the trace is incomplete. You should enlarge the buffers and
try again. Buffers are enlarged by first seeing how large the current buffers
are:
$ cat /debug/tracing/buffer_size_kb
gives you a number. Approximately double this number and write it back, for
instance:
$ echo 128000 > /debug/tracing/buffer_size_kb
Then start again from the top.
If you are doing a trace for a driver project, e.g. Nouveau, you should also
do the following before sending your results:
$ lspci -vvv > lspci.txt
$ dmesg > dmesg.txt
$ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
and then send the .tar.gz file. The trace compresses considerably. Replace
"pciid" and "nick" with the PCI ID or model name of your piece of hardware
under investigation and your nick name.
How Mmiotrace Works
-------------------
Access to hardware IO-memory is gained by mapping addresses from PCI bus by
calling one of the ioremap_*() functions. Mmiotrace is hooked into the
__ioremap() function and gets called whenever a mapping is created. Mapping is
an event that is recorded into the trace log. Note, that ISA range mappings
are not caught, since the mapping always exists and is returned directly.
MMIO accesses are recorded via page faults. Just before __ioremap() returns,
the mapped pages are marked as not present. Any access to the pages causes a
fault. The page fault handler calls mmiotrace to handle the fault. Mmiotrace
marks the page present, sets TF flag to achieve single stepping and exits the
fault handler. The instruction that faulted is executed and debug trap is
entered. Here mmiotrace again marks the page as not present. The instruction
is decoded to get the type of operation (read/write), data width and the value
read or written. These are stored to the trace log.
Setting the page present in the page fault handler has a race condition on SMP
machines. During the single stepping other CPUs may run freely on that page
and events can be missed without a notice. Re-enabling other CPUs during
tracing is discouraged.
Trace Log Format
----------------
The raw log is text and easily filtered with e.g. grep and awk. One record is
one line in the log. A record starts with a keyword, followed by keyword
dependant arguments. Arguments are separated by a space, or continue until the
end of line. The format for version 20070824 is as follows:
Explanation Keyword Space separated arguments
---------------------------------------------------------------------------
read event R width, timestamp, map id, physical, value, PC, PID
write event W width, timestamp, map id, physical, value, PC, PID
ioremap event MAP timestamp, map id, physical, virtual, length, PC, PID
iounmap event UNMAP timestamp, map id, PC, PID
marker MARK timestamp, text
version VERSION the string "20070824"
info for reader LSPCI one line from lspci -v
PCI address map PCIDEV space separated /proc/bus/pci/devices data
unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID
Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual
is a kernel virtual address. Width is the data width in bytes and value is the
data value. Map id is an arbitrary id number identifying the mapping that was
used in an operation. PC is the program counter and PID is process id. PC is
zero if it is not recorded. PID is always zero as tracing MMIO accesses
originating in user space memory is not yet supported.
For instance, the following awk filter will pass all 32-bit writes that target
physical addresses in the range [0xfb73ce40, 0xfb800000[
$ awk '/W 4 / { adr=strtonum($5); if (adr >= 0xfb73ce40 &&
adr < 0xfb800000) print; }'
Tools for Developers
--------------------
The user space tools include utilities for:
- replacing numeric addresses and values with hardware register names
- replaying MMIO logs, i.e., re-executing the recorded writes

View File

@@ -0,0 +1,116 @@
Using the Linux Kernel Tracepoints
Mathieu Desnoyers
This document introduces Linux Kernel Tracepoints and their use. It
provides examples of how to insert tracepoints in the kernel and
connect probe functions to them and provides some examples of probe
functions.
* Purpose of tracepoints
A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty
(checking a condition for a branch) and space penalty (adding a few
bytes for the function call at the end of the instrumented function
and adds a data structure in a separate section). When a tracepoint
is "on", the function you provide is called each time the tracepoint
is executed, in the execution context of the caller. When the function
provided ends its execution, it returns to the caller (continuing from
the tracepoint site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters,
which prototypes are described in a tracepoint declaration placed in a
header file.
They can be used for tracing and performance accounting.
* Usage
Two elements are required for tracepoints :
- A tracepoint definition, placed in a header file.
- The tracepoint statement, in C code.
In order to use tracepoints, you should include linux/tracepoint.h.
In include/trace/subsys.h :
#include <linux/tracepoint.h>
DECLARE_TRACE(subsys_eventname,
TP_PROTO(int firstarg, struct task_struct *p),
TP_ARGS(firstarg, p));
In subsys/file.c (where the tracing statement must be added) :
#include <trace/subsys.h>
DEFINE_TRACE(subsys_eventname);
void somefct(void)
{
...
trace_subsys_eventname(arg, task);
...
}
Where :
- subsys_eventname is an identifier unique to your event
- subsys is the name of your subsystem.
- eventname is the name of the event to trace.
- TP_PROTO(int firstarg, struct task_struct *p) is the prototype of the
function called by this tracepoint.
- TP_ARGS(firstarg, p) are the parameters names, same as found in the
prototype.
Connecting a function (probe) to a tracepoint is done by providing a
probe (function to call) for the specific tracepoint through
register_trace_subsys_eventname(). Removing a probe is done through
unregister_trace_subsys_eventname(); it will remove the probe.
tracepoint_synchronize_unregister() must be called before the end of
the module exit function to make sure there is no caller left using
the probe. This, and the fact that preemption is disabled around the
probe call, make sure that probe removal and module unload are safe.
See the "Probe example" section below for a sample probe module.
The tracepoint mechanism supports inserting multiple instances of the
same tracepoint, but a single definition must be made of a given
tracepoint name over all the kernel to make sure no type conflict will
occur. Name mangling of the tracepoints is done using the prototypes
to make sure typing is correct. Verification of probe type correctness
is done at the registration site by the compiler. Tracepoints can be
put in inline functions, inlined static functions, and unrolled loops
as well as regular functions.
The naming scheme "subsys_event" is suggested here as a convention
intended to limit collisions. Tracepoint names are global to the
kernel: they are considered as being the same whether they are in the
core kernel image or in modules.
If the tracepoint has to be used in kernel modules, an
EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
used to export the defined tracepoints.
* Probe / tracepoint example
See the example provided in samples/tracepoints
Compile them with your kernel. They are built during 'make' (not
'make modules') when CONFIG_SAMPLE_TRACEPOINTS=m.
Run, as root :
modprobe tracepoint-sample (insmod order is not important)
modprobe tracepoint-probe-sample
cat /proc/tracepoint-sample (returns an expected error)
rmmod tracepoint-sample tracepoint-probe-sample
dmesg