diff options
-rw-r--r-- | documentation/profile-manual/profile-manual-usage.xml | 353 |
1 files changed, 353 insertions, 0 deletions
diff --git a/documentation/profile-manual/profile-manual-usage.xml b/documentation/profile-manual/profile-manual-usage.xml index 71feb418fd..cfe916a795 100644 --- a/documentation/profile-manual/profile-manual-usage.xml +++ b/documentation/profile-manual/profile-manual-usage.xml @@ -981,10 +981,363 @@ </para> </section> + <section id='system-wide-tracing-and-profiling'> + <title>System-Wide Tracing and Profiling</title> + <para> + The examples so far have focused on tracing a particular program or + workload - in other words, every profiling run has specified the + program to profile in the command-line e.g. 'perf record wget ...'. + </para> + + <para> + It's also possible, and more interesting in many cases, to run a + system-wide profile or trace while running the workload in a + separate shell. + </para> + + <para> + To do system-wide profiling or tracing, you typically use + the -a flag to 'perf record'. + </para> + + <para> + To demonstrate this, open up one window and start the profile + using the -a flag (press Ctrl-C to stop tracing): + <literallayout class='monospaced'> + root@crownbay:~# perf record -g -a + ^C[ perf record: Woken up 6 times to write data ] + [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ] + </literallayout> + In another window, run the wget test: + <literallayout class='monospaced'> + root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink> + Connecting to downloads.yoctoproject.org (140.211.169.59:80) + linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA + </literallayout> + Here we see entries not only for our wget load, but for other + processes running on the system as well: + </para> + + <para> + <imagedata fileref="figures/perf-systemwide.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> + + <para> + In the snapshot above, we can see callchains that originate in + libc, and a callchain from Xorg that demonstrates that we're + using a proprietary X driver in userspace (notice the presence + of 'PVR' and some other unresolvable symbols in the expanded + Xorg callchain). + </para> + + <para> + Note also that we have both kernel and userspace entries in the + above snapshot. We can also tell perf to focus on userspace but + providing a modifier, in this case 'u', to the 'cycles' hardware + counter when we record a profile: + <literallayout class='monospaced'> + root@crownbay:~# perf record -g -a -e cycles:u + ^C[ perf record: Woken up 2 times to write data ] + [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ] + </literallayout> + </para> + + <para> + <imagedata fileref="figures/perf-report-cycles-u.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> + + <para> + Notice in the screenshot above, we see only userspace entries ([.]) + </para> + + <para> + Finally, we can press 'enter' on a leaf node and select the 'Zoom + into DSO' menu item to show only entries associated with a + specific DSO. In the screenshot below, we've zoomed into the + 'libc' DSO which shows all the entries associated with the + libc-xxx.so DSO. + </para> + + <para> + <imagedata fileref="figures/perf-systemwide-libc.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> + + <para> + We can also use the system-wide -a switch to do system-wide + tracing. Here we'll trace a couple of scheduler events: + <literallayout class='monospaced'> + root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup + ^C[ perf record: Woken up 38 times to write data ] + [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ] + </literallayout> + We can look at the raw output using 'perf script' with no + arguments: + <literallayout class='monospaced'> + root@crownbay:~# perf script + + perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 + perf 1383 [001] 6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 + kworker/1:1 21 [001] 6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 + swapper 0 [000] 6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 + swapper 0 [000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 + kworker/0:3 1209 [000] 6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 + perf 1383 [001] 6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 + perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 + kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 + perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 + </literallayout> + </para> + <section id='perf-filtering'> + <title>Filtering</title> + <para> + Notice that there are a lot of events that don't really have + anything to do with what we're interested in, namely events + that schedule 'perf' itself in and out or that wake perf up. + We can get rid of those by using the '--filter' option - + for each event we specify using -e, we can add a --filter + after that to filter out trace events that contain fields + with specific values: + <literallayout class='monospaced'> + root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf' + ^C[ perf record: Woken up 38 times to write data ] + [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ] + + root@crownbay:~# perf script + + swapper 0 [000] 7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 + kworker/0:3 1209 [000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 + perf 1407 [001] 7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 + perf 1407 [001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 + perf 1407 [001] 7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 + perf 1407 [001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 + perf 1407 [001] 7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 + perf 1407 [001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 + swapper 0 [001] 7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 + swapper 0 [001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 + kworker/1:1 21 [001] 7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120 + swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 + swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 + kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 + </literallayout> + In this case, we've filtered out all events that have 'perf' + in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice + that there are still events recorded for perf, but notice + that those events don't have values of 'perf' for the filtered + fields. To completely filter out anything from perf will + require a bit more work, but for the purpose of demonstrating + how to use filters, it's close enough. + </para> + + <note> + Tying It Together: These are exactly the same set of event + filters defined by the trace event subsystem. See the + ftrace/tracecmd/kernelshark section for more discussion about + these event filters. + </note> + + <note> + Tying It Together: These event filters are implemented by a + special-purpose pseudo-interpreter in the kernel and are an + integral and indispensable part of the perf design as it + relates to tracing. kernel-based event filters provide a + mechanism to precisely throttle the event stream that appears + in user space, where it makes sense to provide bindings to real + programming languages for postprocessing the event stream. + This architecture allows for the intelligent and flexible + partitioning of processing between the kernel and user space. + Contrast this with other tools such as SystemTap, which does + all of its processing in the kernel and as such requires a + special project-defined language in order to accommodate that + design, or LTTng, where everything is sent to userspace and + as such requires a super-efficient kernel-to-userspace + transport mechanism in order to function properly. While + perf certainly can benefit from for instance advances in + the design of the transport, it doesn't fundamentally depend + on them. Basically, if you find that your perf tracing + application is causing buffer I/O overruns, it probably + means that you aren't taking enough advantage of the + kernel filtering engine. + </note> + </section> + </section> + + <section id='using-dynamic-tracepoints'> + <title>Using Dynamic Tracepoints</title> + + <para> + perf isn't restricted to the fixed set of static tracepoints + listed by 'perf list'. Users can also add their own 'dynamic' + tracepoints anywhere in the kernel. For instance, suppose we + want to define our own tracepoint on do_fork(). We can do that + using the 'perf probe' perf subcommand: + <literallayout class='monospaced'> + root@crownbay:~# perf probe do_fork + Added new event: + probe:do_fork (on do_fork) + + You can now use it in all perf tools, such as: + + perf record -e probe:do_fork -aR sleep 1 + </literallayout> + Adding a new tracepoint via 'perf probe' results in an event + with all the expected files and format in + /sys/kernel/debug/tracing/events, just the same as for static + tracepoints (as discussed in more detail in the trace events + subsystem section: + <literallayout class='monospaced'> + root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al + drwxr-xr-x 2 root root 0 Oct 28 11:42 . + drwxr-xr-x 3 root root 0 Oct 28 11:42 .. + -rw-r--r-- 1 root root 0 Oct 28 11:42 enable + -rw-r--r-- 1 root root 0 Oct 28 11:42 filter + -r--r--r-- 1 root root 0 Oct 28 11:42 format + -r--r--r-- 1 root root 0 Oct 28 11:42 id + + root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format + name: do_fork + ID: 944 + format: + field:unsigned short common_type; offset:0; size:2; signed:0; + field:unsigned char common_flags; offset:2; size:1; signed:0; + field:unsigned char common_preempt_count; offset:3; size:1; signed:0; + field:int common_pid; offset:4; size:4; signed:1; + field:int common_padding; offset:8; size:4; signed:1; + + field:unsigned long __probe_ip; offset:12; size:4; signed:0; + + print fmt: "(%lx)", REC->__probe_ip + </literallayout> + We can list all dynamic tracepoints currently in existence: + <literallayout class='monospaced'> + root@crownbay:~# perf probe -l + probe:do_fork (on do_fork) + probe:schedule (on schedule) + </literallayout> + Let's record system-wide ('sleep 30' is a trick for recording + system-wide but basically do nothing and then wake up after + 30 seconds): + <literallayout class='monospaced'> + root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30 + [ perf record: Woken up 1 times to write data ] + [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ] + </literallayout> + Using 'perf script' we can see each do_fork event that fired: + <literallayout class='monospaced'> + root@crownbay:~# perf script + + # ======== + # captured on: Sun Oct 28 11:55:18 2012 + # hostname : crownbay + # os release : 3.4.11-yocto-standard + # perf version : 3.4.11 + # arch : i686 + # nrcpus online : 2 + # nrcpus avail : 2 + # cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz + # cpuid : GenuineIntel,6,38,1 + # total memory : 1017184 kB + # cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30 + # event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern + = 0, id = { 5, 6 } + # HEADER_CPU_TOPOLOGY info available, use -I to display + # ======== + # + matchbox-deskto 1197 [001] 34211.378318: do_fork: (c1028460) + matchbox-deskto 1295 [001] 34211.380388: do_fork: (c1028460) + pcmanfm 1296 [000] 34211.632350: do_fork: (c1028460) + pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460) + matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460) + matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460) + gthumb 1300 [001] 34217.697451: do_fork: (c1028460) + gthumb 1300 [001] 34219.085734: do_fork: (c1028460) + gthumb 1300 [000] 34219.121351: do_fork: (c1028460) + gthumb 1300 [001] 34219.264551: do_fork: (c1028460) + pcmanfm 1296 [000] 34219.590380: do_fork: (c1028460) + matchbox-deskto 1197 [001] 34224.955965: do_fork: (c1028460) + matchbox-deskto 1306 [001] 34224.957972: do_fork: (c1028460) + matchbox-termin 1307 [000] 34225.038214: do_fork: (c1028460) + matchbox-termin 1307 [001] 34225.044218: do_fork: (c1028460) + matchbox-termin 1307 [000] 34225.046442: do_fork: (c1028460) + matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460) + matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460) + gaku 1312 [000] 34237.202388: do_fork: (c1028460) + </literallayout> + And using 'perf report' on the same file, we can see the + callgraphs from starting a few programs during those 30 seconds: + </para> + + <para> + <imagedata fileref="figures/perf-probe-do_fork-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> + + <note> + Tying It Together: The trace events subsystem accomodate static + and dynamic tracepoints in exactly the same way - there's no + difference as far as the infrastructure is concerned. See the + ftrace section for more details on the trace event subsystem. + </note> + + <note> + Tying It Together: Dynamic tracepoints are implemented under the + covers by kprobes and uprobes. kprobes and uprobes are also used + by and in fact are the main focus of SystemTap. + </note> + </section> + + <section id='perf-documentation'> + <title>Documentation</title> + + <para> + Online versions of the man pages for the commands discussed in this + section can be found here: + <itemizedlist> + <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-stat'>'perf stat' manpage</ulink>. + </para></listitem> + <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-record'>'perf record' manpage</ulink>. + </para></listitem> + <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-report'>'perf report' manpage</ulink>. + </para></listitem> + <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-probe'>'perf probe' manpage</ulink>. + </para></listitem> + <listitem><para>The <ulink url='http://linux.die.net/man/1/perf-script'>'perf script' manpage</ulink>. + </para></listitem> + <listitem><para>Documentation on using the + <ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>. + </para></listitem> + <listitem><para>The top-level + <ulink url='http://linux.die.net/man/1/perf'>perf(1) manpage</ulink>. + </para></listitem> + </itemizedlist> + </para> + + <para> + Normally, you should be able to invoke the man pages via perf + itself e.g. 'perf help' or 'perf help record'. + </para> + + <para> + However, by default Yocto doesn't install man pages, but perf + invokes the man pages for most help functionality. This is a bug + and is being addressed by a Yocto bug: + <ulink url='https://bugzilla.yoctoproject.org/show_bug.cgi?id=3388'>Bug 3388 - perf: enable man pages for basic 'help' functionality</ulink>. + </para> + + <para> + The man pages in text form, along with some other files, such as + a set of examples, can be found in the 'perf' directory of the + kernel tree: + <literallayout class='monospaced'> + tools/perf/Documentation + </literallayout> + There's also a nice perf tutorial on the perf wiki that goes + into more detail than we do here in certain areas: + <ulink url='https://perf.wiki.kernel.org/index.php/Tutorial'>Perf Tutorial</ulink> + </para> + </section> </section> </chapter> <!-- |