diff options
author | Scott Rifenbark <scott.m.rifenbark@intel.com> | 2013-01-16 12:49:45 -0800 |
---|---|---|
committer | Richard Purdie <richard.purdie@linuxfoundation.org> | 2013-01-27 13:56:02 +0000 |
commit | 86478ff9486516ed428e470afbe76f3cd857888c (patch) | |
tree | b883e2138166f3a9daf986931779a8f4058169bf /documentation/profile-manual | |
parent | 487a76f7c7bfdb47ddec76659794834e84ed04eb (diff) | |
download | openembedded-core-contrib-86478ff9486516ed428e470afbe76f3cd857888c.tar.gz |
profile-manual: Updated indentation and sectioning
Had the sectioning off and it was preventing the manual from
making. I also added more indentation as needed throughout
most of the manual up to the error.
(From yocto-docs rev: 5de3544593a63e96b349babc177970d8605e0098)
Signed-off-by: Scott Rifenbark <scott.m.rifenbark@intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Diffstat (limited to 'documentation/profile-manual')
-rw-r--r-- | documentation/profile-manual/profile-manual-usage.xml | 1257 |
1 files changed, 629 insertions, 628 deletions
diff --git a/documentation/profile-manual/profile-manual-usage.xml b/documentation/profile-manual/profile-manual-usage.xml index cfe916a795..51c483b700 100644 --- a/documentation/profile-manual/profile-manual-usage.xml +++ b/documentation/profile-manual/profile-manual-usage.xml @@ -108,28 +108,28 @@ top System profiling tool. See 'perf help COMMAND' for more information on a specific command. - </literallayout> - </para> + </literallayout> + </para> - <section id='using-perf-to-do-basic-profiling'> - <title>Using perf to do Basic Profiling</title> + <section id='using-perf-to-do-basic-profiling'> + <title>Using perf to do Basic Profiling</title> - <para> - As a simple test case, we'll profile the 'wget' of a fairly large - file, which is a minimally interesting case because it has both - file and network I/O aspects, and at least in the case of standard - Yocto images, it's implemented as part of busybox, so the methods - we use to analyze it can be used in a very similar way to the whole - host of supported busybox applets in Yocto. - <literallayout class='monospaced'> + <para> + As a simple test case, we'll profile the 'wget' of a fairly large + file, which is a minimally interesting case because it has both + file and network I/O aspects, and at least in the case of standard + Yocto images, it's implemented as part of busybox, so the methods + we use to analyze it can be used in a very similar way to the whole + host of supported busybox applets in Yocto. + <literallayout class='monospaced'> root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \ wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 - </literallayout> - The quickest and easiest way to get some basic overall data about - what's going on for a particular workload it to profile it using - 'perf stat'. 'perf stat' basically profiles using a few default - counters and displays the summed counts at the end of the run: - <literallayout class='monospaced'> + </literallayout> + The quickest and easiest way to get some basic overall data about + what's going on for a particular workload it to profile it using + 'perf stat'. 'perf stat' basically profiles using a few default + counters and displays the summed counts at the end of the run: + <literallayout class='monospaced'> root@crownbay:~# perf stat wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 Connecting to downloads.yoctoproject.org (140.211.169.59:80) linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA @@ -148,19 +148,19 @@ 19550329 branch-misses # 11.82% of all branches 59.836627620 seconds time elapsed - </literallayout> - Many times such a simple-minded test doesn't yield much of - interest, but sometimes it does (see Real-world Yocto bug - (slow loop-mounted write speed)). - </para> + </literallayout> + Many times such a simple-minded test doesn't yield much of + interest, but sometimes it does (see Real-world Yocto bug + (slow loop-mounted write speed)). + </para> - <para> - Also, note that 'perf stat' isn't restricted to a fixed set of - counters - basically any event listed in the output of 'perf list' - can be tallied by 'perf stat'. For example, suppose we wanted to - see a summary of all the events related to kernel memory - allocation/freeing along with cache hits and misses: - <literallayout class='monospaced'> + <para> + Also, note that 'perf stat' isn't restricted to a fixed set of + counters - basically any event listed in the output of 'perf list' + can be tallied by 'perf stat'. For example, suppose we wanted to + see a summary of all the events related to kernel memory + allocation/freeing along with cache hits and misses: + <literallayout class='monospaced'> root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget http:// downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 Connecting to downloads.yoctoproject.org (140.211.169.59:80) linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA @@ -183,28 +183,28 @@ 2917740 cache-misses # 4.365 % of all cache refs 44.831023415 seconds time elapsed - </literallayout> - So 'perf stat' gives us a nice easy way to get a quick overview of - what might be happening for a set of events, but normally we'd - need a little more detail in order to understand what's going on - in a way that we can act on in a useful way. - </para> + </literallayout> + So 'perf stat' gives us a nice easy way to get a quick overview of + what might be happening for a set of events, but normally we'd + need a little more detail in order to understand what's going on + in a way that we can act on in a useful way. + </para> - <para> - To dive down into a next level of detail, we can use 'perf - record'/'perf report' which will collect profiling data and - present it to use using an interactive text-based UI (or - simply as text if we specify --stdio to 'perf report'). - </para> + <para> + To dive down into a next level of detail, we can use 'perf + record'/'perf report' which will collect profiling data and + present it to use using an interactive text-based UI (or + simply as text if we specify --stdio to 'perf report'). + </para> - <para> - As our first attempt at profiling this workload, we'll simply - run 'perf record', handing it the workload we want to profile - (everything after 'perf record' and any perf options we hand - it - here none - will be executedin a new shell). perf collects - samples until the process exits and records them in a file named - 'perf.data' in the current working directory. - <literallayout class='monospaced'> + <para> + As our first attempt at profiling this workload, we'll simply + run 'perf record', handing it the workload we want to profile + (everything after 'perf record' and any perf options we hand + it - here none - will be executedin a new shell). perf collects + samples until the process exits and records them in a file named + 'perf.data' in the current working directory. + <literallayout class='monospaced'> root@crownbay:~# perf record wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 Connecting to downloads.yoctoproject.org (140.211.169.59:80) @@ -215,44 +215,44 @@ To see the results in a 'text-based UI' (tui), simply run 'perf report', which will read the perf.data file in the current working directory and display the results in an interactive UI: - <literallayout class='monospaced'> + <literallayout class='monospaced'> root@crownbay:~# perf report - </literallayout> - </para> + </literallayout> + </para> - <para> - <imagedata fileref="figures/perf-wget-flat-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-wget-flat-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - The above screenshot displays a 'flat' profile, one entry for - each 'bucket' corresponding to the functions that were profiled - during the profiling run, ordered from the most popular to the - least (perf has options to sort in various orders and keys as - well as display entries only above a certain threshold and so - on - see the perf documentation for details). Note that this - includes both userspace functions (entries containing a [.]) and - kernel functions accounted to the process (entries containing - a [k]). (perf has command-line modifiers that can be used to - restrict the profiling to kernel or userspace, among others). - </para> + <para> + The above screenshot displays a 'flat' profile, one entry for + each 'bucket' corresponding to the functions that were profiled + during the profiling run, ordered from the most popular to the + least (perf has options to sort in various orders and keys as + well as display entries only above a certain threshold and so + on - see the perf documentation for details). Note that this + includes both userspace functions (entries containing a [.]) and + kernel functions accounted to the process (entries containing + a [k]). (perf has command-line modifiers that can be used to + restrict the profiling to kernel or userspace, among others). + </para> - <para> - Notice also that the above report shows an entry for 'busybox', - which is the executable that implements 'wget' in Yocto, but that - instead of a useful function name in that entry, it displays - an not-so-friendly hex value instead. The steps below will show - how to fix that problem. - </para> + <para> + Notice also that the above report shows an entry for 'busybox', + which is the executable that implements 'wget' in Yocto, but that + instead of a useful function name in that entry, it displays + an not-so-friendly hex value instead. The steps below will show + how to fix that problem. + </para> - <para> - Before we do that, however, let's try running a different profile, - one which shows something a little more interesting. The only - difference between the new profile and the previous one is that - we'll add the -g option, which will record not just the address - of a sampled function, but the entire callchain to the sampled - function as well: - <literallayout class='monospaced'> + <para> + Before we do that, however, let's try running a different profile, + one which shows something a little more interesting. The only + difference between the new profile and the previous one is that + we'll add the -g option, which will record not just the address + of a sampled function, but the entire callchain to the sampled + function as well: + <literallayout class='monospaced'> root@crownbay:~# perf record -g wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 Connecting to downloads.yoctoproject.org (140.211.169.59:80) linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA @@ -261,217 +261,217 @@ root@crownbay:~# perf report - </literallayout> - </para> + </literallayout> + </para> - <para> - <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - Using the callgraph view, we can actually see not only which - functions took the most time, but we can also see a summary of - how those functions were called and learn something about how the - program interacts with the kernel in the process. - </para> + <para> + Using the callgraph view, we can actually see not only which + functions took the most time, but we can also see a summary of + how those functions were called and learn something about how the + program interacts with the kernel in the process. + </para> - <para> - Notice that each entry in the above screenshot now contains a '+' - on the left-hand side. This means that we can expand the entry and - drill down into the callchains that feed into that entry. - Pressing 'enter' on any one of them will expand the callchain - (you can also press 'E' to expand them all at the same time or 'C' - to collapse them all). - </para> + <para> + Notice that each entry in the above screenshot now contains a '+' + on the left-hand side. This means that we can expand the entry and + drill down into the callchains that feed into that entry. + Pressing 'enter' on any one of them will expand the callchain + (you can also press 'E' to expand them all at the same time or 'C' + to collapse them all). + </para> - <para> - In the screenshot above, we've toggled the __copy_to_user_ll() - entry and several subnodes all the way down. This lets us see - which callchains contributed to the profiled __copy_to_user_ll() - function which contributed 1.77% to the total profile. - </para> + <para> + In the screenshot above, we've toggled the __copy_to_user_ll() + entry and several subnodes all the way down. This lets us see + which callchains contributed to the profiled __copy_to_user_ll() + function which contributed 1.77% to the total profile. + </para> - <para> - As a bit of background explanation for these callchains, think - about what happens at a high level when you run wget to get a file - out on the network. Basically what happens is that the data comes - into the kernel via the network connection (socket) and is passed - to the userspace program 'wget' (which is actually a part of - busybox, but that's not important for now), which takes the buffers - the kernel passes to it and writes it to a disk file to save it. - </para> + <para> + As a bit of background explanation for these callchains, think + about what happens at a high level when you run wget to get a file + out on the network. Basically what happens is that the data comes + into the kernel via the network connection (socket) and is passed + to the userspace program 'wget' (which is actually a part of + busybox, but that's not important for now), which takes the buffers + the kernel passes to it and writes it to a disk file to save it. + </para> - <para> - The part of this process that we're looking at in the above call - stacks is the part where the kernel passes the data it's read from - the socket down to wget i.e. a copy-to-user. - </para> + <para> + The part of this process that we're looking at in the above call + stacks is the part where the kernel passes the data it's read from + the socket down to wget i.e. a copy-to-user. + </para> - <para> - Notice also that here there's also a case where the a hex value - is displayed in the callstack, here in the expanded - sys_clock_gettime() function. Later we'll see it resolve to a - userspace function call in busybox. - </para> + <para> + Notice also that here there's also a case where the a hex value + is displayed in the callstack, here in the expanded + sys_clock_gettime() function. Later we'll see it resolve to a + userspace function call in busybox. + </para> - <para> - <imagedata fileref="figures/perf-wget-g-copy-from-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-wget-g-copy-from-user-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - The above screenshot shows the other half of the journey for the - data - from the wget program's userspace buffers to disk. To get - the buffers to disk, the wget program issues a write(2), which - does a copy-from-user to the kernel, which then takes care via - some circuitous path (probably also present somewhere in the - profile data), to get it safely to disk. - </para> + <para> + The above screenshot shows the other half of the journey for the + data - from the wget program's userspace buffers to disk. To get + the buffers to disk, the wget program issues a write(2), which + does a copy-from-user to the kernel, which then takes care via + some circuitous path (probably also present somewhere in the + profile data), to get it safely to disk. + </para> - <para> - Now that we've seen the basic layout of the profile data and the - basics of how to extract useful information out of it, let's get - back to the task at hand and see if we can get some basic idea - about where the time is spent in the program we're profiling, - wget. Remember that wget is actually implemented as an applet - in busybox, so while the process name is 'wget', the executable - we're actually interested in is busybox. So let's expand the - first entry containing busybox: - </para> + <para> + Now that we've seen the basic layout of the profile data and the + basics of how to extract useful information out of it, let's get + back to the task at hand and see if we can get some basic idea + about where the time is spent in the program we're profiling, + wget. Remember that wget is actually implemented as an applet + in busybox, so while the process name is 'wget', the executable + we're actually interested in is busybox. So let's expand the + first entry containing busybox: + </para> - <para> - <imagedata fileref="figures/perf-wget-busybox-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-wget-busybox-expanded-stripped.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - Again, before we expanded we saw that the function was labeled - with a hex value instead of a symbol as with most of the kernel - entries. Expanding the busybox entry doesn't make it any better. - </para> + <para> + Again, before we expanded we saw that the function was labeled + with a hex value instead of a symbol as with most of the kernel + entries. Expanding the busybox entry doesn't make it any better. + </para> - <para> - The problem is that perf can't find the symbol information for the - busybox binary, which is actually stripped out by the Yocto build - system. - </para> + <para> + The problem is that perf can't find the symbol information for the + busybox binary, which is actually stripped out by the Yocto build + system. + </para> - <para> - One way around that is to put the following in your local.conf - when you build the image: - <literallayout class='monospaced'> + <para> + One way around that is to put the following in your local.conf + when you build the image: + <literallayout class='monospaced'> INHIBIT_PACKAGE_STRIP = "1" - </literallayout> - However, we already have an image with the binaries stripped, - so what can we do to get perf to resolve the symbols? Basically - we need to install the debuginfo for the busybox package. - </para> + </literallayout> + However, we already have an image with the binaries stripped, + so what can we do to get perf to resolve the symbols? Basically + we need to install the debuginfo for the busybox package. + </para> - <para> - To generate the debug info for the packages in the image, we can - to add dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example: - <literallayout class='monospaced'> + <para> + To generate the debug info for the packages in the image, we can + to add dbg-pkgs to EXTRA_IMAGE_FEATURES in local.conf. For example: + <literallayout class='monospaced'> EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs" - </literallayout> - Additionally, in order to generate the type of debuginfo that - perf understands, we also need to add the following to local.conf: - <literallayout class='monospaced'> + </literallayout> + Additionally, in order to generate the type of debuginfo that + perf understands, we also need to add the following to local.conf: + <literallayout class='monospaced'> PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory' - </literallayout> - Once we've done that, we can install the debuginfo for busybox. - The debug packages once built can be found in - build/tmp/deploy/rpm/* on the host system. Find the - busybox-dbg-...rpm file and copy it to the target. For example: - <literallayout class='monospaced'> + </literallayout> + Once we've done that, we can install the debuginfo for busybox. + The debug packages once built can be found in + build/tmp/deploy/rpm/* on the host system. Find the + busybox-dbg-...rpm file and copy it to the target. For example: + <literallayout class='monospaced'> [trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2/busybox-dbg-1.20.2-r2.core2.rpm root@192.168.1.31: root@192.168.1.31's password: busybox-dbg-1.20.2-r2.core2.rpm 100% 1826KB 1.8MB/s 00:01 - </literallayout> - Now install the debug rpm on the target: - <literallayout class='monospaced'> + </literallayout> + Now install the debug rpm on the target: + <literallayout class='monospaced'> root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2.rpm - </literallayout> - Now that the debuginfo is installed, we see that the busybox - entries now display their functions symbolically: - </para> + </literallayout> + Now that the debuginfo is installed, we see that the busybox + entries now display their functions symbolically: + </para> - <para> - <imagedata fileref="figures/perf-wget-busybox-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-wget-busybox-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - If we expand one of the entries and press 'enter' on a leaf node, - we're presented with a menu of actions we can take to get more - information related to that entry: - </para> + <para> + If we expand one of the entries and press 'enter' on a leaf node, + we're presented with a menu of actions we can take to get more + information related to that entry: + </para> - <para> - <imagedata fileref="figures/perf-wget-busybox-dso-zoom-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-wget-busybox-dso-zoom-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - One of these actions allows us to show a view that displays a - busybox-centric view of the profiled functions (in this case we've - also expanded all the nodes using the 'E' key): - </para> + <para> + One of these actions allows us to show a view that displays a + busybox-centric view of the profiled functions (in this case we've + also expanded all the nodes using the 'E' key): + </para> - <para> - <imagedata fileref="figures/perf-wget-busybox-dso-zoom.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-wget-busybox-dso-zoom.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - Finally, we can see that now that the busybox debuginfo is - installed, the previously unresolved symbol in the - sys_clock_gettime() entry mentioned previously is now resolved, - and shows that the sys_clock_gettime system call that was the - source of 6.75% of the copy-to-user overhead was initiated by - the handle_input() busybox function: - </para> + <para> + Finally, we can see that now that the busybox debuginfo is + installed, the previously unresolved symbol in the + sys_clock_gettime() entry mentioned previously is now resolved, + and shows that the sys_clock_gettime system call that was the + source of 6.75% of the copy-to-user overhead was initiated by + the handle_input() busybox function: + </para> - <para> - <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-wget-g-copy-to-user-expanded-debuginfo.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - At the lowest level of detail, we can dive down to the assembly - level and see which instructions caused the most overhead in a - function. Pressing 'enter' on the 'udhcpc_main' function, we're - again presented with a menu: - </para> + <para> + At the lowest level of detail, we can dive down to the assembly + level and see which instructions caused the most overhead in a + function. Pressing 'enter' on the 'udhcpc_main' function, we're + again presented with a menu: + </para> - <para> - <imagedata fileref="figures/perf-wget-busybox-annotate-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-wget-busybox-annotate-menu.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - Selecting 'Annotate udhcpc_main', we get a detailed listing of - percentages by instruction for the udhcpc_main function. From the - display, we can see that over 50% of the time spent in this - function is taken up by a couple tests and the move of a - constant (1) to a register: - </para> + <para> + Selecting 'Annotate udhcpc_main', we get a detailed listing of + percentages by instruction for the udhcpc_main function. From the + display, we can see that over 50% of the time spent in this + function is taken up by a couple tests and the move of a + constant (1) to a register: + </para> - <para> - <imagedata fileref="figures/perf-wget-busybox-annotate-udhcpc.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-wget-busybox-annotate-udhcpc.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - As a segue into tracing, let's try another profile using a - different counter, something other than the default 'cycles'. - </para> + <para> + As a segue into tracing, let's try another profile using a + different counter, something other than the default 'cycles'. + </para> - <para> - The tracing and profiling infrastructure in Linux has become - unified in a way that allows us to use the same tool with a - completely different set of counters, not just the standard - hardware counters that traditionally tools have had to restrict - themselves to (of course the traditional tools can also make use - of the expanded possibilities now available to them, and in some - cases have, as mentioned previously). - </para> + <para> + The tracing and profiling infrastructure in Linux has become + unified in a way that allows us to use the same tool with a + completely different set of counters, not just the standard + hardware counters that traditionally tools have had to restrict + themselves to (of course the traditional tools can also make use + of the expanded possibilities now available to them, and in some + cases have, as mentioned previously). + </para> - <para> - We can get a list of the available events that can be used to - profile a workload via 'perf list': - <literallayout class='monospaced'> + <para> + We can get a list of the available events that can be used to + profile a workload via 'perf list': + <literallayout class='monospaced'> root@crownbay:~# perf list List of pre-defined events (to be used in -e): @@ -602,23 +602,23 @@ syscalls:sys_exit_unshare [Tracepoint event] raw_syscalls:sys_enter [Tracepoint event] raw_syscalls:sys_exit [Tracepoint event] - </literallayout> - </para> + </literallayout> + </para> - <note> - Tying It Together: These are exactly the same set of events defined - by the trace event subsystem and exposed by - ftrace/tracecmd/kernelshark as files in - /sys/kernel/debug/tracing/events, by SystemTap as - kernel.trace("tracepoint_name") and (partially) accessed by LTTng. - </note> + <note> + Tying It Together: These are exactly the same set of events defined + by the trace event subsystem and exposed by + ftrace/tracecmd/kernelshark as files in + /sys/kernel/debug/tracing/events, by SystemTap as + kernel.trace("tracepoint_name") and (partially) accessed by LTTng. + </note> - <para> - Only a subset of these would be of interest to us when looking at - this workload, so let's choose the most likely subsystems - (identified by the string before the colon in the Tracepoint events) - and do a 'perf stat' run using only those wildcarded subsystems: - <literallayout class='monospaced'> + <para> + Only a subset of these would be of interest to us when looking at + this workload, so let's choose the most likely subsystems + (identified by the string before the colon in the Tracepoint events) + and do a 'perf stat' run using only those wildcarded subsystems: + <literallayout class='monospaced'> root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 Performance counter stats for 'wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2': @@ -675,26 +675,26 @@ . 58.029710972 seconds time elapsed - </literallayout> - Let's pick one of these tracepoints and tell perf to do a profile - using it as the sampling event: - <literallayout class='monospaced'> + </literallayout> + Let's pick one of these tracepoints and tell perf to do a profile + using it as the sampling event: + <literallayout class='monospaced'> root@crownbay:~# perf record -g -e sched:sched_wakeup wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 - </literallayout> - </para> + </literallayout> + </para> - <para> - <imagedata fileref="figures/sched-wakeup-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/sched-wakeup-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - The screenshot above shows the results of running a profile using - sched:sched_switch tracepoint, which shows the relative costs of - various paths to sched_wakeup (note that sched_wakeup is the - name of the tracepoint - it's actually defined just inside - ttwu_do_wakeup(), which accounts for the function name actually - displayed in the profile: - <literallayout class='monospaced'> + <para> + The screenshot above shows the results of running a profile using + sched:sched_switch tracepoint, which shows the relative costs of + various paths to sched_wakeup (note that sched_wakeup is the + name of the tracepoint - it's actually defined just inside + ttwu_do_wakeup(), which accounts for the function name actually + displayed in the profile: + <literallayout class='monospaced'> /* * Mark the task runnable and perform wakeup-preemption. */ @@ -706,55 +706,55 @@ . . } - </literallayout> - A couple of the more interesting callchains are expanded and - displayed above, basically some network receive paths that - presumably end up waking up wget (busybox) when network data is - ready. - </para> + </literallayout> + A couple of the more interesting callchains are expanded and + displayed above, basically some network receive paths that + presumably end up waking up wget (busybox) when network data is + ready. + </para> - <para> - Note that because tracepoints are normally used for tracing, - the default sampling period for tracepoints is 1 i.e. for - tracepoints perf will sample on every event occurrence (this - can be changed using the -c option). This is in contrast to - hardware counters such as for example the default 'cycles' - hardware counter used for normal profiling, where sampling - periods are much higher (in the thousands) because profiling should - have as low an overhead as possible and sampling on every cycle w - ould be prohibitively expensive. - </para> - </section> + <para> + Note that because tracepoints are normally used for tracing, + the default sampling period for tracepoints is 1 i.e. for + tracepoints perf will sample on every event occurrence (this + can be changed using the -c option). This is in contrast to + hardware counters such as for example the default 'cycles' + hardware counter used for normal profiling, where sampling + periods are much higher (in the thousands) because profiling should + have as low an overhead as possible and sampling on every cycle + would be prohibitively expensive. + </para> + </section> - <section id='using-perf-to-do-basic-tracing'> - <title>Using perf to do Basic Tracing</title> + <section id='using-perf-to-do-basic-tracing'> + <title>Using perf to do Basic Tracing</title> - <para> - Profiling is a great tool for solving many problems or for - getting a high-level view of what's going on with a workload or - across the system. It is however by definition an approximation, - as suggested by the most prominent word associated with it, - 'sampling'. On the one hand, it allows a representative picture of - what's going on in the system to be cheaply taken, but on the other - hand, that cheapness limits its utility when that data suggests a - need to 'dive down' more deeply to discover what's really going - on. In such cases, the only way to see what's really going on is - to be able to look at (or summarize more intelligently) the - individual steps that go into the higher-level behavior exposed - by the coarse-grained profiling data. - </para> + <para> + Profiling is a great tool for solving many problems or for + getting a high-level view of what's going on with a workload or + across the system. It is however by definition an approximation, + as suggested by the most prominent word associated with it, + 'sampling'. On the one hand, it allows a representative picture of + what's going on in the system to be cheaply taken, but on the other + hand, that cheapness limits its utility when that data suggests a + need to 'dive down' more deeply to discover what's really going + on. In such cases, the only way to see what's really going on is + to be able to look at (or summarize more intelligently) the + individual steps that go into the higher-level behavior exposed + by the coarse-grained profiling data. + </para> - <para> - As a concrete example, we can trace all the events we think might - be applicable to our workload: - <literallayout class='monospaced'> + <para> + As a concrete example, we can trace all the events we think might + be applicable to our workload: + <literallayout class='monospaced'> root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:* -e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2 - </literallayout> - We can look at the raw trace output using 'perf script' with no - arguments: - <literallayout class='monospaced'> + </literallayout> + We can look at the raw trace output using 'perf script' with no + arguments: + <literallayout class='monospaced'> root@crownbay:~# perf script perf 1262 [000] 11624.857082: sys_exit_read: 0x0 @@ -779,90 +779,90 @@ wget 1262 [001] 11624.859888: sys_exit_read: 0x400 wget 1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 wget 1262 [001] 11624.859944: sys_exit_read: 0x400 - </literallayout> - This gives us a detailed timestamped sequence of events that - occurred within the workload with respect to those events. - </para> + </literallayout> + This gives us a detailed timestamped sequence of events that + occurred within the workload with respect to those events. + </para> - <para> - In many ways, profiling can be viewed as a subset of tracing - - theoretically, if you have a set of trace events that's sufficient - to capture all the important aspects of a workload, you can derive - any of the results or views that a profiling run can. - </para> + <para> + In many ways, profiling can be viewed as a subset of tracing - + theoretically, if you have a set of trace events that's sufficient + to capture all the important aspects of a workload, you can derive + any of the results or views that a profiling run can. + </para> - <para> - Another aspect of traditional profiling is that while powerful in - many ways, it's limited by the granularity of the underlying data. - Profiling tools offer various ways of sorting and presenting the - sample data, which make it much more useful and amenable to user - experimentation, but in the end it can't be used in an open-ended - way to extract data that just isn't present as a consequence of - the fact that conceptually, most of it has been thrown away. - </para> + <para> + Another aspect of traditional profiling is that while powerful in + many ways, it's limited by the granularity of the underlying data. + Profiling tools offer various ways of sorting and presenting the + sample data, which make it much more useful and amenable to user + experimentation, but in the end it can't be used in an open-ended + way to extract data that just isn't present as a consequence of + the fact that conceptually, most of it has been thrown away. + </para> - <para> - Full-blown detailed tracing data does however offer the opportunity - to manipulate and present the information collected during a - tracing run in an infinite variety of ways. - </para> + <para> + Full-blown detailed tracing data does however offer the opportunity + to manipulate and present the information collected during a + tracing run in an infinite variety of ways. + </para> - <para> - Another way to look at it is that there are only so many ways that - the 'primitive' counters can be used on their own to generate - interesting output; to get anything more complicated than simple - counts requires some amount of additional logic, which is typically - very specific to the problem at hand. For example, if we wanted to - make use of a 'counter' that maps to the value of the time - difference between when a process was scheduled to run on a - processor and the time it actually ran, we wouldn't expect such - a counter to exist on its own, but we could derive one called say - 'wakeup_latency' and use it to extract a useful view of that metric - from trace data. Likewise, we really can't figure out from standard - profiling tools how much data every process on the system reads and - writes, along with how many of those reads and writes fail - completely. If we have sufficient trace data, however, we could - with the right tools easily extract and present that information, - but we'd need something other than pre-canned profiling tools to - do that. - </para> + <para> + Another way to look at it is that there are only so many ways that + the 'primitive' counters can be used on their own to generate + interesting output; to get anything more complicated than simple + counts requires some amount of additional logic, which is typically + very specific to the problem at hand. For example, if we wanted to + make use of a 'counter' that maps to the value of the time + difference between when a process was scheduled to run on a + processor and the time it actually ran, we wouldn't expect such + a counter to exist on its own, but we could derive one called say + 'wakeup_latency' and use it to extract a useful view of that metric + from trace data. Likewise, we really can't figure out from standard + profiling tools how much data every process on the system reads and + writes, along with how many of those reads and writes fail + completely. If we have sufficient trace data, however, we could + with the right tools easily extract and present that information, + but we'd need something other than pre-canned profiling tools to + do that. + </para> - <para> - Luckily, there is general-purpose way to handle such needs, - called 'programming languages'. Making programming languages - easily available to apply to such problems given the specific - format of data is called a 'programming language binding' for - that data and language. Perf supports two programming language - bindings, one for Python and one for Perl. - </para> + <para> + Luckily, there is general-purpose way to handle such needs, + called 'programming languages'. Making programming languages + easily available to apply to such problems given the specific + format of data is called a 'programming language binding' for + that data and language. Perf supports two programming language + bindings, one for Python and one for Perl. + </para> - <note> - Tying It Together: Language bindings for manipulating and - aggregating trace data are of course not a new - idea. One of the first projects to do this was IBM's DProbes - dpcc compiler, an ANSI C compiler which targeted a low-level - assembly language running on an in-kernel interpreter on the - target system. This is exactly analagous to what Sun's DTrace - did, except that DTrace invented its own language for the purpose. - Systemtap, heavily inspired by DTrace, also created its own - one-off language, but rather than running the product on an - in-kernel interpreter, created an elaborate compiler-based - machinery to translate its language into kernel modules written - in C. - </note> + <note> + Tying It Together: Language bindings for manipulating and + aggregating trace data are of course not a new + idea. One of the first projects to do this was IBM's DProbes + dpcc compiler, an ANSI C compiler which targeted a low-level + assembly language running on an in-kernel interpreter on the + target system. This is exactly analagous to what Sun's DTrace + did, except that DTrace invented its own language for the purpose. + Systemtap, heavily inspired by DTrace, also created its own + one-off language, but rather than running the product on an + in-kernel interpreter, created an elaborate compiler-based + machinery to translate its language into kernel modules written + in C. + </note> - <para> - Now that we have the trace data in perf.data, we can use - 'perf script -g' to generate a skeleton script with handlers - for the read/write entry/exit events we recorded: - <literallayout class='monospaced'> + <para> + Now that we have the trace data in perf.data, we can use + 'perf script -g' to generate a skeleton script with handlers + for the read/write entry/exit events we recorded: + <literallayout class='monospaced'> root@crownbay:~# perf script -g python generated Python script: perf-script.py - </literallayout> - The skeleton script simply creates a python function for each - event type in the perf.data file. The body of each function simply - prints the event name along with its parameters. For example: - <literallayout class='monospaced'> + </literallayout> + The skeleton script simply creates a python function for each + event type in the perf.data file. The body of each function simply + prints the event name along with its parameters. For example: + <literallayout class='monospaced'> def net__netif_rx(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, skbaddr, len, name): @@ -870,10 +870,10 @@ common_pid, common_comm) print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name), - </literallayout> - We can run that script directly to print all of the events - contained in the perf.data file: - <literallayout class='monospaced'> + </literallayout> + We can run that script directly to print all of the events + contained in the perf.data file: + <literallayout class='monospaced'> root@crownbay:~# perf script -s perf-script.py in trace_begin @@ -899,39 +899,39 @@ syscalls__sys_exit_read 1 11624.859888770 1262 wget nr=3, ret=1024 syscalls__sys_enter_read 1 11624.859935140 1262 wget nr=3, fd=3, buf=3077701632, count=1024 syscalls__sys_exit_read 1 11624.859944032 1262 wget nr=3, ret=1024 - </literallayout> - That in itself isn't very useful; after all, we can accomplish - pretty much the same thing by simply running 'perf script' - without arguments in the same directory as the perf.data file. - </para> + </literallayout> + That in itself isn't very useful; after all, we can accomplish + pretty much the same thing by simply running 'perf script' + without arguments in the same directory as the perf.data file. + </para> - <para> - We can however replace the print statements in the generated - function bodies with whatever we want, and thereby make it - infinitely more useful. - </para> + <para> + We can however replace the print statements in the generated + function bodies with whatever we want, and thereby make it + infinitely more useful. + </para> - <para> - As a simple example, let's just replace the print statements in - the function bodies with a simple function that does nothing but - increment a per-event count. When the program is run against a - perf.data file, each time a particular event is encountered, - a tally is incremented for that event. For example: - <literallayout class='monospaced'> + <para> + As a simple example, let's just replace the print statements in + the function bodies with a simple function that does nothing but + increment a per-event count. When the program is run against a + perf.data file, each time a particular event is encountered, + a tally is incremented for that event. For example: + <literallayout class='monospaced'> def net__netif_rx(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, skbaddr, len, name): inc_counts(event_name) - </literallayout> - Each event handler function in the generated code is modified - to do this. For convenience, we define a common function called - inc_counts() that each handler calls; inc_counts simply tallies - a count for each event using the 'counts' hash, which is a - specialized has function that does Perl-like autovivification, a - capability that's extremely useful for kinds of multi-level - aggregation commonly used in processing traces (see perf's - documentation on the Python language binding for details): - <literallayout class='monospaced'> + </literallayout> + Each event handler function in the generated code is modified + to do this. For convenience, we define a common function called + inc_counts() that each handler calls; inc_counts simply tallies + a count for each event using the 'counts' hash, which is a + specialized has function that does Perl-like autovivification, a + capability that's extremely useful for kinds of multi-level + aggregation commonly used in processing traces (see perf's + documentation on the Python language binding for details): + <literallayout class='monospaced'> counts = autodict() def inc_counts(event_name): @@ -939,18 +939,18 @@ counts[event_name] += 1 except TypeError: counts[event_name] = 1 - </literallayout> - Finally, at the end of the trace processing run, we want to - print the result of all the per-event tallies. For that, we - use the special 'trace_end()' function: - <literallayout class='monospaced'> + </literallayout> + Finally, at the end of the trace processing run, we want to + print the result of all the per-event tallies. For that, we + use the special 'trace_end()' function: + <literallayout class='monospaced'> def trace_end(): for event_name, count in counts.iteritems(): print "%-40s %10s\n" % (event_name, count) - </literallayout> - The end result is a summary of all the events recorded in the - trace: - <literallayout> + </literallayout> + The end result is a summary of all the events recorded in the + trace: + <literallayout> skb__skb_copy_datagram_iovec 13148 irq__softirq_entry 4796 irq__irq_handler_exit 3805 @@ -968,112 +968,112 @@ net__netif_rx 2 napi__napi_poll 1877 syscalls__sys_exit_write 8990 - </literallayout> - Note that this is pretty much exactly the same information we get - from 'perf stat', which goes a little way to support the idea - mentioned previously that given the right kind of trace data, - higher-level profiling-type summaries can be derived from it. - </para> + </literallayout> + Note that this is pretty much exactly the same information we get + from 'perf stat', which goes a little way to support the idea + mentioned previously that given the right kind of trace data, + higher-level profiling-type summaries can be derived from it. + </para> - <para> - Documentation on using the - <ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>. - </para> - </section> + <para> + Documentation on using the + <ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>. + </para> + </section> - <section id='system-wide-tracing-and-profiling'> - <title>System-Wide Tracing and Profiling</title> + <section id='system-wide-tracing-and-profiling'> + <title>System-Wide Tracing and Profiling</title> - <para> - The examples so far have focused on tracing a particular program or - workload - in other words, every profiling run has specified the - program to profile in the command-line e.g. 'perf record wget ...'. - </para> + <para> + The examples so far have focused on tracing a particular program or + workload - in other words, every profiling run has specified the + program to profile in the command-line e.g. 'perf record wget ...'. + </para> - <para> - It's also possible, and more interesting in many cases, to run a - system-wide profile or trace while running the workload in a - separate shell. - </para> + <para> + It's also possible, and more interesting in many cases, to run a + system-wide profile or trace while running the workload in a + separate shell. + </para> - <para> - To do system-wide profiling or tracing, you typically use - the -a flag to 'perf record'. - </para> + <para> + To do system-wide profiling or tracing, you typically use + the -a flag to 'perf record'. + </para> - <para> - To demonstrate this, open up one window and start the profile - using the -a flag (press Ctrl-C to stop tracing): - <literallayout class='monospaced'> + <para> + To demonstrate this, open up one window and start the profile + using the -a flag (press Ctrl-C to stop tracing): + <literallayout class='monospaced'> root@crownbay:~# perf record -g -a ^C[ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ] - </literallayout> - In another window, run the wget test: - <literallayout class='monospaced'> + </literallayout> + In another window, run the wget test: + <literallayout class='monospaced'> root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink> Connecting to downloads.yoctoproject.org (140.211.169.59:80) linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA - </literallayout> - Here we see entries not only for our wget load, but for other - processes running on the system as well: - </para> + </literallayout> + Here we see entries not only for our wget load, but for other + processes running on the system as well: + </para> - <para> - <imagedata fileref="figures/perf-systemwide.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-systemwide.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - In the snapshot above, we can see callchains that originate in - libc, and a callchain from Xorg that demonstrates that we're - using a proprietary X driver in userspace (notice the presence - of 'PVR' and some other unresolvable symbols in the expanded - Xorg callchain). - </para> + <para> + In the snapshot above, we can see callchains that originate in + libc, and a callchain from Xorg that demonstrates that we're + using a proprietary X driver in userspace (notice the presence + of 'PVR' and some other unresolvable symbols in the expanded + Xorg callchain). + </para> - <para> - Note also that we have both kernel and userspace entries in the - above snapshot. We can also tell perf to focus on userspace but - providing a modifier, in this case 'u', to the 'cycles' hardware - counter when we record a profile: - <literallayout class='monospaced'> + <para> + Note also that we have both kernel and userspace entries in the + above snapshot. We can also tell perf to focus on userspace but + providing a modifier, in this case 'u', to the 'cycles' hardware + counter when we record a profile: + <literallayout class='monospaced'> root@crownbay:~# perf record -g -a -e cycles:u ^C[ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ] - </literallayout> - </para> + </literallayout> + </para> - <para> - <imagedata fileref="figures/perf-report-cycles-u.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-report-cycles-u.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - Notice in the screenshot above, we see only userspace entries ([.]) - </para> + <para> + Notice in the screenshot above, we see only userspace entries ([.]) + </para> - <para> - Finally, we can press 'enter' on a leaf node and select the 'Zoom - into DSO' menu item to show only entries associated with a - specific DSO. In the screenshot below, we've zoomed into the - 'libc' DSO which shows all the entries associated with the - libc-xxx.so DSO. - </para> + <para> + Finally, we can press 'enter' on a leaf node and select the 'Zoom + into DSO' menu item to show only entries associated with a + specific DSO. In the screenshot below, we've zoomed into the + 'libc' DSO which shows all the entries associated with the + libc-xxx.so DSO. + </para> - <para> - <imagedata fileref="figures/perf-systemwide-libc.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-systemwide-libc.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <para> - We can also use the system-wide -a switch to do system-wide - tracing. Here we'll trace a couple of scheduler events: - <literallayout class='monospaced'> + <para> + We can also use the system-wide -a switch to do system-wide + tracing. Here we'll trace a couple of scheduler events: + <literallayout class='monospaced'> root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup ^C[ perf record: Woken up 38 times to write data ] [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ] - </literallayout> - We can look at the raw output using 'perf script' with no - arguments: - <literallayout class='monospaced'> + </literallayout> + We can look at the raw output using 'perf script' with no + arguments: + <literallayout class='monospaced'> root@crownbay:~# perf script perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 @@ -1086,21 +1086,21 @@ perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 - </literallayout> - </para> - - <section id='perf-filtering'> - <title>Filtering</title> + </literallayout> + </para> - <para> - Notice that there are a lot of events that don't really have - anything to do with what we're interested in, namely events - that schedule 'perf' itself in and out or that wake perf up. - We can get rid of those by using the '--filter' option - - for each event we specify using -e, we can add a --filter - after that to filter out trace events that contain fields - with specific values: - <literallayout class='monospaced'> + <section id='perf-filtering'> + <title>Filtering</title> + + <para> + Notice that there are a lot of events that don't really have + anything to do with what we're interested in, namely events + that schedule 'perf' itself in and out or that wake perf up. + We can get rid of those by using the '--filter' option - + for each event we specify using -e, we can add a --filter + after that to filter out trace events that contain fields + with specific values: + <literallayout class='monospaced'> root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf' ^C[ perf record: Woken up 38 times to write data ] [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ] @@ -1122,59 +1122,59 @@ swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 - </literallayout> - In this case, we've filtered out all events that have 'perf' - in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice - that there are still events recorded for perf, but notice - that those events don't have values of 'perf' for the filtered - fields. To completely filter out anything from perf will - require a bit more work, but for the purpose of demonstrating - how to use filters, it's close enough. - </para> - - <note> - Tying It Together: These are exactly the same set of event - filters defined by the trace event subsystem. See the - ftrace/tracecmd/kernelshark section for more discussion about - these event filters. - </note> - - <note> - Tying It Together: These event filters are implemented by a - special-purpose pseudo-interpreter in the kernel and are an - integral and indispensable part of the perf design as it - relates to tracing. kernel-based event filters provide a - mechanism to precisely throttle the event stream that appears - in user space, where it makes sense to provide bindings to real - programming languages for postprocessing the event stream. - This architecture allows for the intelligent and flexible - partitioning of processing between the kernel and user space. - Contrast this with other tools such as SystemTap, which does - all of its processing in the kernel and as such requires a - special project-defined language in order to accommodate that - design, or LTTng, where everything is sent to userspace and - as such requires a super-efficient kernel-to-userspace - transport mechanism in order to function properly. While - perf certainly can benefit from for instance advances in - the design of the transport, it doesn't fundamentally depend - on them. Basically, if you find that your perf tracing - application is causing buffer I/O overruns, it probably - means that you aren't taking enough advantage of the - kernel filtering engine. - </note> + </literallayout> + In this case, we've filtered out all events that have 'perf' + in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice + that there are still events recorded for perf, but notice + that those events don't have values of 'perf' for the filtered + fields. To completely filter out anything from perf will + require a bit more work, but for the purpose of demonstrating + how to use filters, it's close enough. + </para> + + <note> + Tying It Together: These are exactly the same set of event + filters defined by the trace event subsystem. See the + ftrace/tracecmd/kernelshark section for more discussion about + these event filters. + </note> + + <note> + Tying It Together: These event filters are implemented by a + special-purpose pseudo-interpreter in the kernel and are an + integral and indispensable part of the perf design as it + relates to tracing. kernel-based event filters provide a + mechanism to precisely throttle the event stream that appears + in user space, where it makes sense to provide bindings to real + programming languages for postprocessing the event stream. + This architecture allows for the intelligent and flexible + partitioning of processing between the kernel and user space. + Contrast this with other tools such as SystemTap, which does + all of its processing in the kernel and as such requires a + special project-defined language in order to accommodate that + design, or LTTng, where everything is sent to userspace and + as such requires a super-efficient kernel-to-userspace + transport mechanism in order to function properly. While + perf certainly can benefit from for instance advances in + the design of the transport, it doesn't fundamentally depend + on them. Basically, if you find that your perf tracing + application is causing buffer I/O overruns, it probably + means that you aren't taking enough advantage of the + kernel filtering engine. + </note> + </section> </section> - </section> - <section id='using-dynamic-tracepoints'> - <title>Using Dynamic Tracepoints</title> + <section id='using-dynamic-tracepoints'> + <title>Using Dynamic Tracepoints</title> - <para> - perf isn't restricted to the fixed set of static tracepoints - listed by 'perf list'. Users can also add their own 'dynamic' - tracepoints anywhere in the kernel. For instance, suppose we - want to define our own tracepoint on do_fork(). We can do that - using the 'perf probe' perf subcommand: - <literallayout class='monospaced'> + <para> + perf isn't restricted to the fixed set of static tracepoints + listed by 'perf list'. Users can also add their own 'dynamic' + tracepoints anywhere in the kernel. For instance, suppose we + want to define our own tracepoint on do_fork(). We can do that + using the 'perf probe' perf subcommand: + <literallayout class='monospaced'> root@crownbay:~# perf probe do_fork Added new event: probe:do_fork (on do_fork) @@ -1182,13 +1182,13 @@ You can now use it in all perf tools, such as: perf record -e probe:do_fork -aR sleep 1 - </literallayout> - Adding a new tracepoint via 'perf probe' results in an event - with all the expected files and format in - /sys/kernel/debug/tracing/events, just the same as for static - tracepoints (as discussed in more detail in the trace events - subsystem section: - <literallayout class='monospaced'> + </literallayout> + Adding a new tracepoint via 'perf probe' results in an event + with all the expected files and format in + /sys/kernel/debug/tracing/events, just the same as for static + tracepoints (as discussed in more detail in the trace events + subsystem section: + <literallayout class='monospaced'> root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al drwxr-xr-x 2 root root 0 Oct 28 11:42 . drwxr-xr-x 3 root root 0 Oct 28 11:42 .. @@ -1210,23 +1210,23 @@ field:unsigned long __probe_ip; offset:12; size:4; signed:0; print fmt: "(%lx)", REC->__probe_ip - </literallayout> - We can list all dynamic tracepoints currently in existence: - <literallayout class='monospaced'> + </literallayout> + We can list all dynamic tracepoints currently in existence: + <literallayout class='monospaced'> root@crownbay:~# perf probe -l probe:do_fork (on do_fork) probe:schedule (on schedule) - </literallayout> - Let's record system-wide ('sleep 30' is a trick for recording - system-wide but basically do nothing and then wake up after - 30 seconds): - <literallayout class='monospaced'> + </literallayout> + Let's record system-wide ('sleep 30' is a trick for recording + system-wide but basically do nothing and then wake up after + 30 seconds): + <literallayout class='monospaced'> root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ] - </literallayout> - Using 'perf script' we can see each do_fork event that fired: - <literallayout class='monospaced'> + </literallayout> + Using 'perf script' we can see each do_fork event that fired: + <literallayout class='monospaced'> root@crownbay:~# perf script # ======== @@ -1265,27 +1265,28 @@ matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460) matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460) gaku 1312 [000] 34237.202388: do_fork: (c1028460) - </literallayout> - And using 'perf report' on the same file, we can see the - callgraphs from starting a few programs during those 30 seconds: - </para> + </literallayout> + And using 'perf report' on the same file, we can see the + callgraphs from starting a few programs during those 30 seconds: + </para> - <para> - <imagedata fileref="figures/perf-probe-do_fork-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> - </para> + <para> + <imagedata fileref="figures/perf-probe-do_fork-profile.png" width="6in" depth="7in" align="center" scalefit="1" /> + </para> - <note> - Tying It Together: The trace events subsystem accomodate static - and dynamic tracepoints in exactly the same way - there's no - difference as far as the infrastructure is concerned. See the - ftrace section for more details on the trace event subsystem. - </note> - - <note> - Tying It Together: Dynamic tracepoints are implemented under the - covers by kprobes and uprobes. kprobes and uprobes are also used - by and in fact are the main focus of SystemTap. - </note> + <note> + Tying It Together: The trace events subsystem accomodate static + and dynamic tracepoints in exactly the same way - there's no + difference as far as the infrastructure is concerned. See the + ftrace section for more details on the trace event subsystem. + </note> + + <note> + Tying It Together: Dynamic tracepoints are implemented under the + covers by kprobes and uprobes. kprobes and uprobes are also used + by and in fact are the main focus of SystemTap. + </note> + </section> </section> <section id='perf-documentation'> |