Giter Site home page Giter Site logo

flamegraph's Introduction

Flame Graphs visualize profiled code

Main Website: http://www.brendangregg.com/flamegraphs.html

Example (click to zoom):

Example

Click a box to zoom the Flame Graph to this stack frame only. To search and highlight all stack frames matching a regular expression, click the search button in the upper right corner or press Ctrl-F. By default, search is case sensitive, but this can be toggled by pressing Ctrl-I or by clicking the ic button in the upper right corner.

Other sites:

Flame graphs can be created in three steps:

  1. Capture stacks
  2. Fold stacks
  3. flamegraph.pl

1. Capture stacks

Stack samples can be captured using Linux perf_events, FreeBSD pmcstat (hwpmc), DTrace, SystemTap, and many other profilers. See the stackcollapse-* converters.

Linux perf_events

Using Linux perf_events (aka "perf") to capture 60 seconds of 99 Hertz stack samples, both user- and kernel-level stacks, all processes:

# perf record -F 99 -a -g -- sleep 60
# perf script > out.perf

Now only capturing PID 181:

# perf record -F 99 -p 181 -g -- sleep 60
# perf script > out.perf

DTrace

Using DTrace to capture 60 seconds of kernel stacks at 997 Hertz:

# dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); } tick-60s { exit(0); }' -o out.kern_stacks

Using DTrace to capture 60 seconds of user-level stacks for PID 12345 at 97 Hertz:

# dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345 && arg1/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks

60 seconds of user-level stacks, including time spent in-kernel, for PID 12345 at 97 Hertz:

# dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks

Switch ustack() for jstack() if the application has a ustack helper to include translated frames (eg, node.js frames; see: http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/). The rate for user-level stack collection is deliberately slower than kernel, which is especially important when using jstack() as it performs additional work to translate frames.

2. Fold stacks

Use the stackcollapse programs to fold stack samples into single lines. The programs provided are:

  • stackcollapse.pl: for DTrace stacks
  • stackcollapse-perf.pl: for Linux perf_events "perf script" output
  • stackcollapse-pmc.pl: for FreeBSD pmcstat -G stacks
  • stackcollapse-stap.pl: for SystemTap stacks
  • stackcollapse-instruments.pl: for XCode Instruments
  • stackcollapse-vtune.pl: for Intel VTune profiles
  • stackcollapse-ljp.awk: for Lightweight Java Profiler
  • stackcollapse-jstack.pl: for Java jstack(1) output
  • stackcollapse-gdb.pl: for gdb(1) stacks
  • stackcollapse-go.pl: for Golang pprof stacks
  • stackcollapse-vsprof.pl: for Microsoft Visual Studio profiles
  • stackcollapse-wcp.pl: for wallClockProfiler output

Usage example:

For perf_events:
$ ./stackcollapse-perf.pl out.perf > out.folded

For DTrace:
$ ./stackcollapse.pl out.kern_stacks > out.kern_folded

The output looks like this:

unix`_sys_sysenter_post_swapgs 1401
unix`_sys_sysenter_post_swapgs;genunix`close 5
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf 85
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_closef 26
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_setf 5
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_getstate 6
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_unfalloc 2
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`closef 48
[...]

3. flamegraph.pl

Use flamegraph.pl to render a SVG.

$ ./flamegraph.pl out.kern_folded > kernel.svg

An advantage of having the folded input file (and why this is separate to flamegraph.pl) is that you can use grep for functions of interest. Eg:

$ grep cpuid out.kern_folded | ./flamegraph.pl > cpuid.svg

Provided Examples

Linux perf_events

An example output from Linux "perf script" is included, gzip'd, as example-perf-stacks.txt.gz. The resulting flame graph is example-perf.svg:

Example

You can create this using:

$ gunzip -c example-perf-stacks.txt.gz | ./stackcollapse-perf.pl --all | ./flamegraph.pl --color=java --hash > example-perf.svg

This shows my typical workflow: I'll gzip profiles on the target, then copy them to my laptop for analysis. Since I have hundreds of profiles, I leave them gzip'd!

Since this profile included Java, I used the flamegraph.pl --color=java palette. I've also used stackcollapse-perf.pl --all, which includes all annotations that help flamegraph.pl use separate colors for kernel and user level code. The resulting flame graph uses: green == Java, yellow == C++, red == user-mode native, orange == kernel.

This profile was from an analysis of vert.x performance. The benchmark client, wrk, is also visible in the flame graph.

DTrace

An example output from DTrace is also included, example-dtrace-stacks.txt, and the resulting flame graph, example-dtrace.svg:

Example

You can generate this using:

$ ./stackcollapse.pl example-stacks.txt | ./flamegraph.pl > example.svg

This was from a particular performance investigation: the Flame Graph identified that CPU time was spent in the lofs module, and quantified that time.

Options

See the USAGE message (--help) for options:

USAGE: ./flamegraph.pl [options] infile > outfile.svg

--title TEXT     # change title text
--subtitle TEXT  # second level title (optional)
--width NUM      # width of image (default 1200)
--height NUM     # height of each frame (default 16)
--minwidth NUM   # omit smaller functions. In pixels or use "%" for 
                 # percentage of time (default 0.1 pixels)
--fonttype FONT  # font type (default "Verdana")
--fontsize NUM   # font size (default 12)
--countname TEXT # count type label (default "samples")
--nametype TEXT  # name type label (default "Function:")
--colors PALETTE # set color palette. choices are: hot (default), mem,
                 # io, wakeup, chain, java, js, perl, red, green, blue,
                 # aqua, yellow, purple, orange
--bgcolors COLOR # set background colors. gradient choices are yellow
                 # (default), blue, green, grey; flat colors use "#rrggbb"
--hash           # colors are keyed by function name hash
--cp             # use consistent palette (palette.map)
--reverse        # generate stack-reversed flame graph
--inverted       # icicle graph
--flamechart     # produce a flame chart (sort by time, do not merge stacks)
--negate         # switch differential hues (blue<->red)
--notes TEXT     # add notes comment in SVG (for debugging)
--help           # this message

eg,
./flamegraph.pl --title="Flame Graph: malloc()" trace.txt > graph.svg

As suggested in the example, flame graphs can process traces of any event, such as malloc()s, provided stack traces are gathered.

Consistent Palette

If you use the --cp option, it will use the $colors selection and randomly generate the palette like normal. Any future flamegraphs created using the --cp option will use the same palette map. Any new symbols from future flamegraphs will have their colors randomly generated using the $colors selection.

If you don't like the palette, just delete the palette.map file.

This allows your to change your colorscheme between flamegraphs to make the differences REALLY stand out.

Example:

Say we have 2 captures, one with a problem, and one when it was working (whatever "it" is):

cat working.folded | ./flamegraph.pl --cp > working.svg
# this generates a palette.map, as per the normal random generated look.

cat broken.folded | ./flamegraph.pl --cp --colors mem > broken.svg
# this svg will use the same palette.map for the same events, but a very
# different colorscheme for any new events.

Take a look at the demo directory for an example:

palette-example-working.svg
palette-example-broken.svg

flamegraph's People

Contributors

actinium avatar agentzh avatar awreece avatar brendangregg avatar cpaxton avatar danielcompton avatar ekacnet avatar emaste avatar fpavageau avatar hassec avatar iyangsj avatar jan-konczak-cs-put avatar knittl avatar mattdudys avatar matthew-olson-intel avatar milianw avatar mjguzik avatar mlauter avatar nitsanw avatar psanford avatar randomstuff avatar rhysh avatar saruspete avatar simoneismann avatar skarcha avatar timbunce avatar tjake avatar toddlipcon avatar unpluggedcoder avatar versable avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flamegraph's Issues

Do we need vbox debuginfo fro the package

Hi Brendan,

I am running your script on centos6 and found the issues as the following.

Do I need Vbox debuginfo?I couldn't find el6 version,..

Thanks,

Y.H. Chang

[root@localhost FlameGraph]# perf script | ./stackcollapse-perf.pl > out.perf-folded
no symbols found in /opt/VBoxGuestAdditions-4.2.4/sbin/VBoxService, maybe install a debug package?
Failed to open /tmp/perf-13664.map, continuing without symbols
Failed to open /tmp/perf-13644.map, continuing without symbols
No kallsyms or vmlinux with build-id 109273b95e150142afb31a6b1ecd0053a6147f79 was found
[vboxguest] with build id 109273b95e150142afb31a6b1ecd0053a6147f79 not found, continuing without symbols
Failed to open /tmp/perf-13356.map, continuing without symbols

[root@localhost FlameGraph]#

-rw-------. 1 root root 10080492 Apr 16 14:03 perf.data

Flamegraph error when passing it what seems like a valid folded file

I seem to have a valid out-perf.folded file but the flamegraph.pl program complains when run on this file:
Ignored 25155 lines with invalid format
ERROR: No stack counts found

The way I made the file was not the usual way. I run on an ARM target that cannot run perl. So what I do is store the contents of perf script to a file (say perf.file) and then copy said file to an x86 machine (where I can run perl).

So something like this:
#On ARM machine
perf record -F 99 -p my_pid -- sleep 20
perf script > perf_script_output_file

#On x86 machine
#Get hold of the perf_script_output_file from the ARM machine.
cat perf_script_output_file >  FlameGraph/stackcollapse-perf.pl > out.perf-folded
# This runs fine without errors and when I view the out.perf-folded file I see all my stack traces.

FlameGraph/flamegraph.pl out.perf-folded > perf-kernel.svg
Ignored 25155 lines with invalid format
ERROR: No stack counts found

What should I be looking for to figure out why this is happening?

Features to aid embedding flame graphs in other tools

Hello Brendan.

I'm working on integrating flame graphs into Perl's Devel::NYTProf profiler.
I'd like to suggest a few features that would help me with that integration.

  • Add option(s) to make the svg areas be links. Something like --linkfmt "../foo/%s/src" where the %s gets replaced with the function name. So when
  • Options to change the wordings: the title, "Function", "sample" etc. For my use "Function" would be "Subroutine" and "samples" would be "microseconds".
  • Add separators into the count, so 18459905 would be shown as 18,459,905.

If you like the ideas I'll develop them and send you pull requests.

Loved it - but needed it to work with pstack output

Hi Brendan,

I used your flamegraph scripts and found them great! However my stack samples were in a slightly different format than those supported, as they were produced by repeatedly calling pstack at intervals on a process running a gcc-compiled executable coded in C++.
I have the modified version of the script which I am willing to contribute, as well as examples of the stack format if you're interested. Please just let me know.

Regards,

Laurent.

Non-default height parameter causes overlap with "Reset Zoom" and page title

Hi,

when creating a flamegraph with --height=32 the SVG which is created does contain a flamegraph with larger rows, but the "Reset Zoom" button and the page title "Flame Graph" on top are now partially hidden underneath the flamegraph itself.

Thanks for the great tool and for looking into this issue.

test suite

Need a test suite to ensure fixes for one profile output (eg, golang) don't break another.

Should use --hash to ensure repeatable output. No random numbers.

Re-licencing with MIT

We are working on some stap utilities for nodejs and go and would like to bundle the perl scripts. This would allows us to generate an SVG file from a pid in one step and would greatly help debugging CPU issues.

Would it be possible to re-licence these scripts as MIT ?

Reduce memory usage

I'm trying to use FG for the first time, and it gets OOM-killed...
I've got 4GB ram + 2GB swap and this is apparently not sufficient :-/

$ dmesg
[...]
[25783.912839] Out of memory: Kill process 26662 (stackcollapse-p) score 788 or sacrifice child
[25783.912844] Killed process 26072 (stackcollapse-p) total-vm:4768696kB, anon-rss:3372340kB, file-rss:0kB
$ free
              total        used        free      shared  buff/cache   available
Mem:        4035936      634612     3239928       39348      161396     3312444
Swap:       2000088      672080     1328008
$ ls -lh perf.data
-rw------- 1 vince vince 107M Jun 27 13:27 perf.data

Is that really too much perf data to process or is there something that leaks memory in FG ?

[unknown] functions

Frequently I see in perf sript entries like this:

httpd 18294 [008] 205996.782023:   10101010 cpu-clock:
                  2b20e5 [unknown] (/opt/remi/php56/root/usr/lib64/httpd/modules/libphp5.so)
            2b5e1de3f698 [unknown] ([unknown])
            2b5e00000013 [unknown] ([unknown])
        4c20ec834853fd89 [unknown] ([unknown])
mysqld 18314 [003] 205996.822418:   10101010 cpu-clock:
                  39a7c4 prev_record_reads(st_position*, unsigned int, unsigned long long) (/usr/libexec/mysqld)
                  39c65c best_access_path(JOIN*, st_join_table*, unsigned long long, unsigned int, bool, double, st_position*, st_position*) (/usr/libexec/mysqld)
                  39e997 [unknown] (/usr/libexec/mysqld)
                  39ed0f [unknown] (/usr/libexec/mysqld)
                  39ed0f [unknown] (/usr/libexec/mysqld)
                  39ed0f [unknown] (/usr/libexec/mysqld)
...

Then flame graph is occupied with a lot of [unknown] functions, even though some of unknowns have binary associated with them. Could you modify stackcollapse-perf.pl if it sees [unknown] but there's binary to show binary name, it could be like [/path/bin] for example. Or add option to do such replacements.

Failed to open [vsyscall], continuing without symbols

hi all,
I used perf record to record cpu profile my nodejs app (with flag --perf_basic_prof_only_functions, node 5.1.0).
perf work fine but when i try to Fold stacks with script
perf script | ./stackcollapse-perf.pl > out.perf-folded

it gen error

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Failed to open [vsyscall], continuing without symbols

What wrong? please help me fix this error

PS : with above error, i gen file framegraphs svg, but it not show source code detail with cpu clock time

screen shot 2016-08-13 at 1 55 37 pm

with perf script out put i've 'unknown symbol'

node 29327 22196701.429206:   10101010 cpu-clock:
             cb0e7d935e7 [unknown] (/tmp/perf-29327.map)

Still Unknown Functions!

Hi,I use '--perf-basic-prof' option while profiling nodejs,and I also get a file named '/tmp/perf-PID.map' under /tmp directory, but when I generate flame graph using 'perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > MAIN.svg', it remains many 'unknown' label in my graph, how could that be?
Thank you!

perf script command returns like this:
image

FlameGraph has poor usability and performance with large multiprocess application.

Hello, thank you very much for FlameGraph. It works very well. However, it is a bit awkward to use with multiprocess applications. This isn't that bad though. What is bad is the bad performance it has in Firefox version 34.0. Maybe this is a bug in Firefox though or maybe you just commonly develop on an alternative browser such as Google Chrome. As a side note, Emacs also has bad performance with it but I wouldn't expect Emacs to have that great performance anyways. Also gThumb has performance troubles with it as well. My application is at https://gitorious.org/linted/linted/source/c5e68486a16c58959d9f7d777016e62746d0b25d: . A gist of the SVG is at https://gist.github.com/sstewartgallus/c1b90061c81eb5a1fb5b

Include scheduling events (or syscall usually invoking the scheduler)

After watching your recent (fantastic) presentation at SCaLE13, I decided to try out a few of the things you showcased, especially the flame graph. Proudly, I showed my colleagues a nice way to lay out a where a binary/system is spending its time.

One of my colleagues threw me a curve-ball though. He said, trace me this program:

#include <unistd.h>

int compute(int start) {
  for (int i = 0; i < 10000000; ++i) {
     start += i;
  }
  return start;
}

int main(int argc, char **) {
  int ret = 0;
  for (int i = 0; i != 5; ++i) {
     sleep(argc == 1 ? 0 : 1);
     ret += compute(argc);
  }
  return ret;
}

So I traced it with:

sudo perf record -F 500 -g --call-graph dwarf -- ./test 1 && sudo perf script | ~/github/FlameGraph/stackcollapse-perf.pl | ~/github/FlameGraph/flamegraph.pl > test.svg

Naturally, I found that the nanosleep system calls were not being shown in the flame graph. This is where the binary spends most of its time. As you illustrate in this article about speeding up vim, sometimes nanosleep is indeed the culprit. It's not really obvious from the flamegraph (I don't even see that it calls do_nanosleep, let alone spend a lot of time in it). I thought to myself that this must be because the scheduler is just running some other binary instead and thus perf can't see it. This is good for whole system profiling but worse if one tries to look at one (or a group of) binaries in isolation, where "wait" time is important, especially if introduced by locking or sleeping.

I looked around and found some things to try on https://perf.wiki.kernel.org/index.php/Tutorial, namely:

$ ./perf record -e sched:sched_stat_sleep -e sched:sched_switch  -e sched:sched_process_exit -g -o ~/perf.data.raw ~/foo
$ ./perf inject -v -s -i ~/perf.data.raw -o ~/perf.data
$ ./perf report --stdio --show-total-period -i ~/perf.data

This is nice, and indeed shows the sleep times. I can't convert this into a flamegraph though (I guess I was foolish to expect this to work, event tracing must be wholly different from sampling).

Is a flamegraph the wrong tool to illustrate this with? I actually believe it could be useful.

I don't expect you to solve this issue for me, but if you could point me in the right direction perhaps I could whip up the tool and show my colleague I can succinctly illustrate where the program is spending its time. (Showing both the actual on-CPU stracktraces and "other" areas like sleeping/locking/... in one visual representation).

Once again, thanks for the fantastic tools.

perf overwhelms stack trace

First off -- many thanks, and especially for the work on getting combined Java/C++ stack traces. That's a big win for me, as I work a lot with C++ code hosted in a Java-based FIX engine.

The issue I'm running into is that the flame graph ends up with a single huge spike which resolves to native_write_msr_safe inside perf code -- and that completely throws off the scaling for the rest of the graph.

I tried looking for a way to exclude certain symbols from the graph, but didn't see anything. Any suggestions would be most welcome.

The delta percentage numbers in the differential flame graphs look confusing.

Hi Brendan,

I'm very excited about the new differential flame graphs, especially the negated ones for assessing optimization results. Thank you for doing it! But I've found the delta percentage numbers rather confusing and are not what I'd expect.

For example, if a specific tower in the "before" flame graph disappears completely in the "after" graph, I'd expect the function frames in the negated graph show deltas like "-100%" but I'm getting ridiculously small numbers like "-0.8%" due to the way this delta percentage is calculated (which IMHO does not make much sense).

Please consider the following minimal example:

main;foo 1 2
main;foo;bar 0 4

The resulting negated diff flame graph is shown here: http://agentzh.org/misc/flamegraph/tiny-negated-diff.svg

For these diff folded backtraces, the total reduction in foo()'s time should be (1-(4+2))/(4+2) = -0.83 (or -83%), but in the SVG, the "foo" frame shows -16.67%. On the other hand, the bar function should show (0-4)/4 = -1 (or -100%), but the graph actually shows -66.67% at the bar frame. These numbers are quite far from my intuition.

I can understand that the current calculation is the delta percentage regarding the whole sample space but for optimizations, I only concern about how radically a particular tower changes even though its overall ratio is not that big. And even for the whole sample space ratio calculation approach, the current results still do not look right ;)

What's your opinion on this?

Does FlameGraph support Java 8 features such as lambda expressions?

If so, are there differences how to use the tool in combination with perf?

I suppose mostly the answer relies on JVM support for frame pointers for lambda expressions for perf. I can provide a small example that when the tooling is used for Java 8 features, the stack elements on the graph are presented different than for example how an anonymous inner class implementation of Runnable would appear using Java 7.

Non-ascii characters are not treated right

This code
substr($text, -2, 2) = ".." if $chars < length $func;
breaks treating non-ascii characters, like German umlauts right.

I just removed it; now it woks without the ".."; however, it does work.

Find a way to share UI features between our libs instead of implementing them twice

I maintain a very similar flamegraph lib. Basically the perl scripts were ported from here to JavaScript to run entirely in the browser.
So someone asked to implement a zoom feature since you guys did and so I added one except it's panning - so a bit different than you did it.

That got me thinking that it'd be nice to collaborate on the UI layer more. It shouldn't matter that we generate the svg in different ways, as long as it was compatible we could have a UI that consumes that svg and adds features to it.
Just a thought in order to get more features by implementing them once instead of in two places.

Let me know what you think about that idea and if you agree we can think about the steps to get things compatible again (I add some data-elements mainly to do the zoom) and then figure out how to pull out the UI features and make them work with either lib.

no way to recover full function name when it is truncated because of length

I am running FlameGraph on a c++ project that makes heavy use of templates.
As a result, function names can be very, very long, and they are truncated with no way to recover the full name.

A solution could be showing the full name in the tooltip while leaving the truncated name in the graph itself.

Option to sort by number of samples

Are functions always sorted by names in a result graph?
If yes, then can it be possible to add option to sort them such way that most sampled would be on the left side and less sampled on the right?

Java coloring and new output format of perf-map-agent

Since this recent commit the class signature of entries is cleaned up to look like a proper class name. This breaks the Java coloring scheme which depends on slashes being found in the name. Can we come up with a more distinctive naming scheme which can be enabled on perf-map-agent and which is then recognized by the flame graph script? E.g. prefixing Java JIT entries with jit: or similar?

White/black differentials

Differential flame graphs I think is still not quite solved. My red/blue approach, http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html, has a problem: code paths missing entirely in the second profile will be missing from the flame graph.

There's been various ideas of how to fix this (like flamegraphdiff). I thought of another, which aims to be intuitive and leverage as much as possible of the existing flame graph organization:

The idea is like a chess board: there are pieces on the board in play, and then pieces on the side that are out of play. This takes my red/blue differential flame graph, and has an extended region on the right for code paths out of play. Like the side of a chess board. Mock up:

zfs-flamegraph-diff2

So in the included (white) area, red shows code paths that grew, and blue shows code paths that shrunk.

In the missing (black) area, only some blue code paths will be shown: that were missing from the 2nd profile entirerly.

This missing area may contain a lot, very little, or nothing (just the title "Missing", with nothing there), and would need to grow or shrink (down to the title size) as appropriate.

(Note: I tried a few things, and settled on the flat color backgrounds.)

Suggestion: merge normal and "--reverse" output into a single SVG

I'm not much of an SVG or JS expert so I'm probably not well set up to attempt this myself. It's just an idea I had so I thought I'd post it here.

One thing I love about flamegraph SVGs is the interactivity -- being able to zoom around makes dissecting even complicated traces very fast.

However, often I find myself generating both a normal and a --reverse flamegraph and alternating between them. Wouldn't it be great if there was just a JS button that did that immediately? Especially combined with the existing drill-down functionality -- drill-into a function that is causing a problem and immediately be able to look at its other callstacks.

R implementation

Spur of the moment question. Do you think a fork to code this using 'R' is useful ? The codebase only processes the data. Right ?
Not an issue but didn't know about the appropriate forum to ask this.

fiddle purple palette

The purple palette needs to be fiddled to be a little bit more distant to the search color

[Linux 4.8.6] New perf script output format

I have installed kernel 4.8.6, and doing

perf record -F 99 -a -g -- sleep 10

followed by

perf script

gives

swapper     0 [000]   147.374682:          1 cycles:ppp: 
        ffffffff810631d4 native_write_msr+0x4 ([kernel.kallsyms])
        ffffffff8100b720 intel_pmu_enable_all+0x10 ([kernel.kallsyms])
        ffffffff81007403 x86_pmu_enable+0x263 ([kernel.kallsyms])
        ffffffff8117dcc2 perf_pmu_enable+0x22 ([kernel.kallsyms])
        ffffffff8117eba1 ctx_resched+0x51 ([kernel.kallsyms])
        ffffffff8117ed90 __perf_event_enable+0x1e0 ([kernel.kallsyms])
        ffffffff81177726 event_function+0xa6 ([kernel.kallsyms])
        ffffffff81178c1f remote_function+0x3f ([kernel.kallsyms])
        ffffffff8110887b flush_smp_call_function_queue+0x7b ([kernel.kallsyms])
        ffffffff81109363 generic_smp_call_function_single_interrupt+0x13 ([kernel.kallsyms])
        ffffffff81050d87 smp_call_function_single_interrupt+0x27 ([kernel.kallsyms])
        ffffffff8173ceec call_function_single_interrupt+0x8c ([kernel.kallsyms])
        ffffffff815b3427 cpuidle_enter+0x17 ([kernel.kallsyms])
        ffffffff810c6f6b cpu_startup_entry+0x2cb ([kernel.kallsyms])
        ffffffff8172dd47 rest_init+0x77 ([kernel.kallsyms])
        ffffffff81da5191 start_kernel+0x4a7 ([kernel.kallsyms])
        ffffffff81da45d6 x86_64_start_reservations+0x2a ([kernel.kallsyms])
        ffffffff81da4724 x86_64_start_kernel+0x14c ([kernel.kallsyms])

F.ex. x86_64_start_kernel+0x14c instead of x86_64_start_kernel.

Should stackcollapse-perf.pl automatically remove this, or should there be a new option for it ?

RHEL 7.2 w/ kernel-ml-4.8.6 from ELRepo.

relative percentage in flamegraph

I don't know if it's possible, but it will be great, when you select a subpart of the stack, that the inner function calls contains also the relative % with 100% being the selected part.

stackcollapse-jstack.pl is not generating correct files

after running
stackcollapse-jstack.pl 29072.stack.log > tmp.log

i only got one line below in tmp.log
DestroyJavaVM 59

I'm using JDK 1.7 on Ubuntu;
I can not paste stacktrace now. is there any chance to fix the perl without the stacktrace.

Why does flamegraph.pl filters the data?

Dear Brendan,

I've attached here
https://drive.google.com/open?id=0B3DMXMfcPWF3ZTVpRGpDblFrVmc

the trace file of a size of 38MB which I've got after profiling Hadoop cluster using StatsD JVM Profiler (actually my fork of it).

The file sortedx.txt in the attachment has a well formed structure:
stacktrace1 count
stacktrace2 count
...

etc.

No errors. Besides, I sorted stacktraces alphabetically.

But when I run ./flamegraph.pl sortedx.txt > sortedx.svg, I get the output of just 300KB is size. Not as expected, of course.

I see that there are stacktraces which have enormous count (contrary to the rest), so I later launched a command with awk that filtered unnecessary strings and did this:

awk '!/EPollArrayWrapper|org.apache.hadoop.net.unix.DomainSocketWatcher|java.lang.UNIXProcess.waitForProcessExit|org.apache.hadoop.util.Shell$1.run:509;java.io.BufferedReader.readLine:382;java.io.BufferedReader.readLine:317;java.io.BufferedReader.fill:154;java.io.InputStreamReader.read:184;sun.nio.cs.StreamDecoder.read:177;sun.nio.cs.StreamDecoder.implRead:325;sun.nio.cs.StreamDecoder.readBytes:283;java.io.BufferedInputStream.read:334;java.io.BufferedInputStream.read1:273;java.io.FileInputStream.read:272;java.io.FileInputStream.readBytes/' sortedx.txt | ./flamegraph.pl >sortedx_after_filtering.svg

AND.... sortedx_after_filtering.svg is already 3MB in size! MUCH BETTER.

The question is next: why do you filter some stacktraces? Could you please, make an option in the
flamegraph.pl to REMOVE filtering of anything?

I am asking because many processes in the cluster are running those "waiting" threads like "EPollWait" etc., but I don't care about them - I will simply drop down to where I need.

Could you please make a fix?

Thanks

"java 19983 cycles:" not matched as the event header - perf v3.2.28

Environment:
hostname : xguan-ubuntu
os release : 3.2.0-31-generic
perf version : 3.2.28
arch : x86_64
cmdline : /usr/bin/perf_3.2.0-31 record -F 99 -p 19982 -g sleep 30

...

The perf.out:

java 19983 cycles:
7f7241fbe6d8 arrayOopDesc::base(BasicType) const (/home/xguan/opt/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so)
7f7241239aec writeBytes (/home/xguan/opt/jdk1.8.0_112/jre/lib/amd64/libjava.so)
7f724123176f Java_java_io_FileOutputStream_writeBytes (/home/xguan/opt/jdk1.8.0_112/jre/lib/amd64/libjava.so)
7f722d119786 Ljava/io/FileOutputStream;::writeBytes (/tmp/perf-19982.map)
7f722d142778 Ljava/io/PrintStream;::print (/tmp/perf-19982.map)
7f722d12e1d4 LBusy;::main (/tmp/perf-19982.map)
7f722d0004e7 call_stub (/tmp/perf-19982.map)
7f72421d2d76 JavaCalls::call_helper(JavaValue_, methodHandle_, JavaCallArguments_, Thread_) (/home/xguan/opt/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so)
7f72421ed566 jni_invoke_static(JNIEnv__, JavaValue_, jobject, JNICallType, jmethodID, JNI_ArgumentPusher_, Thread_) (/home/xguan/opt/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so)
7f72421f987a jni_CallStaticVoidMethod (/home/xguan/opt/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so)
7f724309ebdf JavaMain (/home/xguan/opt/jdk1.8.0_112/lib/amd64/jli/libjli.so)
7f72432b4e9a start_thread (/lib/x86_64-linux-gnu/libpthread-2.15.so)

When I tried:

./stackcollapse-perf.pl perf.out

Unrecognized line: java 19983 cycles: at ./stackcollapse-perf.pl line 264, <> line 3609.
Use of uninitialized value $pname in string eq at ./stackcollapse-perf.pl line 246, <> line 3610.

And the fix is to tweak the regular expression in line 177.

stackcollapse truncates function names when they contain brackets

Hello, I am using perf+FlameGraph to profile a heavily templated c++ application. Sometimes the type of a template contains extra brackets () and in that case stackcollapse-perf.pl truncates the function name to the first pair of brackets it encounters. A reproducer is attached: void exec<void is displayed instead of void exec<void (*)()>(void (*)()) as correctly displayed by perf and perf script.

The problem is that the regexp that should match the function name in stackcollapse is too greedy, but I do not know perl well enough to correct it myself. Any help would be greatly appreciated.

reproducer.txt

`<>` characters in C++ types get munged

C++ template types have lots of <s and >s in them. But they get filtered out by stackcollapse-perf.pl (and possibly others, didn't check), and the resulting SVG has things like

std::vectorint, std::allocatorint ::push_back

instead of

std::vector<int, std::allocator<int> >::push_back

The following patch fixes this, but I assume has unintended consequences:

diff --git a/flamegraph.pl b/flamegraph.pl
index def5ed0..ede4514 100755
--- a/flamegraph.pl
+++ b/flamegraph.pl
@@ -517,8 +517,6 @@ foreach (sort @Data) {
                $maxdelta = abs($delta) if abs($delta) > $maxdelta;
        }

-       $stack =~ tr/<>/()/;
-
        # merge frames and populate %Node:
        $last = flow($last, [ '', split ";", $stack ], $time, $delta);

diff --git a/stackcollapse-perf.pl b/stackcollapse-perf.pl
index 16b47f0..fdca6cc 100755
--- a/stackcollapse-perf.pl
+++ b/stackcollapse-perf.pl
@@ -135,7 +135,6 @@ foreach (<STDIN>) {
                next if $func =~ /^\(/;         # skip process names
                if ($tidy_generic) {
                        $func =~ s/;/:/g;
-                       $func =~ tr/<>//d;
                        $func =~ s/\(.*//;
                        # now tidy this horrible thing:
                        # 13a80b608e0a RegExp:[&<>\"\'] (/tmp/perf-7539.map)

Make interpreter location more platform-independent

The shebang of stackcollapse.pl begins with #!/usr/bin/perl. On FreeBSD, Perl is located in /usr/local/bin, causing the script to fail.

Does anything speak against using the canonical form #!/usr/bin/env perl?

stackcollapse-perf.pl borking on CUPSManager line

I have a line like:

CUPSManager cup 11320 186448.846888: 1008373 cycles:

in my script file

which means that when I run this:

cat first.script | ./stackcollapse-perf.pl > first.folded

I get:

Unrecognized line: CUPSManager cup 11320 186448.846888: 1008373 cycles: at ./stackcollapse-perf.pl line 228, <> line 133186.

and it seems to be not processing the majority of my data

Search misses filename matching

Node.js can have filenames in the function name. Eg, from the SVG:

<title>LazyCompile:~foo /apps/XXX/node_modules/once/once.js:22 (1 samples, 0.08%)</title>

But the search functionality only searches the first term. It was desired to match the second term, the filename, as well.

It can be fixed by removing these lines:

        function g_to_func(e) {
                var func = g_to_text(e);
//              if (func != null)
//                      func = func.replace(/ .*/, "");
                return (func);
        }

But the question is, what else does that break? From memory, I was probably doing this to filter out args on Java signatures...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.