koute / not-perf Goto Github PK
View Code? Open in Web Editor NEWA sampling CPU profiler for Linux
License: Apache License 2.0
A sampling CPU profiler for Linux
License: Apache License 2.0
First of all, thank you for this amazing work.
It's a nightmare to cross compile perf for arm/aarch64 with libdwarf support.
not-perf is really useful for embeeded Linux profiling and built-in flamegraph generator is also nice feature. :)
Currently, it seems not-perf only support profiling specific process(pid) with -p/-P argument.
I am wondering is it easy to add support for system-wide profiling option, just like the -a/--all-cpus.
Thanks again for this great work.
Typically I use nperf
to render a flamegraph like so.
nperf flamegraph --merge-threads perf.data > perf.svg
htop
shows it just uses a single cpu core used:
It takes a long while to render the flamegraph on a Machine with 8 CPUs, 16GB RAM.
Will it be possible add an option to increase concurrency of flamegraph generation? I'm not too familiar with the internals of nperf.
I will definitely be more than happy to investigate, and try implementing it, if it's a good feature to have.
Currently nwind
does not support PowerPC, from what I could see to support a new architecture src/arch/${arch}.rs
and src/arch/${arch}_get_regs.s
are the main components.
here the ABI specification.
For example if I want to flamegraph the last 30seconds of a program, but I donβt know how long it ran exactly. Perhaps can provide negative values to from and to? Happy to make a PR.
There needs to be many more examples on how to use this tool except the single example in the README.md which just shows how to profile into a flamegraph (which is not very useful).
What are all the other options about?
Is the tool output compatible with any existing other parsing tools made for "perf"?
How do you get a source-code annotated sampled histogram?
Even with the flamegraph built-in thing and full debug symbols in the binary I'm profiling I just get a bunch of 0x123124125 adresses and the flamegraph SVG is just cut off and unreadable at any of the more tiny slice sizes..
Hi,
Thanks for the useful tool! I had trouble building this project though, because the Cargo.lock file is missing and so defaulted to using gimli 0.16.1
for some of the dependencies. This caused the replace to not work correctly.
Thanks!
Hi! There are some suggestions in the README that this perf-version was designed with cross compilation in mind etc and there are some very brief instructions on how to cross compile it. Could you please expand this into a working example?
For example, on Ubuntu 20.04 you can install gcc-9-arm-linux-gnueabihf and in it you can find a gcc (the "linker" for Rust .config) but where do you find the "sys-root"?
I would like to profile benchmarks running instead of a target bin file directly. What would be a correct CLI syntax for that? I've tried the following unsuccessfully (the output datafile
is not generated):
% cargo run record -P $(cargo bench) -w -o datafile
Compiling htsget-benchmarks v0.1.0 (/Users/rvalls/dev/umccr/htsget-rs/htsget-benchmarks)
Finished bench [optimized + debuginfo] target(s) in 7.37s
Running benches/refserver_benchmarks.rs (/Users/rvalls/dev/umccr/htsget-rs/target/release/deps/refserver_benchmarks-fecd9ace2aeca2e9)
Running benches/request_benchmarks.rs (/Users/rvalls/dev/umccr/htsget-rs/target/release/deps/request_benchmarks-f78a77d95f70b5d1)
Running benches/search_benchmarks.rs (/Users/rvalls/dev/umccr/htsget-rs/target/release/deps/search_benchmarks-407b85d5c1d86b5d)
Benchmarking Queries/[LIGHT] Bam query all
Benchmarking Queries/[LIGHT] Bam query all: Warming up for 3.0000 s
Benchmarking Queries/[LIGHT] Bam query all: Collecting 50 samples in estimated 30.048 s (487k iterations)
Benchmarking Queries/[LIGHT] Bam query all: Analyzing
Benchmarking Queries/[LIGHT] Bam query specific
Benchmarking Queries/[LIGHT] Bam query specific: Warming up for 3.0000 s
Benchmarking Queries/[LIGHT] Bam query specific: Collecting 50 samples in estimated 30.260 s (66k iterations)
Benchmarking Queries/[LIGHT] Bam query specific: Analyzing
Benchmarking Queries/[LIGHT] Bam query header
Benchmarking Queries/[LIGHT] Bam query header: Warming up for 3.0000 s
Benchmarking Queries/[LIGHT] Bam query header: Collecting 50 samples in estimated 30.096 s (282k iterations)
Benchmarking Queries/[LIGHT] Bam query header: Analyzing
error: a bin target must be available for `cargo run`
/cc @mmalenic
When parsing something into the chromium trace format, the JSON that is output is broken, it has a missing field and each element in the array starts with a , which is illegal json format (and the chrome tracing load complains):
[,{"name": "0x0000000000431FD7 [main]","ph": "B","ts": 3627138130963.042,"pid": 10357,"tid": 10357}
This after: # ./nperf trace-events -o main_prof.trace --granularity line main_prof
We have an occasional issue where using not-perf crashes profiled process in AArch64 HW.
Don't have much data yet, but the callstack is as follows:
- tid: 23863 # --------------------------------------------------
proc_dump: ~
user_time: 4.170000
system_time: 2.450000
registers: [
0x0000000000000000, 0x0000007ff4cdfdf0, 0x0000000000000000, 0x0000000000000008,
0x0000000000000000, 0x0000007ff4cdfdf0, 0xffffffffffffffff, 0xffffffffffffffff,
0x0000000000000087, 0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff,
0xffffffffffffffff, 0xffffffffffffffff, 0x0000000000000000, 0x0000000000000035,
0x0000007f7d2b2a60, 0x0000007f78d08ec8, 0x0000007f78e3a7f4, 0x0000000000000006,
0x0000007f783b2020, 0x0000007f783b2720, 0x0000000000000100, 0x0000007f77f66028,
0x0000007ff4ce0618, 0x0000007ff4ce06b0, 0x0000007ff4ce0638, 0x0000000000000000,
0x0000000000000000, 0x0000007ff4cdfdd0, 0x0000007f78d07f0c, 0x0000007ff4cdfdd0,
0x0000007f78d07f0c, 0x0000000000000000 ]
backtrace: [
{ a: 0000007f78d07f0c, s: gsignal, o: 0x9c, l: 0xcc, e: 0, S: 0, f: "/usr/lib64/libc-2.28.so" },
{ a: 0000007f78d09000, s: abort, o: 0x138, l: 0x22c, e: 0, S: 0, f: "/usr/lib64/libc-2.28.so" },
{ a: 0000007f7d1b5e1c, s: _ZN5nwind15local_unwinding5abort17h255a5769eb294e0dE, o: 0xcc, l: 0xd0, e: 0, S: 0, f: "/opt/memprof/aarch64/libmemory_profiler.so" },
{ a: 0000007f7d1b6a4c, s: nwind_on_exception_through_trampoline, o: 0x444, l: 0x448, e: 0, S: 0, f: "/opt/memprof/aarch64/libmemory_profiler.so" },
{ a: 0000007f7d27c1f8, s: nwind_ret_trampoline, o: 0x44, l: 0x50, e: 1, S: 0, f: "/opt/memprof/aarch64/libmemory_profiler.so" } ]
Unfortunately I don't know yet if it's first or second abort() in said function :(
Did this happen before? Could it be issue of application (e.g. memory corruption), or some corner case bug in not-perf as this seems to be during exception handling (?).
Hi, i am trying to run not-perf on an armv7 machine but when i try to run nperf record -p XXX
it says perf_event_open failed, maybe it's some silly mistake of mine and I have to edit some other file that I have no idea about or something like that but I haven't found more information on the internet
This is the machine
# lscpu
Architecture: armv7l
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
Model name: ARMv7 Processor rev 10 (v7l)
the perf_event_paranoid file
# cat /proc/sys/kernel/perf_event_paranoid
0
this is the output from nperf
[2001-01-08T22:20:43Z INFO nperf_core::profiler] Opening "20010108_222043_00223_web_server.nperf" for writing...
[2001-01-08T22:20:43Z INFO nperf_core::cmd_record] Opening perf events for process with PID 223...
[2001-01-08T22:20:43Z ERROR perf_event_open::perf] The perf_event_open syscall failed for PID 223: Operation not permitted (os error 1)
[2001-01-08T22:20:43Z WARN nwind::frame_descriptions] No .eh_frame section found for '[vdso]'
[2001-01-08T22:20:43Z INFO nperf_core::cmd_record] Enabling perf events...
[2001-01-08T22:20:43Z INFO nperf_core::cmd_record] Running...
[2001-01-08T22:20:43Z INFO nperf_core::profiler] Finished output file initialization
[2001-01-08T22:20:52Z INFO nperf_core::profiler] Collected 0 samples in total!
What could be the possible cause for inode mis-match?
i'm trying to profile the startup of my application, which should be done in the 100ms that we're waiting between attempts at finding the process; would there be interest in such a feature? maybe some guidance on where to start/add that? i.e. i'd be willing to help implementing, just making sure i didn't get the docs wrong and it already exists, and/or avoid putting in time and effort and the feature being not implemented on purpose.
Hello everybody π
This is a feature request for supporting the profiling format of pprof
https://github.com/google/pprof/blob/main/proto/profile.proto
Because the native output format seems not to be compatible with the perf
profiling data, it is currently not possible to use the tool pprof
. Correct me if I am wrong here.
We have an app which requires nightly channel and uses this crate as a dep. Lru in version "0.2.0" on 1.48.0-nightly is probably outdated, so we encounter many panics.
running 47 tests
test cmd_trace_events::test_emit_events_4 ... ok
test cmd_trace_events::test_emit_events_5 ... ok
test cmd_trace_events::test_emit_events_1 ... ok
test cmd_trace_events::test_emit_events_2 ... ok
test cmd_trace_events::test_emit_events_3 ... ok
test cmd_trace_events::test_emit_events_8 ... ok
test cmd_trace_events::test_emit_events_6 ... ok
test cmd_trace_events::test_emit_events_7 ... ok
test data_reader::test::collate_aarch64_hot_spot_usleep_in_a_loop_fp ... FAILED
test data_reader::test::collate_aarch64_perfect_unwinding_usleep_in_a_loop_no_fp ... FAILED
test data_reader::test::collate_aarch64_perfect_unwinding_usleep_in_a_loop_fp ... FAILED
test data_reader::test::collate_aarch64_hot_spot_usleep_in_a_loop_no_fp ... FAILED
test data_reader::test::collate_aarch64_noreturn ... FAILED
test data_reader::test::collate_aarch64_perfect_unwinding_floating_point ... FAILED
test data_reader::test::collate_amd64_hot_spot_usleep_in_a_loop_fp ... FAILED
test data_reader::test::collate_amd64_hot_spot_usleep_in_a_loop_no_fp ... FAILED
test data_reader::test::collate_amd64_hot_spot_usleep_in_a_loop_no_fp_online ... FAILED
test data_reader::test::collate_amd64_inline_functions ... FAILED
test data_reader::test::collate_amd64_perfect_unwinding_floating_point ... FAILED
test data_reader::test::collate_amd64_noreturn ... FAILED
test data_reader::test::collate_amd64_perfect_unwinding_pthread_cond_wait ... FAILED
test data_reader::test::collate_amd64_perfect_unwinding_usleep_in_a_loop_fp ... FAILED
test data_reader::test::collate_amd64_perfect_unwinding_usleep_in_a_loop_fp_only_eh_frame_hdr ... FAILED
test data_reader::test::collate_amd64_perfect_unwinding_usleep_in_a_loop_fp_only_loaded_eh_frame ... FAILED
test data_reader::test::collate_arm_hot_spot_usleep_in_a_loop_fp ... FAILED
test data_reader::test::collate_arm_hot_spot_usleep_in_a_loop_no_fp ... FAILED
test data_reader::test::collate_arm_inline_functions ... FAILED
test data_reader::test::collate_arm_perfect_unwinding_floating_point ... FAILED
test data_reader::test::collate_mips64_inline_functions ... ignored
test data_reader::test::collate_arm_noreturn ... FAILED
test data_reader::test::collate_amd64_pthread_cond_wait ... FAILED
test data_reader::test::collate_amd64_perfect_unwinding_usleep_in_a_loop_no_fp ... FAILED
test data_reader::test::collate_arm_perfect_unwinding_usleep_in_a_loop_fp ... FAILED
test data_reader::test::collate_arm_perfect_unwinding_usleep_in_a_loop_no_fp ... FAILED
test mount_info::test_parse_mountinfo ... ok
test mount_info::test_path_resolver ... ok
test profiler::tests::reload_which_clears_base_address_does_not_panic ... ok
test data_reader::test::collate_mips64_hot_spot_usleep_in_a_loop_fp ... FAILED
test profiler::tests::spurious_reload_with_no_base_address_does_not_panic ... ok
test profiler::tests::test_update_maps_basic ... ok
test data_reader::test::collate_mips64_perfect_unwinding_usleep_in_a_loop_fp ... FAILED
test data_reader::test::collate_mips64_perfect_unwinding_floating_point ... FAILED
test data_reader::test::collate_mips64_noreturn ... FAILED
test data_reader::test::collate_mips64_hot_spot_usleep_in_a_loop_no_fp ... FAILED
test data_reader::test::collate_mips64_perfect_unwinding_usleep_in_a_loop_no_fp ... FAILED
test data_reader::test::collate_mips64_pthread_cond_wait ... FAILED
test profiler::tests::reloading_never_panics ... ok
failures:
---- data_reader::test::collate_aarch64_hot_spot_usleep_in_a_loop_fp stdout ----
thread 'data_reader::test::collate_aarch64_hot_spot_usleep_in_a_loop_fp' panicked at 'attempted to leave type `lru::LruEntry<u64, nwind::frame_descriptions::CachedUnwindInfo>` uninitialized, which is invalid', /home/mzwolins/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/mem/mod.rs:658:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
.
.
.
<more same panics>
It looks like any alignment but bumping is not needed. Minimal working lru version is 0.4. Would you mind updating it?
Executing repro.sh.txt produces compilation failure with sigsev, logs: log.txt
I'm trying to profile an application that has compressed debug sections, but the resulting flamegraph does not show the function names.
Decompressing the symbol file prior to analysis work, but it would be nice if we could support compressed debug sections. Here's the reproduction steps:
mkdir /demo
mkdir /usr/lib/debug/demo/
cd /demo
wget -O demo.c https://gist.githubusercontent.com/thiagovice/d0f65a8b5e8cc254e840e70477cff77e/raw/1eb029244b60b6d52b373a87eb8cb8dd7984539d/demo.c
gcc -ggdb -o demo -Wl,--compress-debug-sections=zlib demo.c
objcopy --only-keep-debug demo demo.sym
strip -s demo
mv demo.sym /usr/lib/debug/demo/demo
./demo &
timeout 10s ./nperf record -F 997 -p `pgrep -f 'demo'` -o datafileDemo
./nperf flamegraph datafileDemo -d /usr/lib/debug/ > flame.svg
The resulting flame.svg, won't show the function properly, only addressees.
Removing the linking option "-Wl,--compress-debug-sections=zlib" and repeating the process gets a correct flamegraph
I think, at least we should add git tags and gh releases. To make for users alignments more straightforward. Each release could hold short description of the changes in exposed modules and what alignments are required.
I think it could be uploaded to crates.io, but I do not if there are some cons? (maybe nwind should be separated to different repository or maybe it can be embedded in nperf crate?).
Would it be easy to add demangling support based on a filter program, so users can plug in e.g. c++filt?
Example message from command nperf record -P <command name> -w
:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/execution_queue.rs:28:60
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted
Compare against perf's handling of this case:
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
Little/nothing that I know about Rust, I'm unable to compile this.
β $?=0 β€ cargo build --release
Compiling vec_map v0.8.0
Compiling lazy_static v1.0.0
Compiling sc v0.2.2
Compiling unicode-width v0.1.4
Compiling regex v0.2.11
Compiling regex-syntax v0.5.6
Compiling time v0.1.39
Compiling rand v0.4.2
Compiling memmap v0.6.2
Compiling num_cpus v1.8.0
Compiling memchr v2.0.1
Compiling atty v0.2.10
Compiling num-integer v0.1.36
error[E0554]: #![feature] may not be used on the stable release channel
--> /home/arastogi/.cargo/registry/src/github.com-1ecc6299db9ec823/sc-0.2.2/src/lib.rs:15:1
|
15 | #![feature(asm)]
| ^^^^^^^^^^^^^^^^
error: aborting due to previous error
error: Could not compile `sc`.
warning: build failed, waiting for other jobs to finish...
error: build failed
β β€ cargo --version
cargo 0.26.0
β β€
Hi!
I am looking for a profiling tool for rust-analyzer, and I wonder if not-perf
could be of help. I have some very specific requirements, but I am not a profiling expert, so I don't know if what I ask is at all possible, hence this feature-request/support issue :) Feel free to just close with "out of scope" if I ask for something silly!
rust-analyzer relies heavily on incremental computation, and I'd love to profile the incremental case. The interesting benchmark looks like this:
load_data_from_disk(); // 200ms of IO
compute_completion(); // triggers initial analysis, 10 seconds
{
change_single_file();
compute_completion(); // re computation after a change, 300 ms
}
I am only interested in profiling the block. Although the running-time of the benchmark is dominated by initial analysis, I explicitly don't care much about its performance.
So, what I like is to do
Is this possible at least in theory (ie, are there sampling tools than can give such data)? Could not-perf be of help here? My dream API would look like this:
load_data_from_disk();
compute_completion();
{
let _p = not_perf::record("~/tmp/profile_data.perf")
.append(); // append to the data file, so that I can run this in a loop
change_single_file();
compute_completion();
// stops profiling on Drop
}
Hi,
Currently, with this tool after doing profiling of a process, we can only get the details in a flame graph which is in SVG Format, the main issue of the flame which is a good option but the main issue is, it is really hard to read and do rapid analysis due to nearly not readable flame graphs.
So if we could provide a good way to profile the same after generating the data, it will be really helpful.
Best,
Vibhoothi
First of all, amazing work. I just had a chance to test it and works wonderfully and addresses an important use case for me.
While the collated output works great with flamegraph.pl
, I was also hoping to be able to use it with https://github.com/Netflix/flamescope.
Flamescope seems to be unable to parse the timestamps to divide the perf output to discrete time intervals. Is is because this is missing in the output unlike the normal perf
output would?
Thanks.
Looking at the following flamegraph:
flaminggraph.zip
The follow_finalized_head
is appearing twice as part of tokio-runtime-w [THREAD=96]
. The appearing after std::sys::unix::thread::Thread::new::thread_start [khala-node]
is also having much more samples. I would have assumed that the one originated from the raw poll is having more samples.
So, my question. Is everything working as expected or is there maybe some bug?
hi, any chance for adding windows support / removing unix dependencies?
Hi,
So I was trying to profile rav1e using not-perf, so after few seconds it stops, its saying thread 'main' panicked at 'attempt to subtract with overflow'
here
vibhoothiiaanand@coneBox:~/not-perf$ RUST_BACKTRACE=1 sudo /home/vibhoothiiaanand/not-perf/target/debug/nperf record -P rav1e -w -o datafile
[2019-08-16T05:54:14Z INFO nperf::ps] Waiting for process named 'rav1e'...
[2019-08-16T05:54:14Z INFO nperf::ps] Process 'rav1e' found with PID 4032!
[2019-08-16T05:54:14Z INFO nperf::profiler] Opening "datafile" for writing...
[2019-08-16T05:54:14Z INFO nperf::cmd_record] Opening perf events for 4032...
[2019-08-16T05:54:14Z INFO nperf::profiler] Ready to write profiling data!
[2019-08-16T05:54:15Z INFO nperf::cmd_record] Enabling perf events...
[2019-08-16T05:54:15Z INFO nperf::cmd_record] Running...
thread 'main' panicked at 'attempt to subtract with overflow', /home/vibhoothiiaanand/not-perf/nwind/src/dwarf.rs:179:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
[2019-08-16T05:54:42Z INFO nperf::profiler] Collected 27445 samples in total!
vibhoothiiaanand@coneBox:~/not-perf$
Device Specs:
Device: Raspberry Pi 3 B+
RAM: 1 GB
Arch: aarch64
Processor: Cortex-A53 (ARMv8) 64-bit SoC @ 1.4GHz
OS: Ubuntu 18.04.2 LTS
Hi
During memory profiling with bytehound I encountered crash with bracktrace containing 2 entries:
#0 0x7faf1993fc in gsignal+0xcc from /usr/lib64/libc.so.6+0x323fc
#1 0x7fb33ada8c from /opt/memprof/aarch64/libbytehound.so+0xf5a8c
this leads to
addr2line -e libbytehound.so 0xf5a8c
/root/.cargo/git/checkouts/not-perf-af1a46759dd83df9/51003a4/nwind/src/arch/aarch64_trampoline.s:22
aarch64_trampoline.s from this revision
revision 51003a4 is quite old IMO - but maybe you're aware of some bugs related to that part of code or give a hint how should I debug such issues?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.