nmsu-please-lab / incprof Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 269 KB

Incremental profiling based on gprof, with analysis tools

License: Other

Makefile 0.52% Python 73.10% C 24.21% Shell 2.17%

incprof's Introduction

IncProf

Incremental profiling based on gprof, with analysis tools

How to Use Libipr

Libipr is an "incremental profiler" which means it generates gprof profiles at a constant rate during application execution, rather than just one at the end. It does this by using the hidden function inside the gprof instrumentation that writes out the raw "gmon.out" profile data, and calls this function from a signal handler that is invoked from an interval timer signal (SIGVTALRM).

Several Python scripts help setup Libipr and post-process the data it generates. To use all of the capabilities, follow these steps

Build libipr.so
Compile your application with "-pg" so that gprof profiling instrumentation is generated
Run "findwrgmon.py" with the full path to the system installed Gnu libc shared library file. It might be something like "/lib/x86_64-linux-gnu/libc-2.23.so"
Use the offset from this as the value of the environment variable IPR_GMONOFFSET
Set up other IPR environment variables as you wish, or not (use defaults)
Set the environment variable LD_PRELOAD to point to your libipr.so file
Run your application
Use "gensvm.py" to postprocess the sampled profile files; it currently needs you to supply the regular expression for the list of file names you want to process, so do an "ls" of where your sample files and make your expression. Redirect the stdout output to a file for step 8
Use "cluster.py" to run clustering on the output file from step 7. This script needs the Python sklearn package installed. This script will create two csv files (bestk cluster and elbow cluster)
Use "gendata.py" similarly to "gensvm.py" in step 7, this script will find the functions call count
Use "findmostused.py" to process the data file and get the most used functions with or without recording the function call count. This function take 0 or 1 flag as input; 0 time difference only / 1 time and count
Use "findinstr.py" to process either the bestk cluster or elbow cluster. This script finds the best instrumention points in the application

Indivitual steps:

----------------

Sample Run Script for steps 3-6:

 #!/bin/sh
 # IPR_DATADIR -- directory for sample data files; default none (MUST EXIST!)
 export IPR_DATADIR=gdata

 # IPR_GMONOFFSET -- offset (hex or dec) from "moncontrol" symbol to write_gmon begin
 #export IPR_GMONOFFSET=-1360
 # Home NUC
 #export IPR_GMONOFFSET=-1328
 export IPR_GMONOFFSET=-1264

 # IPR_SECONDS -- seconds between profile sample writes (added to useconds)
 # IPR_USECONDS -- microseconds between profile sample writes (added to seconds)
 export IPR_SECONDS=1
 #export IPR_USECONDS=250000

 # IPR_APPNAME -- the application name ( the executable name )
 export IPR_APPNAME='testpr'

 # IPR_DEBUG -- 1 if want debug messages
 export IPR_DEBUG=1

 rm -f gmon* gprof-*.out gdata/g*.out ipr-err.out ipr.log cluster.*out* elb_distance.csv svmfmap.txt result.* gmon.* cluster.bestk cluster.elbowk
 export LD_PRELOAD=./libipr.so
 ./testpr 230 2> ipr-err.out
 #---------end-sample-run-script---------

Sample setps 7-11

export LD_PRELOAD=""
proc_num=$(ls gmon-* | head -n 1 | cut -d "." -f2)
python gensvm.py ./testpr "gmon-*.${proc_num}" > gmon.svm
python cluster.py gmon.svm svmfmap.txt flip > cluster.out
python gendata.py ./testpr "gmon-*.${proc_num}" > gmon.data
python findmostused.py gmon.data 1 > gmon.count.svm
python findinstr.py cluster.elbowk gmon.count.svm svmfmap.txt > result.elbowk

All-in-one script:

-----------------

Sample steps 7-11 all-in-one script

export LD_PRELOAD=""
./DiscoverInst.sh ./testpr

incprof's People

Contributors

Watchers

incprof's Issues

Name gmon files with timestamp?

Instead (or in addition to) naming gmon data files with an interval #, perhaps include a timestamp that would include microseconds, perhaps in the form of wall clock time since start of execution. Then the total amount of profiled time in the data file could be compared to this, and the difference could be a surrogate measure of I/O (blocked) time.

Need to solve selecting same function for different phases

Need a generic name mapping for SVM indices

In our clustering we've used ID10 as an SVN data index for function time and (ID10)+1 as an index for calls, we need some sort of generic index->name mapping. The naming map file should be already mapped to the correct SVM indices, and the clustering code should just use it as is. The names can then include annotations like "calls" or "time" on whatever name is there.

Extend tools to process data from multiple runs and multiple ranks

Reading in multiple sequences of gprof data files, from either different runs or different ranks (process copies) would be extremely useful. The resulting SVM file would just contain more rows for clustering. Maybe some backend tracking of which rows are from which sequences would be useful, but clustering wouldn't care.

Need exact data definitions for processing pipeline

We should separate at least three computations: 1) reducing profile data to feed into clustering; 2) clustering; 3) cluster/phase attribute identification (instrumentation site). So 1->2 and 2->3 need exact (and generic) data input/output definitions. There may also be a 1->3 data pipe.

1->2: We use libSVM format for the clustering data, plus an attribute->name map; this map should use exact attribute id's and not have an implied mapping (such as id*10); every attribute should have its own name. the map is JSON, both ids and names are strings.

2->3: We should output a canonical attribute weighting for each cluster (e.g., the centroid), the interval->cluster mapping, and perhaps the size (count) of each cluster.

1->3: output a (reduced) callgraph? with weighted edges/nodes? (numcalls) Such information can help step 3 select the most appropriate cluster attributes (functions) for instrumentation

Bug in whole-execution coverage calculation

In miniFE results, the function 'cg_solve' is being marked as an instrumentation point for two clusters, however in the total coverage calculation (not the phase coverage), in one cluster it reports 19.5% and in the other it reports 20.5%. Since it is the same function, the coverage must be the same, so there is a bug in the calculation.

Capture communication time in profile?

Can we somehow capture comm time? We could run under gprof and MPIP (or pMPI?) and capture comm statistics, but this is not per-interval. Could we assume that any non-CPU time in an interval is I/O time? E.g., if we are sampling at one second, and the profile only adds up to 0.6 seconds, then we have 0.4 seconds of I/O? We could do some experiments to validate this hypothesis...

Find real data point that is closest to cluster centroid

KMeans produces a centroid that is a true average of the cluster, meaning it is not actually a real interval data point. Thus, it could have non-zero value for a function that is actually zero in real data. I created a routine in cluster.py called findClosestRealDatapoint that is working but not fully integrated yet

Explore DBSCAN as an alternative clustering method

I've created code n cluster.py that invokes the DBSCAN clustering algorithm, but it needs some integration and some way to print out meaningful results. DBSCAN can find odd-shaped clusters (not sure we need that, though), and it has a tuning parameter that needs some method of selection.

Ignore Interval 0 profile data file

We generate an initial profile data file with zero time in it; fix gencallgraph.py to ignore this file.

Need a way to avoid very high-count instrumentation sites

Or we need to reintroduce a rate factor

Function indices in gprof output are not consistent across intervals

Investigating my data outputs, I find that function indices are generally the same but NOT ALWAYS. This is critical because I think we are using the function IDs reported by gprof to keep track of functions. It doesn't work. Rather, we need to create our own map and reuse it every time we process another gprof report file.

Reorganize repo near of paper submission

Move Mohammad's stuff into a better-named place and move top-level stuff into "alternate" or something like that.