Giter Site home page Giter Site logo

incprof's Introduction

IncProf

Incremental profiling based on gprof, with analysis tools

How to Use Libipr

Libipr is an "incremental profiler" which means it generates gprof profiles at a constant rate during application execution, rather than just one at the end. It does this by using the hidden function inside the gprof instrumentation that writes out the raw "gmon.out" profile data, and calls this function from a signal handler that is invoked from an interval timer signal (SIGVTALRM).

Several Python scripts help setup Libipr and post-process the data it generates. To use all of the capabilities, follow these steps

  1. Build libipr.so
  2. Compile your application with "-pg" so that gprof profiling instrumentation is generated
  3. Run "findwrgmon.py" with the full path to the system installed Gnu libc shared library file. It might be something like "/lib/x86_64-linux-gnu/libc-2.23.so"
  4. Use the offset from this as the value of the environment variable IPR_GMONOFFSET
  5. Set up other IPR environment variables as you wish, or not (use defaults)
  6. Set the environment variable LD_PRELOAD to point to your libipr.so file
  7. Run your application
  8. Use "gensvm.py" to postprocess the sampled profile files; it currently needs you to supply the regular expression for the list of file names you want to process, so do an "ls" of where your sample files and make your expression. Redirect the stdout output to a file for step 8
  9. Use "cluster.py" to run clustering on the output file from step 7. This script needs the Python sklearn package installed. This script will create two csv files (bestk cluster and elbow cluster)
  10. Use "gendata.py" similarly to "gensvm.py" in step 7, this script will find the functions call count
  11. Use "findmostused.py" to process the data file and get the most used functions with or without recording the function call count. This function take 0 or 1 flag as input; 0 time difference only / 1 time and count
  12. Use "findinstr.py" to process either the bestk cluster or elbow cluster. This script finds the best instrumention points in the application

Indivitual steps:

----------------

Sample Run Script for steps 3-6:

 #!/bin/sh
 # IPR_DATADIR -- directory for sample data files; default none (MUST EXIST!)
 export IPR_DATADIR=gdata

 # IPR_GMONOFFSET -- offset (hex or dec) from "moncontrol" symbol to write_gmon begin
 #export IPR_GMONOFFSET=-1360
 # Home NUC
 #export IPR_GMONOFFSET=-1328
 export IPR_GMONOFFSET=-1264

 # IPR_SECONDS -- seconds between profile sample writes (added to useconds)
 # IPR_USECONDS -- microseconds between profile sample writes (added to seconds)
 export IPR_SECONDS=1
 #export IPR_USECONDS=250000

 # IPR_APPNAME -- the application name ( the executable name )
 export IPR_APPNAME='testpr'

 # IPR_DEBUG -- 1 if want debug messages
 export IPR_DEBUG=1

 rm -f gmon* gprof-*.out gdata/g*.out ipr-err.out ipr.log cluster.*out* elb_distance.csv svmfmap.txt result.* gmon.* cluster.bestk cluster.elbowk
 export LD_PRELOAD=./libipr.so
 ./testpr 230 2> ipr-err.out
 #---------end-sample-run-script---------

Sample setps 7-11

export LD_PRELOAD=""
proc_num=$(ls gmon-* | head -n 1 | cut -d "." -f2)
python gensvm.py ./testpr "gmon-*.${proc_num}" > gmon.svm
python cluster.py gmon.svm svmfmap.txt flip > cluster.out
python gendata.py ./testpr "gmon-*.${proc_num}" > gmon.data
python findmostused.py gmon.data 1 > gmon.count.svm
python findinstr.py cluster.elbowk gmon.count.svm svmfmap.txt > result.elbowk

All-in-one script:

-----------------

Sample steps 7-11 all-in-one script

export LD_PRELOAD=""
./DiscoverInst.sh ./testpr

incprof's People

Contributors

joncooknmsu avatar altahat2003 avatar

Watchers

James Cloos avatar

incprof's Issues

Name gmon files with timestamp?

Instead (or in addition to) naming gmon data files with an interval #, perhaps include a timestamp that would include microseconds, perhaps in the form of wall clock time since start of execution. Then the total amount of profiled time in the data file could be compared to this, and the difference could be a surrogate measure of I/O (blocked) time.

Need a generic name mapping for SVM indices

In our clustering we've used ID10 as an SVN data index for function time and (ID10)+1 as an index for calls, we need some sort of generic index->name mapping. The naming map file should be already mapped to the correct SVM indices, and the clustering code should just use it as is. The names can then include annotations like "calls" or "time" on whatever name is there.

Extend tools to process data from multiple runs and multiple ranks

Reading in multiple sequences of gprof data files, from either different runs or different ranks (process copies) would be extremely useful. The resulting SVM file would just contain more rows for clustering. Maybe some backend tracking of which rows are from which sequences would be useful, but clustering wouldn't care.

Need exact data definitions for processing pipeline

We should separate at least three computations: 1) reducing profile data to feed into clustering; 2) clustering; 3) cluster/phase attribute identification (instrumentation site). So 1->2 and 2->3 need exact (and generic) data input/output definitions. There may also be a 1->3 data pipe.

1->2: We use libSVM format for the clustering data, plus an attribute->name map; this map should use exact attribute id's and not have an implied mapping (such as id*10); every attribute should have its own name. the map is JSON, both ids and names are strings.

2->3: We should output a canonical attribute weighting for each cluster (e.g., the centroid), the interval->cluster mapping, and perhaps the size (count) of each cluster.

1->3: output a (reduced) callgraph? with weighted edges/nodes? (numcalls) Such information can help step 3 select the most appropriate cluster attributes (functions) for instrumentation

Bug in whole-execution coverage calculation

In miniFE results, the function 'cg_solve' is being marked as an instrumentation point for two clusters, however in the total coverage calculation (not the phase coverage), in one cluster it reports 19.5% and in the other it reports 20.5%. Since it is the same function, the coverage must be the same, so there is a bug in the calculation.

Capture communication time in profile?

Can we somehow capture comm time? We could run under gprof and MPIP (or pMPI?) and capture comm statistics, but this is not per-interval. Could we assume that any non-CPU time in an interval is I/O time? E.g., if we are sampling at one second, and the profile only adds up to 0.6 seconds, then we have 0.4 seconds of I/O? We could do some experiments to validate this hypothesis...

Find real data point that is closest to cluster centroid

KMeans produces a centroid that is a true average of the cluster, meaning it is not actually a real interval data point. Thus, it could have non-zero value for a function that is actually zero in real data. I created a routine in cluster.py called findClosestRealDatapoint that is working but not fully integrated yet

Explore DBSCAN as an alternative clustering method

I've created code n cluster.py that invokes the DBSCAN clustering algorithm, but it needs some integration and some way to print out meaningful results. DBSCAN can find odd-shaped clusters (not sure we need that, though), and it has a tuning parameter that needs some method of selection.

Function indices in gprof output are not consistent across intervals

Investigating my data outputs, I find that function indices are generally the same but NOT ALWAYS. This is critical because I think we are using the function IDs reported by gprof to keep track of functions. It doesn't work. Rather, we need to create our own map and reuse it every time we process another gprof report file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.