Giter Site home page Giter Site logo

gvprof / gvprof Goto Github PK

View Code? Open in Web Editor NEW
42.0 5.0 9.0 235 KB

GVProf: A Value Profiler for GPU-based Clusters

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.50% Cuda 23.23% C++ 7.44% C 3.66% Python 51.71% Shell 12.47%
profiler gpu redundancy instrumentation cuda gpu-optimization machine-learning binary-analysis value-profiler data-flow

gvprof's People

Contributors

findhao avatar jokeren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gvprof's Issues

Record executed functions and its module (cubin)

Use -ck HPCRUN_FUNCTION_RECORD=<on/off> to control if function recording is turned on or not. Knowing what functions are actually used during the execution, we can analyze only the cubins that contain these functions, reducing huge amount of analysis overhead.

[MemoryProfiler] unordered map bug

include/analysis/memory_profile.h:118:6: error: ‘unordered_map’ in namespace ‘std’ does not name a template type
std::unordered_map<u64, u64> _addresses_map;
^~~~~~~~~~~~~
include/analysis/memory_profile.h:118:1: note: ‘std::unordered_map’ is defined in header ‘<unordered_map>’; did you forget to ‘#include <unordered_map>’?
include/analysis/memory_profile.h:1:1:

@Lin-Mao

Reduce GPU memory usage

Add an option to by default serialize streams in a context. In this case, all streams share a single device side buffer. This feature benefits monitoring tensorflow, in which a lot of CPU threads are launching kernels.

GVProf shows nothing after execution

Hi dear authors,
I'm using GVProf nowadays, but after executing $gvprof -e redundancy ./vectorAdd, it shows these messages, but no profiling information. Where is the profiling information?

image

thank you
best regards
William

Multi-threading problems

  1. dtoh copy in cudamalloc failed.
  2. hpcprof has recursive struct nodes.
  3. shared_ptr free incurs a problem.

Needs a minimal reproducer.

Installation error with latest spack

==> Installing mxm-3.6.3104-tddpbewqvkjglnydag44jh6s7b7orzms
==> No binary for mxm-3.6.3104-tddpbewqvkjglnydag44jh6s7b7orzms found: installing from source
==> No patches needed for mxm
==> mxm: Executing phase: 'install'
==> Error: InstallError: mxm is not installable, you need to specify it as an external package in packages.yaml

/home/yhao24/opt/gvprof_nov7/spack/var/spack/repos/builtin/packages/mxm/package.py:30, in install:
         29    def install(self, spec, prefix):
  >>     30        raise InstallError(
         31            self.spec.format(
         32                "{name} is not installable, you need to specify "
         33                "it as an external package in packages.yaml"

See build log for details:
  /tmp/yhao24/spack-stage/spack-stage-mxm-3.6.3104-tddpbewqvkjglnydag44jh6s7b7orzms/spack-build-out.txt

==> Warning: Skipping build of openmpi-4.1.4-r2dz4oekb6piettcebzfz5lxbmaxdgjk since mxm-3.6.3104-tddpbewqvkjglnydag44jh6s7b7orzms failed
==> Warning: Skipping build of boost-1.80.0-skyztn6i6bfwclmhqxmmrkusswxbixqr since openmpi-4.1.4-r2dz4oekb6piettcebzfz5lxbmaxdgjk failed
==> Warning: Skipping build of dyninst-master-gwhlcipxktc2st5y6elgj7nun5rfa4u3 since boost-1.80.0-skyztn6i6bfwclmhqxmmrkusswxbixqr failed

yolov4 missing calling context

TYPE: KERNEL
COUNT: 1
DUPLICATE:
CONTEXT:

  1. darknet.c:498 test_detector [darknet]
  2. detector.c:1669 network_predict_gpu [darknet]
  3. network_kernels.cu:693 forward_network_gpu [darknet]
  4. network_kernels.cu:87 upsample_gpu [darknet]
  5. blas_kernels.cu:1427 _device_stub__Z15upsample_kernelmPfiiiiiifS(unsigned long, float*, int, int, int, int, int, int, float, float*) [darknet]
  6. tmpxft_002e072b_00000000-6_blas_kernels.cudafe1.stub.c:1 cudaLaunchKernel
  7. cuda_runtime.h:209 gpu_op_kernel

Between line 4 and line 5, there should be more details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.