gvprof / gvprof Goto Github PK

View Code? Open in Web Editor NEW

42.0 5.0 9.0 235 KB

GVProf: A Value Profiler for GPU-based Clusters

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.50% Cuda 23.23% C++ 7.44% C 3.66% Python 51.71% Shell 12.47%

profiler gpu redundancy instrumentation cuda gpu-optimization machine-learning binary-analysis value-profiler data-flow

gvprof's People

Contributors

Stargazers

Watchers

Forkers

findhao zangcq zhaojp-frank dongxiao92 xaswq elio-yang lyrics-wangkl mistobaan gpu-profiler

gvprof's Issues

Support GCC > 9.0

OpenMP Target with Multiple CPU threads

hpcrun fail when export OMP_NUM_THREADS>1.

Test case: qmcpack.

Record executed functions and its module (cubin)

Use -ck HPCRUN_FUNCTION_RECORD=<on/off> to control if function recording is turned on or not. Knowing what functions are actually used during the execution, we can analyze only the cubins that contain these functions, reducing huge amount of analysis overhead.

[MemoryProfiler] Don't declare non built-in type variables in the header file

The problem appears for every new analysis header file

Unified Testing & Benchmark Framework

Use yaml for sample configurations

[MemoryProfiler] unordered map bug

include/analysis/memory_profile.h:118:6: error: ‘unordered_map’ in namespace ‘std’ does not name a template type
std::unordered_map<u64, u64> _addresses_map;
^~~~~~~~~~~~~
include/analysis/memory_profile.h:118:1: note: ‘std::unordered_map’ is defined in header ‘<unordered_map>’; did you forget to ‘#include <unordered_map>’?
include/analysis/memory_profile.h:1:1:

@Lin-Mao

dtoh copy in cudamalloc failed.
hpcprof has recursive struct nodes.
shared_ptr free incurs a problem.

Needs a minimal reproducer.

==> Installing mxm-3.6.3104-tddpbewqvkjglnydag44jh6s7b7orzms
==> No binary for mxm-3.6.3104-tddpbewqvkjglnydag44jh6s7b7orzms found: installing from source
==> No patches needed for mxm
==> mxm: Executing phase: 'install'
==> Error: InstallError: mxm is not installable, you need to specify it as an external package in packages.yaml

/home/yhao24/opt/gvprof_nov7/spack/var/spack/repos/builtin/packages/mxm/package.py:30, in install:
         29    def install(self, spec, prefix):
  >>     30        raise InstallError(
         31            self.spec.format(
         32                "{name} is not installable, you need to specify "
         33                "it as an external package in packages.yaml"

See build log for details:
  /tmp/yhao24/spack-stage/spack-stage-mxm-3.6.3104-tddpbewqvkjglnydag44jh6s7b7orzms/spack-build-out.txt

==> Warning: Skipping build of openmpi-4.1.4-r2dz4oekb6piettcebzfz5lxbmaxdgjk since mxm-3.6.3104-tddpbewqvkjglnydag44jh6s7b7orzms failed
==> Warning: Skipping build of boost-1.80.0-skyztn6i6bfwclmhqxmmrkusswxbixqr since openmpi-4.1.4-r2dz4oekb6piettcebzfz5lxbmaxdgjk failed
==> Warning: Skipping build of dyninst-master-gwhlcipxktc2st5y6elgj7nun5rfa4u3 since boost-1.80.0-skyztn6i6bfwclmhqxmmrkusswxbixqr failed

yolov4 missing calling context

TYPE: KERNEL
COUNT: 1
DUPLICATE:
CONTEXT:

darknet.c:498 test_detector [darknet]
detector.c:1669 network_predict_gpu [darknet]
network_kernels.cu:693 forward_network_gpu [darknet]
network_kernels.cu:87 upsample_gpu [darknet]
blas_kernels.cu:1427 _device_stub__Z15upsample_kernelmPfiiiiiifS(unsigned long, float*, int, int, int, int, int, int, float, float*) [darknet]
tmpxft_002e072b_00000000-6_blas_kernels.cudafe1.stub.c:1 cudaLaunchKernel
cuda_runtime.h:209 gpu_op_kernel

Between line 4 and line 5, there should be more details.