daerduocarey / pytorchemd Goto Github PK
View Code? Open in Web Editor NEWPyTorch Wrapper for Earth-Mover-Distance (EMD) for 3D point cloud regression
PyTorch Wrapper for Earth-Mover-Distance (EMD) for 3D point cloud regression
Hi! Thank you for your great work!
I have some questions and need your help.
Could you please give some advice?
Thanks!
Is there any way to install this with CUDA 11.8 and PyTorch 2.2? I've followed the tips in #6 and was able to install but when running I get the following error:
match = emd_cuda.approxmatch_forward(xyz1, xyz2)
RuntimeError: Unknown layout
match = emd_cuda.approxmatch_forward(xyz1, xyz2)
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)
This is the error I am getting after input :
EMD = earth_mover_distance().cuda()
np1 = np.asarray(TSS.points)
np2 = np.asarray(RSS.points)
T1 = torch.from_numpy(np1).type(torch.float32).cuda()
T2 = torch.from_numpy(np2).type(torch.float32).cuda()
l1 = EMD(T2, T2)
hi,
Thanks for your excellent work!
However, when the number of points increased, I encountered a problem:
'''
p1 = torch.rand(1,50000, 3).cuda()
p2 = torch.rand(1,50000, 3).cuda()
d = earth_mover_distance(p1, p2, transpose=False)
print(d)
'''
RuntimeError: CUDA error: an illegal memory access was encountered
Can you provide any suggestions?
Hey, I'm using a mac, as this repository is dependent on cuda, I'm unable to setup the project. It would really help if someone knows of any alternative for mac. Thanks.
heelo, @daerduoCarey
I read and test the code of EMD distance, it seems that the range value of this distance is summed by n*m
, so if the n and m is too large(such as >=10000), then EMD distance is also very large.
should we normalize it divided by n*m
if used EMD as a loss? thanks!
Hello, I am interested in this code but I have CUDA 10.1 version and PyTorch 1.5. Is there a way to modify the setup accordingly ?
Thank you in advance
How can I get the matching matrix from the python code ?
If you are looking for EMD of dense point clouds (with over 10,000 points) and large batch size, please check our implementation: https://github.com/Colin97/MSN-Point-Cloud-Completion
THCudaCheck FAIL file=/home/user/hdd/github/PyTorchEMD/cuda/emd_kernel.cu line=190 error=98 : invalid device function Traceback (most recent call last): File "test_emd_loss.py", line 35, in <module> d = earth_mover_distance(p1, p2, transpose=False) File "/home/user/hdd/github/PyTorchEMD/emd.py", line 44, in earth_mover_distance cost = EarthMoverDistanceFunction.apply(xyz1, xyz2) File "/home/user/hdd/github/PyTorchEMD/emd.py", line 11, in forward match = emd_cuda.approxmatch_forward(xyz1, xyz2) RuntimeError: cuda runtime error (98) : invalid device function at /home/user/hdd/github/PyTorchEMD/cuda/emd_kernel.cu:190 Segmentation fault (core dumped)
I met this error when execute test_emd_loss.py
after successfully compiling. The environment is Pytorch 1.7, CUDA 10.2, Ubuntu 18.04
Google for a while, I realize that the cuda versions are inconsistent between my conda env(cudatoolkit=10.2) and local CUDA (/usr/local/cuda-10.0).
After reinstalling CUDA 10.2 for the local machine, it works.
Just record it for anything helpful.
Thank you for your nice code! However, I met an installation problem, so that I cannot use your code. Could you help me to check how to fix this problem?
I use Ubuntu16.04, Pytorch 1.1, GCC 5.5, CUDA 9.0. Should I change GCC version?
Here's the installation log:
running install
running bdist_egg
running egg_info
writing emd_ext.egg-info/PKG-INFO
writing dependency_links to emd_ext.egg-info/dependency_links.txt
writing top-level names to emd_ext.egg-info/top_level.txt
reading manifest file 'emd_ext.egg-info/SOURCES.txt'
writing manifest file 'emd_ext.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'emd_cuda' extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/cuda
gcc -pthread -B /home/pm/anaconda3/envs/PyTorchEMD/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include -I/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include/TH -I/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-9.0/include -I/home/pm/anaconda3/envs/PyTorchEMD/include/python3.6m -c cuda/emd.cpp -o build/temp.linux-x86_64-3.6/cuda/emd.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=emd_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda-9.0/bin/nvcc -I/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include -I/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include/TH -I/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-9.0/include -I/home/pm/anaconda3/envs/PyTorchEMD/include/python3.6m -c cuda/emd_kernel.cu -o build/temp.linux-x86_64-3.6/cuda/emd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=emd_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9220): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9231): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9244): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9255): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9268): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9279): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9292): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9303): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9316): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9327): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9340): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9352): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9365): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9376): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9389): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9401): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9410): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9419): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9428): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9437): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9445): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9454): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9463): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9472): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9481): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9490): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9499): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9508): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9517): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9526): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9535): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9544): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(55): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(63): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(73): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(81): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(91): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(100): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(109): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(117): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(127): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(136): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(145): error: argument of type "void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(153): error: argument of type "void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10799): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10811): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10823): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10835): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10847): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10859): error: argument of type "const void *" is incompatible with parameter of type "const float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10871): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10883): error: argument of type "const void *" is incompatible with parameter of type "const double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10895): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10907): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10919): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10931): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10943): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10955): error: argument of type "const void *" is incompatible with parameter of type "const int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10967): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10979): error: argument of type "const void *" is incompatible with parameter of type "const long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(10989): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11000): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11009): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11020): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11029): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11040): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11049): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11060): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11069): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11080): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11089): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11100): error: argument of type "void *" is incompatible with parameter of type "float *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11109): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11120): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11129): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11140): error: argument of type "void *" is incompatible with parameter of type "double *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11149): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11160): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11169): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11180): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11189): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11200): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11209): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11220): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11229): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11240): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11249): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11260): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11269): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11280): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11289): error: argument of type "void *" is incompatible with parameter of type "long long *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11300): error: argument of type "void *" is incompatible with parameter of type "long long *"
/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include/ATen/cuda/NumericLimits.cuh(83): warning: calling a constexpr host function("from_bits") from a host device function("lowest") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include/ATen/cuda/NumericLimits.cuh(84): warning: calling a constexpr host function("from_bits") from a host device function("max") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include/ATen/cuda/NumericLimits.cuh(85): warning: calling a constexpr host function("from_bits") from a host device function("lower_bound") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
/home/pm/anaconda3/envs/PyTorchEMD/lib/python3.6/site-packages/torch/include/ATen/cuda/NumericLimits.cuh(86): warning: calling a constexpr host function("from_bits") from a host device function("upper_bound") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
92 errors detected in the compilation of "/tmp/tmpxft_0000464e_00000000-6_emd_kernel.cpp1.ii".
error: command '/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1
Hi,
Similar to some of the other issues posted, I'm getting a very large EMD. I divided by the number of points but ended up with an EMD of around 26 for a Chamfer Distance of around .1. I'm working with n = 22000 points.
In addition, if I use EMD as my loss and backpropogate, the loss ends up increasing, whereas it went down with Chamfer Distance. Any advice?
Thanks!
Hi, I have met a problem "RuntimeError: CUDA error: an illegal memory access was encountered" when specified a gpu device other than device 0. For example, if there are more than one gpu device, I want to specify gpu 2 to train a model.
Sorry to disturb, I failed when install.
These are error messages, could you please help me find where is wrong?
Ubuntu 16.04 Pytorch1.6.1 CUDA9.0
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.10.1
g++ -pthread -shared -B /home/gy/anaconda3/envs/pvc/compiler_compat -L/home/gy/anaconda3/envs/pvc/lib -Wl,-rpath=/home/gy/anaconda3/envs/pvc/lib -Wl,--no-as-needed -Wl,--sysroot=/ /home/gy/pvc0922/PyTorchEMD/build/temp.linux-x86_64-3.8/cuda/emd.o /home/gy/pvc0922/PyTorchEMD/build/temp.linux-x86_64-3.8/cuda/emd_kernel.o -L/home/gy/anaconda3/envs/pvc/lib/python3.8/site-packages/torch/lib -L//usr/local/cuda-9.0/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.8/emd_cuda.cpython-38-x86_64-linux-gnu.so
g++: error: /home/gy/pvc0922/PyTorchEMD/build/temp.linux-x86_64-3.8/cuda/emd.o: 没有那个文件或目录
g++: error: /home/gy/pvc0922/PyTorchEMD/build/temp.linux-x86_64-3.8/cuda/emd_kernel.o: 没有那个文件或目录
error: command 'g++' failed with exit status 1
Hi,
I wanted to run the EMD, however, the value is extremely large. FOr example, chamfer distance is about 0.07, however, EMD generated from this code is 60-ish. I wonder is there any normalization that we should take care of?
Thanks!
Your implementation failed grad check.
import torch as T
from torch.autograd import gradcheck
x = T.rand(2, 4, 3).cuda().double().requires_grad_(True)
y = T.rand(2, 5, 3).cuda().double()
print(gradcheck(earth_mover_distance, (x, y)))
One bug is perhaps here and here. Probably you want to cast a value to scalar_t
before the division.
I am not familiar with CUDA so couldn't get any further. You have any idea how to solve this?
PyTorchEMD/cuda/emd_kernel.cu(181): error: identifier "AT_CHECK" is undefined
in running setup
(base) server1@server1:/data/PythonCodes/XiaoYuan/PyTorchEMD$ python setup.py install
running install
running bdist_egg
running egg_info
creating emd_ext.egg-info
writing emd_ext.egg-info/PKG-INFO
writing dependency_links to emd_ext.egg-info/dependency_links.txt
writing top-level names to emd_ext.egg-info/top_level.txt
writing manifest file 'emd_ext.egg-info/SOURCES.txt'
reading manifest file 'emd_ext.egg-info/SOURCES.txt'
writing manifest file 'emd_ext.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py:782: UserWarning: The detected CUDA version (11.1) has a minor version mismatch with the version that was used to compile PyTorch (11.3). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'emd_cuda' extension
creating /data/PythonCodes/XiaoYuan/PyTorchEMD/build
creating /data/PythonCodes/XiaoYuan/PyTorchEMD/build/temp.linux-x86_64-3.7
creating /data/PythonCodes/XiaoYuan/PyTorchEMD/build/temp.linux-x86_64-3.7/cuda
Emitting ninja build file /data/PythonCodes/XiaoYuan/PyTorchEMD/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /home/server1/home/cuda11.1/bin/nvcc -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include/TH -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include/THC -I/home/server1/home/cuda11.1/include -I/home/server1/anaconda3/include/python3.7m -c -c /data/PythonCodes/XiaoYuan/PyTorchEMD/cuda/emd_kernel.cu -o /data/PythonCodes/XiaoYuan/PyTorchEMD/build/temp.linux-x86_64-3.7/cuda/emd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=emd_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /data/PythonCodes/XiaoYuan/PyTorchEMD/build/temp.linux-x86_64-3.7/cuda/emd_kernel.o
/home/server1/home/cuda11.1/bin/nvcc -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include/TH -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include/THC -I/home/server1/home/cuda11.1/include -I/home/server1/anaconda3/include/python3.7m -c -c /data/PythonCodes/XiaoYuan/PyTorchEMD/cuda/emd_kernel.cu -o /data/PythonCodes/XiaoYuan/PyTorchEMD/build/temp.linux-x86_64-3.7/cuda/emd_kernel.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=emd_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/data/PythonCodes/XiaoYuan/PyTorchEMD/cuda/emd_kernel.cu(181): error: identifier "AT_CHECK" is undefined
/data/PythonCodes/XiaoYuan/PyTorchEMD/cuda/emd_kernel.cu(268): error: identifier "AT_CHECK" is undefined
/data/PythonCodes/XiaoYuan/PyTorchEMD/cuda/emd_kernel.cu(385): error: identifier "AT_CHECK" is undefined
3 errors detected in the compilation of "/data/PythonCodes/XiaoYuan/PyTorchEMD/cuda/emd_kernel.cu".
[2/2] c++ -MMD -MF /data/PythonCodes/XiaoYuan/PyTorchEMD/build/temp.linux-x86_64-3.7/cuda/emd.o.d -pthread -B /home/server1/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include/TH -I/home/server1/anaconda3/lib/python3.7/site-packages/torch/include/THC -I/home/server1/home/cuda11.1/include -I/home/server1/anaconda3/include/python3.7m -c -c /data/PythonCodes/XiaoYuan/PyTorchEMD/cuda/emd.cpp -o /data/PythonCodes/XiaoYuan/PyTorchEMD/build/temp.linux-x86_64-3.7/cuda/emd.o -g -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=emd_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build
env=env)
File "/home/server1/anaconda3/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "setup.py", line 26, in
'build_ext': BuildExtension
File "/home/server1/anaconda3/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/server1/anaconda3/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/server1/anaconda3/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/server1/anaconda3/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/server1/anaconda3/lib/python3.7/site-packages/setuptools/command/install.py", line 67, in run
self.do_egg_install()
File "/home/server1/anaconda3/lib/python3.7/site-packages/setuptools/command/install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
File "/home/server1/anaconda3/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/server1/anaconda3/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/server1/anaconda3/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/home/server1/anaconda3/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/home/server1/anaconda3/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/server1/anaconda3/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/server1/anaconda3/lib/python3.7/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/home/server1/anaconda3/lib/python3.7/distutils/command/install_lib.py", line 107, in build
self.run_command('build_ext')
File "/home/server1/anaconda3/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/server1/anaconda3/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/server1/anaconda3/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/home/server1/anaconda3/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/home/server1/anaconda3/lib/python3.7/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
build_ext.build_extensions(self)
File "/home/server1/anaconda3/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/home/server1/anaconda3/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/home/server1/anaconda3/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/home/server1/anaconda3/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/home/server1/anaconda3/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
depends=ext.depends)
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 565, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1404, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/home/server1/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
Is it hard to convert cuda into metal so that Apple M1 can also use it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.