Giter Site home page Giter Site logo

Comments (20)

MultiPath avatar MultiPath commented on June 28, 2024

It seems that you did not compile the code successfully. Could you paste your system, cuda version, pytorch version, etc?

from nsvf.

NNNNAI avatar NNNNAI commented on June 28, 2024

Thanks for your reply! The configuration are shown below:
Driver Version: 440.44 CUDA Version: 10.2 PyTorch 1.4.0 GPU : RTX2080TI 12G

NSVF) gdp@gdp:~/harddisk/Data4/lny/NSVF$ CUDA_VISIBLE_DEVICES=2 python -u train.py /home/gdp/harddisk/Data4/lny/NSVFDATASET/Synthetic_NSVF/Robot
--user-dir fairnr
--task single_object_rendering
--train-views "0..100" --view-resolution "800x800"
--max-sentences 1 --view-per-batch 4 --pixel-per-view 2048
--no-preload
--sampling-on-mask 1.0 --no-sampling-at-reader
--valid-views "100..200" --valid-view-resolution "400x400"
--valid-view-per-batch 1
--transparent-background "1.0,1.0,1.0" --background-stop-gradient
--arch nsvf_base
--initial-boundingbox /home/gdp/harddisk/Data4/lny/NSVFDATASET/Synthetic_NSVF/Robot/bbox.txt
--use-octree
--raymarching-stepsize-ratio 0.125
--discrete-regularization
--color-weight 128.0 --alpha-weight 1.0
--optimizer "adam" --adam-betas "(0.9, 0.999)"
--lr 0.001 --lr-scheduler "polynomial_decay" --total-num-update 150000
--criterion "srn_loss" --clip-norm 0.0
--num-workers 0
--seed 2
--save-interval-updates 500 --max-update 150000
--virtual-epoch-steps 5000 --save-interval 1
--half-voxel-size-at "5000,25000,75000"
--reduce-step-size-at "5000,25000,75000"
--pruning-every-steps 2500
--keep-interval-updates 5 --keep-last-epochs 5
--log-format simple --log-interval 1
--save-dir checkpoints/robot
--tensorboard-logdir checkpoints/robot/tensorboard
| tee -a checkpoints/robot/train.log

from nsvf.

MultiPath avatar MultiPath commented on June 28, 2024

I don't think PyTorch 1.4.0 is ever compiled with CUDA 10.2... The cuda version to compile this code should match with PyTorch cuda version. You can check it by python -c "import torch; print(torch.version.cuda)"

from nsvf.

NNNNAI avatar NNNNAI commented on June 28, 2024

Sorry for the mistake .The cuda version is excatly 10.1.

from nsvf.

MultiPath avatar MultiPath commented on June 28, 2024

Ok. just in case, did you run python setup.py build_ext --inplace and everything showed ok?

from nsvf.

NNNNAI avatar NNNNAI commented on June 28, 2024

(NSVF) gdp@gdp:~/harddisk/Data4/lny/NSVF$ python setup.py build_ext --inplace
running build_ext
copying build/lib.linux-x86_64-3.7/fairnr/clib/_ext.cpython-37m-x86_64-linux-gnu.so -> fairnr/clib

from nsvf.

MultiPath avatar MultiPath commented on June 28, 2024

To try recompling, i think you need to delete the build folder under NSVF

from nsvf.

NNNNAI avatar NNNNAI commented on June 28, 2024

I got such a report after deleting the build folder under NSVF and running python setup.py build_ext --inplace.

running build_ext
building 'fairnr.clib.ext' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/fairnr
creating build/temp.linux-x86_64-3.7/fairnr/clib
creating build/temp.linux-x86_64-3.7/fairnr/clib/src
gcc -pthread -B /home/gdp/.conda/envs/NSVF/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/TH -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/home/gdp/.conda/envs/NSVF/include/python3.7m -c fairnr/clib/src/binding.cpp -o build/temp.linux-x86_64-3.7/fairnr/clib/src/binding.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
gcc -pthread -B /home/gdp/.conda/envs/NSVF/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/TH -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/home/gdp/.conda/envs/NSVF/include/python3.7m -c fairnr/clib/src/intersect.cpp -o build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
gcc -pthread -B /home/gdp/.conda/envs/NSVF/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/TH -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/home/gdp/.conda/envs/NSVF/include/python3.7m -c fairnr/clib/src/octree.cpp -o build/temp.linux-x86_64-3.7/fairnr/clib/src/octree.o -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda-10.0/bin/nvcc -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/TH -I/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.0/include -I/home/gdp/.conda/envs/NSVF/include/python3.7m -c fairnr/clib/src/intersect_gpu.cu -o build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect_gpu.o -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -Ifairnr/clib/include -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++11
/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

/home/gdp/.conda/envs/NSVF/lib/python3.7/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/fairnr
creating build/lib.linux-x86_64-3.7/fairnr/clib
g++ -pthread -shared -B /home/gdp/.conda/envs/NSVF/compiler_compat -L/home/gdp/.conda/envs/NSVF/lib -Wl,-rpath=/home/gdp/.conda/envs/NSVF/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/fairnr/clib/src/binding.o build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect.o build/temp.linux-x86_64-3.7/fairnr/clib/src/octree.o build/temp.linux-x86_64-3.7/fairnr/clib/src/intersect_gpu.o -L/usr/local/cuda-10.0/lib64 -lcudart -o build/lib.linux-x86_64-3.7/fairnr/clib/_ext.cpython-37m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.7/fairnr/clib/_ext.cpython-37m-x86_64-linux-gnu.so -> fairnr/clib

from nsvf.

MultiPath avatar MultiPath commented on June 28, 2024

It looks ok. I have tried this data on my side just now and it did not show errors. Could you try reduce --view-per-batch to 1 to see if it is because of out of memory?

from nsvf.

NNNNAI avatar NNNNAI commented on June 28, 2024

I still got 'CUDA kernel failed : invalid device function' , even though I reduced --view-per-batch to 1 . It seems like not a OOM issue.

from nsvf.

MultiPath avatar MultiPath commented on June 28, 2024

Do you have the full log file?

from nsvf.

MultiPath avatar MultiPath commented on June 28, 2024

Also, could try remove "--use-octree" and see how the other method works?

from nsvf.

NNNNAI avatar NNNNAI commented on June 28, 2024

train.log
Here is the full log file. And I also remove "--use-octree" and run the code again. But I still got the same error report.
The error report (after removing "--use-octree") is shown below:
2020-10-21 13:52:27 | INFO | fairseq.utils | CUDA enviroments for all 1 workers
2020-10-21 13:52:27 | INFO | fairnr_cli.train | training on 1 GPUs
2020-10-21 13:52:27 | INFO | fairnr_cli.train | max tokens per GPU = None and max sentences per GPU = 1
2020-10-21 13:52:27 | INFO | fairseq.trainer | no existing checkpoint found checkpoints/robot/checkpoint_last.pt
2020-10-21 13:52:27 | INFO | fairseq.trainer | loading train data for epoch 1
2020-10-21 13:52:27 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16
CUDA kernel failed : invalid device function
void aabb_intersect_point_kernel_wrapper(int, int, int, float, int, const float*, const float*, const float*, int*, float*, float*) at L:371 in fairnr/clib/src/intersect_gpu.cu

from nsvf.

MultiPath avatar MultiPath commented on June 28, 2024

I searched your error a bit, it seems to be some cuda setting or GPU setting issues. Although I don't exactly know what caused this...
If you try nvcc --version, what it shows?

from nsvf.

NNNNAI avatar NNNNAI commented on June 28, 2024

It shows:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

from nsvf.

MultiPath avatar MultiPath commented on June 28, 2024

So your nvcc CUDA is 10.0 instead of 10.1?

from nsvf.

NNNNAI avatar NNNNAI commented on June 28, 2024

Yep, the nvcc --version indicate that the CUDA is 10.0.But when I ran [python -c "import torch; print(torch.version.cuda)"], it output 10.1.

from nsvf.

MultiPath avatar MultiPath commented on June 28, 2024

python -c "import torch; print(torch.version.cuda) this is your pytorch cuda version. It is compiled with cuda10.1. But your machine also needs to install cuda 10.1 to match the version of pytorch.

from nsvf.

NNNNAI avatar NNNNAI commented on June 28, 2024

Sorry for the late reply. The problem was solved after I installed cuda 10.1 in my machine. Thanks for your help~

from nsvf.

MultiPath avatar MultiPath commented on June 28, 2024

Glad to see it was solved!

from nsvf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.