Comments (15)
Sorry but we modified the official caffe for our project. So we rely on openmpi if you want to use multiple GPUs (we only use multi-gpu for testing, but not training).
We also highly recommend you install cudnn v5. After downloading and extracting it, you need to replace the /path/to/cudnn
in the cmake command with your own directory path. For example, if you copy the cudnn files to /usr/local/cuda
, then the cmake command should be
cmake .. -DUSE_MPI=ON -DCUDNN_INCLUDE=/usr/local/cuda/include -DCUDNN_LIBRARY=/usr/local/cuda/lib64/libcudnn.so
from person_search.
Thanks,But according to my environment which has only one server with 4 GPUs,can I use the openmpi?
from person_search.
Sure. You can change these two lines to
mpirun -n 4 python2 tools/eval_test.py \
--gpu 0,1,2,3 \
from person_search.
um,thanks.
"boost >= 1.55 (A tip for Ubuntu 14.04: sudo apt-get autoremove libboost1.54* then sudo apt-get install libboost1.55-all-dev)"
it must be >=1.55?
from person_search.
Yes. It should be >= 1.55.
from person_search.
xd@amax-1080:~/person_search-master$ experiments/scripts/eval_test.sh resnet50 50000 resnet50
[amax-1080:00334] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:00334] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:00334] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:00334] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:00334] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_crs_none: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.
Host: amax-1080
Framework: crs
Component: none
[amax-1080:00334] *** Process received signal ***
[amax-1080:00334] Signal: Segmentation fault (11)
[amax-1080:00334] Signal code: Address not mapped (1)
[amax-1080:00334] Failing at address: 0x28
[amax-1080:00334] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330) [0x7f65b2b0a330]
[amax-1080:00334] [ 1] /usr/lib/libmpi.so.1(mca_base_select+0x11e) [0x7f652bf16f1e]
[amax-1080:00334] [ 2] /usr/lib/libmpi.so.1(opal_crs_base_select+0x7e) [0x7f652beff28e]
[amax-1080:00334] [ 3] /usr/lib/libmpi.so.1(opal_cr_init+0x3fc) [0x7f652bf1ff1c]
[amax-1080:00334] [ 4] /usr/lib/libmpi.so.1(opal_init+0x1d0) [0x7f652bf28810]
[amax-1080:00334] [ 5] /usr/lib/libmpi.so.1(orte_init+0x37) [0x7f652beb86e7]
[amax-1080:00334] [ 6] /usr/lib/libmpi.so.1(ompi_mpi_init+0x174) [0x7f652be78024]
[amax-1080:00334] [ 7] /usr/lib/libmpi.so.1(PMPI_Init_thread+0xd4) [0x7f652be8f7f4]
[amax-1080:00334] [ 8] /usr/local/lib/python2.7/dist-packages/mpi4py/MPI.so(initMPI+0x4716) [0x7f652c27d0a6]
[amax-1080:00334] [ 9] python2(_PyImport_LoadDynamicModule+0x9b) [0x427992]
[amax-1080:00334] [10] python2() [0x55642f]
[amax-1080:00334] [11] python2() [0x4e2dec]
[amax-1080:00334] [12] python2() [0x556cf1]
[amax-1080:00334] [13] python2() [0x569c08]
[amax-1080:00334] [14] python2(PyEval_CallObjectWithKeywords+0x6b) [0x4c8c8b]
[amax-1080:00334] [15] python2(PyEval_EvalFrameEx+0x2958) [0x5264a8]
[amax-1080:00334] [16] python2() [0x567d14]
[amax-1080:00334] [17] python2(PyRun_FileExFlags+0x92) [0x465bf4]
[amax-1080:00334] [18] python2(PyRun_SimpleFileExFlags+0x2ee) [0x46612d]
[amax-1080:00334] [19] python2(Py_Main+0xb5e) [0x466d92]
[amax-1080:00334] [20] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f65b2756f45]
[amax-1080:00334] [21] python2() [0x577c2e]
[amax-1080:00334] *** End of error message ***
jxd@amax-1080:~/person_search-master$
I do not do the pretrain work,and directly use the trained model.
As you say,I can do the test without MPI,so I do not use the MPI with "use only one GPU, remove the mpirun -n 8 in L14 and change L16 to --gpu 0",but it show the error above.How can I solve it,thanks.
In addtion,when I use the MPI following what you advise,it also show the errors like this.
from person_search.
It seems that you have different versions of openmpi. Let's say if you compile openmpi and install it into a local directory like /home/jxd/openmpi
. Then please add the following lines in your ~/.bashrc
:
export PATH=/home/jxd/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/home/jxd/openmpi/lib:$LD_LIBRARY_PATH
Restart the terminal, rm -rf build
, and compile the caffe again.
from person_search.
Hello,I have successfully installed the openmpi,and test it that it can be used.Then I cmake the caffe successfully,but I still exist the questions above.So I try to do the training,it meets the same questions.
Thanks!
jxd@amax-1080:~/person_search-master$ experiments/scripts/train.sh 0 --set EXP_DIR resnet50
- set -e
- export PYTHONUNBUFFERED=True
- PYTHONUNBUFFERED=True
- GPU_ID=0
- NET=resnet50
- DATASET=psdb
- array=($@)
- len=4
- EXTRA_ARGS='--set EXP_DIR resnet50'
- EXTRA_ARGS_SLUG=--set_EXP_DIR_resnet50
- case $DATASET in
- TRAIN_IMDB=psdb_train
- TEST_IMDB=psdb_test
- PT_DIR=psdb
- ITERS=50000
++ date +%Y-%m-%d_%H-%M-%S - LOG=experiments/logs/psdb_train_resnet50_--set_EXP_DIR_resnet50.txt.2017-03-08_08-49-53
- exec
++ tee -a experiments/logs/psdb_train_resnet50_--set_EXP_DIR_resnet50.txt.2017-03-08_08-49-53 - echo Logging output to experiments/logs/psdb_train_resnet50_--set_EXP_DIR_resnet50.txt.2017-03-08_08-49-53
Logging output to experiments/logs/psdb_train_resnet50_--set_EXP_DIR_resnet50.txt.2017-03-08_08-49-53 - python2 tools/train_net.py --gpu 0 --solver models/psdb/resnet50/solver.prototxt --weights data/imagenet_models/resnet50.caffemodel --imdb psdb_train --iters 50000 --cfg experiments/cfgs/resnet50.yml --rand --set EXP_DIR resnet50
[amax-1080:22914] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:22914] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:22914] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:22914] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[amax-1080:22914] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_crs_none: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.
Host: amax-1080
Framework: crs
Component: none
[amax-1080:22914] *** Process received signal ***
[amax-1080:22914] Signal: Segmentation fault (11)
[amax-1080:22914] Signal code: Address not mapped (1)
[amax-1080:22914] Failing at address: 0x28
[amax-1080:22914] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330) [0x7f0a35507330]
[amax-1080:22914] [ 1] /usr/lib/libmpi.so.1(mca_base_select+0x11e) [0x7f09acb8bf1e]
[amax-1080:22914] [ 2] /usr/lib/libmpi.so.1(opal_crs_base_select+0x7e) [0x7f09acb7428e]
[amax-1080:22914] [ 3] /usr/lib/libmpi.so.1(opal_cr_init+0x3fc) [0x7f09acb94f1c]
[amax-1080:22914] [ 4] /usr/lib/libmpi.so.1(opal_init+0x1d0) [0x7f09acb9d810]
[amax-1080:22914] [ 5] /usr/lib/libmpi.so.1(orte_init+0x37) [0x7f09acb2d6e7]
[amax-1080:22914] [ 6] /usr/lib/libmpi.so.1(ompi_mpi_init+0x174) [0x7f09acaed024]
[amax-1080:22914] [ 7] /usr/lib/libmpi.so.1(PMPI_Init_thread+0xd4) [0x7f09acb047f4]
[amax-1080:22914] [ 8] /usr/local/lib/python2.7/dist-packages/mpi4py/MPI.so(initMPI+0x4716) [0x7f09acef20a6]
[amax-1080:22914] [ 9] python2(_PyImport_LoadDynamicModule+0x9b) [0x427992]
[amax-1080:22914] [10] python2() [0x55642f]
[amax-1080:22914] [11] python2() [0x4e2dec]
[amax-1080:22914] [12] python2() [0x556cf1]
[amax-1080:22914] [13] python2() [0x569c08]
[amax-1080:22914] [14] python2(PyEval_CallObjectWithKeywords+0x6b) [0x4c8c8b]
[amax-1080:22914] [15] python2(PyEval_EvalFrameEx+0x2958) [0x5264a8]
[amax-1080:22914] [16] python2() [0x567d14]
[amax-1080:22914] [17] python2(PyRun_FileExFlags+0x92) [0x465bf4]
[amax-1080:22914] [18] python2(PyRun_SimpleFileExFlags+0x2ee) [0x46612d]
[amax-1080:22914] [19] python2(Py_Main+0xb5e) [0x466d92]
[amax-1080:22914] [20] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f0a35153f45]
[amax-1080:22914] [21] python2() [0x577c2e]
[amax-1080:22914] *** End of error message ***
experiments/scripts/train.sh: line 47: 22914 Segmentation fault (core dumped) python2 tools/train_net.py --gpu ${GPU_ID} --solver models/${PT_DIR}/${NET}/solver.prototxt --weights data/imagenet_models/${NET}.caffemodel --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/${NET}.yml --rand ${EXTRA_ARGS}
from person_search.
Could you please check the output of the following commands:
which mpirun
ldd $(which mpirun) | grep mpi
ldd caffe/build/install/bin/caffe | grep mpi
from person_search.
yeah,maybe I do not cmake caffe successfully as there's no information about it ?
ldd: caffe/build/install/bin/caffe: No such file or directory
jxd@amax-1080:$ which mpirun$ ldd $(which mpirun) | grep mpi
/usr/local/openmpi/bin/mpirun
jxd@amax-1080:
libopen-rte.so.12 => /usr/local/openmpi/lib/libopen-rte.so.12 (0x00007f75c7edc000)
libopen-pal.so.13 => /usr/local/openmpi/lib/libopen-pal.so.13 (0x00007f75c7bfe000)
jxd@amax-1080:~$ ldd caffe/build/install/bin/caffe | grep mpi
ldd: caffe/build/install/bin/caffe: No such file or directory
from person_search.
OK. You have another self-compiled openmpi installed at /usr/local/openmpi
. So you need to add these lines to ~/.bashrc
:
export PATH=/usr/local/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
Restart the terminal, remove the build directory under caffe, and recompile it following the steps in the README file.
from person_search.
Yes,I have added these lines to ~/.bashrc,and recompile it yesterday.Are there two openmpi installed in the system?
Now I try to remove the build directory again and recompile it.Thanks
from person_search.
Right. In your previous log, it complaints
mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
So you have a system-installed openmpi at /usr/lib
, and a self-installed /usr/local/openmpi
.
from person_search.
thanks a lot!
I found the issue.I add the line to ~/.bashrc:
export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so
all detection:
recall = 79.37%
ap = 74.82%
labeled only detection:
recall = 97.76%
search ranking:
mAP = 75.41%
top- 1 = 78.48%
top- 5 = 90.07%
top-10 = 92.34%
from person_search.
Good to hear that! Will close the issue for now, and please feel free to reopen it if there are further problems.
from person_search.
Related Issues (20)
- target_blobs.size() == source_layer.blobs_size() (1 vs. 0) Incompatible number of blobs for layer feat
- Rewrite batch question
- demo.py error HOT 2
- Error in demo --gpu 0
- about train model download
- About the log file of oim loss
- about dataset
- Caffe Installation Issue on GPU GTX 1050 Ubuntu 18.04 HOT 2
- CUHK-SYSU Person Search Dataset HOT 1
- Can I use standard caffe for inference only? HOT 1
- If you have problems when compiling, please see here
- About the Datase
- About the Dataset HOT 2
- Please help !!! problems running the demo HOT 1
- Implementation bug about unlabeled_matching_layer?
- A good pytorch implementation is available now.
- cuda 8.0 and cudnn v5.1
- 折线图 HOT 1
- how to get cuhk02 03 and sysu
- CMake error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from person_search.