Comments (12)
Did you modify the code? For the first training iteration, it should be something like
I1113 15:51:24.800622 32170 solver.cpp:240] Iteration 0, loss = 6.22973
I1113 15:51:24.800657 32170 solver.cpp:255] Train net output #0: det_accuracy = 0.078125
I1113 15:51:24.800668 32170 solver.cpp:255] Train net output #1: det_loss = 0.706399 (* 1 = 0.706399 loss)
I1113 15:51:24.800671 32170 solver.cpp:255] Train net output #2: id_accuracy = 0
I1113 15:51:24.800676 32170 solver.cpp:255] Train net output #3: id_loss = 9.26615 (* 1 = 9.26615 loss)
I1113 15:51:24.800681 32170 solver.cpp:255] Train net output #4: loss_bbox = 1.04062e-05 (* 1 = 1.04062e-05 loss)
I1113 15:51:24.800685 32170 solver.cpp:255] Train net output #5: rpn_bbox_loss = 0.188907 (* 1 = 0.188907 loss)
I1113 15:51:24.800689 32170 solver.cpp:255] Train net output #6: rpn_cls_loss = 0.693245 (* 1 = 0.693245 loss)
I1113 15:51:24.800700 32170 solver.cpp:640] Iteration 0, lr = 0.001
from person_search.
I have not modify the code! Could I modify the code?
from person_search.
No. That won't be necessary. Directly running the training script should be fine. Could you please provide a full training log (by uploading to BaiduYun / GoogleDrive / Dropbox) for me to have further analysis?
Also could you please evaluate our trained model by following the instructions in the README, to see if it works properly?
from person_search.
Yes, I can evaluate by your trained model,and there is no error.
The train log is here:https://drive.google.com/file/d/0Bz7UoqmY26NkeWphcnZYckNKUU0/view?usp=sharing
from person_search.
That's quite weird. Could you please
- Remove this line of randomness
- Run the training script with specified random seed
experiments/scripts/train.sh 0 --set EXP_DIR resnet50 RNG_SEED 1
On my machine, this will lead to the same loss as follows for iteration 0
I0412 10:00:41.251739 29112 solver.cpp:240] Iteration 0, loss = 11.4016
I0412 10:00:41.251796 29112 solver.cpp:255] Train net output #0: det_accuracy = 0.804688
I0412 10:00:41.251809 29112 solver.cpp:255] Train net output #1: det_loss = 0.681872 (* 1 = 0.681872 loss)
I0412 10:00:41.251818 29112 solver.cpp:255] Train net output #2: id_accuracy = 0
I0412 10:00:41.251827 29112 solver.cpp:255] Train net output #3: id_loss = 9.40343 (* 1 = 9.40343 loss)
I0412 10:00:41.251835 29112 solver.cpp:255] Train net output #4: loss_bbox = 0.522466 (* 1 = 0.522466 loss)
I0412 10:00:41.251844 29112 solver.cpp:255] Train net output #5: rpn_bbox_loss = 0.123584 (* 1 = 0.123584 loss)
I0412 10:00:41.251876 29112 solver.cpp:255] Train net output #6: rpn_cls_loss = 0.693231 (* 1 = 0.693231 loss)
I0412 10:00:41.251895 29112 solver.cpp:640] Iteration 0, lr = 0.001
from person_search.
Sorry,when I first run the training script with no modify!There are one error!
experiments/scripts/train.sh
0 --set EXP_DIR resnet50`
Normalizing targets
done
Traceback (most recent call last):
File "tools/train_net.py", line 130, in <module>
max_iters=args.max_iters)
File "/home/cy/PycharmProjects/person_search-master/tools/../lib/fast_rcnn/train.py", line 121, in train_net
pretrained_model=pretrained_model)
File "/home/cy/PycharmProjects/person_search-master/tools/../lib/fast_rcnn/train.py", line 50, in __init__
pb2.text_format.Merge(f.read(), self.solver_param)
AttributeError: 'module' object has no attribute 'text_format'
And then I google solved by adding import google.protobuf.text_format in /lib/fast_rcnn/train.py!
and then got the nan_loss error!
Now I do as you say the step 1 and 2! also got the nan loss
I0412 11:58:24.734537 15281 solver.cpp:240] Iteration 0, loss = 45.384
I0412 11:58:24.734563 15281 solver.cpp:255] Train net output #0: det_accuracy = 0.03125
I0412 11:58:24.734571 15281 solver.cpp:255] Train net output #1: det_loss = 0.693147 (* 1 = 0.693147 loss)
I0412 11:58:24.734575 15281 solver.cpp:255] Train net output #2: id_accuracy = -nan
I0412 11:58:24.734578 15281 solver.cpp:255] Train net output #3: id_loss = 0 (* 1 = 0 loss)
I0412 11:58:24.734582 15281 solver.cpp:255] Train net output #4: loss_bbox = 0.0592934 (* 1 = 0.0592934 loss)
I0412 11:58:24.734586 15281 solver.cpp:255] Train net output #5: rpn_bbox_loss = 0.00123454 (* 1 = 0.00123454 loss)
I0412 11:58:24.734591 15281 solver.cpp:255] Train net output #6: rpn_cls_loss = 0.693147 (* 1 = 0.693147 loss)
I0412 11:58:24.734596 15281 solver.cpp:640] Iteration 0, lr = 0.001
I0412 11:58:48.375877 15281 solver.cpp:240] Iteration 20, loss = nan
I0412 11:58:48.376101 15281 solver.cpp:255] Train net output #0: det_accuracy = 0.929688
I0412 11:58:48.376142 15281 solver.cpp:255] Train net output #1: det_loss = 0.620077 (* 1 = 0.620077 loss)
I0412 11:58:48.376157 15281 solver.cpp:255] Train net output #2: id_accuracy = -nan
I0412 11:58:48.376165 15281 solver.cpp:255] Train net output #3: id_loss = 0 (* 1 = 0 loss)
I0412 11:58:48.376170 15281 solver.cpp:255] Train net output #4: loss_bbox = nan (* 1 = nan loss)
I0412 11:58:48.376176 15281 solver.cpp:255] Train net output #5: rpn_bbox_loss = 0.185238 (* 1 = 0.185238 loss)
I0412 11:58:48.376183 15281 solver.cpp:255] Train net output #6: rpn_cls_loss = 0.680902 (* 1 = 0.680902 loss)
I0412 11:58:48.376190 15281 solver.cpp:640] Iteration 20, lr = 0.001
from person_search.
@Cysu First ,very thanks for your perfect job.There is no issue,but I have a question, have you try YOLO9000 for pedestrain detection,YOLO v2 for object detection is more faster and precision than faster rcnn.At your current work have the detection accuracy influence the person_search‘s mAP.
from person_search.
Thank you very much for the suggestion. I really appreciate recent advances in object detection, e.g., YOLO v2, FPN, etc., and would like to give it a try if I have some time in the future. But currently I may not have enough spare time for it, and YOLO v2 seems to be implemented only in darknet, which is not that popular, compared with caffe / tf / pytorch.
By the way, do you still suffer from nan loss? If not, how did you solve it?
from person_search.
Now, there are tensorflow verson YOLO:https://github.com/thtrieu/darkflow
I still suffer from the nan loss,i think it's machine environment's error ,but i not sure.
from person_search.
Thank you very much for the link. I will check about it.
It's quite weird about the nan problem. Sorry but currently I have no idea about why it happens.
from person_search.
@andongchen @Cysu When training ,I got "id_accuracy = -nan", Is normal ?
from person_search.
@duanLH, id_accuracy = -nan is possible, because there are cases that the proposals do not contain any ground truth person, especially at the beginning stage of training.
from person_search.
Related Issues (20)
- How is the 128(BATCH_SIZE) RoIs sent to ID net
- target_blobs.size() == source_layer.blobs_size() (1 vs. 0) Incompatible number of blobs for layer feat
- Rewrite batch question
- demo.py error HOT 2
- Error in demo --gpu 0
- about train model download
- About the log file of oim loss
- about dataset
- Caffe Installation Issue on GPU GTX 1050 Ubuntu 18.04 HOT 2
- CUHK-SYSU Person Search Dataset HOT 1
- Can I use standard caffe for inference only? HOT 1
- If you have problems when compiling, please see here
- About the Datase
- About the Dataset HOT 2
- Please help !!! problems running the demo HOT 1
- Implementation bug about unlabeled_matching_layer?
- A good pytorch implementation is available now.
- cuda 8.0 and cudnn v5.1
- 折线图 HOT 1
- how to get cuhk02 03 and sysu
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from person_search.