Giter Site home page Giter Site logo

Comments (12)

Cysu avatar Cysu commented on June 5, 2024

Did you modify the code? For the first training iteration, it should be something like

I1113 15:51:24.800622 32170 solver.cpp:240] Iteration 0, loss = 6.22973
I1113 15:51:24.800657 32170 solver.cpp:255]     Train net output #0: det_accuracy = 0.078125
I1113 15:51:24.800668 32170 solver.cpp:255]     Train net output #1: det_loss = 0.706399 (* 1 = 0.706399 loss)
I1113 15:51:24.800671 32170 solver.cpp:255]     Train net output #2: id_accuracy = 0
I1113 15:51:24.800676 32170 solver.cpp:255]     Train net output #3: id_loss = 9.26615 (* 1 = 9.26615 loss)
I1113 15:51:24.800681 32170 solver.cpp:255]     Train net output #4: loss_bbox = 1.04062e-05 (* 1 = 1.04062e-05 loss)
I1113 15:51:24.800685 32170 solver.cpp:255]     Train net output #5: rpn_bbox_loss = 0.188907 (* 1 = 0.188907 loss)
I1113 15:51:24.800689 32170 solver.cpp:255]     Train net output #6: rpn_cls_loss = 0.693245 (* 1 = 0.693245 loss)
I1113 15:51:24.800700 32170 solver.cpp:640] Iteration 0, lr = 0.001

from person_search.

andongchen avatar andongchen commented on June 5, 2024

I have not modify the code! Could I modify the code?

from person_search.

Cysu avatar Cysu commented on June 5, 2024

No. That won't be necessary. Directly running the training script should be fine. Could you please provide a full training log (by uploading to BaiduYun / GoogleDrive / Dropbox) for me to have further analysis?

Also could you please evaluate our trained model by following the instructions in the README, to see if it works properly?

from person_search.

andongchen avatar andongchen commented on June 5, 2024

Yes, I can evaluate by your trained model,and there is no error.
The train log is here:https://drive.google.com/file/d/0Bz7UoqmY26NkeWphcnZYckNKUU0/view?usp=sharing

from person_search.

Cysu avatar Cysu commented on June 5, 2024

That's quite weird. Could you please

  1. Remove this line of randomness
  2. Run the training script with specified random seed
experiments/scripts/train.sh 0 --set EXP_DIR resnet50 RNG_SEED 1

On my machine, this will lead to the same loss as follows for iteration 0

I0412 10:00:41.251739 29112 solver.cpp:240] Iteration 0, loss = 11.4016
I0412 10:00:41.251796 29112 solver.cpp:255]     Train net output #0: det_accuracy = 0.804688
I0412 10:00:41.251809 29112 solver.cpp:255]     Train net output #1: det_loss = 0.681872 (* 1 = 0.681872 loss)
I0412 10:00:41.251818 29112 solver.cpp:255]     Train net output #2: id_accuracy = 0
I0412 10:00:41.251827 29112 solver.cpp:255]     Train net output #3: id_loss = 9.40343 (* 1 = 9.40343 loss)
I0412 10:00:41.251835 29112 solver.cpp:255]     Train net output #4: loss_bbox = 0.522466 (* 1 = 0.522466 loss)
I0412 10:00:41.251844 29112 solver.cpp:255]     Train net output #5: rpn_bbox_loss = 0.123584 (* 1 = 0.123584 loss)
I0412 10:00:41.251876 29112 solver.cpp:255]     Train net output #6: rpn_cls_loss = 0.693231 (* 1 = 0.693231 loss)
I0412 10:00:41.251895 29112 solver.cpp:640] Iteration 0, lr = 0.001

from person_search.

andongchen avatar andongchen commented on June 5, 2024

Sorry,when I first run the training script with no modify!There are one error!
experiments/scripts/train.sh 0 --set EXP_DIR resnet50`

Normalizing targets
done
Traceback (most recent call last):
  File "tools/train_net.py", line 130, in <module>
    max_iters=args.max_iters)
  File "/home/cy/PycharmProjects/person_search-master/tools/../lib/fast_rcnn/train.py", line 121, in train_net
    pretrained_model=pretrained_model)
  File "/home/cy/PycharmProjects/person_search-master/tools/../lib/fast_rcnn/train.py", line 50, in __init__
    pb2.text_format.Merge(f.read(), self.solver_param)
AttributeError: 'module' object has no attribute 'text_format'

And then I google solved by adding import google.protobuf.text_format in /lib/fast_rcnn/train.py!
and then got the nan_loss error!

Now I do as you say the step 1 and 2! also got the nan loss

I0412 11:58:24.734537 15281 solver.cpp:240] Iteration 0, loss = 45.384
I0412 11:58:24.734563 15281 solver.cpp:255]     Train net output #0: det_accuracy = 0.03125
I0412 11:58:24.734571 15281 solver.cpp:255]     Train net output #1: det_loss = 0.693147 (* 1 = 0.693147 loss)
I0412 11:58:24.734575 15281 solver.cpp:255]     Train net output #2: id_accuracy = -nan
I0412 11:58:24.734578 15281 solver.cpp:255]     Train net output #3: id_loss = 0 (* 1 = 0 loss)
I0412 11:58:24.734582 15281 solver.cpp:255]     Train net output #4: loss_bbox = 0.0592934 (* 1 = 0.0592934 loss)
I0412 11:58:24.734586 15281 solver.cpp:255]     Train net output #5: rpn_bbox_loss = 0.00123454 (* 1 = 0.00123454 loss)
I0412 11:58:24.734591 15281 solver.cpp:255]     Train net output #6: rpn_cls_loss = 0.693147 (* 1 = 0.693147 loss)
I0412 11:58:24.734596 15281 solver.cpp:640] Iteration 0, lr = 0.001
I0412 11:58:48.375877 15281 solver.cpp:240] Iteration 20, loss = nan
I0412 11:58:48.376101 15281 solver.cpp:255]     Train net output #0: det_accuracy = 0.929688
I0412 11:58:48.376142 15281 solver.cpp:255]     Train net output #1: det_loss = 0.620077 (* 1 = 0.620077 loss)
I0412 11:58:48.376157 15281 solver.cpp:255]     Train net output #2: id_accuracy = -nan
I0412 11:58:48.376165 15281 solver.cpp:255]     Train net output #3: id_loss = 0 (* 1 = 0 loss)
I0412 11:58:48.376170 15281 solver.cpp:255]     Train net output #4: loss_bbox = nan (* 1 = nan loss)
I0412 11:58:48.376176 15281 solver.cpp:255]     Train net output #5: rpn_bbox_loss = 0.185238 (* 1 = 0.185238 loss)
I0412 11:58:48.376183 15281 solver.cpp:255]     Train net output #6: rpn_cls_loss = 0.680902 (* 1 = 0.680902 loss)
I0412 11:58:48.376190 15281 solver.cpp:640] Iteration 20, lr = 0.001

from person_search.

andongchen avatar andongchen commented on June 5, 2024

@Cysu First ,very thanks for your perfect job.There is no issue,but I have a question, have you try YOLO9000 for pedestrain detection,YOLO v2 for object detection is more faster and precision than faster rcnn.At your current work have the detection accuracy influence the person_search‘s mAP.

from person_search.

Cysu avatar Cysu commented on June 5, 2024

Thank you very much for the suggestion. I really appreciate recent advances in object detection, e.g., YOLO v2, FPN, etc., and would like to give it a try if I have some time in the future. But currently I may not have enough spare time for it, and YOLO v2 seems to be implemented only in darknet, which is not that popular, compared with caffe / tf / pytorch.

By the way, do you still suffer from nan loss? If not, how did you solve it?

from person_search.

andongchen avatar andongchen commented on June 5, 2024

Now, there are tensorflow verson YOLO:https://github.com/thtrieu/darkflow
I still suffer from the nan loss,i think it's machine environment's error ,but i not sure.

from person_search.

Cysu avatar Cysu commented on June 5, 2024

Thank you very much for the link. I will check about it.

It's quite weird about the nan problem. Sorry but currently I have no idea about why it happens.

from person_search.

duanLH avatar duanLH commented on June 5, 2024

@andongchen @Cysu When training ,I got "id_accuracy = -nan", Is normal ?

from person_search.

Cysu avatar Cysu commented on June 5, 2024

@duanLH, id_accuracy = -nan is possible, because there are cases that the proposals do not contain any ground truth person, especially at the beginning stage of training.

from person_search.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.