kujason / avod Goto Github PK

View Code? Open in Web Editor NEW

918.0 918.0 349.0 23.96 MB

Code for 3D object detection for autonomous driving

License: MIT License

Python 84.77% Shell 0.29% Makefile 0.06% C++ 14.89%

deep-learning object-detection

avod's People

Contributors

Stargazers

Watchers

Forkers

handsling chasingw dominikj93 weidezhang avr-tec xpchuan xmprise xinyugaotudelft cqq0505 erickwan kazukiminemura xiaohedu royaalto rjzz liketheflower kuixu githubfragments scofield1991 zyms5244 fmigneault collector-m xiaokeshen jerrypiglet shengwenbo125 ecward yanminbit bestlin aicarmark conanhung suoivy kamiyuanyang neilgu00365 me-meda locussam peiliangli jdc08161063 melodyhappy wanjinchang shubhampachori12110095 ashwathaithal lihua213 narcisidi hzshuai lyk125 labimage xllau goldenminerlmg nicny leinok vincentcheungm lizhao12 shawnghu xtanitfy joestark lunwk aaalgo loujt1984 bowen03 haihualu tjusym poodarchu yukkysaito jiamery manuelschmidt aneesht90 senecal-jjs glc12125 lilac-lee kmario23 betsyweilin donrv keenan-burnett zhangzhang83 ycsmile hunkyu kenroda ypaafw leoonhack fenglf ziliaogithub arasharchor guanmoyu cuptea yongxinw wuzzh 01001hr derrickcos qz1219 snooble linesyao kang-yc ahuirecome qiqika boomfan villanuevab yunzeman jinfree nishanthjois halostorm surajitsaikia27

avod's Issues

question regarding 2D IoU in BEV

Hello,

Thank you for releasing the code to your paper!

"Background anchors are determined by calculating the 2D IoU in BEV between the anchors
and the ground truth bounding boxes. For the car class, anchors with IoU less than 0.3 are considered background anchors, while ones with IoU greater than 0.5"

I am trying to figure out how you overcame the problem of IoU calculation for non-axis aligned rectangles to determine negative and positive anchor predictions. The calculation uses 2 box_list objects.
Could you please point me towards the box_list generation for the ground truth labels. Or help me to understand the process with a few words about the content of these box_lists.
Is it an IoU-calculation between axis aligned bounding boxes around the ground truth box and anchor prediction boxes ?

Regards,
Johannes

cannot found planes

Excuse me for interrupting you. I`m trying to use your code. When I start training, i cannot find the file named "planes",however i had been downloaded all Dataset from "http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=bev".
Could you please tell me the Website where i can download the "planes", thank you so much.

what's the meaning of Plane?

Width 4
Height 1
-5.445015e-02 -9.985028e-01 5.238001e-03 1.661760e+00
the four number stand for what?
thx

Running the code remotely

If I run this code remotely using SSH do I need to change some thing in the code or configuration ?

ValueError: could not convert string to float: "b'0.00'"

(py35) yanchao@yanchao:~/avod$ python scripts/preprocessing/gen_mini_batches.py
Clustering labels 1 / 3712Traceback (most recent call last):
File "scripts/preprocessing/gen_mini_batches.py", line 199, in
main()
File "scripts/preprocessing/gen_mini_batches.py", line 120, in main
car_dataset_config_path)
File "/home/yanchao/MyProjects/avod/avod/builders/dataset_builder.py", line 154, in load_dataset_from_config
use_defaults=False)
File "/home/yanchao/MyProjects/avod/avod/builders/dataset_builder.py", line 191, in build_kitti_dataset
return KittiDataset(cfg_copy)
File "/home/yanchao/MyProjects/avod/avod/datasets/kitti/kitti_dataset.py", line 131, in init
self.kitti_utils = KittiUtils(self)
File "/home/yanchao/MyProjects/avod/avod/datasets/kitti/kitti_utils.py", line 59, in init
self.label_cluster_utils.get_clusters()
File "/home/yanchao/MyProjects/avod/avod/core/label_cluster_utils.py", line 194, in get_clusters
img_idx)
File "/home/yanchao/MyProjects/avod/wavedata/wavedata/tools/obj_detection/obj_utils.py", line 125, in read_labels
obj.truncation = float(p[1])
ValueError: could not convert string to float: "b'0.00'"

How the planes is generated?

The project provided a planes directory for model's input. I'm a little curious about the way it was generated and It seems cannot be found in the paper. Could you please suggest some related resources? Thanks!

ImportError: No module named 'google'

When I ran 'python3 gen_mini_batches.py', I got the problem. Who knows the reason?

Questions about continue training from the last checkpoint

First of all, thank you for sharing your brilliant work.

When setting "overwrite_checkpoints: False" in the config, it means if you stop at any checkpoint and later you want to train with more iterations it will start from the last checkpoint you saved.

Here are my questions,

How do you perform deciding the order of input data?
When continue training from last checkpoint, will the input data order be initialized?

How to generate testing results?

I can run all of the instructions, however, I'm not sure how to generate results on the testing dataset for Kitti benchmark submission.
Could you tell me how to do it?

SyntaxError when executing gen_mini_batches.py

Hi,

Thank you for making your code available.

I follow your instruction in the front page but have encountered a syntax error while invoking the gen_mini_batches.py script as below.

Would you mind letting me know what I could do wrong?

tuan@mypc:~/avod$ python scripts/preprocessing/gen_mini_batches.py
Traceback (most recent call last):
  File "scripts/preprocessing/gen_mini_batches.py", line 6, in <module>
    from avod.builders.dataset_builder import DatasetBuilder
  File "avod/avod/builders/dataset_builder.py", line 169
    new_cfg=None) -> KittiDataset:
                  ^
SyntaxError: invalid syntax

Code retrieved in 5/11/2018.

Python version

Python 2.7.12 (default, Dec  4 2017, 14:50:18)

Kitti data structure as instructed as:

Download the data and place it in your home folder at ~/Kitti/object

tuan@mypc:~/Kitti/object$ tree -L 2
.
├── training
│   ├── calib -> /opt/dataset/KITTI_3D/calib/training/calib
│   ├── image_2 -> /opt/dataset/KITTI_3D/image_2/training/image_2
│   ├── planes
│   └── velodyne -> /opt/dataset/KITTI_3D/velodyne/training/velodyne
├── train.txt
├── trainval.txt
└── val.txt

Ubuntu 16.04 LTS
GPU: nVidia 1080 Ti

Regards,
Tuan

Which version for protoc?

I encounter the problem "avod/protos/kitti_utils.proto:24:5: Expected "required", "optional", or "repeated" when I execute "sh avod/protos/run_protoc.sh". I'm not familiar with protoc. Is there something wrong with the version of my protoc? (version is 2.5.0)

Validation result doesn't convergent for "deep fusion"

Hi, thank you for sharing your code, we are doing relative research and it's very helpful for us!

Now we train the network using pyrimid_cars_with_aug_example.config, with fusion type "deep". After the validation, the best car_detection_3D results appears at around the iteration 52500 checkpoint, which is:
car_detection_3D : [52500, 120000, 70000, 100000, 102500]

In 52500 checkpoint, the result is:
car_detection_3D AP: 84.555695 74.843224 68.156281

Moreover, the result vibrates at the later checkpoints, even decrease to (car_detection_3D AP: 77.260490 68.040474 67.316147) at 117500 checkpoint, as you can see in the attachment figure.

Since it is our first time to train such a large network, my question is: is it a normal thing to have the best performance at 52500 checkpoint? Can we say that the result has already converged? Thanks and looking forward to your reply!

preprocessing time

The preprocessing time is like:
Feed dict time:
Min: 0.12029
Max: 0.24133
Mean: 0.14371
Median: 0.14227
much larger than 0.02s reported ?

Question, how to run without camera

Hi!

Thank you for sharing your excellent work. In the paper there is a row in Table III where you use BEV only features (RPN BEV Only). I am interested in using this network, is there a configuration file available for this?

Also, would it be possible for you to share the trained models?

Which configurations to use to get the results in the paper ?

I am trying to reproduce the results in the paper, the ones in Table 1, but I can't. I keep getting values up to 8% less than the ones in the paper. So which configuration files (for preprocessing and training) and thresholds in the evaluation script to use to reproduce the values in the paper ? Or are there any modifications to the existing configurations (preprocessing and training) to do that ?

Your help would be much appreciated, I'm working on my masters thesis :)

Recursive git clone fails

This command fails, with unknown publickey error from the wavedata submodule:

git clone https://github.com/kujason/avod --recurse-submodules

But this repo can correctly be cloned within the just cloned AVOD repo with:

cd avod
git clone https://github.com/kujason/wavedata

Maybe some setting is incorrect somewhere?

Only recall objects in the right side

Hi, thanks for your sharing. I have trained and validated your network in kitti object train/val split, it works great.
However, when I test the network on kitti raw dataset, it gives out the below results. Only the right side objects are detected in the whole sequence.

So any clues for pointing out the possible reason?

Can we have less than 12000 iterations?

I am training this model and I have a slow gpu I wonder if I will have the some network performances if I cut the training to 9000 or 8000 with step of 50 ?if there any harm in that ?
because training + evaluation takes almost 54 hours for me

another question why there is no max element wise fusion option for RPN model and how "late" and "deep" fusion types perform in comparison to "early" fusion?
Thank you

Create new videos

Result on validation set

I check the training split for validation, that's the same with the one released by MV3D paper, is that the case ?
By the split, on validation set, I run the "avod_cars_example.config", and get results after 120,000 iterations, in "avod_cars_example_results_05_iou_0.1.txt"
"car_detection_3D_AP: 89.85737 80.741768 80.542282"
Is it reasonable results ?
Checked the MV3D paper, it seems they report better results on the same splitted validation set ?

How to generate anchor_info for model's input?

The model asked an anchor_info as the input, which relays on the ground truth bounding box from my understanding.
I see from the code gen_mini_batches.py generate anchor_info for both training and validation data with ground truth label. However, how could we generate anchor_info for the testing data without a ground truth label?

It would be great that you can clarify that procedure.

thanks!

Performance on validation set not aligned to the report in paper

"car_detection_3D AP: 82.047119 67.536583 66.807381" top performance on iteration 39000, this is the top ranked by the given script, there is still a gap on the moderate, I leave the config by default to run "avod_cars_examples.config", is there anything I am missing ?

thanks

thank you for sharing your code

How to remove undetected bounding boxes

Hi Team,

Thanks to your instruction, I am able to run your code to train, evaluate and inference on the Kitti dataset.

After running the demo generation for 2d image demos/show_predictions_2d.py, I see lots of green bounding boxes as image below. Would you mind letting me know the color coding invention you are using? What is the difference between yello and red?

And, importantly, how could I disable those green boxes?

In addition, I am wondering if you have a script to generate the demo for 3d point cloud also. Any hint would be greatly appreciated.

Thank you,

Questions about 2D detection performance

First of all, congratulations on your good paper and code release.

According to your paper, many problems that existed in existing 2d box recognition have been solved. But I am curious. Why is the 2d box recognition rate not ranked at the top from kitti? ( Compared to methods that do not use point clouds)

I am curious to see if there is any limit to this lidar technique about 2d detections.

Inference Time

Thanks for sharing your code!

I have followed the procedure in the README and trained the model after 53000 step, So I made an experiment using the evaluator, below is the output

Step 53000: 450 / 3769, Inference on sample 001021
Step 53000: Eval RPN Loss: objectness 0.149, regression 0.095, total 0.244
Step 53000: Eval AVOD Loss: classification 0.038, regression 1.809, total 2.091
Step 53000: Eval AVOD Loss: localization 1.310, orientation 0.499
Step 53000: RPN Objectness Accuracy: 0.95703125
Step 53000: AVOD Classification Accuracy: 0.9880478087649402
Step 53000: Total time 0.577916145324707 s
Step 53000: 451 / 3769, Inference on sample 001022
Step 53000: Eval RPN Loss: objectness 0.026, regression 0.094, total 0.119
Step 53000: Eval AVOD Loss: classification 0.019, regression 0.942, total 1.080
Step 53000: Eval AVOD Loss: localization 0.897, orientation 0.045
Step 53000: RPN Objectness Accuracy: 0.9921875
Step 53000: AVOD Classification Accuracy: 0.9970443349753695
Step 53000: Total time 0.24765753746032715 s
Step 53000: 452 / 3769, Inference on sample 001025
Step 53000: Eval RPN Loss: objectness 0.172, regression 0.175, total 0.347
Step 53000: Eval AVOD Loss: classification 0.044, regression 3.892, total 4.282
Step 53000: Eval AVOD Loss: localization 3.537, orientation 0.354
Step 53000: RPN Objectness Accuracy: 0.970703125
Step 53000: AVOD Classification Accuracy: 0.9950884086444007
Step 53000: Total time 0.2989237308502197 s
Step 53000: 453 / 3769, Inference on sample 001026

I think the inference time(around 0.3s) is slow compared with 100ms claimed in the paper, any suggestion? I'm using a 1080Ti GPU.

Does the net can only work on the data from HDL64 lidar?

How to detect vehicle and people simultaneously

As described in paper, the network detect vehicle and people separately, so how to detect both of them simultaneously?
Whether the config file avod/configs/pyramid_cars_with_aug_example.config refers to the project AVOD-FPN ranking first in ketti?
Thx!

training loss

@kujason
Hi, thank you for sharing the code. Ifollowed all the instructions and now training for CAR begins. Without modifying any code and using default config, training loss seems not right. It fluctuates and something loss are like this. Do you have any idea why this is happening?

Step 500, Total Loss 2.645, Time Elapsed 6.096 s
Step 550, Total Loss 3.583, Time Elapsed 5.431 s
Step 560, Total Loss 11.722, Time Elapsed 5.333 s
Step 570, Total Loss 3.723, Time Elapsed 5.495 s
Step 580, Total Loss 1.895, Time Elapsed 5.473 s
Step 590, Total Loss 15.548, Time Elapsed 5.860 s
Step 600, Total Loss 2.417, Time Elapsed 5.647 s

After running demos/show_predictions_2d.py, nothing output!!!

To visualize these results, after running demos/show_predictions_2d.py. I only get two same images without 3D bounding box.
My system is:
ubuntu 16.04
Tensorflow 1.6
CUDA 9
Cudnn 7
Who can help me? Thanks in advance!

Protoc compile

Anyone who knows how to solve this problems when I compile protoc from the avod folder. Thanks.

avod/protos/kitti_utils.proto:24:5: Expected "required", "optional", or "repeated".
avod/protos/kitti_utils.proto:24:25: Missing field number.
avod/protos/kitti_dataset.proto: Import "avod/protos/kitti_utils.proto" was not found or had errors.
avod/protos/kitti_dataset.proto:39:14: "KittiUtilsConfig" is not defined.

protoc version

Which version of "protoc" are you using in this project?

I got below error

avod/protos/kitti_utils.proto:24:5: Expected "required", "optional", or "repeated".
avod/protos/kitti_utils.proto:24:25: Missing field number.
avod/protos/kitti_dataset.proto: Import "avod/protos/kitti_utils.proto" was not found or had errors.
avod/protos/kitti_dataset.proto:39:14: "KittiUtilsConfig" is not defined.

I can't find the results

After training and running the evaluation I can't find the results file

FileNotFoundError: [Errno 2] No such file or directory: 'results/avod_cars_example_results_0.1.txt'

so I am wondering what is the problem?, your help will be much appreciated because I am working on my master's Thesis

avod-ssd

where can i find the version of avod-ssd?
thanks

Wrong at Evaluator&Inference script

Hi, I successfully trained the avod to 120000 iterations, but when I ran the evaluator&inference script, they both stopped when processing sample 002908, and the wrong was like

libpng error: Read Error
Traceback (most recent call last):
File "avod/experiments/run_evaluation.py", line 130, in
tf.app.run()
File "/home/prp/anaconda2/envs/py35tf13/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "avod/experiments/run_evaluation.py", line 126, in main
evaluate(model_config, eval_config, dataset_config)
File "avod/experiments/run_evaluation.py", line 83, in evaluate
model_evaluator.repeated_checkpoint_run()
File "/home/prp/chrisli/myavod/avod/avod/core/evaluator.py", line 460, in repeated_checkpoint_run
self.run_checkpoint_once(checkpoint_to_restore)
File "/home/prp/chrisli/myavod/avod/avod/core/evaluator.py", line 199, in run_checkpoint_once
feed_dict = self.model.create_feed_dict()
File "/home/prp/chrisli/myavod/avod/avod/core/models/avod_model.py", line 655, in create_feed_dict
feed_dict = self._rpn_model.create_feed_dict()
File "/home/prp/chrisli/myavod/avod/avod/core/models/rpn_model.py", line 643, in create_feed_dict
shuffle=False)
File "/home/prp/chrisli/myavod/avod/avod/datasets/kitti/kitti_dataset.py", line 424, in next_batch
samples_in_batch.extend(self.load_samples(np.arange(start, end)))
File "/home/prp/chrisli/myavod/avod/avod/datasets/kitti/kitti_dataset.py", line 277, in load_samples
rgb_image = cv_bgr_image[..., :: -1]
TypeError: 'NoneType' object is not subscriptable

The system is Ubuntu 16.04 with python3.5 and tensorflow1.3.0.

Thanks for your help.

Project lidar to camera coordinate?

Hi, in kitti_utils.get_point_cloud, it returns the lidar points which are projected onto camera coordinate. Why don't just return the raw lidar data? Thanks~

To determine which model is used during testing

During testing, how can we determine the best model to be used ?
For example, to repeat the results on the KITTI leading board, how can you determine the model to be used ? Do you just use the last model when 120,000 iterations are finished ?

Results on testing split

Hi,
Thanks for your great codes.
I run your codes with pyramid_cars_with_aug_example configuration, and final get the AP about (84.0, 74.0, 67.7) on val split, which is comparable with the results on your paper.
But when I use the best model to run on the test split, and I only get AP 56.4 on moderate split from the official testing service.
So do you know where is the problem?
Thanks very much.

Preprocessing time

Read disk (image, calib, ground, pc) time:
Min: 0.01494
Max: 0.03513
Mean: 0.02009
Median: 0.02044

Create bev time:
Min: 0.01963
Max: 0.09319
Mean: 0.0378
Median: 0.034

Load sample time:
Min: 0.05957
Max: 0.18219
Mean: 0.08599
Median: 0.07644

Fill anchor time:
Min: 0.0688
Max: 0.16987
Mean: 0.0908
Median: 0.07938

Feed dict time:
Min: 0.12845
Max: 0.29517
Mean: 0.17686
Median: 0.15515

Inference time:
Min: 0.08431
Max: 2.92182
Mean: 0.16493
Median: 0.09333

Preprocessing time profiled as above, much larger more than 0.02s, I don't think my cpu is super weak, do you have any suggestion ? For example, fill anchor time is so expensive ?

Add timing chart to Wiki

https://github.com/kujason/avod/wiki/Timing

calculate AP

the test set of KITTI dataset did not provide labels, How did you get the AP in your ablation experiment？

Question about AP in Validation set

Firstly, much thanks for your code release.

I successfully trained the avod-cars network to 120000 iteration, however, when i run the evaluation command:

python avod/experiments/run_evaluation.py --pipeline_config=avod/configs/pymarid_cars_with_aug_example.config --device='0' --data_split='val'

the result on 120000 iteration is as follows:

120000
done.
car_detection AP: 22.552376 24.332737 25.962851
car_detection_BEV AP: 22.371897 23.603966 25.545847
car_heading_BEV AP: 22.340508 23.495865 25.304886
car_detection_3D AP: 21.757717 19.721174 20.458136
car_heading_3D AP: 21.725494 19.660765 20.347580

which is much lower than the results in the paper(more than 70% basically). And I also run the evaluation on the checkpoint 110000 iteration, which is (26.233753 23.378477 27.482744) on car_detection_3D AP performance.

Why VGG 16 and not ResNet ?

Hello ,Thank you for sharing the code of this great work
Why did you choose to use VGG16 as encoder instead of ResNet ?
Are you using Batch normalization ?
Another question how can we calculate the distance to the obstacles using the Lidar Points representation ?
Thank you

Question on card with low VRAM

Hey, my graphics card only has 2GB of VRAM is there anyway I can change the batch size for the training to work. I always get an error saying tensorflow ran out of memory. I've changed some settings in the configs but can't seem to get it to work. Or can someone maybe please upload a pretrained model I just wanted to test a few things.

Thanks

Early vs Late vs Deep Fusion

Can some one explain to me with details the difference between the 3 types of fusion( deep , early , late)?

New dataset

I was wondering if anyone tried avod with other dataset. Also, if anyone can give instruction to setup avod with new dataset it would be great. Thank you.

GPU memory usage when training

Hi,
How much GPU memory do we need to train with your codes?
When training with your codes, the number of anchors among different images are different due to the distribution of lidar points, so will the GPU memory change in different batch?
Actually when I do the training, I found the GPU memory cost is always about 4GB, where I am curious.
Thanks very much.

Questions about the comparison with MV3D

Firstly, thanks for sharing your work and codes, it definitely helps me a lot. After carefully comparing your work and MV3D, I have a few questions for the comparison result:

How can you get the 0.7-3D-IoU AP of validation data set (i.e. 83.87% 72.35% 64.56% in Table I) for MV3D? I did not find the same results presented in the original MV3D paper.
Actually MV3D also exploits 2x or 4x deconv operations to upsampling the last feature map to handle extra-small objects, though it cannot get a full size as you did. Therefore, except the different upsampling methods for feature map, could you help me to point out the major difference between these two works?

does run time means inference time ?

how to calculate run time on kitti is it the same inference time min on every check point?

How to run with our own data

Hi, thanks for sharing your code. I just finished the training and wonder how to do inference on our own data. I am looking into the evaluator.py, but I would be grateful if you could give me some hints. Thanks~

kujason / avod Goto Github PK

avod's People

Contributors

Stargazers

Watchers

Forkers

avod's Issues

Create bev time: Min: 0.01963 Max: 0.09319 Mean: 0.0378 Median: 0.034

Feed dict time: Min: 0.12845 Max: 0.29517 Mean: 0.17686 Median: 0.15515

Inference time: Min: 0.08431 Max: 2.92182 Mean: 0.16493 Median: 0.09333

Recommend Projects

Recommend Topics

Recommend Org

Create bev time:
Min: 0.01963
Max: 0.09319
Mean: 0.0378
Median: 0.034

Feed dict time:
Min: 0.12845
Max: 0.29517
Mean: 0.17686
Median: 0.15515

Inference time:
Min: 0.08431
Max: 2.92182
Mean: 0.16493
Median: 0.09333