Giter Site home page Giter Site logo

avod-ssd's People

Contributors

dependabot[bot] avatar melfm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

avod-ssd's Issues

understanding of avod-ssd framework

Thank you so much for making avod-ssd public, it helps a lot.
I roughly read the code in avod_ssd_model.py, and please check whether my understanding is correct or not.

  1. AVOD-SSD is not a "REAL" SSD.
    Becuase
    (1) It still needs to generate 3D anchors in 3D space like what is done in AVOD-FPN before training. While, in SSD, anchors are generated from feature map like conv4_3, fc_7 and so on.
    (2) AVOD-SSD only use the last FPN feature maps for box generation. Whie in SSD, multiple scales of feature maps are utilized

  2. AVOD-SSD is called SSD, only because RPN of AVOD-FPN is gone.
    AVOD-SSD extracts FPN feature map from BEV and RGB, and then directly connect the maps to FC piplelines of (2048, 2048, 2048), which is 2nd fusion part in AVOD-FPN.

  3. Because of the above 2 reasons, inference time of AVOD-SSD is similar to AVOD.
    Since (1) FPN feature map generation and (2) fusion part of (2048, 2048, 2048) has no change, and fusion of (256, 256) is not taking too much time compared with the above 2 factors.

If any mis-understanding exists, it is appreciated for any one to point out.

how to run test set

@melfm Hi, thanks for sharing your code. I just finished the training and wonder how to do inference on KITTI test set? How to set the ground-plane belong to the test set? Thanks

How to upload Kitti results?

Hi,I have registered my account on Kitti. How do I upload the result? Do we need to upload txt documents containing test results only? How does Kitti measure the speed?
How do we distinguish between 2D and 3D object on the Kitti list?

channel numbers of FC layers?

Hi @melfm ,
Thank you for the share, it is really a great work.

Here is my question.
The feature extractor output channel numbers are both 32 for the BEV and image, and after the fusion, the channel is still 32. (Is that right?)
However, the channel numbers of FC layers are 512, 1024 and 1024 for classification, offset, and orientation regression, and it means that the number of categories is 16 and the offset and orientation are both encoded as 32 channels. (Is that right?)
In fact, those three number in you thesis are set as 2, 10 and 2 for categories, offset and orientation.

So, it is really confused me.

traing loss converge problem

@melfm
Hi, Sorry to bother you.
I have tried to train/validate avod_ssd using provided data. And the loss is like following.

20180426_avod_ssd_loss

Although by iteratively evaluate the checkpoints, best AP at iter=140000 is like following, which is very close to yours.
83.17% | 73.49% | 67.58%

Why loss is so unstable? how could i improve this?
By the way, the original avod_ssd_cars_example.config and training data are used, no change was made.

Inference Speed?

HI, thanks for you excellent work! It seems your ssd version of avod costs more time than the original avod with lower AP from the results in readme. I wonder what are the pros and cons of the SSD version. Sorry, I haven't read the whole codes in the repo. And I wonder when would you release your Master's thesis, the link in readme seems not working.

Multiclass detection

Thanks for making the code public.

I noticed none of the example config files is set for detecting multiple classes (i.e. car, pedestrian and cyclist as 3 separate classes). I tried modifying one of the included configs by appending to "classes" and "num_clusters" in dataset_config but it crashed building the dataset. I think the related lines are 178-179 of avod.datasets.kitti.kitti_dataset:

            elif self.classes == ['Car', 'Pedestrian', 'Cyclist']:
                self.classes_name = 'All'

then 'All' is passed to get_anchor_info in avod.core.mini_batch_utils, which tries to find the preprocessed dataset files for 'All', which is not created in the preprocessing - instead it creates 3 directories for 'car', 'pedestrian' and 'cyclist'. I think there's either a mismatch between the preprocessing and training code, or I'm missing something in the config file that's unclear.

Is it possible to include one of the config files from your thesis work where you did multiclass detection on KITTI?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.