Giter Site home page Giter Site logo

linjieyangsc / densecap Goto Github PK

View Code? Open in Web Editor NEW
53.0 53.0 24.0 108.64 MB

Dense captioning with joint inference and visual context

License: Other

CMake 2.17% Makefile 0.55% Jupyter Notebook 15.37% Python 12.94% Shell 0.45% C++ 63.00% Cuda 4.73% MATLAB 0.74% Dockerfile 0.05%

densecap's People

Contributors

blgene avatar bogger avatar cypof avatar dgolden1 avatar ducha-aiki avatar eelstork avatar erictzeng avatar flx42 avatar jamt9000 avatar jeffdonahue avatar jyegerlehner avatar kloudkl avatar longjon avatar lukeyeager avatar mavenlin avatar mohomran avatar mtamburrano avatar netheril96 avatar philkr avatar qipeng avatar rbgirshick avatar ronghanghu avatar sergeyk avatar sguada avatar shelhamer avatar ste-m5s avatar tnarihi avatar vsubhashini avatar yangqing avatar yosinski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

densecap's Issues

how to get object class like faster-rcnn (object name e.g. person or car) with other informations

Hi
Dear Linjie,

I want to get class names before captioning (e.g. women or man and etc) is this possible to show me an example to do that in this project?
is there any variable in ./lib/tools/demo.py (fast_rcnn/test.py) in densecap project or I need to implant py-faster-rcnn into this project ?

I will appreciate you if you help me to get object classes in densecap project.

Thank you so much.

How to process the phrases accordingly when multiple regions are merged during training

Hi! This could be a minor question, in the paper it was mentioned the bounding boxes with IoU higher than 0.7 are merged into one. In such cases, how do you merge the caption for each bounding box accordingly during training? Because if I understand correctly each box should originally have one phrase?

Or I got it wrong, there was no merging of the bounding boxes during training phase at all?

Unable to run demo.py due to Caffe error

Hey,

I've been trying to get the system to work for a couple of days now, but keep on running into trouble with Caffe.

When running the demo.py in lib/tools, I get the following error message:

[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 426:18: Message type "caffe.LayerParameter" has no field named "reshape_param".

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0314 06:48:26.013850 25088 upgrade_proto.cpp:928] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: models/dense_cap/vgg_region_global_feature.prototxt

Could you provide a link to the specific Caffe version / fork that you currently use for the project? I understand that you need Ross Girschick's fork for Fast R-CNN to enable ROI pooling. My system has CUDA 8.0 and CuDNN 7.1.

Predict captions for existing bounding boxes

Hi,

we would like to use densecap to predict caption for already computed bounding boxes.
I tried using the im_detect() function in /lib/fast_rcnn/test.py which has a boxes argument.

I would expect the function to output the same number of boxes as i put in, which does not happen. Instead, it looks like the network predicts new boxes in the RPN.

I tried setting the cfg.TEST.HAS_RPN parameter to False in order to load the rois blobs in the _get_blobs() function -> The boxes are loaded into blobs, but this has no effect on the outcome. Are they used at all in this case?

Do i need to adjust the feature_net (vgg_region_global_feature.prototxt) in some way or set some other parameters in order for the network to work as expected? Or did I miss something else?

Thanks

Error caffe: No module named 'caffe._caffe'

I installed caffe on ubuntu 18.04 using sudo apt install caffe-cuda.
Importing on python3 works, but when I run
python3 ./lib/tools/demo.py --image images/ --gpu 0 --net VGG_ILSVRC_16_layers.caffemodel
an error occurred:
ModuleNotFoundError: No module named 'caffe._caffe'
How I can fix this?

Training model: Missing protobuf file

When following the instructions in the README to train a model the dense_cap_train.sh throws an error.

[...]
models/dense_cap/solver_joint_inference.prototxt
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1026 16:31:58.601896  5156 io.cpp:36] Check failed: fd != -1 (-1 vs. -1) File not found: models/dense_cap/solver_joint_inference.prototxt

The file models/dense_cap/solver_joint_inference.prototxt appears to be missing. I simply renamed the solver_joint_inference_finetune.prototxt file in order to train the model but I am not sure whether this is the correct approach.

Is it possible to use the solver_joint_inference_finetune.prototxt file? If not, could you publish the correct file?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.