Giter Site home page Giter Site logo

oscar_scripts's Introduction

cylinder



๐Ÿ’ป Tech Stack ๐Ÿ’ป





oscar_scripts's People

Contributors

alcoholrithm avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

oscar_scripts's Issues

Using Oscar with Bottom-up features

Hi thanks for this amazing work!

I've tried to use your code to generate captions on raw images with bottom-up features but I'm getting bad results. I moved the notebook to a python script and tried inference on 1 image with the model 'OursL(XE)' you posted on your MODEL ZOO. I'm printing the features and classes detected and they seem to be fine but the caption generated has nothing to do with the image.

This is the image i'm using:

perro_frisbee

This is the result I get after running the code.

python test.py
Config 'workspace/detectron2/configs/VG-Detection/faster_rcnn_R_101_C4_caffe.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Modifications for VG in RPN (modeling/proposal_generator/rpn.py):
Use hidden dim 512 instead fo the same dim as Res4 (1024).

Modifications for VG in RoI heads (modeling/roi_heads/roi_heads.py):
1. Change the stride of conv1 and shortcut in Res5.Block1 from 2 to 1.
2. Modifying all conv2 with (padding: 1 --> 2) and (dilation: 1 --> 2).
For more details, please check 'https://github.com/peteanderson80/bottom-up-attention/blob/master/models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt'.

/home/jhurtado/anaconda3/envs/Alcoholrithm/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
[{'class': 'dog'}, {'class': 'frisbee'}, {'class': 'leg'}, {'class': 'leg'}, {'class': 'head'}, {'class': 'shadow'}, {'class': 'eye'}, {'class': 'eyes'}, {'class': 'beach'}, {'class': 'snow'}, {'class': 'dog'}, {'class': 'ear'}, {'class': 'eye'}, {'class': 'paw'}, {'class': 'legs'}, {'class': 'water'}, {'class': 'nose'}, {'class': 'ground'}, {'class': 'ground'}, {'class': 'water'}, {'class': 'shadow'}, {'class': 'ground'}, {'class': 'snow'}, {'class': 'ground'}, {'class': 'snow'}, {'class': 'head'}, {'class': 'mouth'}, {'class': 'sand'}]
[[array([[0.36432788, 0.23231444, 0. , ..., 0.99799365, 0.31319785,
0.9006105 ],
[0. , 0.01907516, 0. , ..., 0.337672 , 0.15087502,
0.07700826],
[0. , 0. , 0. , ..., 0.9307157 , 0.08133312,
0.28387186],
...,
[0.02281303, 0.04437782, 0.0105911 , ..., 0.37423557, 0.34131297,
0.36085242],
[0. , 0.00111979, 0. , ..., 0.31540072, 0.06328667,
0.08102156],
[0. , 0.93097776, 0.01253132, ..., 0.99827766, 0.82969713,
0.52183276]], dtype=float32), array([117, 123, 808, 808, 191, 683, 467, 546, 62, 176, 117, 274, 467,
786, 829, 183, 391, 465, 465, 183, 683, 465, 176, 465, 176, 191,
452, 326])]]
/home/jhurtado/image_captioning/Pruebas/MaravillasDeSergio/Oscar_Scripts/Oscar/oscar/modeling/modeling_utils.py:506: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
beam_id = idx // vocab_size
['a bunch of tools sitting on top of a pile of pens and pencils.']

I'm thinking there might be a missmatch between labels and features but I don't how can I try to fix this.
Do you think you can help me with this?

Thanks a lot in advance

Pretrained OSCAR Model

Hi,
I was trying to get the image captioning code running from your repository. I was wondering if you can point me to the location of the 'checkpoint-59-554820' model that you use

Using Oscar+ original repo models produces bad captions

Hi again!

I have been using your code and models with vinvl detections and getting good results so far but when I try to use a checkpoint from the original Oscar repository I get terrible captions.

I'm trying the model trained on COCO for image captioning finetuned with cross entropy that they have available on their VinVLModelZoo.md

https://biglmdiag.blob.core.windows.net/vinvl/model_ckpts/image_captioning/coco_captioning_large_xe.zip

I used this image again:
perro_frisbee

This is the caption I got from your model finetuned with CIDER optimization:
['a small brown bear standing on top of a sandy beach.']

This is the caption I get with the model from Oscar's repo:
['a large number of lights that are on a building.']

I also manually checked what detections was vinvl finding from their original repository on this same image and got really good results:
perrofrisbee_x152c4 attr

perrofrisbee_x152c4.attr.txt

I should also mention I made a few changes in the code to run it in cpu instead of cuda since the model can't fit in the gpu I have available.

I want to check what results I can get with the best performing model from their repository (41.0 BLEU4 score) but I think I'm missing something.

Do you think you can help me with this?

Thanks for the help once again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.