alcoholrithm / oscar_scripts Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 1.0 3.63 MB

Scripts for to inference using Oscar in Image Captioning and VQA tasks

License: MIT License

Python 1.39% Jupyter Notebook 98.61%

oscar_scripts's Introduction

💻 Tech Stack 💻

oscar_scripts's People

Contributors

Stargazers

Watchers

Forkers

rl-gan-vision-privacy-finance-projects

oscar_scripts's Issues

Using Oscar with Bottom-up features

Hi thanks for this amazing work!

I've tried to use your code to generate captions on raw images with bottom-up features but I'm getting bad results. I moved the notebook to a python script and tried inference on 1 image with the model 'OursL(XE)' you posted on your MODEL ZOO. I'm printing the features and classes detected and they seem to be fine but the caption generated has nothing to do with the image.

This is the image i'm using:

This is the result I get after running the code.

python test.py
Config 'workspace/detectron2/configs/VG-Detection/faster_rcnn_R_101_C4_caffe.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Modifications for VG in RPN (modeling/proposal_generator/rpn.py):
Use hidden dim 512 instead fo the same dim as Res4 (1024).

Modifications for VG in RoI heads (modeling/roi_heads/roi_heads.py):
1. Change the stride of conv1 and shortcut in Res5.Block1 from 2 to 1.
2. Modifying all conv2 with (padding: 1 --> 2) and (dilation: 1 --> 2).
For more details, please check 'https://github.com/peteanderson80/bottom-up-attention/blob/master/models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt'.

/home/jhurtado/anaconda3/envs/Alcoholrithm/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
[{'class': 'dog'}, {'class': 'frisbee'}, {'class': 'leg'}, {'class': 'leg'}, {'class': 'head'}, {'class': 'shadow'}, {'class': 'eye'}, {'class': 'eyes'}, {'class': 'beach'}, {'class': 'snow'}, {'class': 'dog'}, {'class': 'ear'}, {'class': 'eye'}, {'class': 'paw'}, {'class': 'legs'}, {'class': 'water'}, {'class': 'nose'}, {'class': 'ground'}, {'class': 'ground'}, {'class': 'water'}, {'class': 'shadow'}, {'class': 'ground'}, {'class': 'snow'}, {'class': 'ground'}, {'class': 'snow'}, {'class': 'head'}, {'class': 'mouth'}, {'class': 'sand'}]
[[array([[0.36432788, 0.23231444, 0. , ..., 0.99799365, 0.31319785,
0.9006105 ],
[0. , 0.01907516, 0. , ..., 0.337672 , 0.15087502,
0.07700826],
[0. , 0. , 0. , ..., 0.9307157 , 0.08133312,
0.28387186],
...,
[0.02281303, 0.04437782, 0.0105911 , ..., 0.37423557, 0.34131297,
0.36085242],
[0. , 0.00111979, 0. , ..., 0.31540072, 0.06328667,
0.08102156],
[0. , 0.93097776, 0.01253132, ..., 0.99827766, 0.82969713,
0.52183276]], dtype=float32), array([117, 123, 808, 808, 191, 683, 467, 546, 62, 176, 117, 274, 467,
786, 829, 183, 391, 465, 465, 183, 683, 465, 176, 465, 176, 191,
452, 326])]]
/home/jhurtado/image_captioning/Pruebas/MaravillasDeSergio/Oscar_Scripts/Oscar/oscar/modeling/modeling_utils.py:506: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
beam_id = idx // vocab_size
['a bunch of tools sitting on top of a pile of pens and pencils.']

I'm thinking there might be a missmatch between labels and features but I don't how can I try to fix this.
Do you think you can help me with this?

Thanks a lot in advance

Pretrained OSCAR Model

Hi,
I was trying to get the image captioning code running from your repository. I was wondering if you can point me to the location of the 'checkpoint-59-554820' model that you use

Using Oscar+ original repo models produces bad captions

Hi again!

I have been using your code and models with vinvl detections and getting good results so far but when I try to use a checkpoint from the original Oscar repository I get terrible captions.

I'm trying the model trained on COCO for image captioning finetuned with cross entropy that they have available on their VinVLModelZoo.md

https://biglmdiag.blob.core.windows.net/vinvl/model_ckpts/image_captioning/coco_captioning_large_xe.zip

I used this image again:

This is the caption I got from your model finetuned with CIDER optimization:
['a small brown bear standing on top of a sandy beach.']

This is the caption I get with the model from Oscar's repo:
['a large number of lights that are on a building.']

I also manually checked what detections was vinvl finding from their original repository on this same image and got really good results:

perrofrisbee_x152c4.attr.txt

I should also mention I made a few changes in the code to run it in cpu instead of cuda since the model can't fit in the gpu I have available.

I want to check what results I can get with the best performing model from their repository (41.0 BLEU4 score) but I think I'm missing something.

Do you think you can help me with this?

Thanks for the help once again.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.