Giter Site home page Giter Site logo

foundationvision / generateu Goto Github PK

View Code? Open in Web Editor NEW
122.0 122.0 6.0 14.73 MB

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Python 90.82% C++ 2.96% Cuda 6.20% Shell 0.02%
mllm multimodality object-detection open-vocabulary open-vocabulary-detection open-world

generateu's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

generateu's Issues

Some weights of T5ForConditionalGeneration were not initialized from the model checkpoint at google/flan-t5-base and are newly initialized: ['temp']

Thank you very much for your outstanding work. However, when I was training vg_swinT.yaml, I encountered the following issue:

Some weights of T5ForConditionalGeneration were not initialized from the model checkpoint at google/flan-t5-base and are newly initialized: ['temp']
You should probably TRAIN this model on a downstream task to be able to use it for predictions and inference.

Moreover, the training results were only:

AP AP50 AP75 APs APm APl APr APc APf
0.013 0.027 0.010 0.001 0.007 0.057 0.000 0.001 0.025

Is this issue caused by the flan-t5-base model not loading correctly? I hope to get your advice on this matter. Thank you.

About multi-modal large model initialization.

Thanks for your excellent work. Due to limited resources, I would like to learn the part of training detection head from multi-modal large model initialization. Please kindly transfer a code in your busy schedule for learning and academic research only.

Evaluation results of the model.

It is a interesting work. When I just evaluate the pretrained model from the author provided, I get the lower results:
image

Is there any wrong for testing process? Can author provide the python code of "DDETRSVLUniWithTTA"? Is the paper result got by "DDETRSVLUniWithTTA" process?

Thanks!

torch.cuda.OutOfMemoryError

File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 496.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 235.06 MiB is free. Process 20398 has 14.52 GiB memory in use. Of the allocated memory 14.06 GiB is allocated by PyTorch, and 330.42 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Run error

When I run pip3 install -r requirements.txt, I get the following error:
ERROR: Invalid requirement: -e . --user
pip3: error: no such option: --user

I would appreciate a solution, if possible. Thanks.

Inference code for user-defined data

Interested in your work. Thank you very much for it.
I would like to compare with results of other model as qualitative.
Can I know your plans for this?

COCO zero-shot

@clin1223
Hi, thanks for your significant work!
We want to reproduce the COCO zero-shot results In Table 3.
We generate the text embeddings via clip-vit-large-patch14-336. We replace the ZERO_SHOT_WEIGHT with the generated embeds.
Unfortunately, the results are 0.
Could you please give some points to us? Could you please provide the corresponding COCO-80-embeddings?
Thanks! Have a nice day!

By the way, we generate the COCO-80-embeddings as follows.

model_path = "clip-vit-large-patch14-336"
model = CLIPTextModel.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
inputs = tokenizer(['a '+ class], padding=True, return_tensors="pt")
outputs = model(**inputs)
text_features = outputs.pooler_output

We obtain a numpy array, 80* 768.

About t5_loss

Thank you for your fascinating work. I noticed in the code that t5_loss is not included in the weight_dict, which implies that t5_loss does not get optimized, meaning the T5 model does not get updated. Is this an oversight, or is there a specific reason for this configuration?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.