sarahesl / pubmedclip Goto Github PK
View Code? Open in Web Editor NEWFine-tuning CLIP using ROCO dataset which contains image-caption pairs from PubMed articles.
License: MIT License
Fine-tuning CLIP using ROCO dataset which contains image-caption pairs from PubMed articles.
License: MIT License
hello, me again.
When I ran main.py
in PubMedCLIP
with vision encoder ViT-B/32
, something wrong:
-------Loading CLIP with vision encoder ViT-B/32 -------
-------Training started-------
Traceback (most recent call last):
File "C:/Users/fhdu/PycharmProjects/PubMedCLIP/main/main.py", line 51, in <module>
train(cfg, train_loader, val_loader, device)
File "C:\Users\fhdu\PycharmProjects\PubMedCLIP\main\train.py", line 65, in train
for i, (image, caption) in enumerate(train_loader):
File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
data = self._next_data()
File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\fhdu\PycharmProjects\PubMedCLIP\lib\dataset\ROCOdataset.py", line 80, in __getitem__
**image = self._load_image(index)**
File "C:\Users\fhdu\PycharmProjects\PubMedCLIP\lib\dataset\ROCOdataset.py", line 70, in _load_image
**image = read_image(path, mode=ImageReadMode.RGB)**
File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torchvision\io\image.py", line 222, in read_image
data = read_file(path)
File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torchvision\io\image.py", line 42, in read_file
data = torch.ops.image.read_file(path)
File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torch\_ops.py", line 63, in __getattr__
op = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator image::read_file
Not sure the wrong version of torch leads to this issue? my version:
torch 1.10.0 , torchvision 0.11.1, according to your requirement.txt
Looking forward to your answer, thanks in advance.
Hello, thank you so much for your work!
I tried to reproduce your results and I'm not able to do it so far. I followed the instructions for QCR on the SLAKE dataset and my results show almost the same overall accuracy for CLIP and PubMedCLIP (using your checkpoints) and even a little lower for PubMedCLIP, both around 79% overall accuracy. Any advice?
Also, I tried to train my own PubMedCLIP using your code (following the instructions on the PubMedCLIP subdirectory) and it seems to be overfitting in the first 5 epochs for every model, was it similar in your experiments? These models also performed similar to CLIP and your checkpoints of PubMedCLIP on QCR-SLAKE.
Thanks
Hi,
After following the instruction you provided in 'QCR_PubMedCLIP' folder, we got the following error
/content/PubMedCLIP/QCR_PubMedCLIP loading dictionary from ./data/data_rad/dictionary.pkl loading DAE image data from file: ./data/data_rad/images128x128.pkl loading CLIP image data from file: ./data/data_rad/images250x250.pkl loading DAE image data from file: ./data/data_rad/images128x128.pkl loading CLIP image data from file: ./data/data_rad/images250x250.pkl Traceback (most recent call last): File "main/main.py", line 85, in <module> question_classify.load_state_dict(pretrained_model) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1483, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for classify_model: size mismatch for w_emb.emb.weight: copying a param with shape torch.Size([1178, 300]) from checkpoint, the shape in current model is torch.Size([1260, 300]).
And we found this bug is caused by the newly generated dictionary. The size is different from your saved_model. Could you help me with it?
Hello. Thank you for this research.
I have a question about this work. I try to use PubMedCLIP as a pretrained model for extracting features from medical images and questions but I can't. Can you help me to solve this problem? How can I use this model as a pretrained model?
Thanks
Hi,
Very interesting work.
For some reason, GDrive does not give access to your trained models.
Can you please fix that?
Thanks,
Hi, I have the same problem as #8 (comment), and I can not solve this problem by re-running the script.
Traceback (most recent call last):
File "main/main.py", line 85, in <module>
question_classify.load_state_dict(pretrained_model)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for classify_model:
size mismatch for w_emb.emb.weight: copying a param with shape torch.Size([1178, 300]) from checkpoint, the shape in current model is torch.Size([1260, 300]).
In https://github.com/sarahESL/PubMedCLIP/blob/main/QCR_PubMedCLIP/lib/utils/create_dictionary.py
, create_dictionary
function use both train / test file to create dictionary (with nvocab = 1260). But in train code, the tf-idf loading module uses only train set (nvocab = 1178). I guess that this problem is due to the difference between the question used in the create dictionary and the question set used in the tf-idf calculation. Could you please solve this problem?
This repository is great! I was able to reproduce the results for the SLAKE dataset and the model seems to work well. However, I was wondering about the validation data. It seems like in line 65 in main.py for QCR_PubMedCLIP, the test dataset is used for validation and best model selection. Also in the setup script, it seems like the img2idx dictionary is created using train and test data instead of validation. Is this supposed to be the case?
Hi,Do you know how to generate the CSV file
Hello,
I'm having trouble running the script for generating the input files (dictionaries, and pickles). One problem in lib/utils/run.sh runs python create_dictionary.py "../../data/data_slake" instead of /data_rad/. However, after changing the directory to data_rad more issues pop up including df columns not matching and I understand that most likely the file was not modified perfectly for generating both datasets' inputs.
It took a while to get them all working, but it would be nice to see a fix.
How to save a model in local machine ?
hi there, I used to train the MEVF in my local machine, as MEVF's README.md
wrote:
All data should be downloaded via link. The downloaded file should be extracted to data_RAD/ directory.
so I really want to know in your repository how the run.sh
can create datasets files or some dictionaries files(I thought it should have a url at least?), I think this ought to be a low-level question(:
appreciated for your answer.
How is imgid2idx.json generated
How to find the two files pretrained_ae.pth and pretrained_maml.pth, thank you for telling me
Hello,
First off, thanks for this contribution and for making the code public.
I have some questions regarding the validation of your model on VQA-Rad:
First, I wasn’t able to find anything about a validation set. Is the current setup that the model is validated and tested on the same test set?
Second, as I understand it the problem is set up as classification of all the possible answers present in the dataset. However, I noticed that there are many answers in the test set that are not present in the train set. Wouldn’t this mean that it’s impossible (as long as we don’t include any textual embedding of the answers) for the model to predict these answers correctly?
Thanks!
Hi, Thank you for sharing the wonderful work in the medical field. I have one question.
In QCR_PubMEDCLIP.lib.utils.create_label.py <line 200>
What is the reason for adding the "#" string to the answer file in an open answer? (for Slake dataset)
for answer in open_answers: if answer in ans2label: ans = answer + "#"
How to find the this files embed_tfidf_weights , thank you for telling me
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.