Giter Site home page Giter Site logo

pubmedclip's Introduction

PubMedCLIP in Medical Visual Question Answering

This repository includes PubMedCLIP, the fine-tuned version of CLIP with ROCO image--caption pairs. We also provide the pipelines for encorporating PubMedCLIP as the alternative pre-trained visual encoder in MEVF and QCR medical visual question answering pipelines. Our experiments illustrate that PubMedCLIP results in up tp 3% improvement in the medical visual question answering.

Citation

If you use this work in academic publication, please cite the arXiv paper by Sedigheh Eslami, Gerard de Melo, and Christoph Meinel:

Sedigheh Eslami, Gerard de Melo, Christoph Meinel (2021). 
Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?
arXiv e-prints 2112.13906, 2021.

BibTeX entry:

@inproceedings{eslami2023pubmedclip,
  title={PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?},
  author={Eslami, Sedigheh and Meinel, Christoph and De Melo, Gerard},
  booktitle={Findings of the Association for Computational Linguistics: EACL 2023},
  pages={1151--1163},
  year={2023}
}

pubmedclip's People

Contributors

gdemelo avatar sarahesl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pubmedclip's Issues

Questions about reproducibility

Hello, thank you so much for your work!

I tried to reproduce your results and I'm not able to do it so far. I followed the instructions for QCR on the SLAKE dataset and my results show almost the same overall accuracy for CLIP and PubMedCLIP (using your checkpoints) and even a little lower for PubMedCLIP, both around 79% overall accuracy. Any advice?

Also, I tried to train my own PubMedCLIP using your code (following the instructions on the PubMedCLIP subdirectory) and it seems to be overfitting in the first 5 epochs for every model, was it similar in your experiments? These models also performed similar to CLIP and your checkpoints of PubMedCLIP on QCR-SLAKE.

Thanks

Validation set & test/train answer differences

Hello,

First off, thanks for this contribution and for making the code public.

I have some questions regarding the validation of your model on VQA-Rad:

First, I wasn’t able to find anything about a validation set. Is the current setup that the model is validated and tested on the same test set?

Second, as I understand it the problem is set up as classification of all the possible answers present in the dataset. However, I noticed that there are many answers in the test set that are not present in the train set. Wouldn’t this mean that it’s impossible (as long as we don’t include any textual embedding of the answers) for the model to predict these answers correctly?

Thanks!

Use PubMedCLIP as a pretrained model

Hello. Thank you for this research.
I have a question about this work. I try to use PubMedCLIP as a pretrained model for extracting features from medical images and questions but I can't. Can you help me to solve this problem? How can I use this model as a pretrained model?
Thanks

pretrained_ae.pth

How to find the two files pretrained_ae.pth and pretrained_maml.pth, thank you for telling me

Validation Set for SLAKE dataset

This repository is great! I was able to reproduce the results for the SLAKE dataset and the model seems to work well. However, I was wondering about the validation data. It seems like in line 65 in main.py for QCR_PubMedCLIP, the test dataset is used for validation and best model selection. Also in the setup script, it seems like the img2idx dictionary is created using train and test data instead of validation. Is this supposed to be the case?

Issues with lib/utils/run.sh for input files

Hello,
I'm having trouble running the script for generating the input files (dictionaries, and pickles). One problem in lib/utils/run.sh runs python create_dictionary.py "../../data/data_slake" instead of /data_rad/. However, after changing the directory to data_rad more issues pop up including df columns not matching and I understand that most likely the file was not modified perfectly for generating both datasets' inputs.

It took a while to get them all working, but it would be nice to see a fix.

why can run.sh create datasets files?

hi there, I used to train the MEVF in my local machine, as MEVF's README.md wrote:

All data should be downloaded via link. The downloaded file should be extracted to data_RAD/ directory.

so I really want to know in your repository how the run.sh can create datasets files or some dictionaries files(I thought it should have a url at least?), I think this ought to be a low-level question(:
appreciated for your answer.

Issue with VQA RAD training

Hi, I have the same problem as #8 (comment), and I can not solve this problem by re-running the script.

Traceback (most recent call last):
  File "main/main.py", line 85, in <module>
    question_classify.load_state_dict(pretrained_model)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for classify_model:
        size mismatch for w_emb.emb.weight: copying a param with shape torch.Size([1178, 300]) from checkpoint, the shape in current model is torch.Size([1260, 300]).

In https://github.com/sarahESL/PubMedCLIP/blob/main/QCR_PubMedCLIP/lib/utils/create_dictionary.py, create_dictionary function use both train / test file to create dictionary (with nvocab = 1260). But in train code, the tf-idf loading module uses only train set (nvocab = 1178). I guess that this problem is due to the difference between the question used in the create dictionary and the question set used in the tf-idf calculation. Could you please solve this problem?

Question about answer+"#"

Hi, Thank you for sharing the wonderful work in the medical field. I have one question.

In QCR_PubMEDCLIP.lib.utils.create_label.py <line 200>

What is the reason for adding the "#" string to the answer file in an open answer? (for Slake dataset)

for answer in open_answers: if answer in ans2label: ans = answer + "#"

No such operator image::read_file

hello, me again.
When I ran main.py in PubMedCLIP with vision encoder ViT-B/32, something wrong:

-------Loading CLIP with vision encoder ViT-B/32 -------
-------Training started-------
Traceback (most recent call last):
  File "C:/Users/fhdu/PycharmProjects/PubMedCLIP/main/main.py", line 51, in <module>
    train(cfg, train_loader, val_loader, device)
  File "C:\Users\fhdu\PycharmProjects\PubMedCLIP\main\train.py", line 65, in train
    for i, (image, caption) in enumerate(train_loader):
  File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
    data = self._next_data()
  File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\fhdu\PycharmProjects\PubMedCLIP\lib\dataset\ROCOdataset.py", line 80, in __getitem__
    **image = self._load_image(index)**
  File "C:\Users\fhdu\PycharmProjects\PubMedCLIP\lib\dataset\ROCOdataset.py", line 70, in _load_image
    **image = read_image(path, mode=ImageReadMode.RGB)**
  File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torchvision\io\image.py", line 222, in read_image
    data = read_file(path)
  File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torchvision\io\image.py", line 42, in read_file
    data = torch.ops.image.read_file(path)
  File "E:\SoftwareFile\Anaconda3\envs\med_vqa\lib\site-packages\torch\_ops.py", line 63, in __getattr__
    op = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator image::read_file

Not sure the wrong version of torch leads to this issue? my version:
torch 1.10.0 , torchvision 0.11.1, according to your requirement.txt

Looking forward to your answer, thanks in advance.

VQA_RAC preprocessing dictionary dimension

Hi,
After following the instruction you provided in 'QCR_PubMedCLIP' folder, we got the following error
/content/PubMedCLIP/QCR_PubMedCLIP loading dictionary from ./data/data_rad/dictionary.pkl loading DAE image data from file: ./data/data_rad/images128x128.pkl loading CLIP image data from file: ./data/data_rad/images250x250.pkl loading DAE image data from file: ./data/data_rad/images128x128.pkl loading CLIP image data from file: ./data/data_rad/images250x250.pkl Traceback (most recent call last): File "main/main.py", line 85, in <module> question_classify.load_state_dict(pretrained_model) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1483, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for classify_model: size mismatch for w_emb.emb.weight: copying a param with shape torch.Size([1178, 300]) from checkpoint, the shape in current model is torch.Size([1260, 300]).

And we found this bug is caused by the newly generated dictionary. The size is different from your saved_model. Could you help me with it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.