hyn2028 / llm-cxr Goto Github PK

View Code? Open in Web Editor NEW

107.0 107.0 11.0 22.7 MB

Official code for "LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation"

Home Page: https://arxiv.org/abs/2305.11490

License: Apache License 2.0

Python 97.31% Shell 2.69%

llm-cxr's People

Contributors

Stargazers

Watchers

Forkers

dorucioclea gokuljs c00cjz00 one-june kpahwa16 disi-unibo-nlp navid-ebrahimi

llm-cxr's Issues

Two Stage Tuning

Hello, thank you for the wonderful project.
I have a few questions. You mentioned that the training was conducted in two stages, and I'm curious if there is a significant difference in performance compared to not using this two-stage approach.
Additionally, when dividing the data into two stages, you mentioned using a larger amount of higher volume of lower quality data and a higher-quality pruned dataset.
I'm interested in understanding the criteria used to make this distinction. For example, did you make this judgment directly, or were there specific criteria involved?

Speed up the inference time

Hi @hyn2028,

I found that it takes about 5 seconds to evaluate each sample by using "generate_llmcxr.py".
The test set of MIMIC-CXR includes about 3k~7k images.
It requires about 4~9 hours to inference the whole test set.

Do we have other way to speed up the inference process?

Questions about Code Release for CXR-to-report Generation

Hi @hyn2028 ,

Thank you for your amazing work! The idea is enlightening!
I am curious about when will your code and pre-trained model weights for CXR-to-report generation be released?

Best

Thank you very much for your nice work! I would like to know if the code for this project can run directly in a Linux PyTorch environment without requiring the Gradio environment.

ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.

when I downloaded and loaded the LLM pretrained model, it showed that "ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.", can you help me with this issue?

Questions about the Model Size of LLM

Hi @hyn2028 ,

Thank you for your amazing work! The idea is enlightening!
I wonder what model size do you use in your approach? Is it a 7B LLaMA or 13B one? I cannot find any illustration in the paper. Please correct me if I have some misunderstanding.

Best

The code for VQ-GAN, which does not provide the

Cannot create models for VQ-GAN，There is no information about the VQ-GAN model in taming/modules.Importing VQModel from taming.models.vqgan with error。

Error about loading finetuned model

Hi @hyn2028,

I meet this error when I load the finetuned model.

Traceback (most recent call last):
File "test.py", line 347, in
main()
File "test.py", line 197, in main
model, tokenizer = load_model_tokenizer_for_generate_separate(args.config_path, args.model_path)
File "./llm-cxr/training/generate.py", line 57, in load_model_tokenizer_for_generate_separate
model = AutoModelForCausalLM.from_pretrained(
File "../transformers/models/auto/auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "../transformers/modeling_utils.py", line 2405, in from_pretrained
raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory
./checkpoints/llmcxr_origin_report__2023-06-19_00-13-51.

This are my saved files. Did the training code save the model?

Thank you in advance.

The selection of LLM

Thank you for your nice work! I want to ask why you choose dolly as the LLM, instead of some famous model like LLaMA2. Is there any other consideration?

Figures for README.md

Code for quantizing latent vector of MIMIC-CXR images

Hi,
Thank you for sharing your code.
Can you provide the code for uantizing latent vector of MIMIC-CXR images?

thx!

Question about loading pretrained LLM models

Hello,@hyn2028
I have download your pretrained LLM weights, but when i run the scripts "python generate_llmcxr.py
--model_path ./weights/llmcxr_checkpoint-v3-1e+v2-2e.tar.gz", it shows the error that: huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/llm-cxr-main/weights/llmcxr_checkpoint-v3-1e+v2-2e.tar.gz'. Use repo_type argument if needed.
Can you help me solve the problem?
Thanks in advance.

Generalize the image domain, and, generate multi latent images.

First of all, congratulations for the work, I believe that this network has a much higher zero-shot learning potential than the diffusion based ones.

Now the question, what would it take to generalize this network, so that it would generate images from any domain? Would it be the same process for CXR?
And, for the network to generate multiple latent spaces (e.g. VQ-VAE2), would it be simple to align them in the dataset, so that it generates both in sequence?