hyn2028 / llm-cxr Goto Github PK
View Code? Open in Web Editor NEWOfficial code for "LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation"
Home Page: https://arxiv.org/abs/2305.11490
License: Apache License 2.0
Official code for "LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation"
Home Page: https://arxiv.org/abs/2305.11490
License: Apache License 2.0
Hello, thank you for the wonderful project.
I have a few questions. You mentioned that the training was conducted in two stages, and I'm curious if there is a significant difference in performance compared to not using this two-stage approach.
Additionally, when dividing the data into two stages, you mentioned using a larger amount of higher volume of lower quality data and a higher-quality pruned dataset.
I'm interested in understanding the criteria used to make this distinction. For example, did you make this judgment directly, or were there specific criteria involved?
Hi @hyn2028,
I found that it takes about 5 seconds to evaluate each sample by using "generate_llmcxr.py".
The test set of MIMIC-CXR includes about 3k~7k images.
It requires about 4~9 hours to inference the whole test set.
Do we have other way to speed up the inference process?
Hi @hyn2028 ,
Thank you for your amazing work! The idea is enlightening!
I am curious about when will your code and pre-trained model weights for CXR-to-report generation be released?
Best
Thank you very much for your nice work! I would like to know if the code for this project can run directly in a Linux PyTorch environment without requiring the Gradio environment.
when I downloaded and loaded the LLM pretrained model, it showed that "ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.", can you help me with this issue?
Hi @hyn2028 ,
Thank you for your amazing work! The idea is enlightening!
I wonder what model size do you use in your approach? Is it a 7B LLaMA or 13B one? I cannot find any illustration in the paper. Please correct me if I have some misunderstanding.
Best
Cannot create models for VQ-GAN,There is no information about the VQ-GAN model in taming/modules.Importing VQModel from taming.models.vqgan with error。
Hi @hyn2028,
I meet this error when I load the finetuned model.
Traceback (most recent call last):
File "test.py", line 347, in
main()
File "test.py", line 197, in main
model, tokenizer = load_model_tokenizer_for_generate_separate(args.config_path, args.model_path)
File "./llm-cxr/training/generate.py", line 57, in load_model_tokenizer_for_generate_separate
model = AutoModelForCausalLM.from_pretrained(
File "../transformers/models/auto/auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "../transformers/modeling_utils.py", line 2405, in from_pretrained
raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory
./checkpoints/llmcxr_origin_report__2023-06-19_00-13-51.
This are my saved files. Did the training code save the model?
Thank you in advance.
Thank you for your nice work! I want to ask why you choose dolly as the LLM, instead of some famous model like LLaMA2. Is there any other consideration?
Hi,
Thank you for sharing your code.
Can you provide the code for uantizing latent vector of MIMIC-CXR images?
thx!
Hello,@hyn2028
I have download your pretrained LLM weights, but when i run the scripts "python generate_llmcxr.py
--model_path ./weights/llmcxr_checkpoint-v3-1e+v2-2e.tar.gz", it shows the error that: huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/llm-cxr-main/weights/llmcxr_checkpoint-v3-1e+v2-2e.tar.gz'. Use repo_type
argument if needed.
Can you help me solve the problem?
Thanks in advance.
First of all, congratulations for the work, I believe that this network has a much higher zero-shot learning potential than the diffusion based ones.
Now the question, what would it take to generalize this network, so that it would generate images from any domain? Would it be the same process for CXR?
And, for the network to generate multiple latent spaces (e.g. VQ-VAE2), would it be simple to align them in the dataset, so that it generates both in sequence?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.