Giter Site home page Giter Site logo

hyn2028 / llm-cxr Goto Github PK

View Code? Open in Web Editor NEW
107.0 107.0 11.0 22.7 MB

Official code for "LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation"

Home Page: https://arxiv.org/abs/2305.11490

License: Apache License 2.0

Python 97.31% Shell 2.69%

llm-cxr's People

Contributors

hyn2028 avatar one-june avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

llm-cxr's Issues

Two Stage Tuning

Hello, thank you for the wonderful project.
I have a few questions. You mentioned that the training was conducted in two stages, and I'm curious if there is a significant difference in performance compared to not using this two-stage approach.
Additionally, when dividing the data into two stages, you mentioned using a larger amount of higher volume of lower quality data and a higher-quality pruned dataset.
I'm interested in understanding the criteria used to make this distinction. For example, did you make this judgment directly, or were there specific criteria involved?

Speed up the inference time

Hi @hyn2028,

I found that it takes about 5 seconds to evaluate each sample by using "generate_llmcxr.py".
The test set of MIMIC-CXR includes about 3k~7k images.
It requires about 4~9 hours to inference the whole test set.

Do we have other way to speed up the inference process?

Questions about the Model Size of LLM

Hi @hyn2028 ,

Thank you for your amazing work! The idea is enlightening!
I wonder what model size do you use in your approach? Is it a 7B LLaMA or 13B one? I cannot find any illustration in the paper. Please correct me if I have some misunderstanding.

Best

Error about loading finetuned model

Hi @hyn2028,

I meet this error when I load the finetuned model.

Traceback (most recent call last):
File "test.py", line 347, in
main()
File "test.py", line 197, in main
model, tokenizer = load_model_tokenizer_for_generate_separate(args.config_path, args.model_path)
File "./llm-cxr/training/generate.py", line 57, in load_model_tokenizer_for_generate_separate
model = AutoModelForCausalLM.from_pretrained(
File "../transformers/models/auto/auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "../transformers/modeling_utils.py", line 2405, in from_pretrained
raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory
./checkpoints/llmcxr_origin_report__2023-06-19_00-13-51.

This are my saved files. Did the training code save the model?
image

Thank you in advance.

The selection of LLM

Thank you for your nice work! I want to ask why you choose dolly as the LLM, instead of some famous model like LLaMA2. Is there any other consideration?

Question about loading pretrained LLM models

Hello,@hyn2028
I have download your pretrained LLM weights, but when i run the scripts "python generate_llmcxr.py
--model_path ./weights/llmcxr_checkpoint-v3-1e+v2-2e.tar.gz", it shows the error that: huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/llm-cxr-main/weights/llmcxr_checkpoint-v3-1e+v2-2e.tar.gz'. Use repo_type argument if needed.
Can you help me solve the problem?
Thanks in advance.

Generalize the image domain, and, generate multi latent images.

First of all, congratulations for the work, I believe that this network has a much higher zero-shot learning potential than the diffusion based ones.

Now the question, what would it take to generalize this network, so that it would generate images from any domain? Would it be the same process for CXR?
And, for the network to generate multiple latent spaces (e.g. VQ-VAE2), would it be simple to align them in the dataset, so that it generates both in sequence?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.