Giter Site home page Giter Site logo

ceruleandeep / comfyui-llava-captioner Goto Github PK

View Code? Open in Web Editor NEW
87.0 87.0 10.0 21 KB

A ComfyUI extension for chatting with your images with LLaVA. Runs locally, no external services, no filter.

License: GNU General Public License v3.0

Python 100.00%

comfyui-llava-captioner's People

Contributors

ceruleandeep avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

comfyui-llava-captioner's Issues

failing to install llama

when i run install.py - i get the following error

Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml): started
Building wheel for llama-cpp-python (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error

Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
exit code: 1

then everything after fails saying llama-cpp-python doesn't exist.

[Bug] LLaVA Captioner appears to leak VRAM

Hi - thank you for this excellent node. I use it as part of a two stage SDXL workflow where an image prompt is generated using a grammar, with the resulting image analysed by LLaVA to extract a derived prompt, which is then generated from again.

I've noticed that when running long batches of queued prompts this node appears to leak VRAM. I enclose a small sample workflow that generates a random 16x16 tile and interrogates it.

After freshly booting Comfy UI, nvidia-smi shows about 1.4gb of VRAM usage:

Fri Mar 15 19:34:42 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.23                 Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060      WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   48C    P8             25W /  170W |    1405MiB /  12288MiB |     21%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

After queuing the above workflow 30 times, VRAM usage rises to about 8gb:

Fri Mar 15 19:37:50 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.23                 Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060      WDDM  |   00000000:01:00.0  On |                  N/A |
| 30%   45C    P8             25W /  170W |    7897MiB /  12288MiB |     27%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

As you can guess, in real use this eventually starts to crowd out my SDXL checkpoints, which must switch to low-VRAM mode. Restarting Comfy releases the memory as expected.

It seems likely to me that the problem is actually in llama-cpp-python, and there have been a number of threads online about potential vram leaks there - I'm not able to conclusively work out if this is one of them. One suggestion online has been to offload the inference task into a separate process (using the multiprocessing module or a separate python script, I guess) but this looked non-trivial and I wanted to get your thoughts on it before attempting to build a solution and PR it.

In the meantime, is there some way to force llama-cpp-python to run this on cpu only without rebuilding the library, and to make that a toggle option on the node itself?

No module named 'llama_cpp'

After installing and moving the models to the right folder I still get this when starting Comfyui:

Traceback (most recent call last):
  File "D:\AI-Programmer\ComfyUI\ComfyUI\nodes.py", line 1813, in load_custom_node
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "D:\AI-Programmer\ComfyUI\ComfyUI\custom_nodes\ComfyUI-LLaVA-Captioner\__init__.py", line 1, in <module>
    from .llava import NODE_CLASS_MAPPINGS, NODE_DISPLAY_NAME_MAPPINGS
  File "D:\AI-Programmer\ComfyUI\ComfyUI\custom_nodes\ComfyUI-LLaVA-Captioner\llava.py", line 12, in <module>
    from llama_cpp import Llama
ModuleNotFoundError: No module named 'llama_cpp'

Cannot import D:\AI-Programmer\ComfyUI\ComfyUI\custom_nodes\ComfyUI-LLaVA-Captioner module for custom nodes: No module named 'llama_cpp'

error, failed to create llama_context

I'm not too familiar with python, but looks like a async/loading issue?
It was working and then suddenly it wasn't.
I am using two nodes in my project.
the error disappears if I reload the server - I run one prompt successfully (at least, it is variable, but less than five each time), and then the error return.

  File "/home/altruios/AI/ComfyUI/execution.py", line 155, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/altruios/AI/ComfyUI/execution.py", line 85, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/altruios/AI/ComfyUI/execution.py", line 78, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/altruios/AI/ComfyUI/custom_nodes/ComfyUI-LLaVA-Captioner/llava.py", line 229, in caption
    llava = wait_for_async(lambda: get_llava(model, mm_proj, -1))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/altruios/AI/ComfyUI/custom_nodes/ComfyUI-LLaVA-Captioner/llava.py", line 168, in wait_for_async
    loop.run_until_complete(run_async())
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/altruios/AI/ComfyUI/custom_nodes/ComfyUI-LLaVA-Captioner/llava.py", line 158, in run_async
    r = await async_fn()
        ^^^^^^^^^^^^^^^^
  File "/home/altruios/AI/ComfyUI/custom_nodes/ComfyUI-LLaVA-Captioner/llava.py", line 88, in get_llava
    llm = Llama(
          ^^^^^^
  File "/home/altruios/AI/ComfyUI/lib/python3.11/site-packages/llama_cpp/llama.py", line 325, in __init__
    self._ctx = _LlamaContext(
                ^^^^^^^^^^^^^^
  File "/home/altruios/AI/ComfyUI/lib/python3.11/site-packages/llama_cpp/_internals.py", line 265, in __init__
    raise ValueError("Failed to create llama_context")
ValueError: Failed to create llama_context

Error when using load img list

Error occurred when executing LlavaCaptioner: Failed to create llama_context File "D:\StableSwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\StableSwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\StableSwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\StableSwarmUI\dlbackend\comfy\ComfyUI\custom_nodes\ComfyUI-LLaVA-Captioner\llava.py", line 229, in caption llava = wait_for_async(lambda: get_llava(model, mm_proj, -1)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\StableSwarmUI\dlbackend\comfy\ComfyUI\custom_nodes\ComfyUI-LLaVA-Captioner\llava.py", line 168, in wait_for_async loop.run_until_complete(run_async()) File "asyncio\base_events.py", line 653, in run_until_complete File "D:\StableSwarmUI\dlbackend\comfy\ComfyUI\custom_nodes\ComfyUI-LLaVA-Captioner\llava.py", line 158, in run_async r = await async_fn() ^^^^^^^^^^^^^^^^ File "D:\StableSwarmUI\dlbackend\comfy\ComfyUI\custom_nodes\ComfyUI-LLaVA-Captioner\llava.py", line 88, in get_llava llm = Llama( ^^^^^^ File "D:\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\llama_cpp\llama.py", line 327, in init self._ctx = _LlamaContext( ^^^^^^^^^^^^^^ File "D:\StableSwarmUI\dlbackend\comfy\python_embeded\Lib\site-packages\llama_cpp_internals.py", line 265, in init raise ValueError("Failed to create llama_context")

It looks like this workflow has a contextual relationship between the images before and after. So only manual single image processing?

Copied models to ComfyUI\models\llama but they are not found

Copied models to the indicated folder models\llama but when ComfyUI loads, the models could not be found in the nodes, error message:

Prompt outputs failed validation: Required input is missing: model
Required input is missing: mm_proj
Required input is missing: model
Required input is missing: mm_proj
Required input is missing: model
Required input is missing: mm_proj
LlavaCaptioner:
- Required input is missing: model
- Required input is missing: mm_proj
LlavaCaptioner:
- Required input is missing: model
- Required input is missing: mm_proj
LlavaCaptioner:
- Required input is missing: model
- Required input is missing: mm_proj

Llava Next

Now that 1.6 is out I have (as a non programmer) been attempting to update to Next, but I have made a right mess of it
https://github.com/jjohare/ComfyUI-LLaVA-Next/blob/main/llava_next.py

I had a version that made a spirited attempt at loading GGUF conversions of 1.6 from huggingface but that hit UTF8 parsing errors.

I'm giving up here but reporting what progress I made in case it helps anyone.

Feature request: Output list of strings

bilde
Love this node!

It would be a great boon if when captioning a list of images, the node would output a list of strings corresponding to each image.

The strings are as of now concatenated into one multiline string. This results in one single (long) txt file instead of multiple files when used together with the LoRA Caption nodes.

Feature Request

is it possible to have a toggle to keep the model loaded into memory? for captioning large amounts of data

model reading question

program read ggml file every time when i execute the workflow, can this behave like image gen, which don't repeatly read model

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.