autodistill / autodistill-llava Goto Github PK

View Code? Open in Web Editor NEW

6.0 5.0 2.0 18 KB

LLaVA base model for use with Autodistill.

Home Page: https://docs.autodistill.com

License: Apache License 2.0

Makefile 10.05% Python 89.95%

autodistill computer-vision llava multimodal-llm

autodistill-llava's Introduction

notebooks | inference | autodistill | collect

Autodistill uses big, slower foundation models to train small, faster supervised models. Using autodistill, you can go from unlabeled images to inference on a custom model running at the edge with no human intervention in between.

Tip

You can use Autodistill on your own hardware, or use the Roboflow hosted version of Autodistill to label images in the cloud.

Currently, autodistill supports vision tasks like object detection and instance segmentation, but in the future it can be expanded to support language (and other) models.

🔗 Quicklinks

Tutorial	Docs	Supported Models	Contribute

👀 Example Output

Here are example predictions of a Target Model detecting milk bottles and bottlecaps after being trained on an auto-labeled dataset using Autodistill (see the Autodistill YouTube video for a full walkthrough):

🚀 Features

🔌 Pluggable interface to connect models together
🤖 Automatically label datasets
🐰 Train fast supervised models
🔒 Own your model
🚀 Deploy distilled models to the cloud or the edge

📚 Basic Concepts

To use autodistill, you input unlabeled data into a Base Model which uses an Ontology to label a Dataset that is used to train a Target Model which outputs a Distilled Model fine-tuned to perform a specific Task.

Autodistill defines several basic primitives:

Task - A Task defines what a Target Model will predict. The Task for each component (Base Model, Ontology, and Target Model) of an autodistill pipeline must match for them to be compatible with each other. Object Detection and Instance Segmentation are currently supported through the detection task. classification support will be added soon.
Base Model - A Base Model is a large foundation model that knows a lot about a lot. Base models are often multimodal and can perform many tasks. They're large, slow, and expensive. Examples of Base Models are GroundedSAM and GPT-4's upcoming multimodal variant. We use a Base Model (along with unlabeled input data and an Ontology) to create a Dataset.
Ontology - an Ontology defines how your Base Model is prompted, what your Dataset will describe, and what your Target Model will predict. A simple Ontology is the CaptionOntology which prompts a Base Model with text captions and maps them to class names. Other Ontologies may, for instance, use a CLIP vector or example images instead of a text caption.
Dataset - a Dataset is a set of auto-labeled data that can be used to train a Target Model. It is the output generated by a Base Model.
Target Model - a Target Model is a supervised model that consumes a Dataset and outputs a distilled model that is ready for deployment. Target Models are usually small, fast, and fine-tuned to perform a specific task very well (but they don't generalize well beyond the information described in their Dataset). Examples of Target Models are YOLOv8 and DETR.
Distilled Model - a Distilled Model is the final output of the autodistill process; it's a set of weights fine-tuned for your task that can be deployed to get predictions.

💡 Theory and Limitations

Human labeling is one of the biggest barriers to broad adoption of computer vision. It can take thousands of hours to craft a dataset suitable for training a production model. The process of distillation for training supervised models is not new, in fact, traditional human labeling is just another form of distillation from an extremely capable Base Model (the human brain 🧠).

Foundation models know a lot about a lot, but for production we need models that know a lot about a little.

As foundation models get better and better they will increasingly be able to augment or replace humans in the labeling process. We need tools for steering, utilizing, and comparing these models. Additionally, these foundation models are big, expensive, and often gated behind private APIs. For many production use-cases, we need models that can run cheaply and in realtime at the edge.

Autodistill's Base Models can already create datasets for many common use-cases (and through creative prompting and few-shotting we can expand their utility to many more), but they're not perfect yet. There's still a lot of work to do; this is just the beginning and we'd love your help testing and expanding the capabilities of the system!

💿 Installation

Autodistill is modular. You'll need to install the autodistill package (which defines the interfaces for the above concepts) along with Base Model and Target Model plugins (which implement specific models).

By packaging these separately as plugins, dependency and licensing incompatibilities are minimized and new models can be implemented and maintained by anyone.

Example:

pip install autodistill autodistill-grounded-sam autodistill-yolov8

Install from source

You can also clone the project from GitHub for local development:

git clone https://github.com/roboflow/autodistill
cd autodistill
pip install -e .

Additional Base and Target models are enumerated below.

🚀 Quickstart

See the demo Notebook for a quick introduction to autodistill. This notebook walks through building a milk container detection model with no labeling.

Below, we have condensed key parts of the notebook for a quick introduction to autodistill.

You can also run Autodistill in one command. First, install autodistill:

pip install autodistill

Then, run:

autodistill images --base="grounding_dino" --target="yolov8" --ontology '{"prompt": "label"}' --output="./dataset"

This command will label all images in a directory called images with Grounding DINO and use the labeled images to train a YOLOv8 model. Grounding DINO will label all images with the "prompt" and save the label as the "label". You can specify as many prompts and labels as you want. The resulting dataset will be saved in a folder called dataset.

Install Packages

For this example, we'll show how to distill GroundedSAM into a small YOLOv8 model using autodistill-grounded-sam and autodistill-yolov8.

pip install autodistill autodistill-grounded-sam autodistill-yolov8

Distill a Model

from autodistill_grounded_sam import GroundedSAM
from autodistill.detection import CaptionOntology
from autodistill_yolov8 import YOLOv8

# define an ontology to map class names to our GroundingDINO prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated annotations
base_model = GroundedSAM(ontology=CaptionOntology({"shipping container": "container"}))

# label all images in a folder called `context_images`
base_model.label(
  input_folder="./images",
  output_folder="./dataset"
)

target_model = YOLOv8("yolov8n.pt")
target_model.train("./dataset/data.yaml", epochs=200)

# run inference on the new model
pred = target_model.predict("./dataset/valid/your-image.jpg", confidence=0.5)
print(pred)

# optional: upload your model to Roboflow for deployment
from roboflow import Roboflow

rf = Roboflow(api_key="API_KEY")
project = rf.workspace().project("PROJECT_ID")
project.version(DATASET_VERSION).deploy(model_type="yolov8", model_path=f"./runs/detect/train/")

Visualize Predictions

To plot the annotations for a single image using autodistill, you can use the code below. This code is helpful to visualize the annotations generated by your base model (i.e. GroundedSAM) and the results from your target model (i.e. YOLOv8).

import supervision as sv
import cv2

img_path = "./images/your-image.jpeg"

image = cv2.imread(img_path)

detections = base_model.predict(img_path)
# annotate image with detections
box_annotator = sv.BoxAnnotator()

labels = [
    f"{base_model.ontology.classes()[class_id]} {confidence:0.2f}"
    for _, _, confidence, class_id, _, _ in detections
]

annotated_frame = box_annotator.annotate(
    scene=image.copy(), detections=detections, labels=labels
)

sv.plot_image(annotated_frame, (16, 16))

📍 Available Models

Our goal is for autodistill to support using all foundation models as Base Models and most SOTA supervised models as Target Models. We focused on object detection and segmentation tasks first but plan to launch classification support soon! In the future, we hope autodistill will also be used for models beyond computer vision.

✅ - complete (click row/column header to go to repo)
🚧 - work in progress

object detection

base / target	YOLOv8	YOLO-NAS	YOLOv5	DETR	YOLOv6
DETIC	✅	✅	✅	✅	🚧
GroundedSAM	✅	✅	✅	✅	🚧
GroundingDINO	✅	✅	✅	✅	🚧
OWL-ViT	✅	✅	✅	✅	🚧
SAM-CLIP	✅	✅	✅	✅	🚧
LLaVA-1.5	✅	✅	✅	✅	🚧
Kosmos-2	✅	✅	✅	✅	🚧
OWLv2	✅	✅	✅	✅	🚧
Roboflow Universe Models (50k+ pre-trained models)	✅	✅	✅	✅	🚧
CoDet	✅	✅	✅	✅	🚧
Azure Custom Vision	✅	✅	✅	✅	🚧
AWS Rekognition	✅	✅	✅	✅	🚧
Google Vision	✅	✅	✅	✅	🚧

instance segmentation

base / target	YOLOv8	YOLO-NAS	YOLOv5
GroundedSAM	✅	🚧	🚧
SAM-CLIP	✅	🚧	🚧
SegGPT	✅	🚧	🚧
FastSAM	🚧	🚧	🚧

classification

base / target	ViT	YOLOv8	YOLOv5
CLIP	✅	✅	🚧
MetaCLIP	✅	✅	🚧
DINOv2	✅	✅	🚧
BLIP	✅	✅	🚧
ALBEF	✅	✅	🚧
FastViT	✅	✅	🚧
AltCLIP	✅	✅	🚧
EvaCLIP (contributed by a community member)	✅	✅	🚧
Fuyu	🚧	🚧	🚧
Open Flamingo	🚧	🚧	🚧
GPT-4
PaLM-2

Roboflow Model Deployment Support

You can optionally deploy some Target Models trained using Autodistill on Roboflow. Deploying on Roboflow allows you to use a range of concise SDKs for using your model on the edge, from roboflow.js for web deployment to NVIDIA Jetson devices.

The following Autodistill Target Models are supported by Roboflow for deployment:

model name	Supported?
YOLOv8 Object Detection	✅
YOLOv8 Instance Segmentation	✅
YOLOv5 Object Detection	✅
YOLOv5 Instance Segmentation	✅
YOLOv8 Classification

🎬 Video Guides

Autodistill: Train YOLOv8 with ZERO Annotations

Published: 8 June 2023

In this video, we will show you how to use a new library to train a YOLOv8 model to detect bottles moving on a conveyor line. Yes, that's right - zero annotation hours are required! We dive deep into Autodistill's functionality, covering topics from setting up your Python environment and preparing your images, to the thrilling automatic annotation of images.

💡 Community Resources

Distill Large Vision Models into Smaller, Efficient Models with Autodistill: Announcement post with written guide on how to use Autodistill
Comparing AI-Labeled Data to Human-Labeled Data: A qualitative evaluation of Grounding DINO used with Autodistill across various tasks and domains.
How to Evaluate Autodistill Prompts with CVevals: Evaluate Autodistill prompts.
Autodistill: Label and Train a Computer Vision Model in Under 20 Minutes: Building a model to detect planes in under 20 minutes.
Comparing AI-Labeled Data to Human-Labeled Data: Explore the strengths and limitations of a base model used with Autoditsill.
Train an Image Classification Model with No Labeling: Use Grounded SAM to automatically label images for training an Ultralytics YOLOv8 classification model.
Train a Segmentation Model with No Labeling: Use CLIP to automatically label images for training an Ultralytics YOLOv8 segmentation model.
File a PR to add your own resources here!

🗺️ Roadmap

Apart from adding new models, there are several areas we plan to explore with autodistill including:

💡 Ontology creation & prompt engineering
👩‍💻 Human in the loop support
🤔 Model evaluation
🔄 Active learning
💬 Language tasks

🏆 Contributing

We love your input! Please see our contributing guide to get started. Thank you 🙏 to all our contributors!

👩‍⚖️ License

The autodistill package is licensed under an Apache 2.0. Each Base or Target model plugin may use its own license corresponding with the license of its underlying model. Please refer to the license in each plugin repo for more information.

Frequently Asked Questions ❓

What causes the `PytorchStreamReader failed reading zip archive: failed finding central directory` error?

This error is caused when PyTorch cannot load the model weights for a model. Go into the ~/.cache/autodistill directory and delete the folder associated with the model you are trying to load. Then, run your code again. The model weights will be downloaded from scratch. Leave the installation process uninterrupted.

💻 explore more Roboflow open source projects

Project	Description
supervision	General-purpose utilities for use in computer vision projects, from predictions filtering and display to object tracking to model evaluation.
Autodistill (this project)	Automatically label images for use in training computer vision models.
Inference	An easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Notebooks	Tutorials for computer vision tasks, from training state-of-the-art models to tracking objects to counting objects in a zone.
Collect	Automated, intelligent data collection powered by CLIP.

autodistill-llava's People

Contributors

Stargazers

Watchers

Forkers

0asa

autodistill-llava's Issues

Errors on Apple M1

Hello,

While running the code on Applle M1, I get the following errors:

/Users/csv610/Projects/CompVis/ObjectDetection/AutoDistill/autodistillenv/lib/python3.11/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
Traceback (most recent call last):
File "/Users/csv610/Projects/CompVis/ObjectDetection/AutoDistill/LLAVA-1.5/autodistill-llava/genlabels.py", line 2, in
from autodistill_llava import LLaVA
File "/Users/csv610/Projects/CompVis/ObjectDetection/AutoDistill/LLAVA-1.5/autodistill-llava/autodistill_llava/init.py", line 1, in
from autodistill_llava.model import LLaVA
File "/Users/csv610/Projects/CompVis/ObjectDetection/AutoDistill/LLAVA-1.5/autodistill-llava/autodistill_llava/model.py", line 46, in
from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN
File "/Users/csv610/.autodistill/LLaVA/llava/init.py", line 1, in
from .model import LlavaLlamaForCausalLM
File "/Users/csv610/.autodistill/LLaVA/llava/model/init.py", line 2, in
from .language_model.llava_mpt import LlavaMPTForCausalLM, LlavaMPTConfig
File "/Users/csv610/.autodistill/LLaVA/llava/model/language_model/llava_mpt.py", line 26, in
from .mpt.modeling_mpt import MPTConfig, MPTForCausalLM, MPTModel
File "/Users/csv610/.autodistill/LLaVA/llava/model/language_model/mpt/modeling_mpt.py", line 19, in
from .hf_prefixlm_converter import add_bidirectional_mask_if_missing, convert_hf_causal_lm_to_prefix_lm
File "/Users/csv610/.autodistill/LLaVA/llava/model/language_model/mpt/hf_prefixlm_converter.py", line 15, in
from transformers.models.bloom.modeling_bloom import _expand_mask as _expand_mask_bloom
ImportError: cannot import name '_expand_mask' from 'transformers.models.bloom.modeling_bloom' (/Users/csv610/Projects/CompVis/ObjectDetection/AutoDistill/autodistillenv/lib/python3.11/site-packages/transformers/models/bloom/modeling_bloom.py)

Error in Kaggle

Hello!
I wanna test LLaVa for auto distillation, but I got this error:

[TypeError: 'NoneType' object is not subscriptable](https://pytorch.org/docs/master/notes/extending.func.html%3C/span%3E%3Cspan)

Minimal code for implement the error:

from autodistill.detection import CaptionOntology

ontology = CaptionOntology({
    "car": "small_car",
    "motorbike": "bike",
    "bus": "bus"
})

from autodistill_llava import LLaVA

base_model = LLaVA(ontology=ontology)
dataset = base_model.label(
    input_folder='/kaggle/input/distillation-test/traffic_dataset/test',
    extension=".jpg",
    output_folder='/kaggle/working/images'
)

Full error:

TypeError                                 Traceback (most recent call last)
Cell In[7], line 4
      1 from autodistill_llava import LLaVA
      3 base_model = LLaVA(ontology=ontology)
----> 4 dataset = base_model.label(
      5     input_folder='/kaggle/input/distillation-test/traffic_dataset/test',
      6     extension=".jpg",
      7     output_folder='/kaggle/working/images'
      8 )

File /opt/conda/lib/python3.10/site-packages/autodistill/detection/detection_base_model.py:52, in DetectionBaseModel.label(self, input_folder, extension, output_folder, human_in_the_loop, roboflow_project, roboflow_tags)
     50     f_path_short = os.path.basename(f_path)
     51     images_map[f_path_short] = image.copy()
---> 52     detections = self.predict(f_path)
     53     detections_map[f_path_short] = detections
     55 dataset = sv.DetectionDataset(
     56     self.ontology.classes(), images_map, detections_map
     57 )

File /opt/conda/lib/python3.10/site-packages/autodistill_llava/model.py:140, in LLaVA.predict(self, input)
    137 streamer = TextStreamer(self.tokenizer, skip_prompt=True, skip_special_tokens=True)
    139 with torch.inference_mode():
--> 140     output_ids = self.model.generate(
    141         input_ids,
    142         images=image_tensor,
    143         do_sample=True,
    144         temperature=0.2,
    145         max_new_tokens=512,
    146         streamer=streamer,
    147         use_cache=True,
    148         stopping_criteria=[stopping_criteria])
    150 outputs = self.tokenizer.decode(output_ids[0, input_ids.shape[1]:]).strip()
    152 self.conv.messages[-1][-1] = outputs

File /opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py:1588, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, **kwargs)
   1580     input_ids, model_kwargs = self._expand_inputs_for_generation(
   1581         input_ids=input_ids,
   1582         expand_size=generation_config.num_return_sequences,
   1583         is_encoder_decoder=self.config.is_encoder_decoder,
   1584         **model_kwargs,
   1585     )
   1587     # 13. run sample
-> 1588     return self.sample(
   1589         input_ids,
   1590         logits_processor=logits_processor,
   1591         logits_warper=logits_warper,
   1592         stopping_criteria=stopping_criteria,
   1593         pad_token_id=generation_config.pad_token_id,
   1594         eos_token_id=generation_config.eos_token_id,
   1595         output_scores=generation_config.output_scores,
   1596         return_dict_in_generate=generation_config.return_dict_in_generate,
   1597         synced_gpus=synced_gpus,
   1598         streamer=streamer,
   1599         **model_kwargs,
   1600     )
   1602 elif is_beam_gen_mode:
   1603     if generation_config.num_return_sequences > generation_config.num_beams:

File /opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py:2642, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
   2639 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
   2641 # forward pass to get next token
-> 2642 outputs = self(
   2643     **model_inputs,
   2644     return_dict=True,
   2645     output_attentions=output_attentions,
   2646     output_hidden_states=output_hidden_states,
   2647 )
   2649 if synced_gpus and this_peer_finished:
   2650     continue  # don't waste resources running the code we don't need

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
    163         output = old_forward(*args, **kwargs)
    164 else:
--> 165     output = old_forward(*args, **kwargs)
    166 return module._hf_hook.post_forward(module, output)

File ~/.autodistill/LLaVA/llava/model/language_model/llava_llama.py:79, in LlavaLlamaForCausalLM.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, images, return_dict)
     56 def forward(
     57     self,
     58     input_ids: torch.LongTensor = None,
   (...)
     68     return_dict: Optional[bool] = None,
     69 ) -> Union[Tuple, CausalLMOutputWithPast]:
     71     if inputs_embeds is None:
     72         (
     73             input_ids,
     74             position_ids,
     75             attention_mask,
     76             past_key_values,
     77             inputs_embeds,
     78             labels
---> 79         ) = self.prepare_inputs_labels_for_multimodal(
     80             input_ids,
     81             position_ids,
     82             attention_mask,
     83             past_key_values,
     84             labels,
     85             images
     86         )
     88     return super().forward(
     89         input_ids=input_ids,
     90         attention_mask=attention_mask,
   (...)
     98         return_dict=return_dict
     99     )

File ~/.autodistill/LLaVA/llava/model/llava_arch.py:121, in LlavaMetaForCausalLM.prepare_inputs_labels_for_multimodal(self, input_ids, position_ids, attention_mask, past_key_values, labels, images)
    119     image_features = [x.flatten(0, 1).to(self.device) for x in image_features]
    120 else:
--> 121     image_features = self.encode_images(images).to(self.device)
    123 # TODO: image start / end is not implemented here to support pretraining.
    124 if getattr(self.config, 'tune_mm_mlp_adapter', False) and getattr(self.config, 'mm_use_im_start_end', False):

File ~/.autodistill/LLaVA/llava/model/llava_arch.py:96, in LlavaMetaForCausalLM.encode_images(self, images)
     94 def encode_images(self, images):
     95     image_features = self.get_model().get_vision_tower()(images)
---> 96     image_features = self.get_model().mm_projector(image_features)
     97     return image_features

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
    163         output = old_forward(*args, **kwargs)
    164 else:
--> 165     output = old_forward(*args, **kwargs)
    166 return module._hf_hook.post_forward(module, output)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py:217, in Sequential.forward(self, input)
    215 def forward(self, input):
    216     for module in self:
--> 217         input = module(input)
    218     return input

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
    163         output = old_forward(*args, **kwargs)
    164 else:
--> 165     output = old_forward(*args, **kwargs)
    166 return module._hf_hook.post_forward(module, output)

File /opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py:441, in Linear8bitLt.forward(self, x)
    438 if self.bias is not None and self.bias.dtype != x.dtype:
    439     self.bias.data = self.bias.data.to(x.dtype)
--> 441 out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
    443 if not self.state.has_fp16_weights:
    444     if self.state.CB is not None and self.state.CxB is not None:
    445         # we converted 8-bit row major to turing/ampere format in the first inference pass
    446         # we no longer need the row-major weight

File /opt/conda/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:563, in matmul(A, B, out, state, threshold, bias)
    561 if threshold > 0.0:
    562     state.threshold = threshold
--> 563 return MatMul8bitLt.apply(A, B, out, bias, state)

File /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
    503 if not torch._C._are_functorch_transforms_active():
    504     # See NOTE: [functorch vjp and autograd interaction]
    505     args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506     return super().apply(*args, **kwargs)  # type: ignore[misc]
    508 if cls.setup_context == _SingleLevelFunction.setup_context:
    509     raise RuntimeError(
    510         'In order to use an autograd.Function with functorch transforms '
    511         '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
    512         'staticmethod. For more details, please see '
    513         '[https://pytorch.org/docs/master/notes/extending.func.html](https://pytorch.org/docs/master/notes/extending.func.html%3C/span%3E%3Cspan) style="color:rgb(175,0,0)">')

File /opt/conda/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:384, in MatMul8bitLt.forward(ctx, A, B, out, bias, state)
    382     outliers = F.extract_outliers(state.CxB, state.SB, state.idx.int())
    383 else:
--> 384     outliers = state.CB[:, state.idx.long()].clone()
    386 state.subB = (outliers * state.SCB.view(-1, 1) / 127.0).t().contiguous().to(A.dtype)
    387 CA[:, state.idx.long()] = 0

TypeError: 'NoneType' object is not subscriptable

Running on M1

Hello,

I get the following error on the Apple M1. I tried to change fp16 to f32 and changed cuda to CPU() in the code.
File "/Users/Projects/CompVis/ObjectDetection/AutoDistill/autodistillenv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

xyxy coordinates are not absolute

First of all, thanks for creating the repo!

While playing around with autodistill I noticed that llava always returns relative coordinates which can't be displayed by supervision out of the box. I guess these just need to be converted to absolute values.

This should do the job:

image = Image.open(SAMPLE_IMAGE)

# # Transform detection bounding boxes from percentage to absolute coordinates
# for detection in results.xyxy:
#     detection[0] *= image.width
#     detection[1] *= image.height
#     detection[2] *= image.width
#     detection[3] *= image.height

autodistill / autodistill-llava Goto Github PK

autodistill-llava's Introduction

🔗 Quicklinks

👀 Example Output

🚀 Features

📚 Basic Concepts

💡 Theory and Limitations

💿 Installation

🚀 Quickstart

Install Packages

Distill a Model

📍 Available Models

object detection

instance segmentation

classification

Roboflow Model Deployment Support

🎬 Video Guides

💡 Community Resources

🗺️ Roadmap

🏆 Contributing

👩‍⚖️ License

Frequently Asked Questions ❓

What causes the PytorchStreamReader failed reading zip archive: failed finding central directory error?

💻 explore more Roboflow open source projects

autodistill-llava's People

Contributors

Stargazers

Watchers

Forkers

autodistill-llava's Issues

Recommend Projects

Recommend Topics

Recommend Org

What causes the `PytorchStreamReader failed reading zip archive: failed finding central directory` error?