Giter Site home page Giter Site logo

sdcat's Introduction

MBARI semantic-release License Python

sdcat

Sliced Detection and Clustering Analysis Toolkit

This repository processes images using a sliced detection and clustering workflow. If your images look something like the image below, and you want to detect objects in the images, and optionally cluster the detections, then this repository may be useful to you. The repository is designed to be run from the command line, and can be run in a Docker container, without or with a GPU (recommended).


Detection

Detection can be done with a fine-grained saliency-based detection model, and/or one the following models run with the SAHI algorithm. Both detections algorithms are run by default and combined to produce the final detections.

Model Description
yolov8s YOLOv8s model from Ultralytics
hustvl/yolos-small YOLOS model a Vision Transformer (ViT)
hustvl/yolos-tiny YOLOS model a Vision Transformer (ViT)
MBARI/megamidwater (default) MBARI midwater YOLOv5x for general detection in midwater images
MBARI/uav-yolov5 MBARI UAV YOLOv5x for general detection in UAV images
FathomNet/MBARI-315k-yolov5 MBARI YOLOv5x for general detection in benthic images

To skip saliency detection, use the --skip-saliency option.

sdcat detect --skip-saliency --image-dir <image-dir> --save-dir <save-dir> --model <model> --slice-size-width 900 --slice-size-height 900

To skip using the SAHI algorithm, use --skip-sahi.

sdcat detect --skip-sahi --image-dir <image-dir> --save-dir <save-dir> --model <model> --slice-size-width 900 --slice-size-height 900

ViTS + HDBSCAN Clustering

Once the detections are generated, the detections can be clustered. Alternatively, detections can be clustered from a collection of images by providing the detections in a folder with the roi option.

sdcat cluster roi --roi <roi> --save-dir <save-dir> --model <model> 

The clustering is done with a Vision Transformer (ViT) model, and a cosine similarity metric with the HDBSCAN algorithm. The ViT model is used to generate embeddings for the detections, and the HDBSCAN algorithm is used to cluster the detections. What is an embedding? An embedding is a vector representation of an object in an image.

The defaults are set to produce fine-grained clusters, but the parameters can be adjusted to produce coarser clusters. The algorithm workflow looks like this:

Vision Transformer (ViT) Models Description
google/vit-base-patch16-224(default) 16 block size trained on ImageNet21k with 21k classes
facebook/dino-vits8 trained on ImageNet which contains 1.3 M images with labels from 1000 classes
facebook/dino-vits16 trained on ImageNet which contains 1.3 M images with labels from 1000 classes

Smaller block_size means more patches and more accurate fine-grained clustering on smaller objects, so ViTS models with 8 block size are recommended for fine-grained clustering on small objects, and 16 is recommended for coarser clustering on larger objects. We recommend running with multiple models to see which model works best for your data, and to experiment with the --min_samples and --min-cluster-size options to get good clustering results.

Installation

Pip install the sdcat package with:

pip install sdcat

Alternatively, Docker can be used to run the code. A pre-built docker image is available at Docker Hub with the latest version of the code.

Detection

docker run -it -v $(pwd):/data mbari/sdcat detect --image-dir /data/images --save-dir /data/detections --model MBARI-org/uav-yolov5

Followed by clustering

docker run -it -v $(pwd):/data mbari/sdcat cluster detections --det-dir /data/detections/ --save-dir /data/detections --model MBARI-org/uav-yolov5

A GPU is recommended for clustering and detection. If you don't have a GPU, you can still run the code, but it will be slower. If running on a CPU, multiple cores are recommended and will speed up processing.

docker run -it --gpus all -v $(pwd):/data mbari/sdcat:cuda124 detect --image-dir /data/images --save-dir /data/detections --model MBARI-org/uav-yolov5

Commands

To get all options available, use the --help option. For example:

sdcat --help

which will print out the following:

Usage: sdcat [OPTIONS] COMMAND [ARGS]...

  Process images from a command line.

Options:
  -V, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  cluster  Cluster detections.
  detect   Detect objects in images

To get details on a particular command, use the --help option with the command. For example, with the cluster command:

 sdcat  cluster --help 

which will print out the following:

Usage: sdcat cluster [OPTIONS] COMMAND [ARGS]...

  Commands related to clustering images

Options:
  -h, --help  Show this message and exit.

Commands:
  detections  Cluster detections.
  roi         Cluster roi.

File organization

The sdcat toolkit generates data in the following folders. Here, we assume both detection and clustering is output to the same root folder.:

/data/20230504-MBARI/
└── detections
    └── hustvl
        └── yolos-small                         # The model used to generate the detections
            ├── det_raw                         # The raw detections from the model
            │   └── csv                    
            │       ├── DSC01833.csv
            │       ├── DSC01859.csv
            │       ├── DSC01861.csv
            │       └── DSC01922.csv
            ├── det_filtered                    # The filtered detections from the model
            ├── det_filtered_clustered          # Clustered detections from the model
                ├── crops                       # Crops of the detections 
                ├── dino_vits8...date           # The clustering results - one folder per each run of the clustering algorithm
                ├── dino_vits8..exemplars.csv   # Exemplar embeddings - examples with the highest cosine similarity within a cluster
                ├── dino_vits8..detections.csv  # The detections with the cluster id
            ├── stats.txt                       # Statistics of the detections
            └── vizresults                      # Visualizations of the detections (boxes overlaid on images)
                ├── DSC01833.jpg
                ├── DSC01859.jpg
                ├── DSC01861.jpg
                └── DSC01922.jpg

Process images creating bounding box detections with the YOLOv5 model.

The YOLOv5s model is not as accurate as other models, but is fast and good for detecting larger objects in images, and good for experiments and quick results. Slice size is the size of the detection window. The default is to allow the SAHI algorithm to determine the slice size; a smaller slice size will take longer to process.

sdcat detect --image-dir <image-dir> --save-dir <save-dir> --model yolov5s --slice-size-width 900 --slice-size-height 900

Cluster detections from the YOLOv5 model

Cluster the detections from the YOLOv5 model. The detections are clustered using cosine similarity and embedding features from a FaceBook Vision Transformer (ViT) model.

sdcat cluster --det-dir <det-dir> --save-dir <save-dir> --model yolov5s

Related work

sdcat's People

Contributors

danellecline avatar semantic-release-bot avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sdcat's Issues

Automated HDBSCAN parameter selection

Picking parameters can be challenging.

This paper has an interesting automated choice for DBSCAN.

Something like the following could be used to find epsilon for a given minpts/k (untested)

def calculate_eps_minpts(data, k):
    # Fit NearestNeighbors model
    neighbors = NearestNeighbors(n_neighbors=k)
    neighbors_fit = neighbors.fit(data)

    # Compute the k-distance for each point
    distances, indices = neighbors_fit.kneighbors(data)

    # Take the k-th nearest distance (i.e., the k-distance)
    k_distances = distances[:, k - 1]

    # Sort the k-distances in ascending order
    k_distances = np.sort(k_distances)

    # Plotting the k-distances to visualize the knee
    plt.plot(k_distances)
    plt.xlabel("Points sorted by distance")
    plt.ylabel(f"Distance to {k}-th nearest neighbor")
    plt.title(f"{k}-distance Graph")
    plt.show()

    # Use KneeLocator to find the knee point
    kneedle = KneeLocator(range(len(k_distances)), k_distances, curve="convex", direction="increasing")

    # The knee point corresponds to the optimal epsilon
    eps = k_distances[kneedle.knee]

    return eps

pip install

We must upgrade the build to a pip install to ease workflow integration. A poetry build is probably the easiest path to do this.

Propose mbari package name sdcat

Zero cluster fails

Getting the error when clustering results return zero clusters. This can occasionally happen with, e.g., rois with just a few examples, i.e. the rare classes.

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Relevant stack dump

File "/venv/lib/python3.10/site-packages/sdcat/cluster/cluster.py", line 164, in _run_hdbscan_assign
    similarity_scores = cosine_similarity(image_emb[i].reshape(1, -1), exemplar_emb)

hdbscan plots lambda errors

From @flecaros-mbari

 /usr/local/lib/python3.10/dist-packages/hdbscan/plots.py:383: UserWarning: Infinite lambda values encountered in chosen clusters. This might be due to duplicates in the data.
  warn('Infinite lambda values encountered in chosen clusters.'

03/11/2024 19:07:13 - ERROR - sdcat -   Exiting. Error: Linkage 'Z' contains negative distances.

Add support to cluster ROIs

In cases where the regions of interest (ROIs) are already determined, add support for skipping the detection step.

Handl bad image crops

Zero-length image crops throw exceptions when computing embedding. Need to skip over them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.