Giter Site home page Giter Site logo

fudan-zvg / semantic-segment-anything Goto Github PK

View Code? Open in Web Editor NEW
2.1K 19.0 132.0 95.44 MB

Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).

License: Apache License 2.0

Python 100.00%

semantic-segment-anything's Introduction

SSA Icon

Semantic Segment Anything
Jiaqi Chen, Zeyu Yang, and Li Zhang
Zhang Vision Group, Fudan Univerisity

SAM is a powerful model for arbitrary object segmentation, while SA-1B is the largest segmentation dataset to date. However, SAM lacks the ability to predict semantic categories for each mask. (I) To address above limitation, we propose a pipeline on top of SAM to predict semantic category for each mask, called Semantic Segment Anything (SSA). (II) Moreover, our SSA can serve as an automated dense open-vocabulary annotation engine called Semantic segment anything labeling engine (SSA-engine), providing rich semantic category annotations for SA-1B or any other dataset. This engine significantly reduces the need for manual annotation and associated costs.

Web demo and API

  • Try the Web Demo and API here: Replicate

๐Ÿค” Why do we need SSA project?

  • SAM is a highly generalizable object segmentation algorithm that can provide precise masks. SA-1B is the largest image segmentation dataset to date, providing fine mask segmentation annotations. Neither SAM nor SA-1B provide category predictions or annotations for each mask. This makes it difficult for researchers to use the powerful SAM algorithm to directly solve semantic segmentation tasks or to utilize SA-1B to train their own models.
  • Advanced close-set segmenters like Segformer, Oneformer, open-set segmenters like CLIPSeg, and image caption methods like BLIP can provide rich semantic annotations. However, their mask segmentation predictions may not be as comprehensive and accurate as those generated by SAM, which has highly precise and detailed boundaries.
  • Therefore, by combining the fine image segmentation masks from SAM and SA-1B with the rich semantic annotations provided by these advanced models, we can generate semantic segmentation models with stronger generalization ability, as well as a large-scale densely categorized image segmentation dataset.

๐Ÿ‘ What SSA project can do?

  • SSA: This is the first open framework that utilizes SAM for semantic segmentation task. It supports users to seamlessly integrate their existing semantic segmenters with SAM without the need for retraining or fine-tuning SAM's weights, enabling them to achieve better generalization and more precise mask boundaries.
  • SSA-engine: SSA-engine provides dense open-vocabulary category annotations for large-scale SA-1B dataset. After manual review and refinement, these annotations can be used to train segmentation models or fine-grained CLIP models.

โœˆ๏ธ SSA: Semantic segment anything

Before the introduction of SAM, most semantic segmentation application scenarios already had their own models. These models could provide rough category classifications for regions, but were blurry and imprecise at the edges, lacking accurate masks. To address this issue, we propose an open framework called SSA that leverages SAM to enhance the performance of existing models. Specifically, the original semantic segmentation models provide category predictions while the powerful SAM provides masks.

If you have already trained a semantic segmentation model on your dataset, you don't need to retrain a new SAM-based model for more accurate segmentation. Instead, you can continue to use the existing model as the Semantic branch. SAM's strong generalization and image segmentation abilities can improve the performance of the original model. It is worth noting that SSA is suitable for scenarios where the predicted mask boundaries by the original segmentor are not highly accurate. If the original model's segmentation is already very accurate, SSA may not provide a significant improvement.

SSA consists of two branches, Mask branch and Semantic branch, as well as a voting module that determines the category for each mask.

  • (I) Mask branch (blue). SAM serves as the Mask branch and provides a set of masks with clear boundaries.

  • (II) Semantic branch (purple). This branch provides the category for each pixel, which is implemented by a semantic segmentor that users can customize in terms of the segmentor's architecture and the interested categories. The segmentor does not need to have highly detailed boundaries, but it should classify each region as accurately as possible.

  • (III) Semantic Voting module (red). This module crops out the corresponding pixel categories based on the mask's position. The top-1 category among these pixel categories is considered as the classification result for that mask.

๐Ÿš„ SSA-engine: Semantic segment anything labeling engine

SSA-engine is an automated annotation engine that serves as the initial semantic labeling for the SA-1B dataset. While human review and refinement may be required for more accurate labeling. Thanks to the combined architecture of close-set segmentation and open-vocabulary segmentation, SSA-engine produces satisfactory labeling for most samples and has the capability to provide more detailed annotations using image caption method.

This tool fills the gap in SA-1B's limited fine-grained semantic labeling, while also significantly reducing the need for manual annotation and associated costs. It has the potential to serve as a foundation for training large-scale visual perception models and more fine-grained CLIP models.

The SSA-engine consists of three components:

  • (I) Close-set semantic segmentor (green). Two close-set semantic segmentation models trained on COCO and ADE20K datasets respectively are used to segment the image and obtain rough category information. The predicted categories only include simple and basic categories to ensure that each mask receives a relevant label.
  • (II) Open-vocabulary classifier (blue). An image captioning model is utilized to describe the cropped image patch corresponding to each mask. Nouns or phrases are then extracted as candidate open-vocabulary categories. This process provides more diverse category labels.
  • (III) Final decision module (orange). The SSA-engine uses a Class proposal filter (i.e. a CLIP) to filter out the top-k most reasonable predictions from the mixed class list. Finally, the Open-vocabulary Segmentor predicts the most suitable category within the mask region based on the top-k classes and image patch.

๐Ÿ“– News

๐Ÿ”ฅ 2023/04/14: SSA benchmarks semantic segmentation on ADE20K and Cityscapes.
๐Ÿ”ฅ 2023/04/10: Semantic Segment Anything (SSA and SSA-engine) is released.
๐Ÿ”ฅ 2023/04/05: SAM and SA-1B are released.

Results

All results were tested on a single NVIDIA A6000 GPU.

1. Inference time

Dataset model Inference time per image (s) Inference time per mask (s)
SA-1B SSA (Close set) 1.149 0.012
SA-1B SSA-engine (Open-vocabulary) 33.333 0.334

2. Memory usage

SSA (with SAM)

Dataset model GPU Memory (MB)
ADE20K SSA 8798
Cityscapes SSA 19012

SSA-engine

Dataset model GPU Memory without SAM (MB) GPU Memory with SAM (MB)
SA-1B SSA-engine-small 11914 28024
SA-1B SSA-engine-base 14466 30576

3. Close-set semantic segmentation on ADE20K and Cityscapes dataset

For the sake of convenience, we utilized different versions of Segformer from Hugging Face, which come with varying parameter sizes and accuracy levels (including B0, B2, and B5), to simulate semantic branches with less accurate masks. The results show that when the accuracy of original Semantic branch is NOT very high, SSA can lead to an improvement in mIoU.

ADE20K

Model Semantic branch mIoU of Semantic branch mIoU of SSA
SSA Segformer-B0 31.78 33.60
SSA Segformer-B2 41.38 42.92
SSA Segformer-B5 45.92 47.14

Cityscapes

Model Semantic branch mIoU of Semantic branch mIoU of SSA
SSA Segformer-B0 52.52 55.14
SSA Segformer-B2 59.76 62.25
SSA Segformer-B5 71.67 72.99

Note that all Segformer checkpoint and data pipeline are sourced from Hugging Face released by NVIDIA, which shows lower mIoU compared to those on official repository.

4. Cross-domain segmentation on Foggy Driving

We also evaluate the performance of SSA on the Foggy Driving dataset, with OneFormer as Semantic branch. The weight and data pipeline of OneFormer is sourced from Hugging Face.

Model Training dataset validation dataset mIoU
SSA Cityscapes Foggy Driving 55.61

Examples

Open-vocabulary prediction on SA-1B

  • Addition example for Open-vocabulary annotations

Close-set semantic segmentation on Cityscapes

Close-set semantic segmentation on ADE20K

Cross-domain segmentation on Foggy Driving

๐Ÿ’ป Requirements

  • Python 3.7+
  • CUDA 11.1+

๐Ÿ› ๏ธ Installation

git clone [email protected]:fudan-zvg/Semantic-Segment-Anything.git
cd Semantic-Segment-Anything
conda env create -f environment.yaml
conda activate ssa
python -m spacy download en_core_web_sm
# install segment-anything
cd ..
git clone [email protected]:facebookresearch/segment-anything.git
cd segment-anything; pip install -e .; cd ../Semantic-Segment-Anything

๐Ÿš€ Quick Start

1. SSA

1.1 Preparation

Dowload the ADE20K or Cityscapes dataset, and unzip them to the data folder.

Folder sturcture:

โ”œโ”€โ”€ Semantic-Segment-Anything
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ ade
โ”‚   โ”‚   โ”œโ”€โ”€ ADEChallengeData2016
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ images
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ training
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ validation
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ADE_val_00002000.jpg
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ test
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ annotations
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ training
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ validation
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ADE_val_00002000.png
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ cityscapes
โ”‚   โ”‚   โ”œโ”€โ”€ leftImg8bit
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ frankfurt
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ lindau
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ munster
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ munster_000173_000019_leftImg8bit.png
โ”‚   โ”‚   โ”œโ”€โ”€ gtFine
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ frankfurt
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ lindau
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ munster
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ munster_000173_000019_gtFine_labelTrainIds.png
โ”‚   โ”‚   โ”œโ”€โ”€ ...

Dowload the checkpoint of SAM and put it to the ckp folder.

mkdir ckp && cd ckp
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
cd ..

1.2 SSA inference

Run our SSA on ADE20K with 8 GPUs:

python scripts/main_ssa.py --ckpt_path ./ckp/sam_vit_h_4b8939.pth --save_img --world_size 8 --dataset ade20k --data_dir data/ade20k/ADEChallengeData2016/images/validation/ --gt_path data/ade20k/ADEChallengeData2016/annotations/validation/ --out_dir output_ade20k

Run our SSA on Cityscapes with 8 GPUs:

python scripts/main_ssa.py --ckpt_path ./ckp/sam_vit_h_4b8939.pth --save_img --world_size 8 --dataset cityscapes --data_dir data/cityscapes/leftImg8bit/val/ --gt_path data/cityscapes/gtFine/val/ --out_dir output_cityscapes

Run our SSA on Foggy Driving with 8 GPUs:

python scripts/main_ssa.py --data_dir data/Foggy_Driving/leftImg8bit/test/ --ckpt_path ckp/sam_vit_h_4b8939.pth --out_dir output_foggy_driving --save_img --world_size 8 --dataset foggy_driving --eval --gt_path data/Foggy_Driving/gtFine/test/ --model oneformer

1.3 SSA evaluation (after inference)

Get the evaluate result of ADE20K:

python scripts/evaluation.py --gt_path data/ade20k/ADEChallengeData2016/annotations/validation --result_path output_ade20k/ --dataset ade20k

Get the evaluate result of Cityscapes:

python scripts/evaluation.py --gt_path data/cityscapes/gtFine/val/ --result_path output_cityscapes/ --dataset cityscapes

Get the evaluate result of Foggy Driving:

# if you haven't downloaded the Foggy Driving dataset, you can run the following command to download it.
wget -P data https://data.vision.ee.ethz.ch/csakarid/shared/SFSU_synthetic/Downloads/Foggy_Driving.zip & unizp data/Foggy_Driving.zip -d data/

python scripts/evaluation.py --gt_path data/Foggy_Driving/gtFine/test/ --result_path output_foggy_driving/ --dataset foggy_driving

2. SSA-engine

Automatic annotation for your own dataset

Organize your dataset as follows:

โ”œโ”€โ”€ Semantic-Segment-Anything
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ <The name of your dataset>
โ”‚   โ”‚   โ”œโ”€โ”€ img_name_1.jpg
โ”‚   โ”‚   โ”œโ”€โ”€ img_name_2.jpg
โ”‚   โ”‚   โ”œโ”€โ”€ ...

Run our SSA-engine-base with 8 GPUs (The GPU memory needed is dependent on the size of the input images):

python scripts/main_ssa_engine.py --data_dir=data/<The name of your dataset> --out_dir=output --world_size=8 --save_img --sam --ckpt_path=ckp/sam_vit_h_4b8939.pth

If you want to run the SSA-engine-small, you can use the following command (add the --light_mode flag):

python scripts/main_ssa_engine.py --data_dir=data/<The name of your dataset> --out_dir=output --world_size=8 --save_img --sam --ckpt_path=ckp/sam_vit_h_4b8939.pth --light_mode

Automatic annotation for SA-1B

Download the SA-1B dataset and unzip it to the data/sa_1b folder.
Or you use your own dataset.

Folder sturcture:

โ”œโ”€โ”€ Semantic-Segment-Anything
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ sa_1b
โ”‚   โ”‚   โ”œโ”€โ”€ sa_223775.jpg
โ”‚   โ”‚   โ”œโ”€โ”€ sa_223775.json
โ”‚   โ”‚   โ”œโ”€โ”€ ...

Run our SSA-engine-base with 8 GPUs:

python scripts/main_ssa_engine.py --data_dir=data/sa_1b --out_dir=output --world_size=8 --save_img

Run the SSA-engine-small with 8 GPUs (add the --light_mode flag):

python scripts/main_ssa_engine.py --data_dir=data/sa_1b --out_dir=output --world_size=8 --save_img --light_mode

For each mask, we add two new fields (e.g. 'class_name': 'face' and 'class_proposals': ['face', 'person', 'sun glasses']). The class name is the most likely category for the mask, and the class proposals are the top-k most likely categories from Class proposal filter. k is set to 3 by default.

{
    'bbox': [81, 21, 434, 666],
    'area': 128047,
    'segmentation': {
        'size': [1500, 2250],
        'counts': 'kYg38l[18oeN8mY14aeN5\\Z1>'
    }, 
    'predicted_iou': 0.9704002737998962,
    'point_coords': [[474.71875, 597.3125]],
    'crop_box': [0, 0, 1381, 1006],
    'id': 1229599471,
    'stability_score': 0.9598413705825806,
    'class_name': 'face',
    'class_proposals': ['face', 'person', 'sun glasses']
}

๐Ÿ“ˆ Future work

We hope that excellent researchers in the community can come up with new improvements and ideas to do more work based on SSA. Some of our ideas are as follows:

  • (I) The masks in SA-1B are often in three levels: whole, part, and subpart, and SSA-engine often cannot provide accurate descriptions for too small part or subpart regions. Instead, we use broad categories. For example, SSA-engine may predict "person" for body parts like neck or hand. Therefore, an architecture for more detailed semantic prediction is needed.
  • (II) SSA and SSA-engine is an ensemble of multiple models, which makes the inference speed slower compared to end-to-end models. We look forward to more efficient designs in the future.
  • (III) For semantic segmentation models with poor boundary segmentation, SSA can utilize SAM and the semantic voting mechanism to provide more accurate masks. However, for models that already have excellent segmentation performance, SSA cannot bring about a significant improvement. On the other hand, if the original segmentation model is too poor and misses many semantic categories, SSA cannot help it recall those categories either. Exploring better ways to utilize SAM is worth further investigation.

๐Ÿ˜„ Acknowledgement

๐Ÿ“œ Citation

If you find this work useful for your research, please cite our github repo:

@misc{chen2023semantic,
    title = {Semantic Segment Anything},
    author = {Chen, Jiaqi and Yang, Zeyu and Zhang, Li},
    howpublished = {\url{https://github.com/fudan-zvg/Semantic-Segment-Anything}},
    year = {2023}
}

semantic-segment-anything's People

Contributors

avivsham avatar chenxwh avatar jiaqi-chen-00 avatar lzrobots avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

semantic-segment-anything's Issues

ๆ˜พๅญ˜ๅ ็”จ

่ฏท้—ฎๆ‚จๅœจๆŽจ็†ๆ—ถไฝฟ็”จ็š„ๆ˜พๅกๆ˜พๅญ˜ๆ˜ฏๅคšๅฐ‘ๅ‘ข๏ผŸๆˆ‘ไฝฟ็”จ24Gๆ˜พๅญ˜็š„4090ๆŽจ็†ๆ—ถ็ˆ†ๆ˜พๅญ˜ไบ†๏ผŒ่ฏท้—ฎ่‡ณๅฐ‘ๅบ”่ฏฅไฝฟ็”จๅคšๅฐ‘ๅผ ไปฅๅŠๅคšๅคงๆ˜พๅญ˜็š„ๆ˜พๅกๅ‘ข๏ผŸ

About the visualization.

Thanks for this great work and the open-soursed repo. I want to know how to visualize the result after the inference like you show in the repo in the end.

Can't find model 'en_core_web_sm'

python scripts/main.py --data=image_20230408/ --out_dir output --world_size=1 --save_img
Traceback (most recent call last):
File "scripts/main.py", line 4, in
from pipeline import semantic_annotation_pipeline
File "/data2/queenie_2023/Semantic-Segment-Anything/scripts/pipeline.py", line 13, in
from blip import open_vocabulary_classification_blip
File "/data2/queenie_2023/Semantic-Segment-Anything/scripts/blip.py", line 3, in
from utils import get_noun_phrases
File "/data2/queenie_2023/Semantic-Segment-Anything/scripts/utils.py", line 2, in
nlp = spacy.load('en_core_web_sm')
File "/data2/queenie/anaconda3/envs/mmTrans/lib/python3.7/site-packages/spacy/init.py", line 60, in load
config=config,
File "/data2/queenie/anaconda3/envs/mmTrans/lib/python3.7/site-packages/spacy/util.py", line 449, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

Get each segments from output of semantic segmentation

Thank you for sharing a wonderful code!

I want to get segments of specific class from result.

Segmentation results have multiple classes, but I only want to see the segmentation results for a specific class. However, the resulting data (json) does not provide information on the coordinates of the segments or the index for each class, making it difficult to access the desired information.

I tried extracting unique the color of segments, but 'unique of color' and 'number of segments' are different....so it failed

for example, I just want mask that pixel's class_name is 'building'.

What can be done?
Thanks

Mask/Label quality

It seems there are too many masks and labels for some simple images, is it possible to use dense crf to improve the masks/labels quality?

ๆŽจ็†่€—ๆ—ถ้žๅธธๆ…ข

ๆˆ‘ไฝฟ็”จmain_ssa.py่ฟ›่กŒๅ•ๅผ ๅ›พๅƒๆŽจ็†๏ผŒ็กฌไปถไธบๅ•ไธชA100ๆ˜พๅก๏ผŒ40Gๆ˜พๅญ˜๏ผŒๅ‘็ŽฐๅœจSemantic Votingๆจกๅ—่€—ๆ—ถๅพˆๅคš๏ผŒๆฏไธชmask้ƒฝๅ‡ ไนŽ้œ€่ฆๅค„็†1s๏ผŒๅ’Œgitไธญๆœ‰ๅ‡บๅ…ฅ๏ผŒ่ฏท้—ฎ่ฟ™ๆ˜ฏๆญฃๅธธ็š„ๅ—

่ฟ่กŒไธ€ไผšๅŽๆŠฅ้”™

  File "/media/admin1/envs/anaconda3/envs/leng_lip/lib/python3.7/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1458, in forward
    conditional_pixel_values=conditional_pixel_values,
  File "/media/admin1/envs/anaconda3/envs/leng_lip/lib/python3.7/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1360, in get_conditional_embeddings
    raise ValueError("Make sure to pass as many prompt texts as there are query images")
ValueError: Make sure to pass as many prompt texts as there are query images

่ƒฝๆˆๅŠŸ100ๅคšๅผ ๅ›พ็‰‡๏ผŒ็„ถๅŽๅฐฑไผšๅ‡บ็Žฐ่ฟ™ๆ ท็š„ๆŠฅ้”™ๅœๆญขใ€‚
ไฝฟ็”จๅ‘ฝไปคๅฆ‚ไธ‹

python scripts/main_ssa_engine.py --data_dir=data/UCM_Captions --out_dir=output --world_size=4 --save_img --sam --ckpt_path=../../mydata/sam_vit_h_4b8939.pth --light_mode

pycocotools ImportError undefined symbol: __intel_sse2_strchr

Thank you very much for your contribution.

I have installed all the requirements and tried to run it on my own data using the following command in miniconda3 environment on a Linux HPC:
python scripts/main_ssa_engine.py --data_dir=data/ --out_dir=output --world_size=8 --save_img --sam --ckpt_path=ckp/sam_vit_h_4b8939.pth

However I keep getting the following error at import pycocotools.mask as maskUtils
import pycocotools._mask as _mask
ImportError: ssa/lib/python3.8/site-packages/pycocotools/_mask.cpython-38-x86_64-linux-gnu.so: undefined symbol: __intel_sse2_strchr

Can I ask if this is a dependency issue and is there any way to solve it?

Thank you in advance.

messy category generation data

For the messy category generation data, such as "the word '50' in white letters," "three blue plastic rabbits", "three blue plastic snowflakes", "some very pretty blue and black items" and "1 ultra blue" are there any other methods to further clean them?

fine tune the model

Hello,
Is the fine tuning possible to a custom dataset and custom labels? Thank you

ModuleNotFoundError: No module named 'cog'

Hi, thanks for your work.
I just build the environment as you suggested as follows:

git clone [email protected]:fudan-zvg/Semantic-Segment-Anything.git
cd Semantic-Segment-Anything
conda env create -f environment.yaml
conda activate ssa
python -m spacy download en_core_web_sm
# install segment-anything
cd ..
git clone [email protected]:facebookresearch/segment-anything.git
cd segment-anything; pip install -e .; cd ../Semantic-Segment-Anything

Then I just try to run python predict.py and it show the following error:

Traceback (most recent call last):
  File "predict.py", line 19, in <module>
    from cog import BasePredictor, Input, Path, BaseModel
ModuleNotFoundError: No module named 'cog'

Can you give me some suggestions on how to solve this?

Simple Python code available?

Hello!
I love this project and the impressing results! But I would like to handle it simple with a few lines of Python code. Is there anything available? I tried to filter out the relevant parts of your project scripts but failed.
Simple giving the address to a single image and in return receiving an array or a list with labels, coordinates etc. Is that possible?

Best regards
Marc

Apply SSA on a single Image

Hi, I was thinking if it is possible to have a single image as input, apply the Segment Anything Model from Meta and the use this tool to get the actual labels for the predicted masks

Don't get any output after inference

Hi All,
Thank you for your amazing work and repo!
I'm trying to inference the open vocab model by some random image.
I followed the installation instructions and completed them without any errors, then I tried to inference the model as explained in the README file (see the attached photo). The inference was completed without errors just warnings, but when I entered the output path provided when calling main.py it was empty.
What am I doing wrong?
image

Cheers,

Can I use my own dataset

Hello, thank you for your work !

I want to use SSA on custom categories and datasets. I saw you mentioned that users can customize in terms of the segmentor's architecture and the interested categories. But can I use my own datasets ?
I've made some attempts. But I don't understand what is "semantic_branch_processor", I try to use one of it directly.But an error will be reported โ€”โ€” ValueError: You have to specify the task_input. Found None. And I guess it has something to do with the fact that I didn't set the "semantic_branch_processor" correctly.
I want to know how to set up "semantic_branch_processor" if I want to use my own dataset.

Thank you very much for any reply.

Appendix
command๏ผš
python scripts/main_ssa.py --ckpt_path ./ckp/sam_vit_h_4b8939.pth --save_img --world_size 1 --dataset VOC2012 --data_dir /media/guo/DATA/chen/lraspp/data/VOCdevkit/VOC2012/JPEGImages --gt_path /media/guo/DATA/chen/lraspp/data/VOCdevkit/VOC2012/Annotations --out_dir output_VOC2012
Complete error reporting๏ผš
Traceback (most recent call last):
File "scripts/main_ssa_try.py", line 269, in
main(0, args)
File "scripts/main_ssa_try.py", line 248, in main
semantic_segment_anything_inference(file_name, args.out_dir, rank, img=img, save_img=args.save_img,
File "/media/guo/DATA/chen/SSA/Semantic-Segment-Anything/scripts/pipeline.py", line 168, in semantic_segment_anything_inference
class_ids = segformer_func(img, semantic_branch_processor, semantic_branch_model, rank)
File "/media/guo/DATA/chen/SSA/Semantic-Segment-Anything/scripts/segformer.py", line 5, in segformer_segmentation
inputs = processor(images=image, return_tensors="pt").to(rank)
File "/home/guo/anaconda3/envs/ssa/lib/python3.8/site-packages/transformers/models/oneformer/processing_oneformer.py", line 112, in call
raise ValueError("You have to specify the task_input. Found None.")
ValueError: You have to specify the task_input. Found None.

torch and transformers version mismatch error

Hi, when I tried to run SSA inference, I met this following error:

Traceback (most recent call last):
File "scripts/main_ssa.py", line 122, in
main(0, args)
File "scripts/main_ssa.py", line 66, in main
from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
File "", line 1039, in _handle_fromlist
File "/home/rxu37/.local/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1117, in getattr
value = getattr(module, name)
File "/home/rxu37/.local/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1116, in getattr
module = self._get_module(self._class_to_module[name])
File "/home/rxu37/.local/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1128, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.segformer.modeling_segformer because of the following error (look up to see its traceback):
No module named 'torch.distributed.algorithms.join'

I installed my environment using the exact provided environment.yaml , so my torch version is 1.9.1+cu111, and my transformers version is 4.27.1. I looked up the PyTorch doc and found that only PyTorch 2.0 has the module torch.distributed.algorithms.join. Thus I upgraded PyTorch to 2.0.1+cu118, but now I met the error

Traceback (most recent call last):
File "scripts/main_ssa.py", line 6, in
from pipeline import semantic_segment_anything_inference, eval_pipeline, img_load
File "/mnt/d/GitHub/Semantic-Segment-Anything/scripts/pipeline.py", line 8, in
from mmdet.core.visualization.image import imshow_det_bboxes
File "/home/rxu37/anaconda3/envs/ssa/lib/python3.8/site-packages/mmdet/core/init.py", line 3, in
from .bbox import * # noqa: F401, F403
File "/home/rxu37/anaconda3/envs/ssa/lib/python3.8/site-packages/mmdet/core/bbox/init.py", line 8, in
from .samplers import (BaseSampler, CombinedSampler,
File "/home/rxu37/anaconda3/envs/ssa/lib/python3.8/site-packages/mmdet/core/bbox/samplers/init.py", line 12, in
from .score_hlr_sampler import ScoreHLRSampler
File "/home/rxu37/anaconda3/envs/ssa/lib/python3.8/site-packages/mmdet/core/bbox/samplers/score_hlr_sampler.py", line 3, in
from mmcv.ops import nms_match
File "/home/rxu37/anaconda3/envs/ssa/lib/python3.8/site-packages/mmcv/ops/init.py", line 2, in
from .assign_score_withk import assign_score_withk
File "/home/rxu37/anaconda3/envs/ssa/lib/python3.8/site-packages/mmcv/ops/assign_score_withk.py", line 5, in
ext_module = ext_loader.load_ext(
File "/home/rxu37/anaconda3/envs/ssa/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "/home/rxu37/anaconda3/envs/ssa/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

I have tried to rebuild the mmcv module after I upgrade torch, but it does not seem to work. Would greatly appreciate any pointer/help on this issue. Thank you!

Release model weights

Hi

Really appreciate this work. Do you plan on releasing model weights/checkpoints for SSA? Would really appreciate it.

Thanks

Is it possible we can define our own labels?

Thanks for sharing this great work!

Can we define our labels or apply transfer learning to this project? I did not figure out how do we run this project if our dataset is not from SA-B dataset.
e.g. I want to apply it on an indoor scene where the labels are mainly the furniture. Would simple addition of config files on configs work?

Appreciate your help!

Inference demo

Great work. And I'm wondering why not just give a simple inference demo to a single Image?

missing stable_two_stage_multi_segmenter_clip_seg.py file in scripts

I cloned the repo and run this command
python scripts/stable_two_stage_multi_segmenter_clip_seg.py --data_dir=data/examples --out_dir=output --world_size=8 --save_img
i got:
python3: can't open file '/content/Semantic-Segment-Anything/scripts/stable_two_stage_multi_segmenter_clip_seg.py': [Errno 2] No such file or directory

cannot import print.log from mmcv.util using google colab

Traceback (most recent call last):
File "/content/Semantic-Segment-Anything/scripts/main_ssa_engine.py", line 5, in
from pipeline import semantic_annotation_pipeline
File "/content/Semantic-Segment-Anything/scripts/pipeline.py", line 8, in
from mmcv.utils import print_log
ImportError: cannot import name 'print_log' from 'mmcv.utils' (/usr/local/lib/python3.10/dist-packages/mmcv/utils/init.py)

Is it possible to be used in Google Colab?

Hi! I am new to CV and want to use this mode. I am wondering if it is possible to run this in Google Colab. Specifically, assume that I have a bunch of images in a folder "images," how could I import the library and use it? Thank you and look forward to the reply:)

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Repository license needed

This work looks quite valuable! I would like to explore SSA, but I noticed there is no license.

Would you please add a license that helps clarify the terms under which this work can or cannot be used, edited, extended, and/or redistributed?

About env

Great Job!!!
How about creating an environment without using Conda, such as using Python from virtualenv, and what libraries are needed?

Apple Silicon Support

Requirements section lists CUDA 11.1+ as a requirement. Is it possible to run this Semantic-Segment_Anything on Apple Silicon hardware (M1, M2)?

Thank you

I must say bravo and thank you for doing exactly what I would like to start doing now.

Change Semantic Branch

Can I switch the semantic branch segformer to other semantic segmentation models๏ผŸ

Human class

Can we get the mask only showing a human body? Not with all the other elements in the image/video

Bug in `scripts/pipeline.py`

In scripts/pipeline.py, there are some code like this

patch_small = mmcv.imcrop(img, np.array(
            [ann['bbox'][0], ann['bbox'][1], ann['bbox'][0] + ann['bbox'][2], ann['bbox'][1] + ann['bbox'][3]]),
                                  scale=scale_small)
patch_large = mmcv.imcrop(img, np.array(
    [ann['bbox'][0], ann['bbox'][1], ann['bbox'][0] + ann['bbox'][2], ann['bbox'][1] + ann['bbox'][3]]),
                          scale=scale_large)
patch_huge = mmcv.imcrop(img, np.array(
    [ann['bbox'][0], ann['bbox'][1], ann['bbox'][0] + ann['bbox'][2], ann['bbox'][1] + ann['bbox'][3]]),
                          scale=scale_large)

would patch_huge have scale=scale_hugh?

TypeError: list indices must be integers or slices, not str

Thank you very much for creating this project & publishing it so soon after the segment-anything release!

I got this error when running SSA inference on a directory of images:

python scripts/main_ssa_engine.py --data_dir=data/examples --out_dir=data/output --save_img --world_size 1

Before running this command, I first ran segment-anything on my directory of images, in "coco_rle" output mode to generate the JSONs:

python scripts/amg.py --checkpoint models/sam_vit_h_4b8939.pth --model-type default --input ../Semantic-Segment-Anything/data/examples --output ../Semantic-Segment-Anything/data/examples/ --convert-to-rle

main_ssa_engine.py threw this error:

torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/laurens/anaconda3/envs/ssa/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/laurens/git/OSS/Semantic-Segment-Anything/scripts/main_ssa_engine.py", line 46, in main
    semantic_annotation_pipeline(file_name, args.data_dir, args.out_dir, rank, save_img=args.save_img,
  File "/home/laurens/git/OSS/Semantic-Segment-Anything/scripts/pipeline.py", line 69, in semantic_annotation_pipeline
    for ann in anns['annotations']:
TypeError: list indices must be integers or slices, not str

I figured this error is because of this part: https://github.com/fudan-zvg/Semantic-Segment-Anything/blob/main/scripts/pipeline.py#L61-L69

If the pipeline has mask_generator, the segmentations are put in dict with "annotations" key - not sure why.

In my case, I'm not using the mask_generator since the JSONs are already generated, so when I remove the key annotations in the for loop on line 69, it works.

ๅฏไปฅ่‡ช่กŒ่ฐƒๆ•ดๆ ‡็ญพไนˆ

ๅ›พ็‰‡ๅค„็†ๅŽๅ‘็Žฐๅพˆๅคšmaskๅ‡บ็Žฐ้‡ๅ ็Žฐ่ฑก๏ผŒ้ข็งฏๅŠ ๅ’Œ่ถ…่ถŠไบ†ๅŽŸๆœฌๅ›พ็‰‡็š„ๅฐบๅฏธๅคงๅฐ๏ผŒ่€Œไธ”ๆˆ‘ไนŸไธ้œ€่ฆๅˆ†็ฑปๆ ‡็ญพ่ฟ™ไนˆไธฐๅฏŒ๏ผŒๆ‰€ไปฅ่ฏท้—ฎๅฏไปฅๆŒ‰็…ง่‡ชๅทฑ็š„ๆ„ๆ„ฟๆƒณๆณ•ไฟฎๆ”น่ƒฝ่ขซๆ ‡ๆณจ็š„ๆ ‡็ญพ็ง็ฑปๅ—

Run without GPU

Hi,
thanks for the amazing code! Is there any chance to get it running without a cuda capable gpu? Maybe via cpu (as for the sam)?
Thanks!

out of memory๏ผ๏ผ๏ผ

cuda:11.1 torch:1.10 single A6000
ๅ‘ฝไปค๏ผšpython scripts/main_ssa_engine.py --data_dir=/mnt/usb/gxy/dataset_path/images --out_dir=output --world_size=1 --save_img --sam --checkpoint-path=checkpoint-path/sam_vit_h_4b8939.pth
ๆŠฅ้”™๏ผšRuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 47.54 GiB total capacity; 537.07 MiB already allocated; 12.06 MiB free; 580.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
help!!!help!!!

OneFormerProcessor

ๅŠ ่ฝฝไธ‹่ฝฝๅฅฝ็š„ๆจกๅž‹๏ผŒ oneformerๅ‡บ้”™๏ผŒhuggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name':

oneformer_ade20k_processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_ade20k_swin_large")
oneformer_ade20k_model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_ade20k_swin_large").to(rank)

evaluation results available?

Looks like, with SSA, it becomes possible to compare the performances of SAM against SOTA on popular benchmark datasets. Would you report the validation results of SSA for semantic segmentation or instance segmentation on AED20k or coco?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.