Giter Site home page Giter Site logo

syscv / sam-hq Goto Github PK

View Code? Open in Web Editor NEW
3.4K 76.0 195.0 42.17 MB

Segment Anything in High Quality [NeurIPS 2023]

Home Page: https://arxiv.org/abs/2306.01567

License: Apache License 2.0

Python 90.78% C++ 0.92% Cuda 8.21% Shell 0.09%
sam segmentation segment-anything zero-shot-segmentation high-quality segment-anything-model

sam-hq's Introduction

Segment Anything in High Quality

PWC Open In Colab Huggingfaces Open in OpenXLab Downloads

Segment Anything in High Quality
NeurIPS 2023
ETH Zurich & HKUST

We propose HQ-SAM to upgrade SAM for high-quality zero-shot segmentation. Refer to our paper for more details.

Updates

πŸ”₯πŸ”₯ SAM in 3D: Interested in intersecting SAM and 3D Gaussian Splatting? See our new work Gaussian Grouping!

2023/12/15: HQ-SAM is adopted in Osprey to provide fine-grained mask annotation and also CaR method.

2023/11/06: HQ-SAM is adopted to annotate the Grounding-anything Dataset proposed by GLaMM.

2023/10/15: HQ-SAM is supported in the OpenMMLab PlayGround for annotation with Label-Studio.

2023/09/28: HQ-SAM is in ENIGMA-51 for annotating egocentric industrial data, with SAM comparison in paper.

2023/08/16: HQ-SAM is in segment-geospatial for segmenting geospatial data, and mask annotation tool ISAT!

2023/08/11: Support python package for easier pip installation.

2023/07/25: Light HQ-SAM is in EfficientSAM series combining with Grounded SAM!

2023/07/21: HQ-SAM is also in OpenXLab apps, thanks their support!

πŸš€πŸš€ 2023/07/17: We released Light HQ-SAM using TinyViT as backbone, for both fast and high-quality zero-shot segmentation, which reaches 41.2 FPS. Refer to Light HQ-SAM vs. MobileSAM for more details.

πŸ†πŸ₯‡ 2023/07/14: Grounded HQ-SAM obtains the first placeπŸ₯‡ in the Segmentation in the Wild competition on zero-shot track (hosted in CVPR 2023 workshop), outperforming Grounded SAM. Refer to our SGinW evaluation for more details.

2023/07/05: We released SAM tuning instuctions and HQSeg-44K data.

2023/07/04: HQ-SAM is adopted in SAM-PT to improve the SAM-based zero-shot video segmentation performance. Also, HQ-SAM is used in Grounded-SAM, Inpaint Anything and HQTrack (2nd in VOTS 2023).

2023/06/28: We released the ONNX export script and colab notebook for exporting and using ONNX model.

2023/06/23: Play with HQ-SAM demo at Huggingfaces, which supports point, box and text prompts.

2023/06/14: We released the colab demo Open In Colab and automatic mask generator notebook.

2023/06/13: We released the model checkpoints and demo visualization codes.

Visual comparison between SAM and HQ-SAM

SAM vs. HQ-SAM

image

Introduction

The recent Segment Anything Model (SAM) represents a big leap in scaling up segmentation models, allowing for powerful zero-shot capabilities and flexible prompting. Despite being trained with 1.1 billion masks, SAM's mask prediction quality falls short in many cases, particularly when dealing with objects that have intricate structures. We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM's original promptable design, efficiency, and zero-shot generalizability. Our careful design reuses and preserves the pre-trained model weights of SAM, while only introducing minimal additional parameters and computation. We design a learnable High-Quality Output Token, which is injected into SAM's mask decoder and is responsible for predicting the high-quality mask. Instead of only applying it on mask-decoder features, we first fuse them with early and final ViT features for improved mask details. To train our introduced learnable parameters, we compose a dataset of 44K fine-grained masks from several sources. HQ-SAM is only trained on the introduced detaset of 44k masks, which takes only 4 hours on 8 GPUs. We show the efficacy of HQ-SAM in a suite of 9 diverse segmentation datasets across different downstream tasks, where 7 out of them are evaluated in a zero-shot transfer protocol.

image

Quantitative comparison between SAM and HQ-SAM

Note: For box-prompting-based evaluation, we feed SAM, MobileSAM and our HQ-SAM with the same image/video bounding boxes and adopt the single mask output mode of SAM.

We provide comprehensive performance, model size and speed comparison on SAM variants: image

Various ViT backbones on COCO:

backbones Note: For the COCO dataset, we use a SOTA detector FocalNet-DINO trained on the COCO dataset as our box prompt generator.

YTVIS and HQ-YTVIS

Note:Using ViT-L backbone. We adopt the SOTA detector Mask2Former trained on the YouTubeVIS 2019 dataset as our video boxes prompt generator while reusing its object association prediction. ytvis

DAVIS

Note: Using ViT-L backbone. We adopt the SOTA model XMem as our video boxes prompt generator while reusing its object association prediction. davis

Quick Installation via pip

pip install segment-anything-hq
python
from segment_anything_hq import sam_model_registry
model_type = "<model_type>" #"vit_l/vit_b/vit_h/vit_tiny"
sam_checkpoint = "<path/to/checkpoint>"
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)

see specific usage example (such as vit-l) by running belowing command:

export PYTHONPATH=$(pwd)
python demo/demo_hqsam_pip_example.py

Standard Installation

The code requires python>=3.8, as well as pytorch>=1.7 and torchvision>=0.8. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

Clone the repository locally and install with

git clone https://github.com/SysCV/sam-hq.git
cd sam-hq; pip install -e .

The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. jupyter is also required to run the example notebooks.

pip install opencv-python pycocotools matplotlib onnxruntime onnx timm

Example conda environment setup

conda create --name sam_hq python=3.8 -y
conda activate sam_hq
conda install pytorch==1.10.0 torchvision==0.11.0 cudatoolkit=11.1 -c pytorch -c nvidia
pip install opencv-python pycocotools matplotlib onnxruntime onnx timm

# under your working directory
git clone https://github.com/SysCV/sam-hq.git
cd sam-hq
pip install -e .
export PYTHONPATH=$(pwd)

Model Checkpoints

Three HQ-SAM model versions of the model are available with different backbone sizes. These models can be instantiated by running

from segment_anything import sam_model_registry
sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")

Download the provided trained model below and put them into the pretrained_checkpoint folder:

mkdir pretrained_checkpoint

Click the links below to download the checkpoint for the corresponding model type. We also provide alternative model downloading links here or at hugging face.

Getting Started

First download a model checkpoint. Then the model can be used in just a few lines to get masks from a given prompt:

from segment_anything import SamPredictor, sam_model_registry
sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
predictor = SamPredictor(sam)
predictor.set_image(<your_image>)
masks, _, _ = predictor.predict(<input_prompts>)

Additionally, see the usage examples in our demo , colab notebook and automatic mask generator notebook.

To obtain HQ-SAM's visual result:

python demo/demo_hqsam.py

To obtain baseline SAM's visual result. Note that you need to download original SAM checkpoint from baseline-SAM-L model and put it into the pretrained_checkpoint folder.

python demo/demo_sam.py

To obtain Light HQ-SAM's visual result:

python demo/demo_hqsam_light.py

HQ-SAM Tuning and HQ-Seg44k Data

We provide detailed training, evaluation, visualization and data downloading instructions in HQ-SAM training. You can also replace our training data to obtain your own SAM in specific application domain (like medical, OCR and remote sensing).

Please change the current folder path to:

cd train

and then refer to detailed readme instruction.

Grounded HQ-SAM vs Grounded SAM on SegInW

Grounded HQ-SAM wins the first placeπŸ₯‡ on SegInW benchmark (consist of 25 public zero-shot in the wild segmentation datasets), and outpuerforming Grounded SAM using the same grounding-dino detector.

Model Name Encoder GroundingDINO Mean AP Evaluation Script Log Output Json
Grounded SAM vit-h swin-b 48.7 script log result
Grounded HQ-SAM vit-h swin-b 49.6 script log result

Please change the current folder path to:

cd seginw

We provide detailed evaluation instructions and metrics on SegInW in Grounded-HQ-SAM evaluation.

Light HQ-SAM vs MobileSAM on COCO

We propose Light HQ-SAM based on the tiny vit image encoder provided by MobileSAM. We provide quantitative comparison on zero-shot COCO performance, speed and memory below. Try Light HQ-SAM at here.

Model Encoder AP AP@L AP@M AP@S Model Params (MB) FPS Memory (GB)
MobileSAM TinyViT 44.3 61.8 48.1 28.8 38.6 44.8 3.7
Light HQ-SAM TinyViT 45.0 62.8 48.8 29.2 40.3 41.2 3.7

Note: For the COCO dataset, we use the same SOTA detector FocalNet-DINO trained on the COCO dataset as our and Mobile sam's box prompt generator.

ONNX export

HQ-SAM's lightweight mask decoder can be exported to ONNX format so that it can be run in any environment that supports ONNX runtime. Export the model with

python scripts/export_onnx_model.py --checkpoint <path/to/checkpoint> --model-type <model_type> --output <path/to/output>

See the example notebook for details on how to combine image preprocessing via HQ-SAM's backbone with mask prediction using the ONNX model. It is recommended to use the latest stable version of PyTorch for ONNX export.

Citation

If you find HQ-SAM useful in your research or refer to the provided baseline results, please star ⭐ this repository and consider citing πŸ“:

@inproceedings{sam_hq,
    title={Segment Anything in High Quality},
    author={Ke, Lei and Ye, Mingqiao and Danelljan, Martin and Liu, Yifan and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
    booktitle={NeurIPS},
    year={2023}
}  

Related high-quality instance segmentation work:

@inproceedings{transfiner,
    title={Mask Transfiner for High-Quality Instance Segmentation},
    author={Ke, Lei and Danelljan, Martin and Li, Xia and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
    booktitle={CVPR},
    year={2022}
}

Acknowledgments

sam-hq's People

Contributors

eddogola avatar giswqs avatar lkeab avatar masterbin-iiau avatar ymq2017 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sam-hq's Issues

Predictions of baseline MobileSAM inconsistent with official MobileSAM

When running the exact same input image and box prompt with the baseline MobileSAM (available in this repository) and the official MobileSAM model there are differences in predicted masks. I have tried to locate if there are particular differences in the code causing this inconsistent behavior but without luck thus far. I believe this is an urgent issue that should be investigated.

Baseline MobileSAM (SAM-HQ repository) vs Official MobileSAM (MobileSam repository):
BaselineMobileSAMInconsistency_HQSAMvsMobileSAM_comp

Code release

Hello,
Thank you for this excellent job.
Do you have any plans to release the code and weights?

About HQSEG-44K

Thanks for the excellent job! I am planning to do some fine-tuning on the SAM model with HQSEG-44K dataset. However, when I prepared the dataset, I found several issues:

  1. Some images in FSS are never labeled, like bamboo_slip/7.png
  2. Some images in MSRA-10K are mislabeled, like 97915.jpg
  3. The DUT-OMRON dataset only contains 5168 images, while actually the DUTS dataset contains 15572 images (including training and val splits). I cannot figure out which dataset is truly used in the HQSEG-44K.

Can you tell us more dataset preparation details? Thanks!

A very good SAM fine-tuning job

This is an exciting work, I read your article, I would like to ask if your team has tested it on remote sensing imagery, which is a relatively complex task.

multimask_output with SAM-HQ

Hi everyone!

I was comparing the implementation of original mask_decoder.py and improved mask_decoder_hq.py and I have found a difference when using multimask_output flag.

In the original implementation (mask_decoder.py), when using multimask_output the output was a mask with size 1, 3, 256, 256 and iou_pred with size 1, 3.
However, with the new implemententation (mask_decoder_hq.py), when using multimask_output the output is a mask with size 1, 1, 256, 256 and iou_pred with size 1,1. Furthermore, this mask is the one associated with the maximum iou_pred (that is, the mask with the maximum "detection confidence").

In summary:

  • mask_decoder.py and multimask_output==True
    • masks: Tensor (1, 3, 256, 256)
    • iou_pred = Tensor (1, 3)
  • mask_decoder_hq.py and multimask_output==True
    • masks_sam: Tensor (1, 1, 256, 256)
    • iou_pred = Tensor (1, 1)

Is there any explanation for this implementation? When multimask_output==True, SAM used to return 3 masks but SAM-HQ now just returns 1.

In the new implementation, then masks_sam is added to masks_hq to obtain final masks (masks = masks_sam + masks_hq). Couldn't this be done over the 3 obtained masks instead of selecting just the one that maximizes iou_prob?

Thanks in advance and great work @lkeab!

About Semantic Segmentation

Hello, I want to ask how to achieve the semantic segmentation effect described in the article images, such as identifying a person, cow, car?

Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False

Traceback (most recent call last):
File "E:\sam-hq\demo\demo_hqsam.py", line 63, in
sam = sam_model_registrymodel_type
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\sam-hq\segment_anything\build_sam.py", line 28, in build_sam_vit_l
return _build_sam(
^^^^^^^^^^^
File "E:\sam-hq\segment_anything\build_sam.py", line 106, in _build_sam
state_dict = torch.load(f)
^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 1172, in _load
result = unpickler.load()
^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 1116, in load_tensor
wrap_storage=restore_location(storage, location),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 217, in default_restore_location
result = fn(storage, location)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 182, in _cuda_deserialize
device = validate_cuda_device(location)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 166, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Evaluation script to reproduce numbers in SAM-HQ paper

Great work! I am looking for a script that allows to reproduce iou and boundary IOU numbers. I looked into the train folder and there is an evaluation example shown. However it uses the checkpoint sam_vit_l_0b3195.pth

The predicted masks from this checkpoint of extreme poor quality leading me to believe I should have been using sam_hq_vit_l.pth shown in the main readme of the repo. However when I pass sam_hq_vit_l.pth, to the argument checkpoint of train.py along with flag --eval, it fails to load the checkpoint and errors out since keys do not match.

Please advise how I can reproduce results.

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint) is giving me this error

Error(s) in loading state_dict for Sam:
Unexpected key(s) in state_dict: "mask_decoder.hf_token.weight", "mask_decoder.hf_mlp.layers.0.weight", "mask_decoder.hf_mlp.layers.0.bias", "mask_decoder.hf_mlp.layers.1.weight", "mask_decoder.hf_mlp.layers.1.bias", "mask_decoder.hf_mlp.layers.2.weight", "mask_decoder.hf_mlp.layers.2.bias", "mask_decoder.compress_vit_feat.0.weight", "mask_decoder.compress_vit_feat.0.bias", "mask_decoder.compress_vit_feat.1.weight", "mask_decoder.compress_vit_feat.1.bias", "mask_decoder.compress_vit_feat.3.weight", "mask_decoder.compress_vit_feat.3.bias", "mask_decoder.embedding_encoder.0.weight", "mask_decoder.embedding_encoder.0.bias", "mask_decoder.embedding_encoder.1.weight", "mask_decoder.embedding_encoder.1.bias", "mask_decoder.embedding_encoder.3.weight", "mask_decoder.embedding_encoder.3.bias", "mask_decoder.embedding_maskfeature.0.weight", "mask_decoder.embedding_maskfeature.0.bias", "mask_decoder.embedding_maskfeature.1.weight", "mask_decoder.embedding_maskfeature.1.bias", "mask_decoder.embedding_maskfeature.3.weight", "mask_decoder.embedding_maskfeature.3.bias".

Prompt based Sam HQ

Hello,

How can I use a prompt for segmentation (like the demo version)?

Thank you

Training code

Thanks for the great work. When will you be able to release the training code?

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

Any guidelines for an hq_token_only parameter?

Big applause for such a decent work!

Perhaps, is there any guidelines when using "hq_token_only" parameter in inference stage?

Heuristally, the performance of this paramater seems to be depending on the details of the image(or an object which needed to be masked). Even with the example images given in "demo", some images result in better segementation of the object, while on the other situation, the slightly worse performance on segmenation of details.

If your team also had an heuristic approach to this parameter, would you suggest any ideas when this parameter works more effectively than having it as "False"?

Thanks in advance, and sorry if I have missed the explanation in the paper.

hqsam_light weights

Hey, I am not sure (maybe this is local) but the hqsam_light weights in hugging face is not working and the one in google drive working.

Is it possible to achieve better results?

First, Thank you for the impressive project!
I am trying to auto generate masks for photometric stereo input and am encountering some problems with unclean masks.

image (extreme example)

Do you have any recommendation on how to generate better results ( masks without holes, only mask of the person/face)?

Thank you for your time!

Data Loader throwing FileNotFound error After few epochs of training

I've used training command but every time after random number of epochs I've got FileNotFound error from dataloader.Anyone knows the solution?
error:
epoch: 14 learning rate: 1e-05
[ 0/333] eta: 0:14:51 training_loss: 0.1127 (0.1127) loss_mask: 0.0446 (0.0446) loss_dice: 0.0681 (0.0681) time: 2.6786 data: 0.3379 max mem: 10103
Traceback (most recent call last):
File "/content/drive/MyDrive/sam-hq/train/train.py", line 651, in
main(net, train_datasets, valid_datasets, args)
File "/content/drive/MyDrive/sam-hq/train/train.py", line 360, in main
train(args, net, optimizer, train_dataloaders, valid_dataloaders, lr_scheduler,writer)
File "/content/drive/MyDrive/sam-hq/train/train.py", line 396, in train
for data in metric_logger.log_every(train_dataloaders,1000):
File "/content/drive/MyDrive/sam-hq/train/utils/misc.py", line 237, in log_every
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise
raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataset.py", line 243, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/content/drive/MyDrive/sam-hq/train/utils/dataloader.py", line 244, in getitem
File "/usr/local/lib/python3.10/dist-packages/skimage/io/_io.py", line 53, in imread
img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
File "/usr/local/lib/python3.10/dist-packages/skimage/io/manage_plugins.py", line 207, in call_plugin
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/skimage/io/_plugins/imageio_plugin.py", line 15, in imread
return np.asarray(imageio_imread(*args, **kwargs))
File "/usr/local/lib/python3.10/dist-packages/imageio/v2.py", line 226, in imread
with imopen(uri, "ri", **imopen_args) as file:
File "/usr/local/lib/python3.10/dist-packages/imageio/core/imopen.py", line 113, in imopen
request = Request(uri, io_mode, format_hint=format_hint, extension=extension)
File "/usr/local/lib/python3.10/dist-packages/imageio/core/request.py", line 247, in init
self._parse_uri(uri)
File "/usr/local/lib/python3.10/dist-packages/imageio/core/request.py", line 407, in _parse_uri
raise FileNotFoundError("No such file: '%s'" % fn)
FileNotFoundError: No such file: '/content/drive/MyDrive/Iris-and-Needle-Segmentation-3/train/images/SID0615_jpg.rf.8dd4aeb70ce910df9c8716e3af21b2cd.jpg'

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2600) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-08-01_11:51:44
host : 6198cb800e23
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 2600)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

result is bad?

When I use my own picture, it's result is bad, even worse than origin sam. Is this method truely useful? There is a gap between paper and the real worldβ€˜s result.

Running on M1/M2 or CPU

Hello,

I am trying to run the software on MAC-M1. I changed device in the demo example (demo_hqsam.py) to "cpu" and "mps" but in both cases I got the error messages: (The code works for demo_sam.py")

File "/Users/Projects/SegmentAnything/SAMHQ/samhq/lib/python3.11/site-
packages/torch/serialization.py", line 217, in default_restore_location
result = fn(storage, location)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/Projects/SegmentAnything/SAMHQ/samhq/lib/python3.11/site-packages/torch/serialization.py", line 182, in _cuda_deserialize
device = validate_cuda_device(location)

When fine-tuning one's own dataset, an error was reported as follows

Traceback (most recent call last):
File "train.py", line 694, in
main(net, train_datasets, valid_datasets, args)
File "train.py", line 327, in main
train_dataloaders, train_datasets = create_dataloaders(train_im_gt_list,
File "/home/quchunguang/datasets/sam-hq/train/utils/dataloader.py", line 71, in create_dataloaders
sampler = DistributedSampler(gos_dataset)
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/utils/data/distributed.py", line 65, in init
num_replicas = dist.get_world_size()
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size
return _get_group_size(group)
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size
default_pg = _get_default_group()
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 410, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

How to solve it?

Question about SAM-HQ

In this project, adding a new image, can I see if there is a car in the image or not, for example?

Or is the purpose here to know if there are objects of interest in the images?

IoU Calculate

Hi, I have found that the calculation of IoU is done by computing IoU for each individual image and then taking the average over all the images. However, shouldn't we calculate the intersection and union over all the images first, and then compute IoU?

Run this project

Hi everyone,

How run this project with Mac M1 after install opencv-python pycocotools matplotlib onnxruntime onnx?

And if i add new image? How can I predict the result of the new image?

Error in paper about Params of SAM?

Hi,
In HQ-SAM paper, it said that param of SAM-B is 358M.
But, I use the code to count is 93.9M
model_total_params = sum(p.numel() for p in sam.parameters())
which is error?

Training problem

When I want to train HQ_SAM by instruction,I get the error "HQ-SAM: error: unrecognized arguments: --local-rank=0
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 1468) of binary: /usr/bin/python3" in colab.how to resolve it?

train.py

How do I get 'labels_points' in line 555 and 'labels_noisemask' in line 559 of the 'train.py' file?
image

demo_hqsam

When I run demo_hqsam, the following problem arises: _pick. UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.
How can I solve it?****

i got a issues about OMP error

image: 0 E:/github/sam-hq/demo/demo_hqsam.py:31: MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later. plt.figure(figsize=(10,10)) OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

Process finished with exit code 3

how to fix that issues

gdown does not work

Hey guys,

I can only download the checkpoints, however, Idk to deploy the model in the cloud and download via gdown.

maybe checkpoints share approach could be improved.

thx

Confusion about the comment in the sampling method

I am confused with the below setences... In train/utils/loss_mask.py Line67

    # It is crucial to calculate uncertainty based on the sampled prediction value for the points.
    # Calculating uncertainties of the coarse predictions first and sampling them for points leads
    # to incorrect results.
    # To illustrate this: assume uncertainty_func(logits)=-abs(logits), a sampled point between
    # two coarse predictions with -1 and 1 logits has 0 logits, and therefore 0 uncertainty value.
    # However, if we calculate uncertainties for the coarse predictions first,
    # both will have -1 uncertainty, and the sampled point will get -1 uncertainty.

Why the two coarse predictions with -1 and 1 logits has 0 logits, and therefore 0 uncertainty value. Is the bilinear upsample caused?

Error loading state dict

While running demo_hqsam.py i have following error
using sam_hg_vit_l model and vit_l for model_type

RuntimeError: Error(s) in loading state_dict for Sam: Unexpected key(s) in state_dict: "mask_decoder.hf_token.weight", "mask_decoder.hf_mlp.layers.0.weight", "mask_decoder.hf_mlp.layers.0.bias", "mask_decoder.hf_mlp.layers.1.weight", "mask_decoder.hf_mlp.layers.1.bias", "mask_decoder.hf_mlp.layers.2.weight", "mask_decoder.hf_mlp.layers.2.bias", "mask_decoder.compress_vit_feat.0.weight", "mask_decoder.compress_vit_feat.0.bias", "mask_decoder.compress_vit_feat.1.weight", "mask_decoder.compress_vit_feat.1.bias", "mask_decoder.compress_vit_feat.3.weight", "mask_decoder.compress_vit_feat.3.bias", "mask_decoder.embedding_encoder.0.weight", "mask_decoder.embedding_encoder.0.bias", "mask_decoder.embedding_encoder.1.weight", "mask_decoder.embedding_encoder.1.bias", "mask_decoder.embedding_encoder.3.weight", "mask_decoder.embedding_encoder.3.bias", "mask_decoder.embedding_maskfeature.0.weight", "mask_decoder.embedding_maskfeature.0.bias", "mask_decoder.embedding_maskfeature.1.weight", "mask_decoder.embedding_maskfeature.1.bias", "mask_decoder.embedding_maskfeature.3.weight", "mask_decoder.embedding_maskfeature.3.bias".

[Feature] Export to ONNX SAM image encoder

Thank you very much for this incredible model.

I was looking at your guide for exporting the model to ONNX. I didn't understand why you don't want to export the SAM image encoder to ONNX. I think is because you are executing the onnx graph with onnxruntime in CPU.

However, it would be nice to have it for Triton Inference Server with CUDA backend.

Real Time & Crop

is it possible to run in real time and crop the detected ones? Can you share the code about it?

pypi package?

Any plans to create a pypi package for PIP to simply import and use the necessary packages across platofrms, and use from APIs and software?

Onnx export

Can you upload a script to correctly export the entire model as onnx ?

Dataset Download

Sorry, you are currently unable to view or download this file.
Too many users have recently viewed or downloaded this file. Please try to access this file again later. If the file you are trying to access is particularly large or has been shared with many people, you may have to wait up to 24 hours before you can view or download it. If you still cannot access the file after 24 hours, please contact your domain administrator.

Questions about Text Prompt

Hi! Great thanks for the work. I tried the hugging face demo and found that text prompt option and the advanced options are really effective for the current segment task i've been trying to do. However i didn't find any guide or options for this part in demo so is there any alternative way to modify these when i'm running my own code? Thx for reply!

About custom dataset.

There is a custom dataset, each picture contains multiple objects, how to modify the dataloader or model to train such a dataset?

About the code of finetuning on the specific dataset

Thanks for this siginificant and interesting work.
Open-source work is helpful to the developer
In the paper, the authors write "In particular, we use both global semantic context and local fine-grained features by fusing SAM’s mask decoder features with early and late feature maps from its ViT encoder. During training, we freeze the entire pre-trained SAM parameters, while only updating our HQ-Output Token, its associated three-layer MLPs, and a small feature fusion block.". Is there a demo to run the training code or should we reproduce them by ourselves?

Thanks & Regards!
Momo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.