Giter Site home page Giter Site logo

dbolya / yolact Goto Github PK

View Code? Open in Web Editor NEW
4.9K 105.0 1.3K 21.2 MB

A simple, fully convolutional model for real-time instance segmentation.

License: MIT License

Python 90.06% Shell 1.10% CSS 0.91% HTML 0.90% JavaScript 7.03%
realtime real-time instance-segmentation yolact pytorch

yolact's Introduction

You Only Look At CoefficienTs

    ██╗   ██╗ ██████╗ ██╗      █████╗  ██████╗████████╗
    ╚██╗ ██╔╝██╔═══██╗██║     ██╔══██╗██╔════╝╚══██╔══╝
     ╚████╔╝ ██║   ██║██║     ███████║██║        ██║   
      ╚██╔╝  ██║   ██║██║     ██╔══██║██║        ██║   
       ██║   ╚██████╔╝███████╗██║  ██║╚██████╗   ██║   
       ╚═╝    ╚═════╝ ╚══════╝╚═╝  ╚═╝ ╚═════╝   ╚═╝ 

A simple, fully convolutional model for real-time instance segmentation. This is the code for our papers:

YOLACT++ (v1.2) released! (Changelog)

YOLACT++'s resnet50 model runs at 33.5 fps on a Titan Xp and achieves 34.1 mAP on COCO's test-dev (check out our journal paper here).

In order to use YOLACT++, make sure you compile the DCNv2 code. (See Installation)

For a real-time demo, check out our ICCV video:

IMAGE ALT TEXT HERE

Some examples from our YOLACT base model (33.5 fps on a Titan Xp and 29.8 mAP on COCO's test-dev):

Example 0

Example 1

Example 2

Installation

  • Clone this repository and enter it:
    git clone https://github.com/dbolya/yolact.git
    cd yolact
  • Set up the environment using one of the following methods:
    • Using Anaconda
      • Run conda env create -f environment.yml
    • Manually with pip
      • Set up a Python3 environment (e.g., using virtenv).
      • Install Pytorch 1.0.1 (or higher) and TorchVision.
      • Install some other packages:
        # Cython needs to be installed before pycocotools
        pip install cython
        pip install opencv-python pillow pycocotools matplotlib 
  • If you'd like to train YOLACT, download the COCO dataset and the 2014/2017 annotations. Note that this script will take a while and dump 21gb of files into ./data/coco.
    sh data/scripts/COCO.sh
  • If you'd like to evaluate YOLACT on test-dev, download test-dev with this script.
    sh data/scripts/COCO_test.sh
  • If you want to use YOLACT++, compile deformable convolutional layers (from DCNv2). Make sure you have the latest CUDA toolkit installed from NVidia's Website.
    cd external/DCNv2
    python setup.py build develop

Evaluation

Here are our YOLACT models (released on April 5th, 2019) along with their FPS on a Titan Xp and mAP on test-dev:

Image Size Backbone FPS mAP Weights
550 Resnet50-FPN 42.5 28.2 yolact_resnet50_54_800000.pth Mirror
550 Darknet53-FPN 40.0 28.7 yolact_darknet53_54_800000.pth Mirror
550 Resnet101-FPN 33.5 29.8 yolact_base_54_800000.pth Mirror
700 Resnet101-FPN 23.6 31.2 yolact_im700_54_800000.pth Mirror

YOLACT++ models (released on December 16th, 2019):

Image Size Backbone FPS mAP Weights
550 Resnet50-FPN 33.5 34.1 yolact_plus_resnet50_54_800000.pth Mirror
550 Resnet101-FPN 27.3 34.6 yolact_plus_base_54_800000.pth Mirror

To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands. The name of each config is everything before the numbers in the file name (e.g., yolact_base for yolact_base_54_800000.pth).

Quantitative Results on COCO

# Quantitatively evaluate a trained model on the entire validation set. Make sure you have COCO downloaded as above.
# This should get 29.92 validation mask mAP last time I checked.
python eval.py --trained_model=weights/yolact_base_54_800000.pth

# Output a COCOEval json to submit to the website or to use the run_coco_eval.py script.
# This command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json

# You can run COCOEval on the files created in the previous command. The performance should match my implementation in eval.py.
python run_coco_eval.py

# To output a coco json file for test-dev, make sure you have test-dev downloaded from above and go
python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json --dataset=coco2017_testdev_dataset

Qualitative Results on COCO

# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.15.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --display

Benchmarking on COCO

# Run just the raw model on the first 1k images of the validation set
python eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark --max_images=1000

Images

# Display qualitative results on the specified image.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=my_image.png

# Process an image and save it to another file.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=input_image.png:output_image.png

# Process a whole folder of images.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --images=path/to/input/folder:path/to/output/folder

Video

# Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
# If you want, use "--display_fps" to draw the FPS directly on the frame.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=my_video.mp4

# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=0

# Process a video and save it to another file. This uses the same pipeline as the ones above now, so it's fast!
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=input_video.mp4:output_video.mp4

As you can tell, eval.py can do a ton of stuff. Run the --help command to see everything it can do.

python eval.py --help

Training

By default, we train on COCO. Make sure to download the entire dataset using the commands above.

  • To train, grab an imagenet-pretrained model and put it in ./weights.
    • For Resnet101, download resnet101_reducedfc.pth from here.
    • For Resnet50, download resnet50-19c8e357.pth from here.
    • For Darknet53, download darknet53.pth from here.
  • Run one of the training commands below.
    • Note that you can press ctrl+c while training and it will save an *_interrupt.pth file at the current iteration.
    • All weights are saved in the ./weights directory by default with the file name <config>_<epoch>_<iter>.pth.
# Trains using the base config with a batch size of 8 (the default).
python train.py --config=yolact_base_config

# Trains yolact_base_config with a batch_size of 5. For the 550px models, 1 batch takes up around 1.5 gigs of VRAM, so specify accordingly.
python train.py --config=yolact_base_config --batch_size=5

# Resume training yolact_base with a specific weight file and start from the iteration specified in the weight file's name.
python train.py --config=yolact_base_config --resume=weights/yolact_base_10_32100.pth --start_iter=-1

# Use the help option to see a description of all available command line arguments
python train.py --help

Multi-GPU Support

YOLACT now supports multiple GPUs seamlessly during training:

  • Before running any of the scripts, run: export CUDA_VISIBLE_DEVICES=[gpus]
    • Where you should replace [gpus] with a comma separated list of the index of each GPU you want to use (e.g., 0,1,2,3).
    • You should still do this if only using 1 GPU.
    • You can check the indices of your GPUs with nvidia-smi.
  • Then, simply set the batch size to 8*num_gpus with the training commands above. The training script will automatically scale the hyperparameters to the right values.
    • If you have memory to spare you can increase the batch size further, but keep it a multiple of the number of GPUs you're using.
    • If you want to allocate the images per GPU specific for different GPUs, you can use --batch_alloc=[alloc] where [alloc] is a comma seprated list containing the number of images on each GPU. This must sum to batch_size.

Logging

YOLACT now logs training and validation information by default. You can disable this with --no_log. A guide on how to visualize these logs is coming soon, but now you can look at LogVizualizer in utils/logger.py for help.

Pascal SBD

We also include a config for training on Pascal SBD annotations (for rapid experimentation or comparing with other methods). To train on Pascal SBD, proceed with the following steps:

  1. Download the dataset from here. It's the first link in the top "Overview" section (and the file is called benchmark.tgz).
  2. Extract the dataset somewhere. In the dataset there should be a folder called dataset/img. Create the directory ./data/sbd (where . is YOLACT's root) and copy dataset/img to ./data/sbd/img.
  3. Download the COCO-style annotations from here.
  4. Extract the annotations into ./data/sbd/.
  5. Now you can train using --config=yolact_resnet50_pascal_config. Check that config to see how to extend it to other models.

I will automate this all with a script soon, don't worry. Also, if you want the script I used to convert the annotations, I put it in ./scripts/convert_sbd.py, but you'll have to check how it works to be able to use it because I don't actually remember at this point.

If you want to verify our results, you can download our yolact_resnet50_pascal_config weights from here. This model should get 72.3 mask AP_50 and 56.2 mask AP_70. Note that the "all" AP isn't the same as the "vol" AP reported in others papers for pascal (they use an averages of the thresholds from 0.1 - 0.9 in increments of 0.1 instead of what COCO uses).

Custom Datasets

You can also train on your own dataset by following these steps:

  • Create a COCO-style Object Detection JSON annotation file for your dataset. The specification for this can be found here. Note that we don't use some fields, so the following may be omitted:
    • info
    • liscense
    • Under image: license, flickr_url, coco_url, date_captured
    • categories (we use our own format for categories, see below)
  • Create a definition for your dataset under dataset_base in data/config.py (see the comments in dataset_base for an explanation of each field):
my_custom_dataset = dataset_base.copy({
    'name': 'My Dataset',

    'train_images': 'path_to_training_images',
    'train_info':   'path_to_training_annotation',

    'valid_images': 'path_to_validation_images',
    'valid_info':   'path_to_validation_annotation',

    'has_gt': True,
    'class_names': ('my_class_id_1', 'my_class_id_2', 'my_class_id_3', ...)
})
  • A couple things to note:
    • Class IDs in the annotation file should start at 1 and increase sequentially on the order of class_names. If this isn't the case for your annotation file (like in COCO), see the field label_map in dataset_base.
    • If you do not want to create a validation split, use the same image path and annotations file for validation. By default (see python train.py --help), train.py will output validation mAP for the first 5000 images in the dataset every 2 epochs.
  • Finally, in yolact_base_config in the same file, change the value for 'dataset' to 'my_custom_dataset' or whatever you named the config object above. Then you can use any of the training commands in the previous section.

Creating a Custom Dataset from Scratch

See this nice post by @Amit12690 for tips on how to annotate a custom dataset and prepare it for use with YOLACT.

Citation

If you use YOLACT or this code base in your work, please cite

@inproceedings{yolact-iccv2019,
  author    = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},
  title     = {YOLACT: {Real-time} Instance Segmentation},
  booktitle = {ICCV},
  year      = {2019},
}

For YOLACT++, please cite

@article{yolact-plus-tpami2020,
  author  = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title   = {YOLACT++: Better Real-time Instance Segmentation}, 
  year    = {2020},
}

Contact

For questions about our paper or code, please contact Daniel Bolya.

yolact's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yolact's Issues

a very very very strange problem on windows

I think thers is something incompatible with windows in yolact.py

run the eval.py says cuda unkown error, the error locates at 'torch.set_default_tensor_type('torch.cuda.FloatTensor')'. It looks like cuda init unsuccessfully.

I try to put 'torch.set_default_tensor_type('torch.cuda.FloatTensor')' from top line to down,like this:

#try1
import torch
torch.set_default_tensor_type('torch.cuda.FloatTensor')
from data import COCODetection, get_label_map, MEANS, COLORS
...

#try2
import torch
...
torch.set_default_tensor_type('torch.cuda.FloatTensor')
from yolact import Yolact
...

then I find that put it before 'from yolact import Yolact' works, otherwise failed.

Now, at the begin of yolact.py, write as follow:

import torch
torch.set_default_tensor_type('torch.cuda.FloatTensor')
from data import COCODetection, get_label_map, MEANS, COLORS
...

Issue while running eval.py scripts

I am running this on a linux 18.04 box with python3 and all the most recent versions of the libraries. Any Idea why I get this error?

$ python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video=/home/vib/Desktop/AndurilSRC/LPR_DATA/lotsofcars_1.mp4:output_video-det.mp4

Config not specified. Parsed yolact_base_config from the file name.

Loading model... Done.
Traceback (most recent call last):
File "eval.py", line 935, in
evaluate(net, dataset)
File "eval.py", line 722, in evaluate
savevideo(net, inp, out)
File "eval.py", line 682, in savevideo
preds = net(batch)
File "/home/vib/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/vib/Desktop/Personal/yolact/yolact.py", line 612, in forward
return self.detect(pred_outs)
File "/home/vib/Desktop/Personal/yolact/layers/functions/detection.py", line 76, in call
result = self.detect(batch_idx, conf_preds, decoded_boxes, mask_data, inst_data)
File "/home/vib/Desktop/Personal/yolact/layers/functions/detection.py", line 103, in detect
boxes, masks, classes, scores = self.fast_nms(boxes, masks, scores, self.nms_thresh, self.top_k)
File "/home/vib/Desktop/Personal/yolact/layers/functions/detection.py", line 148, in fast_nms
iou.triu_(diagonal=1)
RuntimeError: invalid argument 1: expected a matrix at /pytorch/aten/src/THC/generic/THCTensorMathPairwise.cu:203
FAIL

Compute Validation Loss

Hi, is there a way to get validation loss during training? I want to monitor it for overfitting cases.

I noticed you had it before (which is giving me errors), but the overhaul has removed it.

Thanks.

Fine tuning with existing model

Hi,

I tried to train a model with a custom dataset and the resnet101 backbone. I noticed that while half of the bounding boxes looked accurate, the masks were completely off. I checked drew the annotations and verified that they are correct.

It could be due to the size of the dataset: 1357 images and 21 classes. I would like to use yolact_im700_54_80000.pth and fine tune it with my custom classes to see if this improves my results. What would be the steps to do this?

How you produce 'maskcoefficients'

Hi,Thanks a lot for your fantastic work!But,i found that in your paper ,you produce 'mask coefficients'
by using fc layers.but in your code, i found you produce 'mask coefficients' by using conv layer.Can you tell me which kind of layer you use for producing 'mask coefficients'?Thanks for your reply!

Problems encountered while training my own dataset

Hi,
In order to solve the stacking problem of the same object, I have trained my data set as required, but there are some masks that cannot completely cover the object, only part of them can be covered. Do you know what this is about? Do you have any Suggestions for modification?
Looking forward to your reply,thank you.

Training speed

I am not having consistent GPU utilization, and it says 18 days for 1 v100 gpu(p3.2xlarge) with batchsize of 12 and num-workers 8. Does this make sense?

Is there any explanation of timer column and is there tensorboard equivalent for viewing performance over time?

Thank you very much!

training my dataset with multi gpus

Hi, thanks for your good job!
I want to train my dataset, and using for 4gpus, but I find it slower than single gpu(same batch_size), why?

A issue for custom dataset

Hi, thanks for your work. Recently I am trying to train the net using my custom dataset. There is an issue that I find it hard to debug it by myself. Here is my problem. Thanks a lot for your help again.

[ 2] 2930 || B: 3.808 | C: 2.416 | M: 4.821 | S: 0.049 | T: 11.094 || ETA: 4 days, 14:47:05 || timer: 0.478
[ 2] 2940 || B: 3.795 | C: 2.418 | M: 4.838 | S: 0.049 | T: 11.101 || ETA: 4 days, 14:47:10 || timer: 0.497
[ 2] 2950 || B: 3.787 | C: 2.421 | M: 4.812 | S: 0.049 | T: 11.069 || ETA: 4 days, 14:49:01 || timer: 0.474
[ 2] 2960 || B: 3.778 | C: 2.422 | M: 4.846 | S: 0.049 | T: 11.095 || ETA: 4 days, 14:49:52 || timer: 0.512
[ 2] 2970 || B: 3.748 | C: 2.419 | M: 4.846 | S: 0.048 | T: 11.061 || ETA: 4 days, 14:49:04 || timer: 0.491

Computing validation mAP (this may take a while)...

Traceback (most recent call last):
File "train.py", line 377, in
train()
File "train.py", line 300, in train
compute_validation_map(yolact_net, val_dataset)
File "train.py", line 370, in compute_validation_map
eval_script.evaluate(yolact_net, dataset, train_mode=True)
File "/data/pancreas/root/yolact-master/eval.py", line 869, in evaluate
prep_metrics(ap_data, preds, img, gt, gt_masks, h, w, num_crowd, dataset.ids[image_idx], detections)
File "/data/pancreas/root/yolact-master/eval.py", line 433, in prep_metrics
ap_obj = ap_data[iou_type][iouIdx][_class]
IndexError: list index out of range

preserve_aspect_ratio question

I am training on cityscapes, so I want to preserver the ratio.(1024, 2048)
However, after turn on preserve ratio, loss keep decrease but the visualization of bounding box position always wrong.

And I find this line use max_size both at width and height.
I think it should be b_w, b_h = (int(cfg.max_size / r_w * w), int(cfg.min_size / r_h * h)).
or directly b_w, b_h =w, h
I don't understand the comment # A hack to scale the bboxes to the right size
I wonder is this a bug or some trick?

b_w, b_h = (cfg.max_size / r_w * w, cfg.max_size / r_h * h)

Thanks

eval.py does not process all 5k images

When I run:

python eval.py --trained_model=weights/yolact_base_54_800000.pth --dataset=coco2017_dataset

It only evaluates 4952 images. Any ideas on why it does't go though the 5000 images in ./data/coco/images/ ?

The image folder has 5000 images and the annotations_val2017.json file has annotations for those images.

What do I need to change so that it evaluates the complete set of images? (5k)

Training time is long?

Hi, dbolya.

Thanks for your work. I tried to reproduce the performance with ResNet50 pre-trained model and used the command 'python train.py --config=yolact_resnet50_config'. While training, I found that it need about 30 days to finish the training which was too long. Then I set batch_size = 32 because I have 8 GPUs, but it remains the same. The total training time was still about 30 days.

Did I do anything wrong? Or the training time is actually long? How can I use Multi-GPU to accelerate training?

Thanks!

MemoryError

memory is 12G,only used 8G

python train.py --config=yolact_base_config --batch_size=5
loading annotations into memory...
Done (t=0.06s)
creating index...
index created!
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
Initializing weights...
Begin training!

[ 0] 0 || B: 8.264 | C: 14.452 | M: 14.870 | S: 3.010 | T: 40.595 || ETA: 0:00:00 || timer: 12.147
[ 0] 10 || B: 9.251 | C: 9.149 | M: 7.010 | S: 2.204 | T: 27.615 || ETA: 0:57:52 || timer: 0.445
[ 0] 20 || B: 8.156 | C: 7.494 | M: 6.613 | S: 1.537 | T: 23.800 || ETA: 1:00:17 || timer: 0.441
[ 0] 30 || B: 8.053 | C: 6.515 | M: 6.317 | S: 1.206 | T: 22.091 || ETA: 1:08:55 || timer: 0.437
[ 0] 40 || B: 7.631 | C: 5.865 | M: 6.203 | S: 0.981 | T: 20.680 || ETA: 1:22:37 || timer: 0.428
[ 0] 50 || B: 7.558 | C: 5.397 | M: 6.149 | S: 0.845 | T: 19.949 || ETA: 1:20:02 || timer: 0.432
Traceback (most recent call last):
File "train.py", line 374, in
train()
File "train.py", line 211, in train
for datum in data_loader:
File "/home/chase/anaconda3/envs/maskrcnn_benchmark1/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/home/chase/anaconda3/envs/maskrcnn_benchmark1/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
MemoryError: Traceback (most recent call last):
File "/home/chase/anaconda3/envs/maskrcnn_benchmark1/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/chase/anaconda3/envs/maskrcnn_benchmark1/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/chase/yolact/data/coco.py", line 88, in getitem
im, gt, masks, h, w, num_crowds = self.pull_item(index)
File "/home/chase/yolact/data/coco.py", line 151, in pull_item
{'num_crowds': num_crowds, 'labels': target[:, 4]})
File "/home/chase/yolact/utils/augmentations.py", line 658, in call
return self.augment(img, masks, boxes, labels)
File "/home/chase/yolact/utils/augmentations.py", line 54, in call
img, masks, boxes, labels = t(img, masks, boxes, labels)
File "/home/chase/yolact/utils/augmentations.py", line 380, in call
current_masks = masks[mask, :, :].copy()
MemoryError

Custom dataset runtime error

Hello

I am trying to retrain yolact on Pascal Part a variation of Pascal VOC where each classes has many sub-classes.
To simplify everything I make every sub-classes a class in addition with the 20 original one which give me a set 316 classes.
I generated three JSON files for each case.

When I start training I encouter the following error:
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity
Which happen here:
losses = criterion(out, wrapper, wrapper.make_mask())
train.py around line 262 (I had some print in my file so my line number is different)

Here:
eriklindernoren/PyTorch-YOLOv3#110

I read it might be a path issue however I rechecked the image path are correct.
Also I am able to train Pascal Voc using the same image path without issues.

I try to investigate the forward method of the loss function looking for an empty tensor but I did not find any.

evaluation model download URL

hi dbolya,

Can u upload your model on Google drive or other disk? The URL provided by ucdavis. is not accessable

Thanks

Is YOLACT feasible on mobile devices?

First of all, I would like to thank you for your outstanding contribution. Secondly, I would like to ask how the algorithm you proposed works on mobile devices with insufficient computing power and computing memory. Could you give me some reasonable Suggestions? Thank you so much!

more experienments would be nicer

the paper says that box2pix relies on an extremely light-weight backbone detector.
I think more experienments maybe nicer. maybe like this
kitti cityscape coco
box2pix
yolact

also ,yolact-lite maybe good,just like yolo-lite using light-weight backbone(like xception).
this is the yolact v1 just like yolo v1.
I am wondering if the encoder-decoder achitecture or the atrous convolution may help which is adopped by deeplab v3 plus.
expecting yolact v2...

How to run eval.py without cuda?

Hello, I'm trying to run eval.py, but got an error.
The error message is:

Traceback (most recent call last):
File "eval.py", line 990, in
torch.set_default_tensor_type('torch.cuda.FloatTensor')
File "/home/administrator/anaconda3/lib/python3.7/site-packages/torch/init.py", line 158, in set_default_tensor_type
_C._set_default_tensor_type(t)
File "/home/administrator/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 161, in _lazy_init
_check_driver()
File "/home/administrator/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 75, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

I don't have gpu graphic card on my pc, and how to run eval.py without cuda? Thanks.

Inference speed problem on my own environment

First of all, thanks for sharing the amazing work!
Following the instructions, I have deployed the environment and can execute the code successfully, however, when running eval.py, the inference speed is slower than expected.
For model ResNet101-FPN, when testing on validation set of coo, the code return about 9 FPS, and when testing on my own images of kinect (640*480), with ploting and saving disabled, the code return about 14 FPS.
my own evironment is : GTX1080, cuda8.0, cudatoolkits8.0, I am using anaconda, gpu support is checked via

torch.cuda.is_available()

I am a newer for pytorch, so I am wondering there is some configuration or dependencies have missed.

Thanks!

IndexError: list index out of range

Hello!
I trained this model with own dataset, but it fails in the mAP evaluation phase, does anyone have the same problem?

(tensorflow) root@gpuserver:/home/gpuserver/models/yolact# python train.py --config=yolact_base_config
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Initializing weights...
Begin training!

/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
[ 0] 0 || B: 5.480 | C: 23.075 | M: 5.976 | S: 67.004 | T: 101.536 || ETA: 0:00:00 || timer: 23.377
[ 0] 10 || B: 4.757 | C: 18.774 | M: 5.625 | S: 47.000 | T: 76.155 || ETA: 11 days, 7:25:02 || timer: 1.176
[ 0] 20 || B: 4.587 | C: 15.804 | M: 5.362 | S: 29.147 | T: 54.900 || ETA: 11 days, 7:50:00 || timer: 1.180
[ 0] 30 || B: 4.582 | C: 13.355 | M: 5.309 | S: 19.954 | T: 43.199 || ETA: 11 days, 6:14:29 || timer: 1.272
[ 0] 40 || B: 4.553 | C: 11.175 | M: 5.306 | S: 15.150 | T: 36.183 || ETA: 11 days, 6:35:18 || timer: 1.266
[ 0] 50 || B: 4.497 | C: 9.617 | M: 5.303 | S: 12.227 | T: 31.645 || ETA: 11 days, 6:12:37 || timer: 1.120
[ 0] 60 || B: 4.433 | C: 8.514 | M: 5.290 | S: 10.265 | T: 28.503 || ETA: 11 days, 4:48:22 || timer: 1.166
[ 1] 70 || B: 4.383 | C: 7.700 | M: 5.304 | S: 8.850 | T: 26.237 || ETA: 11 days, 7:53:19 || timer: 1.236
[ 1] 80 || B: 4.339 | C: 7.073 | M: 5.269 | S: 7.781 | T: 24.464 || ETA: 11 days, 6:55:00 || timer: 1.173
[ 1] 90 || B: 4.294 | C: 6.585 | M: 5.250 | S: 6.945 | T: 23.074 || ETA: 11 days, 6:22:39 || timer: 1.217
[ 1] 100 || B: 4.235 | C: 6.015 | M: 5.230 | S: 5.666 | T: 21.147 || ETA: 11 days, 5:30:35 || timer: 1.259
[ 1] 110 || B: 4.131 | C: 4.426 | M: 5.184 | S: 1.178 | T: 14.920 || ETA: 11 days, 4:57:38 || timer: 1.177
[ 1] 120 || B: 4.045 | C: 3.427 | M: 5.202 | S: 0.242 | T: 12.915 || ETA: 11 days, 4:43:12 || timer: 1.214
[ 2] 130 || B: 3.926 | C: 2.860 | M: 5.195 | S: 0.192 | T: 12.174 || ETA: 11 days, 5:57:53 || timer: 2.714
[ 2] 140 || B: 3.817 | C: 2.654 | M: 5.138 | S: 0.180 | T: 11.789 || ETA: 11 days, 5:43:35 || timer: 1.230
[ 2] 150 || B: 3.694 | C: 2.571 | M: 5.045 | S: 0.170 | T: 11.480 || ETA: 11 days, 5:23:29 || timer: 1.217
[ 2] 160 || B: 3.617 | C: 2.516 | M: 4.966 | S: 0.158 | T: 11.256 || ETA: 11 days, 5:37:45 || timer: 1.277
[ 2] 170 || B: 3.540 | C: 2.467 | M: 4.876 | S: 0.149 | T: 11.031 || ETA: 11 days, 5:16:56 || timer: 1.222
[ 2] 180 || B: 3.440 | C: 2.419 | M: 4.831 | S: 0.141 | T: 10.831 || ETA: 11 days, 4:54:42 || timer: 1.176
[ 2] 190 || B: 3.342 | C: 2.364 | M: 4.716 | S: 0.135 | T: 10.558 || ETA: 11 days, 4:41:58 || timer: 1.187

Computing validation mAP (this may take a while)...

Traceback (most recent call last):
File "train.py", line 374, in
train()
File "train.py", line 303, in train
compute_validation_map(yolact_net, val_dataset)
File "train.py", line 367, in compute_validation_map
eval_script.evaluate(yolact_net, dataset, train_mode=True)
File "/home/gpuserver/models/yolact/eval.py", line 791, in evaluate
prep_metrics(ap_data, preds, img, gt, gt_masks, h, w, num_crowd, dataset.ids[image_idx], detections)
File "/home/gpuserver/models/yolact/eval.py", line 401, in prep_metrics
ap_obj = ap_data[iou_type][iouIdx][_class]
IndexError: list index out of range

what inspire you the prototypenet?

I know the retinanet inspire the basic backbone, ssd inspire the loss, mask-rcnn inspire the branch,
but I wonder what inspire you the protonet?

How to see graph structure?

Hi sir.
I want to see the data flow to understand this article. However, I nerver use torch. Could you send me a graph logdir by tensorboardX? Thank you in advance.

Computational time with own code

Hi, thank you for the awesome work!
For some reasons, I have to re-write your eval.py by myself.
However, if I run the code, it will take 2 seconds just for prediction.
Do you have any idea why is it?

I already checked I enabled GPU.


import os
from data import COCODetection, MEANS, COLORS, COCO_CLASSES
from yolact import Yolact
from utils.augmentations import BaseTransform, FastBaseTransform, Resize
from utils.functions import MovingAverage, ProgressBar
from layers.box_utils import jaccard, center_size
from utils import timer
from utils.functions import SavePath
from layers.output_utils import postprocess, undo_image_transformation
import pycocotools

from data import cfg, set_cfg, set_dataset

import numpy as np
import torch
import torch.backends.cudnn as cudnn
from torch.autograd import Variable
import argparse
import time
import random
import cProfile
import pickle
import json
import os
from pathlib import Path
from collections import OrderedDict
from PIL import Image

import matplotlib.pyplot as plt
import time

set_cfg("yolact_resnet50_config")
with torch.no_grad():
    torch.cuda.set_device(1)
    cudnn.benchmark = True
    cudnn.fastest = True
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
    net = Yolact()
    net.load_weights('./weights/yolact_resnet50_54_800000.pth')
    net.eval()
    net = net.cuda()
print('model loaded...')

#run your code
def execute(rgb_image):
    net.detect.cross_class_nms = True
    net.detect.use_fast_nms = True
    cfg.mask_proto_debug = False
    with torch.no_grad():
        frame = torch.Tensor(rgb_image).cuda().float()
        batch = FastBaseTransform()(frame.unsqueeze(0))
        time_start = time.clock()
        preds = net(batch)
        time_elapsed = (time.clock() - time_start)
        h, w, _ = rgb_image.shape
        t = postprocess(preds, w, h, visualize_lincomb=False, crop_masks=True, score_threshold=0)
        torch.cuda.synchronize()
        
        classes, scores, boxes, masks = [x[:MAX_MASK_SIZE].cpu().numpy() for x in t]

        print(time_elapsed)

??? a bug when i training

[ 0] 3180 || B: 3.273 | C: 6.118 | M: 5.300 | S: 1.431 | T: 16.121 || ETA: 8 days, 0:27:19 || timer: 0.833
[ 0] 3190 || B: 3.251 | C: 6.134 | M: 5.046 | S: 1.343 | T: 15.774 || ETA: 8 days, 0:20:56 || timer: 0.924
[ 0] 3200 || B: 3.220 | C: 6.074 | M: 5.023 | S: 1.346 | T: 15.663 || ETA: 8 days, 0:14:25 || timer: 0.922
[ 0] 3210 || B: 3.249 | C: 6.012 | M: 4.997 | S: 1.397 | T: 15.655 || ETA: 8 days, 0:03:42 || timer: 0.824
[ 0] 3220 || B: 3.167 | C: 5.980 | M: 4.841 | S: 1.368 | T: 15.355 || ETA: 7 days, 23:56:10 || timer: 0.831
/opt/conda/conda-bld/pytorch_1550813258230/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dt
ype *, Dtype *) [with Dtype = float, Acctype = float]: block: [33,0,0], thread: [192,0,0] Assertion *input >= 0. && *input <= 1. failed.
/opt/conda/conda-bld/pytorch_1550813258230/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dt
ype *, Dtype *) [with Dtype = float, Acctype = float]: block: [33,0,0], thread: [193,0,0] Assertion *input >= 0. && *input <= 1.THCudaCheck FAIL file=/opt/conda/conda-bld/pyt orch_1550813258230/work/aten/src/THC/generated/../THCReduceAll.cuh line=317 error=59 : device-side assert triggered failed.

can you help me solve it? Thanks

how to use your scripts to generate my own anchor sizes and scales?

Dear Sir:
I have some problem to understand your cluster_bbox_sizes.py, optimize_bboxes.py and bbox_recall.py. I really want use them to set the parameters: scales aspect_ratios and conv_sizes more reasonable.
Could you please explain a little of what these means? Thanks a lot!

I use the default paras as the yolact_base.cfg does, and test the scripts on a dataset
scales = [ [24],[48],[96],[192],[384] ] aspect_ratios = [ [[1, 1/sqrt(2), sqrt(2)]] ]*5 conv_sizes = [(69, 69), (35, 35), (18, 18), (9, 9),(5,5)]
here are the results:
from: cluster
`0.062 (18) aspect ratios:
17.71 (8)
5.23 (8)
109.76 (2)

0.146 (70) aspect ratios:
4.39 (34)
2.26 (30)
0.65 (6)

0.241 (125) aspect ratios:
1.12 (103)
0.23 (21)
0.00 (1)
`

from optimize_bbox:

`(Iteration 9) Aspect Ratios: [[[19.03, 0.55, 1.13]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]]]

scales = [[17.53], [60.94], [108.94], [204.94], [396.94]]

aspect_ratios = [[[19.03, 0.55, 1.13]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]]]
`

from bbox_recall:

`Total recall: 33.80

small recall: 0.00
medium recall: 0.00
large recall: 46.75
`

Thanks a lot! It's a bit hard for me >o<

Not able to get 30+ fps processing speed on Nvidia RTX 2080 GPU

Hello, first off, thank you for sharing this amazing work. Much appreciated.

I wanted to report in that I also could not get 30+fps on an Nvidia RTX 2080 GPU with 8GB RAM. I am getting 8-10fps with video and with images, I get ~16fps (0.06sec/image) with the Resnet-101 model, ~20fps (0.05sec/image) with the Resnet-50 model and 17-18fps (0.055sec/image) with the Darket53 model. This is quite impressive but its roughly 1/2 of what is reported in the paper. For images, I used the python timeit module to wrap the evalimage function to report my numbers. Also, it is weird that the difference in speed between the different models is not significant (especially between Resnet-101 and Resnet-50), which indicates to me that something is reducing the processing speed by ~1/2 for all the models.

The command I am using is as below (except I change the model name as needed):

python3 eval.py --trained_model=weights/yolact_resnet50_54_800000.pth --score_threshold=0.4 --top_k=100 --images=./test_images:./test_output_images

I also tried using --benchmark but there is no change in the numbers above.

I was wondering if I could get some help to figure this out.

Support on Multi-GPU?

Hi, dbolya,

I did not find dataparallel in your yolact.py, which define the model. So the code in your repo did not support multi-gpu properly?
I tried simple CUDA_VISIBLE_DEVICES to assign multi-gpu, but the performance is not right according to the train log.

Thanks!

About Training Implenmentation detail of yolact

Thanks for sharing your your great work!
I compared yolact's training config with that of retinanet since yolact is based on retinanet(I think)
I have a few questions about the training config of yolact.
(1) the batch size on one GPU is 8, so how many GPUs did you use when training? 4 or 8? which means that total batch size is 32 or 64. Retinanet's batch size is 16.
(2) the iterations is 800k, which is almost 10x larger than retinanet. why?
(3) the learning rate is 1e-3, which is 10 times smaller than retinanet, why?

Thanks!

KeyError while trying to retrain on Pascal

Hello

I am facing a little issue.
I am trying to retrain the model on Pascal Voc 2012 dataset.
I took the coco like annotations from this source:
https://github.com/facebookresearch/multipathnet

Then I follow the instruction concerning the modification to do in the file config.py

But when I call : python train.py --config=yolact_base_config

I receive the following error:

KeyError: 'Traceback (most recent call last):\n File "/home/smile/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/smile/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/hdd1/prog/yolact/data/coco.py", line 88, in __getitem__\n im, gt, masks, h, w, num_crowds = self.pull_item(index)\n File "/hdd1/prog/yolact/data/coco.py", line 145, in pull_item\n target = self.target_transform(target, width, height)\n File "/hdd1/prog/yolact/data/coco.py", line 39, in __call__\n label_idx = self.label_map[obj[\'category_id\']] - 1\nKeyError: 12\n'

The error is quite not clear to me.

So what I did is create a new dataset:

PASCAL_VOC_CLASSES = ("aeroplane", "bicycle", "bird", "boat", "bottle",
		      "bus", "car", "cat", "chair", "cow", "diningtable",
        	      "dog", "horse", "motorbike", "person", "pottedplant",
		      "sheep", "sofa", "train", "tvmonitor")


PASCAL_VOC_LABEL_MAP = { 1:  1,  2:  2,  3:  3,  4:  4,  5:  5,  6:  6,  7:  7,  8:  8,
                   9:  9, 10: 10, 11: 11, 13: 12, 14: 13, 15: 14, 16: 15, 17: 16,
                  18: 17, 19: 18, 20: 19, 21: 20}

pascalvoc2012_dataset = dataset_base.copy({
    'name': 'PASCAL VOC 2012',
    
    'train_images':'/media/smile/45C142AD782A7053/Datasets/PASCAL_VOC/VOC2012/VOCdevkit/VOC2012/JPEGImages/',
    'train_info':'/home/smile/multipathnet/data/annotations/pascal_train2012.json',

    'valid_images':'/media/smile/45C142AD782A7053/Datasets/PASCAL_VOC/VOC2012/VOCdevkit/VOC2012/JPEGImages/',
    'valid_info':'/home/smile/multipathnet/data/annotations/pascal_val2012.json',

    'label_map': PASCAL_VOC_LABEL_MAP
})

I created a new base_config that only which call the dataset I previously created with the proper number of classes:

pascalvoc_base_config = Config({
    'dataset': pascalvoc2012_dataset,
    'num_classes': 21, # This should include the background class
...

All the other fields are let untouch.

Finally I adapted yolact_base_config:

#yolact_base_config = coco_base_config.copy({
yolact_base_config = pascalvoc_base_config.copy({
    'name': 'yolact_base',

    # Dataset stuff
#    'dataset': coco2017_dataset,
#    'num_classes': len(coco2017_dataset.class_names) + 1,

    'dataset': pascalvoc2012_dataset,
    'num_classes': len(pascalvoc2012_dataset.class_names) + 1,

Here also all the other fields are let untouch.

EDIT

After applying the modifications discussed here the dataset configuration in order to train Pascal Voc is:

MEANS_PV = (103.17, 111.70, 116.69)
STD_PV = (61.11, 59.89, 61.00)

PASCAL_VOC_CLASSES = ("aeroplane", "bicycle", "bird", "boat", "bottle",
		      "bus", "car", "cat", "chair", "cow", "diningtable",
        	      "dog", "horse", "motorbike", "person", "pottedplant",
		      "sheep", "sofa", "train", "tvmonitor")


PASCAL_VOC_LABEL_MAP = { 1:  1,  2:  2,  3:  3,  4:  4,  5:  5,  6:  6,  7:  7,  8:  8,
                   9:  9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16,
                  17: 17, 18: 18, 19: 19, 20: 20}

pascalvoc2012_dataset = dataset_base.copy({
    'name': 'PASCAL VOC 2012',
    
    'train_images':'/media/smile/45C142AD782A7053/Datasets/PASCAL_VOC/VOC2012/VOCdevkit/VOC2012/JPEGImages/',
    'train_info':'/home/smile/multipathnet/data/annotations/pascal_train2012.json',

    'valid_images':'/media/smile/45C142AD782A7053/Datasets/PASCAL_VOC/VOC2012/VOCdevkit/VOC2012/JPEGImages/',
    'valid_info':'/home/smile/multipathnet/data/annotations/pascal_val2012.json',

    'label_map': PASCAL_VOC_LABEL_MAP,
    'class_names': PASCAL_VOC_CLASSES,
})

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.