Giter Site home page Giter Site logo

ast's Introduction

AST Model - Unsupervised Multi-object Segmentation Using Attention and Soft-argmax

Official implementation of the model AST described in the paper " Unsupervised Multi-object Segmentation Using Attention and Soft-argmax" and PhD report "Self-supervised learning of object-centric representations with multi-object detection and segmentation" (link : https://pastel.archives-ouvertes.fr/tel-04106903 )

Requirements

The model needs Pytorch (>= 1.7.1) and Torchvision with cuda capability (see https://pytorch.org/ )

The model also needs OpenCV (>=4.1) (see https://opencv.org/ )

To install other requirements:

pip install -r requirements.txt

The model has been tested on Nvidia RTX 3090 GPU.

How to use the model

Four steps are required to get foreground segmentation masks samples:

  • train the background model
  • generate the background dataset using the trained background model
  • train the foreground model
  • apply the trained foreground model to get segmentation masks

Step 1 : train the background model

  • go to the background directory and create a new entry in the background configuration file config.py with the path to the images dataset,the size of the images, the required number of iterations and a path the backgroudn model output directory. The number of iterations should be

    • 2500 if the sequence shows a fixed background without any illumination changes
    • 10 000 for a video sequence taken from a fixed camera with some illumination changes
    • 24 000 iterations and at least 20 epochs for a video taken from a moving camera
    • 500 000 for non video synthetic datasets with dynamic backgrounds such as ObjectsRoom, CLEVRTEX or Shapestacks
  • start training with the command python train.py. During training, some image samples are generated and saved using the path training_images_output_directory indicated in the configuration file

Specific instructions to handle objectsroom, CLEVR, CLEVRTEX and Shapestacks datasets are provided in the dataset.py, create_clevr_data.py and create_objects_rooom.py files

Step 2 : generate the background dataset

  • update the background configuration file with the path to the final background checkpoint which can be found in the background model output directory
  • start to generate the background dataset with the command python generate_background_dataset.py. This command will create the following datasets in the background model output directory :
    • background image datasets (RGB)
    • background image with error prediction dataset (RGBA)
    • copy of the input image dataset as a frame sequence ( with the same ordering as the background images)
    • copy of ground-truth segmentation masks if available in the input dataset
    • background/foreground segmentation masks predicted by the background model

Step 3 : train the foreground model

  • create a new entry in the foreground configuration file MF_config.py, with the image sizes, value of K, and paths to the required datasets in the background model output directory. Make sure that the dataset name in the background configuration file is the same as in the foreground configuration file.

  • start training the foreground model with python MF_train.py. During training, some image samples are generated and using the path training_images_output_directory set in the configuration file MF_config.py which have to be updated.

Step 4 : apply the trained foreground model to get segmentation masks

  • update the foreground configuration file (object_model_checkpoint_path) with the path to the trained model
  • generate samples using the command python MF_generate_image_samples.py

New in version 2

  • implementation of curriculum training and baseline training scenarios. It remains necessary to build the background dataset since it is used during phase 2 of curriculum training.
  • The object entropy loss is not needed any more and has been removed.
  • The following hyperparameters are not required any more and have been removed: threshold, object entropy loss weight, epsilon_M.

ast's People

Contributors

brunosauvalle avatar

Stargazers

DZhura avatar Maxim avatar Jihoon Oh avatar AlexC avatar Jeff Carpenter avatar Seoyeon Jang avatar  avatar Ondrej Biza avatar  avatar Zhenhao Chen avatar Richard Higgins avatar HalimSD avatar Tasks With Code avatar  avatar  avatar

Watchers

Kostas Georgiou avatar  avatar  avatar

ast's Issues

Problem with reproducing results on CLEVR / CLEVRTEX

Thanks for sharing the code of the article.
We tried re-running the experiments, (first) on CLEVR.
We ran it 3 times using the provided configuration.

Each time, after background training, in the segmentation training, the mIoU increases to at most 15% and stays at this value (compared to the 90.2% in the article).

Any idea of what we might need to change to reproduce the results?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.