Giter Site home page Giter Site logo

mcomp-at-nus's Introduction

Source code of Zhang Shuo's dissertation

For the degree of Master of Computing at NUS.

Description

Saliency Prediction aims to predict attention distribution of human eyes given an RGB image. Most of the recent state-of-the-art methods are based on abstract and deep image representations, taking advantage of traditional CNNs. However, the traditional convolutional structure could not capture the global feature of the image well due to its small kernel size. And the high-level factors which closely correlate to human visual perception, e.g., objects, color, light, etc., are not considered. Inspired by these, we propose Transformer-based method with semantic segmentation as another learning objective. More global cues of the image could be captured by Transformer and simultaneously learning the object segmentation simulates the human visual perception. Our ablation and visualization experiments show the effectiveness of the proposed ideas. Our model achieves competitive performance compared to other state-of-the-art methods. We are among top five in SALICON benchmark (user name: shuo3). We are among top two (traditional evaluation, user name: HATES) in MIT300 benchmark (top five including probabilistic evaluation).

The link to the dissertation will be here soon (if available). The link for the joint learning of saliency prediction and object detection is available at SalDetect.

Installation

  1. We use Anaconda as the basic environment. Please create a new vitual environment by conda create -n pytorch python=3.6 (or 3.7).
  2. Install the dependencies by pip install -r requirements.txt (if necessary). The requirements.txt is provided in this package.

Architecture

The first figure shows the architecture overview, while the second shows the MAM module.

Preparing datasets

Please download SALICON and MIT1003 datasets at http://salicon.net/challenge-2017/ and https://saliency.tuebingen.ai/. SALICON is the largest dataset collected by mouse clicking in the field of saliency, in LSUN workshop at CVPR'17 conference. MIT1003 is the most famous benchmark collected by eye tracking device. Please place the datasets in the same root dir as the code.

SALICON:

Please organize the data as follows after downloading:

salicon
└───images
│     │   *.jpg
|     |
└───maps
      │   *.png

MIT1003:

The data organization is the same as SALICON. Then please preprocess the data to make the image sizes consistent:

cd root_of_mit1003_dir
mkdir ../mitdata
mkdir ../mitdata/images
mkdir ../mitdata/maps
python preprocess_data_mit1003.py

The processed data is in the mitdata folder.

For the dataset of object segmentation preparation, please refer to pytorch-fcn. Please place the dataset in the same root dir as the code.

Training

Please first download the imagenet pretrained transformer model from TransUnet, and place the model under backbone dir. We use pretrained model for better result. Also, please start from the pretrained model on salicon for the training on mit1003.

# train on salicon
python train_salicon.py --data_dir your_data_path --output_folder path_of_saved_models

# train on mit1003, please start the training from the pretrained model on salicon.
python train_mit1003.py --data_dir your_data_path --output_folder path_of_saved_models --model_path your_pretrained_model_on_salicon

Test

# salicon, set save_segmentation as True if you want to output the segmentation.
python eval_salicon.py --image_model_path path_of_your_model --save_segmentation True

# mit1003
python eval_mit.py --image_model_path path_of_your_model

Some saliency prediction results

Acknowledgements

The code is heavily based on TransUnet, EML-Net and pytorch-fcn.

mcomp-at-nus's People

Contributors

zsbluesky avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.