Giter Site home page Giter Site logo

eml-net-saliency's Introduction

EML-NET-Saliency

This repo contains the code and the pre-computed saliency maps used in our paper: "EML-NET: An Expandable Multi-Layer NETwork for saliency prediction". It has shown that visual saliency relies on objectness within an image, but this may also limit the performance when there is no (known) objects. Our work attempts to broaden the horizon of a saliency model by introducing more types prior knowledge in an efficient way, deeper model architectures, e.g., NasNet can be applied in an "almost" end-to-end fashion. You can also try our modified combined loss funciton as a plug-in to see how it works in your saliency system.

GroundTruth Combined ImageNet PLACE

Train a model

Our training code is based on the SALICON dataset, we assume you already download and unzip the images and annotations under your workspace.

salicon
└───Images
│     │   *.jpg
|     |
└───fixations
|     |  *.mat
|     |
└───maps
      │   *.png

Our training code "train_resnet.py" taks two compulsory arguements, 1. "data_folder"(the path of your workspace). 2 "output_folder"(the folder you want to save the trained model). One more optional arguement you might want to set is "--model_path", pre-trained on ImageNet or PLACE365 for classification, it will train from scratch if not specified.

python train_resnet.py ~/salicon imagenet_resnet --model_path backbone/resnet50.pth.tar

A suffix of "_eml" will be added to the output path, e.g., imagenet_resnet_eml in this case. If you specify the loss flag, --mse, the added suffix will be "_mse". You can simply compare our proposed loss function against the standard mean squared error.

The ImageNet pre-trained model can be obtained from torchvision, the PLACE pre-trained one can be downloaded from their official project here. If you want to try a deeper CNN model, e.g., the NasNet used in our paper, you can download the backbone from this project. We would like to thank the authors and coders of: the Pytorch framework, the PLACE dataset and Remi Cadene for the pre-trained models.

After finetuning a backbone(resnet50 from ImageNet) on the SALICON dataset, we can combine multiple saliency models(ImageNet and PLACE) by training a decoder. In this case, we need two more compulsory arguments are needed, the model paths for imagenet and place. (You can change the code slightly to combine more for a wider horizon.)

python train_decoder.py ~/salicon imagenet_resnet pretrained_sal/imagenet_sal.pth.tar pretrained_sal/place_sal.pth.tar

TODO: the nasnet training code will be merged into this training file and the dataloder will be discarded.

Make a prediction

You can make a prediction on any images you want by running "eval.py". This file takes two compulsory arguements: 1. "model_path"(where you saved the pre-trained saliency mdoel). 2. "img_path"(the path of the input image).

python eval.py pretrained_sal/imagenet_sal.pth.tar examples/115.jpg

Download the pre-trained model first and save it somewhere, e.g., pretrained_sal.

TODO: upload the pre-trained model.

If you think our project is helpful, please cite our work:

@article{JIA20EML,
title = "EML-NET: An Expandable Multi-Layer NETwork for saliency prediction",
journal = "Image and Vision Computing",
volume = "95",
pages = "103887",
year = "2020",
issn = "0262-8856",
doi = "https://doi.org/10.1016/j.imavis.2020.103887",
url = "http://www.sciencedirect.com/science/article/pii/S0262885620300196",
author = "Sen Jia and Neil D.B. Bruce",
keywords = "Saliency detection, Scalability, Loss function",
}

eml-net-saliency's People

Contributors

senjia avatar jasper0401 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.