Giter Site home page Giter Site logo

drnet_cvpr2017's Introduction

Code of Detecting Visual Relationships with Deep Relational Networks

The code is written in python, and all networks are implemented using Caffe.

Datasets

  • VRD
  • sVG: subset of Visual Genome
    • Link
    • Images can be downloaded from the website of Visual Genome
    • Remarks: eventually I found no time to further clean it. This subset has a manually cleaned list for relationship predicates. The list for objects may need further cleaning, although Faster-RCNN can get a recall@20 around 50%.
    • Using our method, you can get the corresponding results reported in the paper on this dataset.

Networks

This repo contains three kinds of networks. And all of them get the raw response for predicate based on both appearance cues and spatial cues, followed by a refinement according to responses of the subject, the object and the predicate. The networks are designed for the task of predicate recognition, where ground-truth labels of the subject and the object are provided as inputs. Therefore, in these networks, responses of the subject and the object are replaced with indicator vectors, and only response of the predicate will be refined.

In these networks, the subnet for appearance cues is VGG16, and the subnet for spatial cues consists of three conv layers. And outputs of both subnets are combined via a customized concatenate layer, followed by two fc layers to generate raw response for the predicate.

The customized concatenate layer is used for combining the output of a fc layer and channels of the output of a conv layer, which can be replaced with caffe's Concat layer if the last conv layer in spatial subnet (conv3_p) is equivalently replaced with a fc layer.

The details of these networks are

  • drnet_8units_softmax: it has 8 inference units with softmax function as the activation function.

  • drnet_8units_linear_shareweight: it has 8 inference units with no activation function, and the weights are shared across units.

  • drnet_8units_relu_shareweight: it has 8 inference units with relu function as the activation function, and the weights are shared across units.

Training

The training procedure is component-by-component. Specifically, a network usually contains three components, namely the subnet for appearance (A), the subnet for spatial cues (S), and the drnet for statistical dependencies (D). In training, we train the network as follow:

  • train A in isolation
  • train S in isolation
  • train A + S in isolation, with weights initialized from previous steps
  • train A + S + D jointly, with weights initialized from previous steps

Each step we use the same loss, and we use dropout to avoid overfit.

Recalls on Predicate Recognition

Networks Recall@50 Recall@100
drnet_8units_softmax 75.22 77.55
drnet_8units_linear_shareweight 78.57 79.94
drnet_8units_relu_shareweight 80.86 81.83

Codes

  • lib/: python layers, as well as auxiliary files for evaluation
  • prototxts/: training and testing prototxts
  • tools/: python codes for preparing data and evaluation
  • snapshots/: pretrain models

Finetune or Evaluate

  1. Download the dataset VRD
  2. Preprocess the dataset using tools/preprare_data.py
  3. Download one pretrain model in snapshots/
  4. Finetune or Evaluate using corresponding prototxts in prototxts/

Pair Filter

Structure

Structure

Training

To train this network, we randomly sample pairs of bounding boxes (with labels) from each training image, treating those with 0.5 IoU (or above) with any ground-truth pairs (with same labels) as positive samples, and the rest as negative samples.

Citation

If you use this code, please cite the following paper(s):

@article{dai2017detecting,
	title={Detecting Visual Relationships with Deep Relational Networks},
	author={Dai, Bo and Zhang, Yuqi and Lin, Dahua},
	booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
	year={2017}
}

License

This code is used for research only. See LICENSE for details.

drnet_cvpr2017's People

Contributors

doubledaibo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.