Giter Site home page Giter Site logo

loupe's Introduction

Learnable mOdUle for Pooling fEatures (LOUPE) Tensorflow Toolbox

LOUPE is a Tensorflow toolbox that efficiently implements several learnable pooling method such as NetVLAD[1], NetRVLAD[2], NetFV[2] and Soft-DBoW[2] as well as the Context Gating activation from:

Antoine Miech and Ivan Laptev and Josef Sivic, Learnable pooling with Context Gating for video classification, arXiv:1706.06905, 2017.

It was initially used by the winning approach of the Youtube 8M Kaggle Large-Scale Video understading challenge: https://www.kaggle.com/c/youtube8m. We however think these are some general pooling approaches that can be used in various applications other than video representation. That is why we decided to create this small Tensorflow toolbox.

Usage example

Creating a NetVLAD block:

import loupe as lp

'''
Creating a NetVLAD layer with the following inputs:

feature_size: the dimensionality of the input features
max_samples: the maximum number of features per list
cluster_size: the number of clusters
output_dim: the dimensionality of the pooled features after 
dimension reduction
gating: If True, adds a Context Gating layer on top of the 
pooled representation
add_batch_norm: If True, adds batch normalization during training
is_training: If True, the graph is in training mode
'''
NetVLAD = lp.NetVLAD(feature_size=1024, max_samples=100, cluster_size=64, 
                     output_dim=1024, gating=True, add_batch_norm=True,
                     is_training=True)



'''
Forward pass of the pooling architecture with
tensor_input: A tensor of shape:
'batch_size'x'max_samples'x'feature_size'
tensor_output: The pooled representation of shape:
'batch_size'x'output_dim'
'''
tensor_output = NetVLAD.forward(tensor_input)

It is the same usage for NetRVLAD, NetFV and Soft-DBoW.

NOTE: The toolbox can only pool lists of features of the same length. It was specifically optimized to efficiently do so. One way to handle multiple lists of features of variable length is to create, via a data augmentation technique, a tensor of shape: 'batch_size'x'max_samples'x'feature_size'. Where 'max_samples' would be the maximum number of feature per list. Then for each list, you would fill the tensor with 0 values.

References

[1] Arandjelovic, Relja and Gronat, Petre and Torii, Akihiko and Pajdla, Tomas and Sivic, Josef, NetVLAD: CNN architecture for weakly supervised place recognition, CVPR 2016

If you use this toolbox, please cite the following paper:

[2] Antoine Miech and Ivan Laptev and Josef Sivic, Learnable pooling with Context Gating for video classification, arXiv:1706.06905:

@article{miech17loupe,
  title={Learnable pooling with Context Gating for video classification},
  author={Miech, Antoine and Laptev, Ivan and Sivic, Josef},
  journal={arXiv:1706.06905},
  year={2017},
}

Antoine Miech

loupe's People

Contributors

antoine77340 avatar mlopezantequera avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

loupe's Issues

ValueError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [?,1000,2048], [2048,64].

The error raised when I run this code:
'''
import loupe as lp
import tensorflow as tf
x = tf.placeholder("float", [None,1000,2048])
NetVLAD = lp.NetVLAD(feature_size=2048, max_samples=1000, cluster_size=64,
output_dim=2048, gating=True, add_batch_norm=True,
is_training=True)
NetVLAD.forward(x)
'''
I think this x.shape is #batch_size dot #max_sample dot #feature_size. Should I change line 126 in loupe.py into
'''
cluster_weights = tf.get_variable("cluster_weights",
[1, self.feature_size, self.cluster_size],
initializer = tf.random_normal_initializer(
stddev=1 / math.sqrt(self.feature_size)))
''' ?
But it also lead to other error, can you help me?
Thank you very much!

Parameters Clarification

Hi, firstly thank you for making this code public! I am currently looking to reproduce some of the experiments done in the NetVLAD paper. To do this I'd like to use the VLAD layer as defined here and appended to the end of a VGG16 network as they do in the paper. The output shape of a feature from VGG16, minus the final classification layer is 7,7,512. However, I can't figure out how to pass this to the VLAD layer as defined here. There seems to be roughly four input parameters: feature_size, max_samples, cluster_size, output_dim

The paper describes an overview of the system: "Formally, given N D-dimensional local image descriptors as input, and K cluster centres (“visual words”) as VLAD parameters, the output VLAD image representation V is K×D-dimensional. For convenience we will write V as a K ×D matrix, but this matrix is converted into a vector and, after normalization, used as the image representation"

Feature size: should this be the flattened feature dimensions, i.e. 7x7x512 or just 512, is this the D from the paper?
max_samples: I can't find an explicit mention of this parameter anywhere apart from this code.
cluster_size: number of clusters (K in the original paper)
output dimensions: presumably this is KxD?

Some clarification of this would really be appreciated

l2 normalization bug in NetFV

`the process of intra normalization in NetFV

    fv2 = tf.reshape(fv2,[-1,self.cluster_size*self.feature_size])      
    fv2 = tf.nn.l2_normalize(fv2,1)
    fv2 = tf.reshape(fv2,[-1,self.cluster_size*self.feature_size])
    fv2 = tf.nn.l2_normalize(fv2,1)`

should remove the first reshape?

Hello. About keras version

Hello. This is a very useful project. Do you mind if I rewrite the code and adapt LOUPE to keras version instead?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.