Giter Site home page Giter Site logo

Comments (4)

meyerjo avatar meyerjo commented on August 24, 2024

@HobbitLong As I don't know if there is any timeline on this issue and just to be sure to understand everything correctly to replicate the NYU RGB-D experiments the following steps would be required to replicate them wouldn't they?

  1. Initialize the different models per modality
feat_l, feat_ab, feat_depth = model(inputs)
  1. Create different contrast objects for the different pairs (e.g. in the core view scheme contrast_l_ab, contrast_l_depth)
    2.1. This would enable something like
out_l, out_ab = contrast_l_and_ab(feat_l, feat_ab, index)
out_l_2, out_depth = contrast_l_and_depth(feat_l, feat_depth, index)
  1. For each of the modalities one need to create a criterion object (twice for the core modality "L"). Which would lead to something like
l_loss_from_l_and_depth = criterion_l2(out_l_2)
depth_loss_from_l_and_depth = criterion_depth(out_l_2)
  1. And then add the results from above to the general loss term.

Is this the correct way to go forward?
How exactly did you do the patched based training on the "L" modality? Could you provide some hyper-parameters for that?

Thanks for the otherwise very nice repo.

from cmc.

HobbitLong avatar HobbitLong commented on August 24, 2024

@meyerjo ,

To your question, we used "patch-based contrastive objective" for this task, please refer to section 3.5.2.

That being said, for each of the modality, we extract global feature as well as local feature. Then we contrast between (1)global feature from modality A to local feature from modality B and (2)global feature from modality B to local feature from modality A. The way we build this global-local contrastive loss is exactly the same way s DIM paper. The code for this loss can be found here.

The way of extracting global local feature can be found here.

And then we sum up those pair-wise loss as described in our paper.

from cmc.

jnyjxn avatar jnyjxn commented on August 24, 2024

@HobbitLong thanks for this additional info.

Just to confirm, the use of a patch-wise method here is only motivated by a small dataset?

I am also trying to find an explanation of the global/local feature concept in your CMC paper - is this a deviation from the method presented there?

Many thanks!

from cmc.

HobbitLong avatar HobbitLong commented on August 24, 2024

@jnyjxn ,

Yes, NYU dataset only has less than 2k images.

This is different from the ImageNet experiment, but has been described in the supplementary of the paper.

from cmc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.