Comments (1)
Hi @yuedajiong ,
- For the first question, if you were to directly match the output of the context and target encoder, you would learn invariant representations that collapse the masked and unmasked representations to the same point. However, the idea behind I-JEPA is that we want to be able to predict masked regions from visible regions, not collapse them together. Hence, the predictor is essential for computing this mapping.
- For the second question, this is definitely valid. You can pretrain the visual encoder and then just train the predictor in this frozen latent space, but the idea behind I-JEPA is to learn to represent the world by predicting it, hence we were interested in using this approach to actually train a visual encoder. You could of course still use the same network with unfrozen weights for the context and target, but you would need an explicit regularization to prevent collapse (e.g., using the information maximization terms in the diagram you pointed out). This would be an interesting research questions!
from ijepa.
Related Issues (20)
- Making the code a bit more installable HOT 1
- Loading pre-trained model: state_dict key mismatch HOT 2
- is batch size per gpu important to re-implement the accuracy reported in the paper? HOT 4
- Linear probing HOT 2
- Torch version HOT 1
- All blocks have same size within a batch HOT 2
- config files about vit-small and vit-base HOT 3
- Training loss increases HOT 3
- Error in interpolation of pos_embedd when using data of different dimension
- RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. HOT 2
- I Just download the image1k dataset from huggingface, what should i do next to process the tar.gz file and train this model HOT 3
- Is there any visual code?
- How to evaluate the performance of pre-trained model?
- Struggling to replicate evaluation results HOT 14
- An error was reported while training on multiple Gpus HOT 1
- Difficulty continue self supervised pre training on custom dataset HOT 2
- How to load ijepa checkpoints? HOT 6
- Downstream task HOT 5
- Image resolution & folder structure for unsupervised pre-training HOT 1
- imagenet1k Huggingface extraction HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ijepa.