Giter Site home page Giter Site logo

jigsawpuzzlepytorch's Introduction

JigsawPuzzlePytorch

Pytorch implementation of the paper "Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles" by Mehdi Noroozi GitHub

Partially tested Performances Coming Soon

Dependencies

  • Tested with Python 2.7
  • Pytorch v0.3
  • Tensorflow is used for logging. Remove the Logger all scripts if tensorflow is missing

Train the JigsawPuzzleSolver

Setup Loader

Two DataLoader are provided:

  • ImageLoader: per each iteration it loads data in image format (jpg,png ,...)
    • Dataset/JigsawImageLoader.py uses PyTorch DataLoader and iterator
    • Dataset/ImageDataLoader.py custom implementation.

The default loader is JigsawImageLoader.py. ImageDataLoader.py is slightly faster when using single core.

The images can be preprocessed using produce_small_data.py which resize the image to 256, keeping the aspect ratio, and crops a patch of size 255x255 in the center.

Run Training

Fill the path information in run_jigsaw_training.sh. IMAGENET_FOLD needs to point to the folder containing ILSVRC2012_img_train.

./run_jigsaw_training.sh [GPU_ID]

or call the python script

python JigsawTrain.py [*path_to_imagenet*] --checkpoint [*path_checkpoints_and_logs*] --gpu [*GPU_ID*] --batch [*batch_size*]

By default the network uses 1000 permutations with maximum hamming distance selected using select_permutations.py.

To change the file name loaded for the permutations, open the file JigsawLoader.py and change the permutation file in the method retrive_permutations

Details:

  • The input of the network should be 64x64, but I need to resize to 75x75, otherwise the output of conv5 is 2x2 instead of 3x3 like the official architecture
  • Jigsaw trained using the approach of the paper: SGD, LRN layers, 70 epochs
  • Implemented shortcuts: spatial jittering, normalize each patch indipendently, color jittering, 30% black&white image
  • The LRN layer crushes with a PyTorch version older than 0.3

ToDo

  • TensorboardX
  • LMDB DataLoader

jigsawpuzzlepytorch's People

Contributors

bbrattoli avatar biagiobrattoli avatar topiaruss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

jigsawpuzzlepytorch's Issues

Questions on the implementation detail

First of all, thank you for sharing this code! I am finding it really useful in trying to implement puzzle solver published by Noroozi, 2017. I have few questions on the implementation details.

  1. Dataset/JigsawImageLoader.py line 55
    what is the purpose of setting 0 standard deviation values to 1?

  2. Noroozi mentioned in the original publication that "To prevent mapping the appearance to an absolute position we feed multiple Jigsaw puzzles of the same image to the CFN (an average of 69 out of 1000 possible puzzle configurations) and make sure that the tiles are shuffled as much as possible by choosing configurations with sufficiently large average Hamming distance"
    How is this being accomplished in your implementation? I understand that JigsawImageLoader outputs a single puzzle configuration per image. Do you simply run multiple epochs to ensure that training see multiple configurations per image?

  3. Noroozi reports 59.5 hours of total training time (until convergence). How long did your implementation take to train until convergence?

repro?

Hi @bbrattoli -- just checking, were you actually able to get this to train? I'm trying to overfit on a small subset of ImageNet (see below) but the loss becomes nan unfortunately.

ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00006172.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00047236.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00033634.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00042206.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00022814.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00048145.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00017111.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00006257.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00017333.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00036231.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00022808.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00047438.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00047250.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00023027.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00004731.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00013475.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00038938.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00031336.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00025652.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00041514.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00014257.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00014589.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00035579.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00049435.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00024587.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00001408.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00036235.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00003069.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00044678.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00009586.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00006834.JPEG
JigsawPuzzlePytorch$ python JigsawTrain.py --classes=1000 --lr=0.1 --hamming=mean --batch 10 ~/ImageNet/
CPU mode
Start training: lr 0.100000, batch size 10, classes 1000
Checkpoint: checkpoints/
TESTING: 0), Accuracy 0.00%
Learning Rate 0.100000
[1/70] 0) LR 0.10000, Loss: 6.903, Accuracy 0.0%
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
TESTING: 25), Accuracy 0.00%
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
TESTING: 50), Accuracy 0.00%
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
TESTING: 75), Accuracy 0.00%
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
TESTING: 100), Accuracy 0.00%
Learning Rate 0.010000
[21/70] 100) LR 0.01000, Loss: nan, Accuracy 0.0%
Saved: checkpoints/
Learning Rate 0.010000
Saved: checkpoints/
Learning Rate 0.010000
Saved: checkpoints/
Learning Rate 0.010000
Saved: checkpoints/
Learning Rate 0.010000
Saved: checkpoints/
TESTING: 125), Accuracy 0.00%
Learning Rate 0.010000

What does "ilsvrc12_train.txt" entail?

Dear author, thanks a lot for your contribution! However, I can't find the file ilsvrc12_train.txt anywhere. I guess it's the training images list. I am new to deep learning and not familiar with Imagenet either. Can you kindly attach the files?

Spatially Jittering?

In the original paper it says "We (spatially) jitter the color channels of the color images of each tile randomly by ±0, ±1, ±2 pixels" but seems like in the implementation it's jittering the pixel values. Should it be corrected? Thank you!!

Transfer Learning Experiments

Hi @bbrattoli, have you done any work or experiments on transferring the weights of the self-supervised model to a new model for transfer learning/semi-supervised learning for classification or detection?

how to do downstream task for image classification

Hi, this is very good application. I have trained using my own dataset and your architecture in JigSawNetwork. Now, I am confused how to do downstream task for classification using labeled dataset. can you share how to do that ?
can we used backbone like vgg16, resnet, etc to train self-supervised learning using this repo ? if yes, how to do that ?

thanks

Wrong tiles coordinate with Python 3

Hi,
Thank you for your implementation. I thought people might be interested in using this code with Python 3 and the whole code seems to be compatible with it, except in the following place where a division creates a small issue that might not be easy to detect :

for n in range(9):
i = n / 3
j = n % 3
c = [a * i * 2 + a, a * j * 2 + a]

In Python 2 a simple / work as a floor division, which is not the case in Python 3 where you have to use // instead.
Thus, by using Python 3, the coordinates of the center c for cropping are wrong and this gives this kind of tiles, without raising any errors :
Original image :
image
9 tiles :
image

Tiles 8 and 9 are missing pixels (which comes from the facts that 7/3>2 and 8/3>2 hence the need to use a floor division) which will make this self-supervised task easier and won't be useful to learn interesting features for the neural network.
Using a floor division with // is also compatible with Python 2, so I think it could be useful to add this change to the current version to improve compatibility and prevent potential misuse of the codes by users with Python 3.

Problem in Training

The accuracy remains zero forever and the loss value does not change as well.
How to solve this problem?

Selection_009

Question about the archtecture and input size

I checked the architecture of the official implementation. It seems that your implementation has the same architecture as the official implementation. I think they use the patch size 7575 as your implementation, rather than 6464 declared in the paper. I want to know whether I make some mistakes. ThX

batchnorm in training?

I tried training the current model in the current setting i.e without batch norm and I saw jigsaw task was not getting the accuracy of 71(as reported in the original paper). After adding batch norm I was able to get an accuracy of 71 on jigsaw task, but the numbers on VOC classification was 4map points below what paper reported. Any ideas?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.