bbrattoli / jigsawpuzzlepytorch Goto Github PK

Pytorch implementation of the method from the paper "Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles"

Python 98.70% Shell 1.30%

jigsawpuzzlepytorch's Introduction

JigsawPuzzlePytorch

Pytorch implementation of the paper "Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles" by Mehdi Noroozi GitHub

Partially tested Performances Coming Soon

Dependencies

Tested with Python 2.7
Pytorch v0.3
Tensorflow is used for logging. Remove the Logger all scripts if tensorflow is missing

Train the JigsawPuzzleSolver

Setup Loader

Two DataLoader are provided:

ImageLoader: per each iteration it loads data in image format (jpg,png ,...)
- Dataset/JigsawImageLoader.py uses PyTorch DataLoader and iterator
- Dataset/ImageDataLoader.py custom implementation.

The default loader is JigsawImageLoader.py. ImageDataLoader.py is slightly faster when using single core.

The images can be preprocessed using produce_small_data.py which resize the image to 256, keeping the aspect ratio, and crops a patch of size 255x255 in the center.

Run Training

Fill the path information in run_jigsaw_training.sh. IMAGENET_FOLD needs to point to the folder containing ILSVRC2012_img_train.

./run_jigsaw_training.sh [GPU_ID]

or call the python script

python JigsawTrain.py [*path_to_imagenet*] --checkpoint [*path_checkpoints_and_logs*] --gpu [*GPU_ID*] --batch [*batch_size*]

By default the network uses 1000 permutations with maximum hamming distance selected using select_permutations.py.

To change the file name loaded for the permutations, open the file JigsawLoader.py and change the permutation file in the method retrive_permutations

Details:

The input of the network should be 64x64, but I need to resize to 75x75, otherwise the output of conv5 is 2x2 instead of 3x3 like the official architecture
Jigsaw trained using the approach of the paper: SGD, LRN layers, 70 epochs
Implemented shortcuts: spatial jittering, normalize each patch indipendently, color jittering, 30% black&white image
The LRN layer crushes with a PyTorch version older than 0.3

ToDo

TensorboardX
LMDB DataLoader

jigsawpuzzlepytorch's People

Contributors

Stargazers

Watchers

Forkers

pneha2612 tianbaoli topiaruss sherleens bemoregt skyisnotwarm b1uec1oud shlokk kristinakupf 2-chae ningxili dzcgaara khyrookie jyliu-jade nareshram256 kushagraagrawal guang000 zhuyun97 davidetalon ashok-arjun nezumironin zhangyizhuo24 nachuannairong runqing-formost fsiar mygit007hub egervarc heehyeonkim amberwangsiwen weichichen1210 aureliengauffre ambikeshg cauchyfood yuboge deli7758 yasar-rehman c0olcode chowned zahragh996 janinechen jixin10 prabhat04011998 u7122029 iq-scm giyuntf2 0xjacklove rackyrose

jigsawpuzzlepytorch's Issues

Questions on the implementation detail

First of all, thank you for sharing this code! I am finding it really useful in trying to implement puzzle solver published by Noroozi, 2017. I have few questions on the implementation details.

Dataset/JigsawImageLoader.py line 55
what is the purpose of setting 0 standard deviation values to 1?
Noroozi mentioned in the original publication that "To prevent mapping the appearance to an absolute position we feed multiple Jigsaw puzzles of the same image to the CFN (an average of 69 out of 1000 possible puzzle configurations) and make sure that the tiles are shuffled as much as possible by choosing configurations with sufficiently large average Hamming distance"
How is this being accomplished in your implementation? I understand that JigsawImageLoader outputs a single puzzle configuration per image. Do you simply run multiple epochs to ensure that training see multiple configurations per image?
Noroozi reports 59.5 hours of total training time (until convergence). How long did your implementation take to train until convergence?

repro?

Hi @bbrattoli -- just checking, were you actually able to get this to train? I'm trying to overfit on a small subset of ImageNet (see below) but the loss becomes nan unfortunately.

ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00006172.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00047236.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00033634.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00042206.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00022814.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00048145.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00017111.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00006257.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00017333.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00036231.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00022808.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00047438.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00047250.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00023027.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00004731.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00013475.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00038938.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00031336.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00025652.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00041514.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00014257.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00014589.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00035579.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00049435.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00024587.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00001408.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00036235.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00003069.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00044678.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00009586.JPEG
ImageNet/ILSVRC2012_img_val/val/ILSVRC2012_val_00006834.JPEG

JigsawPuzzlePytorch$ python JigsawTrain.py --classes=1000 --lr=0.1 --hamming=mean --batch 10 ~/ImageNet/
CPU mode
Start training: lr 0.100000, batch size 10, classes 1000
Checkpoint: checkpoints/
TESTING: 0), Accuracy 0.00%
Learning Rate 0.100000
[1/70] 0) LR 0.10000, Loss: 6.903, Accuracy 0.0%
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
TESTING: 25), Accuracy 0.00%
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
TESTING: 50), Accuracy 0.00%
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
TESTING: 75), Accuracy 0.00%
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
Learning Rate 0.100000
Saved: checkpoints/
TESTING: 100), Accuracy 0.00%
Learning Rate 0.010000
[21/70] 100) LR 0.01000, Loss: nan, Accuracy 0.0%
Saved: checkpoints/
Learning Rate 0.010000
Saved: checkpoints/
Learning Rate 0.010000
Saved: checkpoints/
Learning Rate 0.010000
Saved: checkpoints/
Learning Rate 0.010000
Saved: checkpoints/
TESTING: 125), Accuracy 0.00%
Learning Rate 0.010000

hi, i want to know can we able to do object detection without annotating images. if yes then could you please provide the code along with the format of dataset.

What does "ilsvrc12_train.txt" entail?

Dear author, thanks a lot for your contribution! However, I can't find the file ilsvrc12_train.txt anywhere. I guess it's the training images list. I am new to deep learning and not familiar with Imagenet either. Can you kindly attach the files?

Spatially Jittering?

In the original paper it says "We (spatially) jitter the color channels of the color images of each tile randomly by ±0, ±1, ±2 pixels" but seems like in the implementation it's jittering the pixel values. Should it be corrected? Thank you!!

Transfer Learning Experiments

Hi @bbrattoli, have you done any work or experiments on transferring the weights of the self-supervised model to a new model for transfer learning/semi-supervised learning for classification or detection?

how to do downstream task for image classification

Hi, this is very good application. I have trained using my own dataset and your architecture in JigSawNetwork. Now, I am confused how to do downstream task for classification using labeled dataset. can you share how to do that ?
can we used backbone like vgg16, resnet, etc to train self-supervised learning using this repo ? if yes, how to do that ?

thanks

Wrong tiles coordinate with Python 3

Hi,
Thank you for your implementation. I thought people might be interested in using this code with Python 3 and the whole code seems to be compatible with it, except in the following place where a division creates a small issue that might not be easy to detect :

JigsawPuzzlePytorch/Dataset/JigsawImageLoader.py

Lines 46 to 49 in ec85994

    
           for n in range(9): 
        
               i = n / 3 
        
               j = n % 3 
        
               c = [a * i * 2 + a, a * j * 2 + a]

In Python 2 a simple / work as a floor division, which is not the case in Python 3 where you have to use // instead.
Thus, by using Python 3, the coordinates of the center c for cropping are wrong and this gives this kind of tiles, without raising any errors :
Original image :

9 tiles :

Tiles 8 and 9 are missing pixels (which comes from the facts that 7/3>2 and 8/3>2 hence the need to use a floor division) which will make this self-supervised task easier and won't be useful to learn interesting features for the neural network.
Using a floor division with // is also compatible with Python 2, so I think it could be useful to add this change to the current version to improve compatibility and prevent potential misuse of the codes by users with Python 3.

Problem in Training

The accuracy remains zero forever and the loss value does not change as well.
How to solve this problem?

Question about the archtecture and input size

I checked the architecture of the official implementation. It seems that your implementation has the same architecture as the official implementation. I think they use the patch size 7575 as your implementation, rather than 6464 declared in the paper. I want to know whether I make some mistakes. ThX

batchnorm in training?

I tried training the current model in the current setting i.e without batch norm and I saw jigsaw task was not getting the accuracy of 71(as reported in the original paper). After adding batch norm I was able to get an accuracy of 71 on jigsaw task, but the numbers on VOC classification was 4map points below what paper reported. Any ideas?

I don't know why this code doesn't work

To operate the code, you have not touched anything except minor path modifications, but the learning is not going on. What should I do to solve this? This is my modifed code link

	for n in range(9):
	i = n / 3
	j = n % 3
	c = [a * i * 2 + a, a * j * 2 + a]