Unofficial implementation of Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning.
Red and blue points show sampled locations in two image views. Large red dot marks a point in the first view and the large blue dots are "matches" for that pixel in the second view (based off a distance threshold of 0.7).First run at pre-training a model is complete. Results fall short of those reported in the PixPro paper, but are on par with other unsupervised pre-training algorithms like MoCo and SimCLR.
Update: This model was trained without excluding BatchNorm parameters and biases from weight decay or LARS adaptation. This omission has been corrected; it may explain the suboptimal performance. TBD.
Results from VOC2007-test:
pretrain | AP50 | AP | AP75 |
---|---|---|---|
ImageNet-1M, supervised | 81.3 | 53.5 | 58.8 |
MoCo v1, 200ep | 81.5 | 55.9 | 62.6 |
SimCLR, 1000ep | 81.9 | 56.3 | 62.5 |
PixPro, 100ep (this repo) | 81.8 | 56.6 | 63.0 |
PixPro, 100ep (reported) | 83.0 | 58.8 | 66.5 |
At least some of this discrepancy may be due to differences in pre-training hyperparameters.
source | batch size | encoder momentum | epochs | number of gpus |
---|---|---|---|---|
PixPro, 100ep (this repo) | 512 | 0.995 | 100 | 4 |
PixPro, 100ep (reported) | 1024 | 0.99 | 100 | 8 |
The training loss had expected behavior until epoch ~80 when it starts increasing slightly.
Implementations of the dataloader, model and train_backbone script are complete for Pixel Propagation.
- Pixel propagation module
- Suppport for all spatial transforms (crops, resizing, flips, rotations, grid/elastic deformations, etc.)
- Generic encoder and projection head for any torchvision model
- Consistency loss for pixel propagation (not pixel contrast)
- BYOL-style data augmentations
- Cosine learning rate schedule
- Momentum encoder's momentum schedule from BYOL (0.99 -> 1 during training)
- LARS optimizer
- Distributed training script for backbone network (e.g. resnet50)
- Support for mixed precision training
- Pre-trained ResNet50 backbone model
- Results on COCO and/or PASCAL
Hyperparameters and training schedules have been reproduced with as much fidelity to the original publication as possible.
If using conda, setup a new environment with required dependencies with:
conda env create -f environment.yml
Then, on an 8 GPU machine, run:
python train_backbone.py {data_directory} {save_directory} -a resnet50 -b 1024 --lr 4 \
--dist-url 'tcp://localhost:10001' --multiprocessing-distributed \
--world-size 1 --rank 0 --momentum 0.9 --fp16
Where {data_directory} should be a path to a folder containing ImageNet training data. To train with mixed precision add the --fp16
flag.
Scale the learning rate by lr = base_lr x batch_size/256
where base_lr=1
. Results where not reported in the paper for smaller batch sizes; however, assuming that it behaves like the BYOL algorithm, there shouldn't be too much of a loss in performance. In addition to scaling the learning rate for smaller batches, BYOL also increases the starting momentum for the encoder. For PixPro the default momentum for a batch size of 1024 is 0.99, for a batch size of 256 it may be better to use a momentum closer to 0.995 (i.e. --pixpro-mom 0.995
).