xu-ji / iic Goto Github PK

View Code? Open in Web Editor NEW

843.0 843.0 210.0 20.84 MB

Invariant Information Clustering for Unsupervised Image Classification and Segmentation

License: MIT License

Python 98.58% Yacc 1.16% Shell 0.26%

iic's People

Contributors

Stargazers

Watchers

Forkers

archive-git-repo tawnkramer maojingyi finger563 lolz0r robertogemartin kastnerkyle blukee cboulos sckelemen glclucky nathanin sebastiani zhmiao yu1ut didosidali iyengarlekha hoangdzung wuqiangch kaiduohong deepblue0822 tanayvarshney spahrk1 mengkunzhao thoamsdong jamirando 3woodwater rfazeli jingfei-liu changewow yanxi0830 zhao-biophotonics cheneyyu smalltarget108 breadcrumbbuilds dlwbm123 b-m-j-m chen-song yjhong89 txxhoney anji993 githubfragments phymucs kwikwag nickpezzotti1 oiotoxt primecai uzair789 gaiya2050 minzhangm ysx001 msalvaris seobbro ainewdemo keliive chaozhong2010 gladcolor yhu9 leap-frog-sensetime ys-li16 danxan lasummer atmahou lilyylil zhouleisjtu fulmo wolfworld6 mubashirhanif wwarriner hellfirec02 osmium18452 yakuzeng 1092098588 suyanzhou626 geo000 guang000 surfcao ming1993li rahilgholami taigw sirmarjan jmsteitz yaweiyang-sz acelectic pecoraio sallyhuangimperial jhryulab khalilsarwari mattjburke maxkumundzhiev yiliu-coding mseyfi zombasy christophjud lemonci therealsupermario candacelax chengdingg rajeshb24 bkbai

iic's Issues

MNIST training accuracy

I am loading the MNIST 685 model. According to .cluster_eval() I have around 99% accuracy. Evaluating it manually with the coded listed below returns 85% accuracy. Am I doing something wrong?

Load model:

config_in = open("/content/code/mnist_original/config.pickle", "rb")
config = pickle.load(config_in)
net = archs.__dict__[config.arch](config)
net.load_state_dict(torch.load("/content/code/mnist_original/best_net.pytorch"))

Load MNIST:

from code.utils.cluster.cluster_eval import cluster_eval
from code.utils.cluster.data import cluster_twohead_create_dataloaders

if "MNIST" in config.dataset:
  sobel = False
else:
  sobel = True

dataloaders_head_A, dataloaders_head_B, \
mapping_assignment_dataloader, mapping_test_dataloader = \
  cluster_twohead_create_dataloaders(config)

net.cuda()
cluster_eval(config, net,
             mapping_assignment_dataloader=mapping_assignment_dataloader,
             mapping_test_dataloader=mapping_test_dataloader,
             sobel=sobel, print_stats=True)
net.cpu()

Set the mappings from semantic clusters to actual classes as the .cluster_eval() recommends:
mappings = {0: 9, 1: 3, 2: 1, 3: 4, 4: 7, 5: 8, 6: 5, 7: 6, 8: 0, 9: 2}
Manually test for accuracy:

batch = next(enumerate(mapping_test_dataloader))[1]
imgs = batch[0]
labels = batch[1]

def calculate_accuracy(imgs, labels, n):
    correct = 0.0
    for i in range(n):
      raw_prediction = torch.argmax(net(imgs[i].reshape(1, 1, 24, 24))[0]).item()
      predicted = mappings.get(raw_prediction, -1)
      label = labels[i].item()

      if int(predicted) == int(label):
          correct += 1
      
    return correct / float(n) 

np.mean(calculate_accuracy(imgs, labels, 700))

How to visualize segmentation results?

Using the pre-trained model 544, I want to visualize the results of the image segmentation on Potsdam-3 as given in Figure 7 of the IIC research paper. However, downloading the models from the link given in README.MD only gives me the config files for the training. How do I execute the models to see the unsupervised segmentation in action?

How to run this code on custom data

Thanks for your great work! I check the code and find its hard coded for the benchmark given in the paper, whether could i run on my custom data? thanks a lot!

IID_segmentation_loss_uncollapsed ?

Hi,
Could you please tell me what is the different between IID_segmentation_loss and IID_segmentation_loss_uncollapsed?
I saw the difference as the normalization towards padding.

great work

Some questions about training time

Hi, first thanks for the amazing work and providing the code.

I am testing my model on potsdam-3 for unsupervised segmentation, with only two P100 (so 24 GB), the batch size if quite small, and in your example command you have 4800 epochs, which is a lot (that might take me like more than a month to complete), I was wandering if I am doing something wrong or is the training time is indeed this long? and can you please provide just some metrics, like the number of epochs needed for good performance in your experiments and the minimal loss values we can expect?

Thanks again for this work.

Metrics for unsupervised segmentation

Hi @xu-ji

I managed to use IID for segmentation on Potsdam (3 and 6) but was wondering how you performed objective measures, e.g. pixel accuracy, to compare my results with yours.

Do you map ground-truth masks to cluster predictions taken from a network maximising IID?

Thanks in advance.

Batchnormtrack Flag setting for cifar10

Hi @xu-ji,
Thanks for this wonderful work. I am rerunning your code here and noticed that in the commands.txt, the setting with cifar10 is without --batchnorm_track, while most other commands are with this flag. I can understand that freezing the BN can be used in a finetuning setting but here apparently not the case. Can you tell me why the BN has been freezed for this particular setting for training Cifar10 from scratch?
Thanks for your help in advance.

classify my images by IIC

hello
thank you for that great work.
i have 38 K images size 128*128 , and i want to classify them into 7 groups using your model.
unfortunately i cannot know how i can run your model for this problem , i can say am try all way for that and everything not work.
now i have your code with all package installed in my machine.
so can you tell me how what i can do .

The

Supplementary Material for the Paper

Hi there,

As mentioned in the paper, some details you mentioned in the paper are referred to supplementary material. However, I cannot find the supplementary material after the paper. Can you provide it?

Stable loss for MNIST

Hi and thanks for your work,

I was just trying to plug-in the IID loss function for the MNIST example but the loss seems to stabilize to -0.55 just after 1 epoch. If I remove the NaN check, instead of stabilizing the network it starts to output NaNs and fails to learn.

I did consider this in the last version of your paper: "Mutual information (3) expands to I(z, z') = H(z) - H(z | z')." And also, the largest value for H(z) is ln(C) and minimum value for H(z|z') is 0. If the network has randomly assigned weights, its fair to say that the first set of predictions from the first set of mini-batches will have equal likelihood for each class and that H(z) will in fact be ln(C). Which checks out! But the conditional cluster assignment entropy remains at -6 throughout the entire training process and maximizing IID doesn't trade-off between individual and conditional assignments.

Here is a link to the repo, its just something simple:
https://github.com/Bralio123/iic_simple

Just looking for a bit of insight as to what I may be doing wrong. Perhaps I should add an additional clustering head but I'm not sure if this will help.

What is the environment needed for running this?

I'm having quite a bit of trouble running this, from simple stuff between python2.7 and 3.6 to weird behavior from my version of PyTorch not broadcasting types properly.

Training time for CIFAR10

Thank you for releasing your codes! I tried to run your codes on CIFAR10 recently. According to the provided command, the totally epochs you set is 2000. I am using 4 RTX 2080ti GPUs to run your codes. It need to take about 10 minutes for each epoch. Therefore, the totally time cost for training CIFAR10 with 4 GPUs will be about 14 days, which is too long. I am not whether I did something wrong, or it indeed need 14 days? Thank you!

How to install javapackages==1.0.0?

I cannot pip install javapackages==1.0.0?
ERROR: Could not find a version that satisfies the requirement javapackages==1.0.0 (from versions: none)
ERROR: No matching distribution found for javapackages==1.0.0

About the scope of the IID_loss

Hi! I apply the function of IID_loss in the object classification and the loss is negative. And the accuracy decrease in the training process. Is there something wrong? Thank you for answering!

How to load pre-trained model

How do you recommend loading a pre-trained model? In particular, I am working with model-ind 685 (MNIST - unsupervised).

About the local spatial invariance in segmentation loss

Hi xuji,
In your paper 3.3 segmentation, in order to build the spatial relationships between patches, you did a convolution within every gt-k layer of prediction. Maybe this will lead to detail loss. The main class in local will dominate the region, is that right? do you have any idea about that?

Memory issue RuntimeError: CUDA out of memory

When I run on 4 gpus
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m code.scripts.segmentation.segmentation_twohead --mode IID --dataset Coco164kCuratedFew --dataset_root /vulcan/scratch/shlok/IIC/datasets --model_ind 555 --arch SegmentationNet10aTwoHead --num_epochs 4800 --lr 0.0001 --lamb_A 1.0 --lamb_B 1.5 --num_sub_heads 1 --batch_sz 1 --num_dataloaders 1 --use_coarse_labels --output_k_A 15 --output_k_B 3 --gt_k 3 --pre_scale_all --pre_scale_factor 0.33 --input_sz 128 --half_T_side_sparse_min 0 --half_T_side_sparse_max 0 --half_T_side_dense 10 --include_rgb --coco_164k_curated_version 6 --use_uncollapsed_loss --batchnorm_track > gnoded1_gpu0123_m555_r1.out

I receive following error.

File "/vulcan/scratch/shlok/Ana/envs/python2.7pytorch/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/vulcan/scratch/shlok/Ana/envs/python2.7pytorch/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/vulcan/scratch/shlok/IIC/code/scripts/segmentation/segmentation_twohead.py", line 451, in
train()
File "/vulcan/scratch/shlok/IIC/code/scripts/segmentation/segmentation_twohead.py", line 216, in train
using_IR=config.using_IR)
File "code/utils/segmentation/segmentation_eval.py", line 25, in segmentation_eval
verbose=verbose)
File "code/utils/cluster/cluster_eval.py", line 100, in cluster_subheads_eval
verbose=verbose)
File "code/utils/cluster/cluster_eval.py", line 160, in _get_assignment_data_matches
verbose=verbose)
File "code/utils/segmentation/segmentation_eval.py", line 127, in _segmentation_get_data
xrange(config.num_sub_heads)]
RuntimeError: CUDA out of memory. Tried to allocate 4.48 GiB (GPU 0; 10.92 GiB total capacity; 6.56 GiB already allocated; 3.77 GiB free; 1.39 MiB cached)

Do you know why this might occur, I have reduced my batch size to 2.
Also, how many gpu did you use?

torch

Error when training model: "Unsupported operation"

Running this command on a gpu colab notebook:
export CUDA_VISIBLE_DEVICES=0 && python -m code.scripts.cluster.cluster_greyscale_twohead --model_ind 0 --arch ClusterNet6cTwoHead --mode IID --dataset_root /code/utils/cluster/MNIST.py --gt_k 10 --output_k_A 10 --output_k_B 10 --lamb_A 1.0 --lamb_B 1.0 --lr 0.0001 --num_epochs 3200 --batch_sz 10 --num_dataloaders 1 --num_sub_heads 1 --crop_orig --crop_other --tf1_crop centre_half --tf2_crop random --tf1_crop_sz 20 --tf2_crop_szs 16 20 24 --input_sz 24 --rot_val 25 --no_flip --head_B_epochs 2

Gets this error:
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/content/code/code/scripts/cluster/cluster_greyscale_twohead.py", line 495, in
train()
File "/content/code/code/scripts/cluster/cluster_greyscale_twohead.py", line 378, in train
avg_loss_batch.backward()
File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 150, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/init.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: unsupported operation: more than one element of the written-to tensor refers to a single memory location. Please clone() the tensor before performing the operation.

Several Question about IID Segmentation

Hello Xu, thanks for the great work! I'm adopting your approach in my unsupervised segmentation task and encountered several problems about the scripts.

About the 'local spatial invariance' technique in Section 3.3, where's the code implementing it? In dataloader of coco-stuff-3 and potsdam, the paired patches only contains photometric transformation (one raw img, and the other is color jitter version) instead of pairs with shifts. See here.
I'm implementing unsupervised image segmentation for RGB images, which should be the same as 'Fully unsupervised segmentation for coco-stuff-3'. After checking the code, I found the ground truth label/mask is loaded in dataloader and used for computing loss in the training phase. And in the dataloader for Potsdam, the corresponsive label/mask is set to all ones, which I think makes sense. So which should be the right choice for mask_img1 in unsupervsed segmentation task?
I noticed that the function pad_and_or_crop is used both for training and testing part in dataloader. Does cropping images into size (h,w) performs better than directly resizing them into (h,w)? And if you crop them when training, to obtain the whole segmentation map for testing images, we need to generate masks using sliding window with size (h,w) instead of just center crop the img to (h,w), right?

Please correct me if there's any misunderstanding. Thanks for your time!

No module named 'cluster'

Hi, I'm getting import issues. Also, I have changed code to code1 as it was causing some dependency issues.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m code1.scripts.segmentation.segmentation_twohead --mode IID --dataset Coco164kCuratedFew --dataset_root /scratch/local/ssd/xuji/COCO/CocoStuff164k --model_ind 714 --arch SegmentationNet10aTwoHead --num_epochs 4800 --lr 0.0001 --lamb_A 1.0 --lamb_B 1.0 --num_sub_heads 1 --batch_sz 120 --num_dataloaders 1 --use_coarse_labels --output_k_A 15 --output_k_B 3 --gt_k 3 --pre_scale_all --pre_scale_factor 0.33 --input_sz 128 --half_T_side_sparse_min 0 --half_T_side_sparse_max 0 --half_T_side_dense 10 --include_rgb --coco_164k_curated_version 6 --use_uncollapsed_loss --batchnorm_track > gnoded2_gpu0123_m714.out
Traceback (most recent call last):
File "/vulcan/scratch/shlok/Ana/envs/pytorch/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/vulcan/scratch/shlok/Ana/envs/pytorch/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/vulcan/scratch/shlok/IIC/code1/scripts/segmentation/segmentation_twohead.py", line 18, in
import code1.archs as archs
File "/vulcan/scratch/shlok/IIC/code1/archs/init.py", line 1, in
from cluster import *
ImportError: No module named 'cluster'

about the learning rate

I see in your commands that all experiments are done with lr=1e-4 and much more epochs. I wonder why since deepcluster only train on imagenet for 200 epochs with an initial lr=0.05. Does this make much difference on the model performance?

Estimation of the joint distribution

Hello @xu-ji ,

May I ask you why estimating the joint P(X,Y) =P(X)×P(Y) holds in your equation?
It holds when X and Y are independent, It is obvious that P(X) and P(T(X)) are dependent. I may regard it as biased.
You maximize the MI based on the average on z of P(X|Z)×P(Y|Z), where z is an image, does the average of the product helps to reduce this bias?

Extracting images with relative cluster labels, and segmentation

Great work. This has a lot of applicability.

I am attempting to cluster (fully unsupervised) some binary data I have, but I want to cluster it into 10 groups (so setting gt_k and output_k_B to 10 without caring about the output accuracies).

I wanted to extract the assigned cluster label so that I could then review the images relative to their semantic group. I noticed you did something similar in your paper - is there some code in the repo that I didnt notice which can output these labels?

Also - is it possible to generate segmented images with your code without providing Ground Truths? I initially believed this was the case but noticed you report GT images - was this just to compare to human performance?

Again thanks for your time. I really look forward to experimenting with your code!

requires_grad for all_imgs and all_imgs_tf in cluster_greyscale_twohead.py

Why is it required for these two tensors to be set with requires_grad for clustering yet not for segmentation? More so do not understand why one would set requires_grad to True for input images into a conv net, as I expect they should remain static. Thanks in advance!

Why not use KL-divergence or cross-entropy as the objective function?

Hi, I have read your paper and I'm interested in your approach. In my opinion, the invariant information shares similar idea with consistency training in semi-supervised learning [1].

In [1], they use KL-divergence to force consistent prediction between data and augmented ones. I think KL-divergence may be also proper in your implementation without degenracy.

Have you tried some experiments on KL-divergence or cross-entropy as objective function? Could you explain the differences among them?

[1] Unsupervised Data Augmentation for Consistency Training, Xie et al.

How to understand "auxiliary overclustering" ?

As you show the fig.2 in your paper, auxiliary overclustering can used to improve cluster quality.
I read your code, and i notice that you produce 2 datasets called "dataloaders_head_A" and "dataloaders_head_B".
I think you alternately use these two datasets for model training and share parameters among them.
But the model need have a fixed "num_classes", which means parameters of the last few layers of the model cannot be shared.
Specifically, I want to use the "auxiliary overclustering" to my image clustering for a specific task.
But i think just creating a dataloader will be ok. If I just modify the num_classes of the model alternately during training on the same dataloader, can I achieve this "auxiliary overclustering"?

Namely, can you explain brief steps of the "auxiliary overclustering" ?

Minimal tensorflow version

Hi, over the past few days I put together a minimal tensorflow version to study your information based clustering algorithm.

https://github.com/nathanin/IIC/tree/master/tensorflow

The MNIST example can be run in the tf_cluster.py script. It wants a directory called pointcloud to draw the outputs to.

The loss functions are tested to give nearly identical results in tensorflow and pytorch (see the readme there). Still, I'm having trouble training very robust clustering as shown in the MNIST example. The differences would be in the conv net architecture, and the sub-head & auxiliary overclustering heads, unfortunately I'm not fluent in pytorch and it's hard for me to parse the reference package. Currently I only have the MNIST example in place, with regular python iteration to feed the data, instead of a tf.data.dataset. This results in some slow training when image augmentations like flips and rotates are turned on.

I'll keep working on this and make updates to my fork linked above.

Best,

Evaluation for STL10

Hi, Thanks for sharing your work.

I had a question regarding the input size used for evaluation for STL10. Looking at the code in IID_semisup_STL10.py (698), the test data uses TenCrop evaluation with input image size = old_config.input_sz. This is 64x64 from the command used in 650.
Could you please confirm if the numbers reported in Table 3 are all using 64x64 as the input image size with TenCrop evaluation, including the supervised baseline for Cutout networks?

About data splits/partitions (train/test)

Hi Xu,

A well-written paper! Thanks for the code as well.

I am trying to perform timing analysis by loading a pretrianed segmentation model. I have the following question W.R.T dataloader.

In the function segmentation_create_dataloaders(config), the train and test partitions use all the data (train/test/validation) for mode == 'IID'. IID seems to be the required mode by the code. Does this mean entire data was used for training?

Thanks,
Kantha Girish

pretrained model and inference code

Dear IIC group,

Thank you for sharing with us this wonderful project. Just have a simple question.

The model in this link 'https://www.robots.ox.ac.uk/~xuji/models/models.tar.gz' is for mnist or STL dataset? I want to try it on STL dataset. And do you have a example code for inference?

Thank you very much for your help.

Best Wishes,

Alex

Negative loss for segmentation on custom data

Hi,
first of all, thank you for your great work.
I'm currently trying to set up the segmentation model for my custom data set. The training script is running, but I'm receiving a negative loss and the model is not really converging. (Accuracy still around 0.5 after 100 epochs.) Might that be an indication that something is wrong with my code?
Any help is greatly appreciated.
Thanks!

Understanding the 'no-transformation' case

I just had a question about the case where one uses NO transformation g for image clustering (i.e. x = x'). I am playing around with the mutual information loss and it appears to be learning something when trained without transformations. For starters, it learns to predict a one-hot distribution to minimize H(z|z'), but why then is it able to get larger than random chance on the validation set? (10 classes, 18% accurate after a few epochs)

Could anyone help me understand why the network learns something when we maximize the mutual information between an image and the exact same image? Or is it more likely that there is a bug in my code? Thanks! and great work :)

Hardware requirement and training time for clustering of STL10

Hi @xu-ji, could you please describe the hardware requirement and the training time for clustering of STL10? Thank you.

Supplementary Material

Hi, first of all, thanks for the amazing work,

Sorry if this is not the place to ask this type of question, feel free to close the issue; Since you've mentioned the sum. mat in the paper quite a few time, I was wandering where can I find the supplementary material of your paper.

Thanks.

Segmentation tasks

Hi Xu, thanks for the great code and really novel paper! I didn't understand the segmentation implementation too well - hoping you can clarify:

I want to know how to run your code for a segmentation dataset where I have 4 classes. Would I require 4 heads for this task?
Why does SegmentationNet10aTwoHead have two output heads with different sized channels? (Talking about output_k_A and output_k_B).
In your experience, does it take a long time to run the IID loss for say COCO-Stuff 512 settings? It takes me roughly 7 seconds to compute the loss per batch for a 4-class dataset, batch=32 and imgsize= 3x128x128 - I'm not sure if this is normal (My setup is GTX1080 Ti, Cuda 10, pytorch 1.0.1)
EDIT: The time for loss computation was solved for me by removing the float here and by clipping the minimum value to EPS by using the following instead of this:

p_i_j = torch.clamp(p_i_j, min=EPS, max=10000)
p_i_mat = torch.clamp(p_i_mat, min=EPS, max=10000)
p_j_mat = torch.clamp(p_j_mat, min=EPS, max=10000)

lamb parameter

Have you investigated how the lamb affects the performance of model? Tuning it helps me getting a better result, but I don't see anything about it on the paper?

MNIST example does not work

I've been trying to run the unsupervised MNIST example. I took the command from here, and adjusted dataset_root and out_root appropriately.

But when I run it the program hangs, and to stop it I have to restart the computer (no response from ctrl c/z). Htop shows it's using one core for only kernel calls, the longest I've let it run is one hour. I tried looking into the code but haven't figured out what is going wrong. Is there an additional parameter one should change before calling the command?

Side question: What are normal loss values when using the IIC loss function?

how can I load output files like networks files ?

Is it possible to load network in pytorch independently from your code?
how can I test the output network (trained by my data) with my inputs?

what's the explanation of datasets code

in the file (code/datasets/segmentation/cocostuff.py), what operations is done with datsets.

between the following datasets, what's the difference? or where should i refer to
Coco10kFull
Coco10kFew
Coco164kFull
Coco164kFew
Coco164kCuratedFew
Coco164kCuratedFull

Unsupervised MNIST Image Clustering Accuracy

Thank you for sharing the source code for this great work!

I am trying to replicate your unsupervised image clustering results on the MNIST dataset. In the ArXiv paper, the avg. and lowest loss sub-head accuracies are reported to be 98.4% and 99.2% respectively. And in one of your answers to an issue, you stated that the loss function goes down to -2.20, which is also what I obtained after running the training for a few hundred epochs. However, I was never able to reach to a 98% or 99% accuracy. After training from scratch five times, the best accuracy I obtained was around 97.87% on the training and test datasets for all heads.

Do you know what I might have done wrong? Did you get the accuracies you reported on the paper with the same set of random transformations used in the training?

Here is the command I used for training:

export CUDA_VISIBLE_DEVICES=0 && python -m code.scripts.cluster.cluster_greyscale_twohead --model_ind 685 --arch ClusterNet6cTwoHead --mode IID --dataset MNIST --dataset_root /root/IIC/dataset/MNIST --out_root /root/IIC/results --gt_k 10 --output_k_A 50 --output_k_B 10 --lamb_A 1.0 --lamb_B 1.0 --lr 0.0001 --num_epochs 3200 --batch_sz 700 --num_dataloaders 5 --num_sub_heads 5 --crop_orig --crop_other --tf1_crop centre_half --tf2_crop random --tf1_crop_sz 20 --tf2_crop_szs 16 20 24 --input_sz 24 --rot_val 25 --no_flip --head_B_epochs 2

And here is the result I got at the 606th epoch:

Starting e_i: 606
Model ind 685 epoch 606 head B batch: 0 avg loss -2.212163 avg loss no lamb -2.212163 time 2019-12-13 09:59:16.075421
Model ind 685 epoch 606 head B batch: 100 avg loss -2.227755 avg loss no lamb -2.227755 time 2019-12-13 09:59:43.266961
Model ind 685 epoch 606 head B batch: 200 avg loss -2.232833 avg loss no lamb -2.232833 time 2019-12-13 10:00:10.573869
Model ind 685 epoch 606 head B batch: 300 avg loss -2.246190 avg loss no lamb -2.246190 time 2019-12-13 10:00:37.024877
Model ind 685 epoch 606 head B batch: 400 avg loss -2.200957 avg loss no lamb -2.200957 time 2019-12-13 10:01:02.510503
Model ind 685 epoch 606 head B batch: 0 avg loss -2.246450 avg loss no lamb -2.246450 time 2019-12-13 10:01:30.403862
Model ind 685 epoch 606 head B batch: 100 avg loss -2.209442 avg loss no lamb -2.209442 time 2019-12-13 10:01:57.819133
Model ind 685 epoch 606 head B batch: 200 avg loss -2.216221 avg loss no lamb -2.216221 time 2019-12-13 10:02:25.090045
Model ind 685 epoch 606 head B batch: 300 avg loss -2.232495 avg loss no lamb -2.232495 time 2019-12-13 10:02:52.324494
Model ind 685 epoch 606 head B batch: 400 avg loss -2.208003 avg loss no lamb -2.208003 time 2019-12-13 10:03:20.211105
Model ind 685 epoch 606 head A batch: 0 avg loss -2.237204 avg loss no lamb -2.237204 time 2019-12-13 10:03:47.252569
Model ind 685 epoch 606 head A batch: 100 avg loss -2.212928 avg loss no lamb -2.212928 time 2019-12-13 10:04:14.799001
Model ind 685 epoch 606 head A batch: 200 avg loss -2.218388 avg loss no lamb -2.218388 time 2019-12-13 10:04:42.189923
Model ind 685 epoch 606 head A batch: 300 avg loss -2.255437 avg loss no lamb -2.255437 time 2019-12-13 10:05:09.242370
Model ind 685 epoch 606 head A batch: 400 avg loss -2.210399 avg loss no lamb -2.210399 time 2019-12-13 10:05:36.609692
Pre: time 2019-12-13 10:06:12.301906:
std: 2.8002994e-05
best_train_sub_head_match: [(0, 7), (1, 9), (2, 0), (3, 8), (4, 3), (5, 4), (6, 6), (7, 5), (8, 2), (9, 1)]
test_accs: [0.97877145, 0.97877145, 0.9787143, 0.9787143, 0.9787143]
train_accs: [0.97877145, 0.97877145, 0.9787143, 0.9787143, 0.9787143]
best_train_sub_head: 0
worst: 0.9787143
avg: 0.9787372
best: 0.97877145

Suitable for 3D segmentation

Hello, thanks for sharing these very nice results and code. It's up and running for some of my data now. I am kinda new to the field of Deep Learning and I was wondering if this method is suitable for 3D segmentation. Would it require too many modifications?

Thank you,

Giovanni

ModuleNotFoundError when running the code

Hello,

When I run the following command, I get the error "ModuleNotFoundError: No module named 'cluster'".

export CUDA_VISIBLE_DEVICES=0,1 && nohup python -m code.scripts.segmentation.segmentation --mode IID+ --dataset Potsdam --dataset_root /scratch/local/ssd/xuji/POTSDAM --model_ind 487 --arch SegmentationNet10a --num_epochs 4800 --lr 0.00001 --lamb 1.0 --num_sub_heads 1 --batch_sz 60 --num_dataloaders 1 --output_k 24 --gt_k 6 --input_sz 200 --half_T_side_sparse_min 0 --half_T_side_sparse_max 0 --half_T_side_dense 10 --include_rgb --no_sobel --jitter_brightness 0.1 --jitter_contrast 0.1 --jitter_saturation 0.1 --jitter_hue 0.1 --use_uncollapsed_loss --batchnorm_track > gnoded2_gpu01_m487_r1.out &

The results of reproducing this idea

Great work!
I'll refer to your code and paper and reproduce it. However, the results were not satisfactory.
First, define three transformations in init method:
self.transform_tf1= transforms.Compose([transforms.RandomCrop(20),
transforms.Resize(32),
custom_greyscale_to_tensor(include_rgb=False)
])

    self.transform_tf2= transforms.Compose([transforms.RandomCrop(20),
                                transforms.Resize(32),
                                transforms.RandomHorizontalFlip(),
                                transforms.ColorJitter(brightness=0.4, contrast=0.4,saturation=0.4, hue=0.125),
                                custom_greyscale_to_tensor(include_rgb=False)
                                ])

    self.transform_tf3=transforms.Compose([transforms.CenterCrop(20),
                                transforms.Resize(32),
                                custom_greyscale_to_tensor(include_rgb=False)
                           ])

Second, apply these transformations in getitem method:

    img1, target = self.all_data[index], self.all_labels[index]
    img1 = Image.fromarray(img1)
    if not self.mapping_test_dataloader_flag:                 #build the head data
        img1_temp=[]
        img2_temp=[]
        img1_=self.transform_tf1(img1)
        for i in range(config.num_loaders):
            img1_temp.append(img1_)
            img2_temp.append(self.transform_tf2(img1))
        return img1_temp , img2_temp
    else:                                                                        #build the mapping data
        return self.transform_tf3(img1),target

Third, all settings refer to your code, including optimizer, learning rate, etc. But in training stage, the best acc is low (on epoch 10, the best acc=15.14%), and the accuracy rate of each epoch increases very slowly(about 0.5%).

So did I ignore any important settings? Look forward to your reply.

Pretrained models for Doersch and Isola baselines

Are there any pretrained models for the baselines available?
Respectively can you provide the exact settings that were used to reproduce the results presented in the paper?

Training with Potsdam dataset cuda out of memory

hi~
I am training the segmentation code with Potsdam dataset. When the model starts to iterate the GPU memory increasing rapidly and will always out of memory,

I'm using 8 1080TI and my bach size is 2.
My training scrip is
python -u -m code.scripts.segmentation.segmentation_twohead --mode IID --dataset Potsdam --dataset_root /mnt/lustre/lichuchen/lily/unsupervise/dataset/POTSDAM --model_ind 544 --arch SegmentationNet10aTwoHead --num_epochs 4800 --lr 0.000001 --lamb_A 1.0 --lamb_B 1.0 --num_sub_heads 1 --batch_sz 2 --num_dataloaders 1 --output_k_A 36 --output_k_B 6 --gt_k 6 --input_sz 200 --half_T_side_sparse_min 0 --half_T_side_sparse_max 0 --half_T_side_dense 5 --include_rgb --no_sobel --jitter_brightness 0.1
--jitter_contrast 0.1 --jitter_saturation 0.1 --jitter_hue 0.1 --use_uncollapsed_loss --batchnorm_track

Could you tell me the memory usage of your segmentation code?
Thanks~

Unsupervised Segmentation Loss - conceptual question

Hey, I was reading through the segmentation code and I notice that you utilize the pixel labels in the loss through some sort of mask. Is this really unsupervised, or are you just using that as a prior or filter before computing the loss? Any insights will be appreciated, thanks a lot!

Do not work on new Dataset

Do you think for this dataset
https://github.com/Kai-Xuan/ETH-80
Does 18% acc for unsupervised make sense?

Do you have any idea for setting parameter?