yunfan-li / contrastive-clustering Goto Github PK

View Code? Open in Web Editor NEW

279.0 5.0 89.0 249.07 MB

Code for the paper "Contrastive Clustering" (AAAI 2021)

License: MIT License

Python 100.00%

clustering contrastive-learning

contrastive-clustering's People

Contributors

Stargazers

Watchers

Forkers

xlearning-scu liuguoyou chengshuai1992 miziha-zp thehappy1 heibaipei floricaaa kiminh jingmouren zqsiat aturkelson angrycai xianhuaxizi lfchener kiyoshikawasaki yuanmengzhixing zhouyingjie123 contributestock zyongbei xindonghao batradhruv sender666 allfu wh-forker zzsd can-it-run jansele fffaby hakureirm jcbrouwer dugzzuli zechuncao shideng7rz haoweiclouds1 stephen-hao chester-w-xie tjulyf0375 leerw tomgoh zshwuhan suvivarshney zhiyuandang 4fee8fea michaeldammann dennisasdzz mirinda123 jeremite alcs417 abigail248 aqdus01 pb745764716 jackyin68 cameleogrey terrisgo vivym hhdxwen isr-wang yaoyao5 shallowdream2 siaer recusant7 chandan1002 italab03 yamashirohermit godofgeek jmoriaty wangyuyunmu

contrastive-clustering's Issues

About the application in other fields

Dear Yunfan,

Thanks for your outstanding work! I have learned a lot from your paper, codes, and GitHub Issues.

In my research area, there are studies based on your work. However, according to my replication results, the clustering accuracy curve kept oscillating between high and very low values during the training process. I attribute this to the changes they made to your original loss function.

After being disappointed, I applied for your work directly to my dataset and was able to get good results. I was very surprised and grateful.

However, on my dataset, the loss shows a decreasing trend as the training proceeds, but the accuracy curve keeps decreasing from the very high values at the beginning.

My data is characterized by small image size, probably around 11x11, and a high number of channels. So I simply design a three-layer CNN network.

The accuracy curves show the following pattern,

Can you help me to see what is the reason for this?

Thanks again.

Best wishes.

loss function

Is there no I{j $\neq$ i} (same as SimCLR) in the loss function (2) and (5)?

Typo in your dependencies

I think I spotted a typo in the dependencies section of the readme.
You are listing python twice. I think you mean pytorch, right?

关于baseline的疑问

您好，请问一下，在baseline的对比中。K-means的特征向量是如何得到的呢？

About loss backward and parameter update

作者您好，感谢您这出色的工作。

对于损失函数的传播和projector的参数更新我存在一点疑问：您的网络中并列存在instance_projector和cluster_projector，同时两个projector在正向传播过程中各自产生一个loss，您的做法是将两个projector的loss相加后再进行loss.backward()，这样的做法可以通过梯度下降同时优化两个并列的projector吗？这种方式和分别依次进行不同projector的loss.backward()有什么不同吗？

谢谢。

关于idea的Fig1行级别解释

云帆，您好！

想请教你们论文的图1，我们都认为对比学习是将同一个样本的不同数据增强拉近，其他的push开，而且本文的核心是将标签看作表示实现聚类级别的对比学习。聚类级别的特征矩阵是经过softmax出来的是soft label 本文将特征矩阵的列作为对比学习越看越优雅！

可是行级别就是普通的对比学习，特征矩阵就是每个实例的表示；图一为什么是Instance Representation也当做soft label（疑似与框架图出现矛盾，框架图展示的是一个特征矩阵的行就是实例本身的特征，并不存在与图一展示的行级别soft label的说法），还是我理解上出现了问题，希望得到您的解答！

Imagenet-10

Thanks for sharing information about how to use different datasets.

How can I use the datasets info to download Imagenet-10?
Do I need to download the entire Imagenet dataset and put the required classes in a PyTorch dataset format?
Which year should I be downloading?
Any more information would be appreciated so that I can try reproducing the results.

Imbalanced Dataset

Hi,

Thank you for this implementation. It is my understanding that some contrastive frameworks build upon entropy maximization, which leads to inapplicability in the contexts of imbalanced datasets. From the paper, I could see that you are also maximizing the entropy in your loss function. Can the instance-level term mitigate the entropy maximization issue and make the method suitable for imbalanced datasets?

Thanks

Cluster-level和instance-level

您好，paper原文中提到将表征解耦成instance-level和cluster-level，可以取得更好的效果。请问如何理解instance-level和cluster-level之间的关系呢，这两者是independent的吗？

关于代码的运行

你好，我想问下，我用的是window的系统，用pycharm运行train.py，只出现Files already downloaded and verified
Files already downloaded and verified，请问这是怎么回事呢？我想跑几个数据集和你的方法做对比实验，请问你可以提供一些帮助吗？

专用GPU内存占满，batchsize达不到256

您好，想请问一下，我看到您论文中使用的显卡是Nvidia TITAN RTX 24G，不管是使用resnet34还是resnet50时batchsize都能达到256，我用24G的3090复现时resnet50的batchsize设置为256会报cuda out of memory，请问有什么解决方法吗？谢谢！

Can I ditch feature extraction and use the method

Hello, I now have features that have been extracted from the pre-trained network. Can I try to learn the representation of features from the ICH module without feature extraction?(maybe a little strange)

ImageNet 10

Hi，

I did not find a description about the distribution of the Imagenet-10 and the data-augment method you used in your paper and code. And I get a bad accuracy on this dataset.

I would like to know if you could describe in detail or if I get something wrong？

关于防止平凡解的计算

Hi，云帆

注意到你们论文计算信息熵，其中P的计算存在疑问：

softmax出来的特征求和已经是为1了，公式（6）中为什么计算P还需要进行1范数对聚类级别的矩阵进行归一化操作

感谢解惑

Could you please offer your code of t-SNE?

Hi,thanks for your excellent CC.
I want to visualize the clustering result,could you please release your code of t-SNE?

Instance Loss output different from NT Xent loss

Hey,

I found a difference between the output of the Instance loss implemented vs NT Xent loss taken from SIMCLR(https://github.com/Spijkervet/SimCLR/blob/master/simclr/modules/nt_xent.py)

Although the functions loss very similar, the outputs seems to be different. Could you please look into it and share your insights?

import torch
import torch.nn as nn
import math


class InstanceLoss(nn.Module):
    def __init__(self, batch_size, temperature, device):
        super(InstanceLoss, self).__init__()
        self.batch_size = batch_size
        self.temperature = temperature
        self.device = device

        self.mask = self.mask_correlated_samples(batch_size)
        self.criterion = nn.CrossEntropyLoss(reduction="sum")

    def mask_correlated_samples(self, batch_size):
        N = 2 * batch_size
        mask = torch.ones((N, N))
        mask = mask.fill_diagonal_(0)
        for i in range(batch_size):
            mask[i, batch_size + i] = 0
            mask[batch_size + i, i] = 0
        mask = mask.bool()
        return mask

    def forward(self, z_i, z_j):
        N = 2 * self.batch_size
        z = torch.cat((z_i, z_j), dim=0)

        sim = torch.matmul(z, z.T) / self.temperature
        sim_i_j = torch.diag(sim, self.batch_size)
        sim_j_i = torch.diag(sim, -self.batch_size)

        positive_samples = torch.cat((sim_i_j, sim_j_i), dim=0).reshape(N, 1)
        negative_samples = sim[self.mask].reshape(N, -1)

        labels = torch.zeros(N).to(positive_samples.device).long()
        logits = torch.cat((positive_samples, negative_samples), dim=1)
        loss = self.criterion(logits, labels)
        loss /= N

        return loss


class NT_Xent(nn.Module):
    """
    More than inspired from https://github.com/Spijkervet/SimCLR/blob/master/modules/nt_xent.py

    Notes
    =====

    Using this pytorch implementation, you don't actually need to l2-norm the inputs, the results will be
    identical, as shown if you run this file.
    """

    def __init__(self, batch_size, temperature, device):
        super(NT_Xent, self).__init__()
        self.batch_size = batch_size
        self.temperature = temperature
        self.mask = self.get_correlated_samples_mask()
        self.device = device

        self.criterion = nn.CrossEntropyLoss(reduction="sum")
        self.similarity_f = nn.CosineSimilarity(dim=2)

    def forward(self, z_i, z_j):
        """
        We do not sample negative examples explicitly.
        Instead, given a positive pair, similar to (Chen et al., 2017), we treat the other 2(N − 1) augmented examples within a minibatch as negative examples.
        """

        p1 = torch.cat((z_i, z_j), dim=0)
        sim = self.similarity_f(p1.unsqueeze(1), p1.unsqueeze(0)) / self.temperature

        sim_i_j = torch.diag(sim, self.batch_size)
        sim_j_i = torch.diag(sim, -self.batch_size)

        positive_samples = torch.cat((sim_i_j, sim_j_i), dim=0).reshape(self.batch_size * 2, 1)
        negative_samples = sim[self.mask].reshape(self.batch_size * 2, -1)

        labels = torch.zeros(self.batch_size * 2).to(self.device).long()
        logits = torch.cat((positive_samples, negative_samples), dim=1)
        loss = self.criterion(logits, labels)
        loss /= 2 * self.batch_size
        return loss

    def get_correlated_samples_mask(self):
        mask = torch.ones((self.batch_size * 2, self.batch_size * 2), dtype=bool)
        mask = mask.fill_diagonal_(0)
        for i in range(self.batch_size):
            mask[i, self.batch_size + i] = 0
            mask[self.batch_size + i, i] = 0
        return mask



a, b = torch.rand(8, 12), torch.rand(8, 12)
a_norm, b_norm = torch.nn.functional.normalize(a), torch.nn.functional.normalize(b)
cosine_sim = torch.nn.CosineSimilarity()
instance_loss = InstanceLoss(8, 0.5, "cpu")
ntxent_loss = NT_Xent(8, 0.5, "cpu")
print('Cosine')
print(cosine_sim(a, b))
print(cosine_sim(a_norm, b_norm))
print('NT Xent')
print(ntxent_loss(a, b))
print(ntxent_loss(a_norm, b_norm))
print('Instance')
print(instance_loss(a, b))
print(instance_loss(a_norm, b_norm))

Output:

Cosine
tensor([0.6606, 0.7330, 0.7845, 0.8602, 0.6992, 0.8224, 0.7167, 0.7500])
tensor([0.6606, 0.7330, 0.7845, 0.8602, 0.6992, 0.8224, 0.7167, 0.7500])

NT Xent
tensor(2.7081)
tensor(2.7081)

Instance
tensor(3.1286)
tensor(2.7081)

As you can see, Instance loss gives different results where as the others don't when fed a_norm and b_norm.

Colab notebook:
https://github.com/Spijkervet/SimCLR/blob/master/simclr/modules/nt_xent.py

关于您提供的预训练模型中的问题

您好，感谢您出色的研究工作。
我想问一下为什么读取您代码中提供的checkpoint的时候，显示没有optimizer的参数？

RuntimeError: numel: integer multiplication overflow

你好，我跑这个代码，换了个我们自己的数据集，经常会报

sim_i_j = torch.diag(sim, self.batch_size)
RuntimeError: numel: integer multiplication overflow

这个是在contrastive_loss.py文件当中。不知道大佬有遇到过吗

Plans for release supplementary materials

Hello, you have mentioned that there are some experiments in your supplementary material. Do you have any plan to release the supplementary material. Thanks for your reply.

Online clustering

您好，论文中提到了online clustering，请问该方法如何进行在线聚类呢?

Low Accuracy Than Reported

I run the code in win10+python3.7+pytorch1.8.1+cu111 and get 21% accuracy on Cifar10 dataset.

关于和分类问题比较的acc

作者大大，您好，请问本文的acc 比如CIFAR10、100的acc 这是和任务有关吗，但是聚类也可以说是分类任务，可是分类任务对这些数据集用对比学习acc都刷得很高了（比如SimCLR对这些数据集分类的acc），不明白其中的道理。还请您解答，感谢!

Visualization

关于对比算法的实现

您好，我对您CC的研究非常感兴趣。
我想问一下您论文中实验部分，您的对比算法K-means和SC是如何实现的？
能否简要介绍一下您对比算法都是如何实现的？我想要在自己的数据集上复现您的对比实验

New dataset format

Hello,

I'd like to train this model on new image datasets. so what should be the file structure for the input dataset? As it is an unsupervised classification, so will it still be the same as Imagefolder format that we generally use i.e

/data
-> train
-> class_1
-> img1.png
-> img2.png
.
.
-> class_2
->val
-> class_1
-> img1.png
-> img2.png
.
-> class_2

Can you please shed some light on this?

Also, if we are experimenting with new data, should we use just training data or train and test data both?

issue

Hi,
when i test the 'ImageNet-dogs'，i get the error“IndexError: index 14 is out of bounds for axis 0 with size 1”, and i print the cost_matrix in 'evaluation.py',i only get [[0.]].
Is it because I trained less epoch?

Thanks.

cifar10和cifar00

你好！请问cifar10和cifar100这两个数据集有.mat形式的数据集吗？请问代码中的cifar10和cifar100数据集可以用全连接网络跑吗？

Clustering of images

Hey,
I have one question.
Is it possible to figure out which input image belongs to which cluster while inference?
if one wants to pick images from a or two clusters.

weight_decay

Why the parameter 'weight_decay' set in code is 0.

您好，我将您的代码改了一下用于预测路网交通流量，取一小时的时间序列进行节点间比较，但是结果是把所有的节点划分成一类，想问一下您有什么建议吗

Data Augmentation

您好，有两个问题。
1、每个epoch都需要重新进行不同数据增强吗，每次的数据增强方式一样吗？
2、我自己的数据集比较小，只有数百个，但是维度接近一万，因为不是图像数据，我采用在每一个epoch加入两个不同噪声的方式构建正样本对，但是训练过程很不稳定，请问有什么改进的办法呢？
谢谢！

Test metrics don't match

你好，感谢你的工作和分享的代码。我用刚刚下载的最新版本的code跑cluster.py

把start_epoch改成1000来读取save文件夹里训练好的模型，然后把image_size改成32跑Cifar-10，得出的结果如下：

Step [0/120] Computing features...
Step [20/120] Computing features...
Step [40/120] Computing features...
Step [60/120] Computing features...
Step [80/120] Computing features...
Step [100/120] Computing features...
Features shape (60000,)
'NMI = 0.0201 ARI = 0.0099 F = 0.2395 ACC = 0.1421'

其他的没有任何改动，请问如何解决？

Why do your result in paper_with paper dimension Disappeared？

Why do your result in paper_with_paper Disappear？

关于STL-10

您好，请问这个STL-10训练的过程中用到的网络是ResNet34还是ResNet50？

Idea

This idea is very similar to this paper 'Deep Robust Clustering by Contrastive Learning'.

resnet模型参数的疑问

你好，比较细致的学习的你代码。下面的问题请教一下：

看train.py中，优化器只优化了network的参数， resnet模型的参数不包含在内，是不是有问题？
因为往前看 resnet.get_resnet函数调用了 modules目录的代码，并不是一个预先训练好的模型。

    res = resnet.get_resnet(args.resnet)
    model = network.Network(res, args.feature_dim, class_num)
    model = model.to('cuda')
    # optimizer / loss
    optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate, weight_decay=args.weight_decay)

loss和论文里好像不太一样啊

是这样的论文里的loss是很简单的ne xent loss但我看您的代码好像是又进行了简化还是有什么改变

Computational requirements for Training

Hey,

Can I train this on Google colab?
How long will it take in general?
Can you share some more info on where i can run this code?

More details on conclusion

Hey!
The paper is extremely well written but could you give some more info on the following part -

The proposed CC shows its promising performance in clustering. In the future, we plan to extend it to other tasks and applications such as semi-supervised learning and transfer learning.

This was mentioned in the conclusion.
Is this a new area of research, or are there applications other than clustering that can be done with this existing repository?
Can you please give some more insights about this part?

Imagenet-10

GOOD

IF just use ICH

If only the ICH is used, the method is SIMCLR in fact？

Questions about multi-GPU training

Thanks for sharing such an excellent work：
Are there any plans to provide multi-GPU training scripts?
I would be very grateful if I could provide it！
thanks!

About the dataset concatenation

Excellent work! We are grateful that the codes can be released for study.
I have a question about the creation of the dataset:
dataset = data.ConcatDataset([train_dataset, test_dataset])
I guess here both train and test sets are used for training and testing. I can understand that in the task of unsupervised clustering, the true labels are invisible, so all data can be used for training. But I am wondering if this is a standard usage or definition in the field of "deep clustering", or I got a wrong understanding? Thanks~

some question about softmax’s output

大佬，应该是**人吧，我就用中文了：

是这样，我用自己的数据训练了一下，然后前向推理时，我在softmax那个地方加了个日志，把输出写到本地看了一下，排序后发现最大的输出并不是“赢者通吃”的状态，甚至出现了有并列最大值的情况，这种状况下，使用argmax获得到的索引能直接作为类别使用么？
可能是我的使用方式不当，希望大佬指点一下；

优秀的工作，感谢分享！

Low Accuracy and NMI

Hi, Thank you for sharing your code. I would like to reproduce your results on CIFAR-10, I ran your original code with 4 GPUs and the results are attached below. My final ACC (NMI) after 990 epochs is about 69% (64%). Did you use any special method and/or hyperparameters for training your network which is not uploaded on Github? I would appreciate it if you could help me to reproduce your results.
result_batch_256.txt

loss 大小

我用的我自己的数据集跑了一下 loss由7.2下降到5.8就收敛了我调了学习率和batch也没有什么改观请问loss这个下降幅度正常吗

Cluster Assignment Entropy

Hey Yunfan,

first of all, this is really great work and a well written paper! Thanks for providing the code.

I am trying to reimplement your method and am a bit confused about the way you penalize the cluster asssignment matrix. In your code you do

p_i = c_i.sum(0).view(-1)
p_i /= p_i.sum()
ne_i = math.log(p_i.size(0)) + (p_i * torch.log(p_i)).sum()
p_j = c_j.sum(0).view(-1)
p_j /= p_j.sum()
ne_j = math.log(p_j.size(0)) + (p_j * torch.log(p_j)).sum()
ne_loss = ne_i + ne_j

This gives a loss of 1.97 for the example cluster assignment matrix Y with 3 clusters and 2 samples + 2 augmented samples below

Y = torch.Tensor([[0.98, 0.01, 0.01],
                   [0.98, 0.01, 0.01],
                   [0.98, 0.01, 0.01],
                   [0.98, 0.01, 0.01]])

c_i = Y[:2]
c_j = Y[2:]

Your code seems to differ quite a lot from how you write it in the paper. According to your paper I would have done the following

Y_one_norm = torch.linalg.norm(Y, ord=1)
c_i = c_i.sum(dim=0)/Y_one_norm
c_j = c_j.sum(dim=0)/Y_one_norm
ne_loss = (c_i*torch.log(c_i)+c_j*torch.log(c_j)).sum()

which gives a loss of -1.20.

Could you kindly let me know your intention of implementing the loss the way you did and why it seems to differ from the maths in your paper?

Thanks!

Lukas