hiyouga / dual-contrastive-learning Goto Github PK

View Code? Open in Web Editor NEW

147.0 6.0 25.0 5.95 MB

Code for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation"

Home Page: https://arxiv.org/abs/2201.08702

License: MIT License

Python 100.00%

contrastive-learning text-classification transformers bert deep-learning neural-networks natural-language-processing

dual-contrastive-learning's Introduction

Dual-Contrastive-Learning

A PyTorch implementation for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation".

You can download the paper via: [ArXiv] [PapersWithCode].

One-Sentence Summary

This paper proposes a novel contrastive learning framework for supervised classification tasks by simultaneously learning the features of input samples and the parameters of classifiers in the same space.

Abstract

Contrastive learning has achieved remarkable success in representation learning via self-supervision in unsupervised settings. However, effectively adapting contrastive learning to supervised learning tasks remains as a challenge in practice. In this work, we introduce a dual contrastive learning (DualCL) framework that simultaneously learns the features of input samples and the parameters of classifiers in the same space. Specifically, DualCL regards the parameters of the classifiers as augmented samples associating to different labels and then exploits the contrastive learning between the input samples and the augmented samples. Empirical studies on five benchmark text classification datasets and their low-resource version demonstrate the improvement in classification accuracy and confirm the capability of learning discriminative representations of DualCL.

Requirement

Python = 3.7
torch = 1.11.0
numpy = 1.17.2
transformers = 4.19.2

Preparation

Clone

git clone https://github.com/hiyouga/Dual-Contrastive-Learning.git

Create an anaconda environment:

conda create -n dualcl python=3.7
conda activate dualcl
pip install -r requirements.txt

Usage

python main.py --method dualcl

Citation

If this work is helpful, please cite as:

@article{chen2022dual,
  title={Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation},
  author={Qianben Chen and Richong Zhang and Yaowei Zheng and Yongyi Mao},
  journal={arXiv preprint},
  year={2022}
}

Contact

hiyouga [AT] buaa [DOT] edu [DOT] cn

License

MIT

dual-contrastive-learning's People

Contributors

Stargazers

Watchers

dual-contrastive-learning's Issues

Some questions with baselines

Your work is very good and effective. But I have some questions about the baseline approach. I tried different hyperparameters to adjust supervised contrastivelearning or unsupervised contrastive learning to fine-tune BERT, and then to classify. But I've never been able to do anything better than just Cross-Entropy. I wonder what I didn't take into account? I've seen a lot of papers that contrastive learning can help improve classification results, but here I always get the opposite. Maybe I want to know the hyperparameters you set when you ran the comparison.

could you please upload the raw dataset?

Some logical problems

Using the calculation comparison loss method in the source code, the calculated loss may be negative, My input is（batchsizedim, batchsizeclass_num*dim, class_num）,And Lz and Lθ may be negative at the same time

Why mess with the order of tags when using DualCL？

It's a great job. But I have a question, why do you use DualCL to perform out-of-order operation specifically for labels? This operation will not change the real label in binary classification, but it will change the real label in multi-classification. I don't understand the significance of this.

In fact, I followed this setup and then trained it on my own dataset, a binery classification task like dialogue intention recognition, and trained it for 30 epochs using Roberta, with very poor results, isn't DualCl suitable for this kind of task? I hope you can help me to point out my misunderstanding.

FileNotFoundError: [Errno 2] No such file or directory: './datasets_manual/TREC_Train.json'

I just ran python main_polarity.py and I got the following error and don't know how to proceed.

Issue regarding to the evaluation procedure

Hi thank you for your exciting work. I've noticed a potential problem regarding to the evaluation procedure. To best of my knowledge currently the best model is selected based on the test data. However this is not desirable since in real conditions it is not possible to chose the model based on the testing data. One probable issue rather than getting comparable performances is possibility for overfitting. Altough test data is not used for gradient updates, model is chosen based on the best performing test data. Therefore, we have no way of knowing if the proposed model is just better at leaking the information via model selection. One extreme case is if you randomly guess enough times on test set you can get 100%. That's generally why the validation split is used in prior works 1.

about chinese dataset

hello author
If the model runs on the Chinese dataset，What parts need to be modified and What should be paid attention to？
thank you！

There is no dataset

Can you provide the dataset please?

tSNE plot visualization

Hi there!
I think these code and paper awesome!
when I run this code, I can see increasing accuracy.

but I want to see moving that the class representation and sentence feature representation too.
Could you please upload the tSNE visualization code to github as well?

have a good day.

Problem when saving the model !

Hi, thank you for your exciting work.
When I want to save model get this error:
Transformer object has no "save_pretrained"

How to save the model after we train it with your code?
In fact, I want to save the model and upload it in Huggingface so that I can load and use it later.

Why did the Dual gradient collapse on my own Chinese dataset?

Dear author, your framework is valid on the English dataset, but when I used dual-loss deficiency on my Chinese dataset, gradient collapse occurred. My Chinese label is two characters, is it related to this? Or do I have to adjust somewhere? Thank you very much. Look forward to hearing from you soon

For the labels containing multiple words, How to take the mean-pooling?

For the labels containing multiple words, How to take the mean-pooling? Is there this code in files?