hiyouga / dual-contrastive-learning Goto Github PK

Code for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation"

Home Page: https://arxiv.org/abs/2201.08702

License: MIT License

Python 100.00%

contrastive-learning text-classification transformers bert deep-learning neural-networks natural-language-processing

dual-contrastive-learning's Issues

Using the calculation comparison loss method in the source code, the calculated loss may be negative, My input is（batchsizedim, batchsizeclass_num*dim, class_num）,And Lz and Lθ may be negative at the same time

Problem when saving the model !

Hi, thank you for your exciting work.
When I want to save model get this error:
Transformer object has no "save_pretrained"

How to save the model after we train it with your code?
In fact, I want to save the model and upload it in Huggingface so that I can load and use it later.

could you please upload the raw dataset?

about chinese dataset

hello author
If the model runs on the Chinese dataset，What parts need to be modified and What should be paid attention to？
thank you！

Issue regarding to the evaluation procedure

Hi thank you for your exciting work. I've noticed a potential problem regarding to the evaluation procedure. To best of my knowledge currently the best model is selected based on the test data. However this is not desirable since in real conditions it is not possible to chose the model based on the testing data. One probable issue rather than getting comparable performances is possibility for overfitting. Altough test data is not used for gradient updates, model is chosen based on the best performing test data. Therefore, we have no way of knowing if the proposed model is just better at leaking the information via model selection. One extreme case is if you randomly guess enough times on test set you can get 100%. That's generally why the validation split is used in prior works 1.

Why did the Dual gradient collapse on my own Chinese dataset?

Dear author, your framework is valid on the English dataset, but when I used dual-loss deficiency on my Chinese dataset, gradient collapse occurred. My Chinese label is two characters, is it related to this? Or do I have to adjust somewhere? Thank you very much. Look forward to hearing from you soon

Some questions with baselines

Your work is very good and effective. But I have some questions about the baseline approach. I tried different hyperparameters to adjust supervised contrastivelearning or unsupervised contrastive learning to fine-tune BERT, and then to classify. But I've never been able to do anything better than just Cross-Entropy. I wonder what I didn't take into account? I've seen a lot of papers that contrastive learning can help improve classification results, but here I always get the opposite. Maybe I want to know the hyperparameters you set when you ran the comparison.

FileNotFoundError: [Errno 2] No such file or directory: './datasets_manual/TREC_Train.json'

I just ran python main_polarity.py and I got the following error and don't know how to proceed.

For the labels containing multiple words, How to take the mean-pooling?

For the labels containing multiple words, How to take the mean-pooling? Is there this code in files?

tSNE plot visualization

Hi there!
I think these code and paper awesome!
when I run this code, I can see increasing accuracy.

but I want to see moving that the class representation and sentence feature representation too.
Could you please upload the tSNE visualization code to github as well?

have a good day.

Why mess with the order of tags when using DualCL？

It's a great job. But I have a question, why do you use DualCL to perform out-of-order operation specifically for labels? This operation will not change the real label in binary classification, but it will change the real label in multi-classification. I don't understand the significance of this.

In fact, I followed this setup and then trained it on my own dataset, a binery classification task like dialogue intention recognition, and trained it for 30 epochs using Roberta, with very poor results, isn't DualCl suitable for this kind of task? I hope you can help me to point out my misunderstanding.

hiyouga / dual-contrastive-learning Goto Github PK

dual-contrastive-learning's Issues

There is no dataset

Some logical problems

Problem when saving the model !

could you please upload the raw dataset?

about chinese dataset

Issue regarding to the evaluation procedure

Why did the Dual gradient collapse on my own Chinese dataset?

Some questions with baselines

FileNotFoundError: [Errno 2] No such file or directory: './datasets_manual/TREC_Train.json'

For the labels containing multiple words, How to take the mean-pooling?

tSNE plot visualization

Why mess with the order of tags when using DualCL？

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent