zliucr / crossner Goto Github PK

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

License: MIT License

Python 96.11% Shell 3.89%

named-entity-recognition ner cross-domain low-resource domain-adaptation sequence-labeling multi-domain multi-domain-adaptation corpora dataset

crossner's People

Contributors

Stargazers

Watchers

crossner's Issues

Vocab files

Hello,

Thank you for sharing the code and the datasets.
I am trying to reproduce the experiments, but I am getting an error because the vocab.txt file is not present in any of the domain folders
I am trying to run the baseline
python main.py --exp_name politics_bilstm_wordchar --exp_id 1 --tgt_dm politics --bilstm --dropout 0.3 --lr 1e-3 --usechar --emb_dim 400
but i am getting the error:
FileNotFoundError: [Errno 2] No such file or directory: 'ner_data/conll2003/vocab.txt'

Maybe I am missing some step to generate the vocab files

Regards,

Request to share pretrained model checkpoints

Hi,
Thanks for open-sourcing your work. I was exploring this repo and was curious to reproduce these results.
Since domain-adaptive pre-training is compute heavy and expensive, could you share the pre-trained weights to enable experimentation on your datasets?
For example: One would need "politics_spanlevel_integrated/pytorch_model.bin" to train any baseline for politics domain.
It would be great if you could share these model files.

PS: vocab.txt files are also missing in the data folder. Although one can create it easily, it would be great if you could share your version to ensure consistency.

Thanks,
-Nitesh

AttributeError: 'BertTokenizerFast' object has no attribute 'max_len'

Fixed it by replacing max_len to model_max_length .

DAPT训练时间太久了，请问可以提供DAPT后的model吗

AI domain has "programlang" entity not recorded in entity categories

According to the preprint here https://arxiv.org/pdf/2012.04373.pdf, the domain "AI" is not supposed to have entity "programlang". But the entity is in https://github.com/zliucr/CrossNER/blob/main/ner_data/ai/train.txt#L96. Can I ask what is the correct list of entities for AI and other domains? Thank you.

BERT Training Epoch Number

Hi,
Thank you for opening source you work. In run_language_modeling.py, I notice that you set "num_train_epochs" as 15. Is there any reason doing that? Because default value in huggingface script is 3. And there isn't an evaluation file. Is there any risk of overfitting?

Pre-train then Fine-tune Comparing with Jointly Train

Hi, I am a little bit confused about the Pre-train meaning in this paper. It seems like sometimes the Pre-train refers to span-level MLM task and sometimes refers to NER task.
According to the repo, the Pre-train on source domain in Pre-train then Fine-tune is to perform NER task on source domain instead of performing MLM task. So the main difference between Pre-rain then Fine-tune and Jointly Train is whether train source domain at first and then select the best model to train on target domain or mix up source domain and target domain data (also including the target domain augmentation) in single training stage. Do I understand it correctly?

Vocab file

Hello,

Thank you for this great work.

I am gretting this error:
FileNotFoundError: [Errno 2] No such file or directory: 'ner_data/conll2003/vocab.txt'

Could you please provide the vocab file?

Thanks.

Pre-training source domain stops on the 2nd epoch

Hi I have another question regarding the pre-training in the source domain (conll) when doing pre-train and then fine-tune.
Here

CrossNER/src/trainer.py

Line 121 in 2e7ba2a

# if no_improvement_num >= 1:

the training is set to stop after 2 epochs. Is this by design? I could not find something about that in the paper.

zliucr / crossner Goto Github PK

crossner's People

Contributors

Stargazers

Watchers

Forkers

crossner's Issues

Vocab files

Request to share pretrained model checkpoints

AttributeError: 'BertTokenizerFast' object has no attribute 'max_len'

DAPT训练时间太久了，请问可以提供DAPT后的model吗

AI domain has "programlang" entity not recorded in entity categories

BERT Training Epoch Number

Pre-train then Fine-tune Comparing with Jointly Train

Vocab file

Pre-training source domain stops on the 2nd epoch

Literature Doamin has "literarygenre" entity which is not recorded in entity categories

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent