Giter Site home page Giter Site logo

zliucr / crossner Goto Github PK

View Code? Open in Web Editor NEW
115.0 4.0 24.0 2.33 MB

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

License: MIT License

Python 96.11% Shell 3.89%
named-entity-recognition ner cross-domain low-resource domain-adaptation sequence-labeling multi-domain multi-domain-adaptation corpora dataset

crossner's People

Contributors

zliucr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

crossner's Issues

Vocab files

Hello,

Thank you for sharing the code and the datasets.
I am trying to reproduce the experiments, but I am getting an error because the vocab.txt file is not present in any of the domain folders
I am trying to run the baseline
python main.py --exp_name politics_bilstm_wordchar --exp_id 1 --tgt_dm politics --bilstm --dropout 0.3 --lr 1e-3 --usechar --emb_dim 400
but i am getting the error:
FileNotFoundError: [Errno 2] No such file or directory: 'ner_data/conll2003/vocab.txt'

Maybe I am missing some step to generate the vocab files

Regards,

Request to share pretrained model checkpoints

Hi,
Thanks for open-sourcing your work. I was exploring this repo and was curious to reproduce these results.
Since domain-adaptive pre-training is compute heavy and expensive, could you share the pre-trained weights to enable experimentation on your datasets?
For example: One would need "politics_spanlevel_integrated/pytorch_model.bin" to train any baseline for politics domain.
It would be great if you could share these model files.

PS: vocab.txt files are also missing in the data folder. Although one can create it easily, it would be great if you could share your version to ensure consistency.

Thanks,
-Nitesh

BERT Training Epoch Number

Hi,
Thank you for opening source you work. In run_language_modeling.py, I notice that you set "num_train_epochs" as 15. Is there any reason doing that? Because default value in huggingface script is 3. And there isn't an evaluation file. Is there any risk of overfitting?

Pre-train then Fine-tune Comparing with Jointly Train

Hi, I am a little bit confused about the Pre-train meaning in this paper. It seems like sometimes the Pre-train refers to span-level MLM task and sometimes refers to NER task.
According to the repo, the Pre-train on source domain in Pre-train then Fine-tune is to perform NER task on source domain instead of performing MLM task. So the main difference between Pre-rain then Fine-tune and Jointly Train is whether train source domain at first and then select the best model to train on target domain or mix up source domain and target domain data (also including the target domain augmentation) in single training stage. Do I understand it correctly?

Vocab file

Hello,

Thank you for this great work.

I am gretting this error:
FileNotFoundError: [Errno 2] No such file or directory: 'ner_data/conll2003/vocab.txt'

Could you please provide the vocab file?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.