Giter Site home page Giter Site logo

salt-nlp / disfluency-generation-and-detection Goto Github PK

View Code? Open in Web Editor NEW
15.0 4.0 5.0 114 KB

Code for "Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection"

License: MIT License

Python 98.56% Shell 1.44%
disfluency-detection dataaugmentation textgeneration

disfluency-generation-and-detection's Introduction

Disfluency-Generation-and-Detection

This repo contains codes for the following paper:

Jingfeng Yang, Zhaoran Ma, Diyi Yang: Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP'2020)

If you would like to refer to it, please cite the paper mentioned above.

Getting Started

These instructions will get you running the codes.

Requirements

  • Python 3.6 or higher
  • Pytorch >= 1.3.0
  • Pytorch_transformers (also known as transformers)

Planner and Generator Disfluency Generation

cd disf_gen_coarse2fine &&
python train.py -learning_rate 0.001 -no_share_emb_layout_encoder -seprate_encoder -batch_size 64 -max_grad_norm 0.1 -layout_weight 1 -optim adam &&
python evaluate.py &&
cd ..

Heuristic Planner + GPT2 Generator for data augmentation

cd disfluency-detection &&
CUDA_VISIBLE_DEVICES=0 python transformers/examples/run_language_modeling.py --output_dir=news3m_ml_finetune_st --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=news_3m --do_eval --eval_data_file=swbd_LM_val --line_by_line --eval_all_checkpoints --num_train_epochs 6 --logging_steps 6000 --save_steps 6000 &&
python createFakeLMdist.py -infile news_to_fake_3m -outfile news_fake_3m_newstune360000_mp -model_path news3m_ml_finetune_st/checkpoint-360000 -gpu 2222333333555555 &&
python writePretrain.py &&
cd ..

Disfluency detection w/ or w/o augmented data

cd disfluency-detection &&
python trainBertPretrain.py || python trainBertPretrain.py -p &&
cd ..

Aknowledgement

Disfluency generation code is adapted from OpenNMT and Coarse2fine Semantic Parsing

disfluency-generation-and-detection's People

Contributors

jiaaoc avatar jingfengyang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

disfluency-generation-and-detection's Issues

Error during generation

I get this error and I honestly have no idea why this is the case. Can someone give me an explanation, and possibly some guidance?
image

transformer version

thanks a lot for your open source. but i don't find the version of transformers in readme.
could you please tell me what version of transformers required?

where are "train.pt/valid.pt/test.pt"?

there is no train.pt/valid.pt/test.pt in swbdIO but only train.txt/valid.txt/test.txt. However, *.pt is needed at the begin of main() function in train.py. looking forward your reply. thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.