Giter Site home page Giter Site logo

pariajm / joint-disfluency-detector-and-parser Goto Github PK

View Code? Open in Web Editor NEW
46.0 1.0 11.0 367 KB

Improving Disfluency Detection by Self-Training a Self-Attentive Model

Home Page: https://www.aclweb.org/anthology/2020.acl-main.346/

License: MIT License

Makefile 0.03% Scilab 2.12% C 20.69% Perl 0.05% Python 77.10%
constituency-parsing switchboard-trees pretrained-models speech-transcripts disfluency-detection bert-based-parser elmo-based-parser self-attentive-disfluency-detector transformer-disfluency-detection fisher-trees

joint-disfluency-detector-and-parser's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

joint-disfluency-detector-and-parser's Issues

Trying to parse new sentences gives runtime error

When I run the parser with the default sentences in the best_models/raw_sentences.txt, it parses correctly. But when I make any change to those sentences or add any new sentence, it gives runtime error:
return self.layer_norm(outputs + residual), attns_padded
RuntimeError: The size of tensor a (51) must match the size of tensor b (38) at non-singleton dimension 0

using pytorch version: 1.8.1+cu101
python version: Python 3.7.10

Any idea about the issue? Can't we use the pretrained model (swbd_fisher_bert_Edev.0.9078) to parse our own sentence?

Run time error

When I run:
python src/main.py parse --input-path best_models/raw_sentences.txt --output-path best_models/parsed_sentences.txt --model-path-base best_models/swbd_fisher_bert_Edev.0.9078.pt >best_models/out.log

I get the error:
File "src/main.py", line 657, in
main()
File "src/main.py", line 653, in main
args.callback(args)
File "src/main.py", line 530, in run_parse
predicted, _ = parser.parse_batch(subbatch_sentences)
File "/mnt/home/v_shizhan03/joint-disfluency-detector-and-parser/src/parse_nk.py", line 1027, in parse_batch
annotations, _ = self.encoder(emb_idxs, batch_idxs, extra_content_annotations=extra_content_annotations)
File "/mnt/home/v_shizhan03/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/home/v_shizhan03/joint-disfluency-detector-and-parser/src/parse_nk.py", line 615, in forward
res, current_attns = attn(res, batch_idxs)
File "/mnt/home/v_shizhan03/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/home/v_shizhan03/joint-disfluency-detector-and-parser/src/parse_nk.py", line 358, in forward
return self.layer_norm(outputs + residual), attns_padded
RuntimeError: The size of tensor a (102) must match the size of tensor b (82) at non-singleton dimension 0

Any clues?

How to parse a sentence faster?

When I run the GPU to parse a sentence, it takes 6 seconds, and it only 40 seconds to parse 5,000 sentences. Is there any way to parse a sentence faster?

Calling cuda() with async results in SyntaxError

The error occurs in this line.

if use_cuda:
    torch_t = torch.cuda
    def from_numpy(ndarray):
        return torch.from_numpy(ndarray).pin_memory().cuda(async=True)

The reason for this error is that async has become a reserved keyword from python 3.7. The cuda() constructor arguments have changed as well. This is how the new constructor looks -

cuda(device=None, non_blocking=False)

async=True can be replaced by non_blocking=True.

non_blocking (bool):
If True and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. Default: False.

parse_nk.py does not support BERT

I was not able to run the given scripts with BERT model. It seems like the parse_nk.py does not support BERT models yet. Please let me know if that is the case and would the repo be updated?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.